The challenges of using libraries

With the Jia Tan saga and the xz utils backdoor bringing wider attention to the risks of supply chain tampering, I figured it was worth putting a few numbers to the scale of this challenge.

kube-audit-rest a simple example

I maintain kube-audit-rest so figured it was fair to use this project as an example. While I have done my best to reduce external dependencies (such as not using the Kubernetes client-go package) I cannot escape them entirely.

Finding out the size of the problem

# from your source code repository for your binary
# in this case I've already cloned kube-audit-rest and changed directory to it.
docker run --rm -it -v$(pwd):/usr/src/code  golang:1.22 

# Moving to the code in the container
cd /usr/src/code
# build everything
go build -buildvcs=false  -o . ./...

# shows all external source code
find /go/pkg -type f -name \*.go
| xargs cat | wc -l
# 2211035

# Finding all lines of code used by kube-audit-rest, ignoring autogenerated code
# This will include standard libraries from golang etc that are used
go tool objdump kube-audit-rest  | cut -d " " -f 3 |cut -d $'\t' -f1 |  grep : | grep -v "autogenerated"| grep ".go" | sort -u  | wc -l
#96406

# actual amount of source code
find . -type f -name \*.go | xargs cat | wc -l
1224
#too high, let's only have the real source code rather than anything testing

find . -type f -name \*.go -not -name \*_test.go -not -name \*mock.go  | grep -v testing
518

Conclusion / What does this mean?

This simple project has 518 direct lines of code, but relies on 2,211,035 lines of code, of which 96,406 make it in to the binary I ship.

In other words only 0.2% of the code used for this project is visible, and only 0.5% of the code I ship I wrote. This doesn’t even include all the libraries needed to run the container, or the kernel it runs on!

I hope this serves as a wake up call for others, securing the software supply chain and ensuring open source developers get suitable support is critically important.

Caveats

The total source of all libraries includes all test dependencies
Autogenerated code is ignored because it’s hard to measure
This includes blank lines/import statements/etc for ease of calculation