The challenges of using libraries
With the Jia Tan saga and the xz utils backdoor bringing wider attention to the risks of supply chain tampering, I figured it was worth putting a few numbers to the scale of this challenge.
kube-audit-rest a simple example
I maintain kube-audit-rest so figured it was fair to use this project as an example. While I have done my best to reduce external dependencies (such as not using the Kubernetes client-go package) I cannot escape them entirely.
Finding out the size of the problem
# from your source code repository for your binary
# in this case I've already cloned kube-audit-rest and changed directory to it.
docker run --rm -it -v$(pwd):/usr/src/code golang:1.22
# Moving to the code in the container
cd /usr/src/code
# build everything
go build -buildvcs=false -o . ./...
# shows all external source code
find /go/pkg -type f -name \*.go
| xargs cat | wc -l
# 2211035
# Finding all lines of code used by kube-audit-rest, ignoring autogenerated code
# This will include standard libraries from golang etc that are used
go tool objdump kube-audit-rest | cut -d " " -f 3 |cut -d $'\t' -f1 | grep : | grep -v "autogenerated"| grep ".go" | sort -u | wc -l
#96406
# actual amount of source code
find . -type f -name \*.go | xargs cat | wc -l
1224
#too high, let's only have the real source code rather than anything testing
find . -type f -name \*.go -not -name \*_test.go -not -name \*mock.go | grep -v testing
518
Conclusion / What does this mean?
This simple project has 518
direct lines of code, but relies on 2,211,035
lines of code, of which 96,406
make it in to the binary I ship.
In other words only 0.2%
of the code used for this project is visible, and only 0.5%
of the code I ship I wrote. This doesn’t even include all the libraries needed to run the container, or the kernel it runs on!
I hope this serves as a wake up call for others, securing the software supply chain and ensuring open source developers get suitable support is critically important.
Caveats
- The total source of all libraries includes all test dependencies
- Autogenerated code is ignored because it’s hard to measure
- This includes blank lines/import statements/etc for ease of calculation