The challenges of using libraries

With the Jia Tan saga and the xz utils backdoor bringing wider attention to the risks of supply chain tampering, I figured it was worth putting a few numbers to the scale of this challenge.

kube-audit-rest a simple example

I maintain kube-audit-rest so figured it was fair to use this project as an example. While I have done my best to reduce external dependencies (such as not using the Kubernetes client-go package) I cannot escape them entirely.

Finding out the size of the problem

# from your source code repository for your binary
# in this case I've already cloned kube-audit-rest and changed directory to it.
docker run --rm -it -v$(pwd):/usr/src/code  golang:1.22 

# Moving to the code in the container
cd /usr/src/code
# build everything
go build -buildvcs=false  -o . ./...

# shows all external source code
find /go/pkg -type f -name \*.go
| xargs cat | wc -l
# 2211035

# Finding all lines of code used by kube-audit-rest, ignoring autogenerated code
# This will include standard libraries from golang etc that are used
go tool objdump kube-audit-rest  | cut -d " " -f 3 |cut -d $'\t' -f1 |  grep : | grep -v "autogenerated"| grep ".go" | sort -u  | wc -l
#96406

# actual amount of source code
find . -type f -name \*.go | xargs cat | wc -l
1224
#too high, let's only have the real source code rather than anything testing

find . -type f -name \*.go -not -name \*_test.go -not -name \*mock.go  | grep -v testing
518

Conclusion / What does this mean?

This simple project has 518 direct lines of code, but relies on 2,211,035 lines of code, of which 96,406 make it in to the binary I ship.

In other words only 0.2% of the code used for this project is visible, and only 0.5% of the code I ship I wrote. This doesn’t even include all the libraries needed to run the container, or the kernel it runs on!

I hope this serves as a wake up call for others, securing the software supply chain and ensuring open source developers get suitable support is critically important.

Caveats

  • The total source of all libraries includes all test dependencies
  • Autogenerated code is ignored because it’s hard to measure
  • This includes blank lines/import statements/etc for ease of calculation

Copyright © 2024 Richard Finlay Tweed. All rights reserved. All views expressed are my own