A thorough introduction to bpftrace

rincebrain · on Aug 10, 2021

I went to use bpftrace to solve a real problem recently, and unfortunately found I had to resort to systemtap.

I wanted to print a debug log as it was being saved in a kernel module - that's fine, the module has an equivalent of save_log(char* foo), just probe on entrance and print that, right?

...except bpftrace has a hard cap of 200-odd bytes for getting char *s out with str() at a time. [1]

Fine, so you just do some pointer math and print foo, foo+200, etc, right?

No strlen, no printf return value, so you don't know where the end of the string is.

At that point, I said "sod it" and broke out systemtap.

[1] - https://github.com/iovisor/bpftrace/issues/305

suifbwish · on Aug 9, 2021

What is the main advantage of this over native strace?

GrumpySloth · on Aug 9, 2021

bpftrace has lower overhead, because, thanks to BPF, it can accumulate all the data needed in kernel and only send a summary to userspace to be displayed. strace on the other hand uses the ptrace(2) syscall to set a breakpoint on each syscall, including the ones that you do not trace (when you filter calls e.g.) and on each syscall data travels between kernel and strace.

It's possible that in the future strace will use BPF as well to lower its overhead.

bpftrace is also more versatile. You could use it to e.g. collect stack traces for all the syscalls a program makes. You can also attach actions to more things than just syscalls. You can e.g. use kprobes to inject code into kernel functions which aren't exported as syscalls.

wmf · on Aug 9, 2021

strace only traces system calls while bpftrace does much more. Just looking at the first example, vfs_read isn't a system call (read() may also operate on sockets or whatever) and strace doesn't calculate histograms AFAIK.

tych0 · on Aug 9, 2021

I use it a lot to figure out why things fail. For example, what if you get an -EPERM from mount()? Was it denied by seccomp, an LSM, because you don't own the user namespace that owns your mount namespace?

strace will tell you it failed, but bpftrace can help understand why.

Note that I said "help": bpftrace can tell you "this function failed with EPERM", but e.g. ovl_fill_super() can fill with EPERM for lots of different reasons. So it's a bit like printf debugging. And you're SOL if the error is generated within that function or from an inlined function :(

1vuio0pswjnm7 · on Aug 9, 2021

bpftrace requires about 318 MB to install

strace requires less than 2 MB

Bigger is better :)

Sevan777 · on Aug 9, 2021

Standalone binary the project publishes is 45MB. https://github.com/iovisor/bpftrace/releases/download/v0.13....

PennRobotics · on Aug 9, 2021

I just tried two commands.

apt install bpftrace: needs 1,201 kB

apt remove strace: frees 1,792 kB

1vuio0pswjnm7 · on Aug 9, 2021

Need to look at the bpftrace dependencies

For example, is clang already installed

1vuio0pswjnm7 · on Aug 9, 2021

How large is strace standalone binary

45MB is still huge; its not even statically-linked

The largest programs I use, even when statically-linked against musl are all under 6MB

1vuio0pswjnm7 · on Aug 10, 2021

And I can count on one hand the number of programs I use that are over 1MB

elteto · on Aug 9, 2021

Title needs a [2019] tag.