Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

XDP, and the eBPF ecosystem in general, is quite neat. However, a word of caution:

* The BPF verifier's DX is not great yet. If it finds problems with your BPF code it will spit our a rather inscrutable set of error messages that often requires a good understanding of the verifier internals (e.g the register nomenclature) to debug

* For the same source code, the code generated by the verifier can change across compiler versions in a breaking way, e.g. because the new compiler version implemented an optimization that broke the verifier (see https://github.com/iovisor/bcc/issues/4612)

* Checksum updating requires extra care. I believe you can only do incremental updates, not just because of better perf as the post suggests but also because the verifier does not allow BPF programs to operate on unbounded buffers (so checksumming a whole packet of unknown size is tricky / cumbersome). This mostly works but you have to be careful with packets that were generated with csum offload, don't have a valid checksum and whose csum can't be incrementally updated.

As the blog post points out, the kernel networking stack does a lot of work that we don't generally think about. Once you start taking things into your own hands you don't have the luxury of ignorance anymore (think not just ARP but also MTU, routing, RP filtering etc.), something any user of userspace networking frameworks like DPDK will tell you.

My general recommendation is to stick with the kernel unless you have a very good justification for chasing better performance and if you do use eBPF save yourself some trouble and try to limit yourself to readonly operations, if your use case allows.

Also, if you are trying to debug packet drops, newer kernels have started logging this information that you can track using bpftrace which gives you better diagnostics.

Example script (might have to adjust based on kernel version):

    bpftrace -e '
        kprobe:kfree_skb_reason {
        $skb = (struct sk_buff *)arg0;
        $ipheader = ((struct iphdr *) ($skb->head + $skb->network_header));
        printf("reason :%d %s -> %s\n", arg1, ntop($ipheader->saddr), ntop($ipheader->daddr));
    }'


We absolutely ran into these issues.

A couple notes that help quite a bit:

1. Always build the eBPF programs in a container - this is great for reproducibility of course, but also makes DevX on MacOS better for those who prefer to use that.

2. You actually can do a full checksum! You need to limit the MTU but you can:

  static __always_inline void tcp_checksum(const struct iphdr *ip_header, struct tcphdr *tcp_header, const __u16 tcp_len, const void *data_end) {
    __u32 sum = 0;
    __u16 *buf = (void *)tcp_header;
    ip_header_pseudo_checksum(ip_header, tcp_len, &sum);
    tcp_header->check = 0;
    __u16 max_packet_size = tcp_len;
    if (max_packet_size > MAX_TCP_PACKET_SIZE) {
        max_packet_size = MAX_TCP_PACKET_SIZE;
    }
    for (int i = 0; i < max_packet_size / 2; i++) {
        if ((void *)(buf + 1) > data_end) {
            break;
        }
        sum += *buf;
        buf++;
    }
    if ((void *)buf + 1 <= data_end && ((__u8 *)buf - (__u8 *)tcp_header) < max_packet_size) {
        sum += *(__u8 *)buf;
    }
    tcp_header->check = csum_fold_helper(sum);
  }
With that being said, it's not lost on me that XDP in general is something you should only reach for once you hit some sort of bottleneck. The original version of our network migration was actually implemented in userspace for this exact reason!


> You actually can do a full checksum

Indeed! This is what I had in mind when I wrote "cumbersome" :).

It's been a while for me to be able to recall whether the problem was the verifier or me, and things may have improved since, but I recall having the verifier choke on a static size limit too. Have you been able to use this trick successfully?

> Always build the eBPF programs in a container

That should work generally but watch out for any weirdness due to the fact that in a container you are already inside a couple of layers of networking (bridge, netns etc.).


Different kernels will be different levels of fussy about the bounded loop you're using there. Bounded loops are themselves a relatively recent feature.

Of course, checksum fixups in eBPF are idiomatically incremental.


How do containers help when bpf is mostly a matter of kernel version?


they don't its just the poster wanting people to do what they prefer


I figure it’s one way to keep your compiler version unchanged for eBPF work, while you might update/upgrade your dev OS packages over time for other reasons. The title of the linked issue is this:

“Checksum code does not work for LLVM-14, but it works for LLVM-13”

Newer compilers might use new optimizations that the verifier won’t be happy with. I guess the other option would be to find some config option to disable that specific incompatible optimization.


This post hits close to home, I've run into all of these myself.

On checksums: Incremental updates are the path of least pain only if the packet’s checksum is valid and not CHECKSUM_PARTIAL. With modern offloads (TSO/GSO/GRO/checksum offload), the checksum visible to XDP is often zero/garbage because the NIC fills it later. In practice, either disable offloads for that traffic or recompute from scratch with bpf_csum_diff() plus bpf_l3_csum_replace() / bpf_l4_csum_replace().

The verifier: This is a fun one, when you make a small change and suddenly the verifier won't allow it.

And the moment you start modifying packets too much yourself, you're on the hook for everything the kernel used to do for you.

I once went down the rabbit hole of building a minimal TCP stack, and the experience was exactly as you'd expect. Getting to 95% done felt quick, but that last 5% was a nightmare (if 100% is even achievable)


openonload is faster than the kernel even with the most basic configuration, which is pretty much drop-in and requires zero changes on your application.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: