Hacker Newsnew | past | comments | ask | show | jobs | submit | edenfed's commentslogin

Speaking for Odigos (disclosure: I’m the creator), here are two significant differences between us and the other mentioned players:

- Accurate distributed traces with eBPF, including context propagation. Without going into other tools, I highly recommend trying to generate distributed traces using any other eBPF solution and observing the results firsthand.

- We are agent-only. Our data is produced in OpenTelemetry format, allowing you to integrate it seamlessly with your existing observability system.

I hope this clarifies the differences.


I wonder if anyone tried to integrate Odigos with Coroot - looks like it could be really powerful!


Definitely can relate, this is why I started an open-source project that focus on making OpenTelemetry adoption as easy as running a single command line: https://github.com/odigos-io/odigos


You can absolutely use just the OTel APIs and use something else besides the OTel SDK. Here is a blog post about how we did it with eBPF: https://odigos.io/blog/Integrating-manual-and-auto


By dropped data do you mean by exceeding the size of the allocated ring buffer/perf buffer? If so this is configurable by the user, so you can adjust is according to the expected load


eBPF can drop data silently under quite a few conditions, unfortunately. And -- most frustratingly -- it's silent, so it's not even entirely clear which condition you've fallen into. This alone is a pretty significant with respect to DTrace: when/where DTrace drops data, there is always an indicator as to why. And to be clear, this isn't a difference merely of implementation (though that too, certainly), but of principle: DTrace, at root, is a debugger -- and it strives to be as transparent to the user as possible as to the truth of the underlying system.


You can enrich the spans created by eBPF by using OpenTelemetry APIs as usual, the eBPF instrumentation is a replacement for the instrumentation SDK. The eBPF program will detect the data recorded via the APIs and will add it to the final trace combining both automatic and manually created data.


I don’t have a lot of experience using dtrace, but AFAIK the big advantage of eBPF over dtrace is that you do not need to instrument your application with static probes during coding.


DTrace (on Solaris at least) can instrument any userspace symbol or address, no need for static tracepoints in the app.

One problem that DTrace has is that the "pid" provider that you use for userspace app tracing only works on processes that are already running. So, if more processes with the executable of interest launch after you've started DTrace, its pid provider won't catch the new ones. Then you end up doing some tricks like tracking exec-s of the binary and restarting your DTrace script...


That's not exactly correct, and is merely a consequence of the fact that you are trying to use the pid provider. The issue that you're seeing is that pid probes are created on-the-fly -- and if you don't demand that they are created in a new process, they in fact won't be. USDT probes generally don't have this issue (unless they are explicitly lazily created -- and some are). So you don't actually need/want to restart your DTrace script, you just want to force probes to be created in new processes (which will necessitate some tricks, just different ones).


So how would you demand that they’d be created in a new process? I was already using pid* provider years ago when I was working on this (and wasn’t using static compiled-in tracepoints).


Thank you for reporting will fix ASAP


eBPF instrumentation does not require code changes, redeployment or restart to running applications.

We are constantly adding more language support for eBPF instrumentation and are aiming to cover the most popular programming languages soon.

Btw, not sure that sampling is really the solution to combat overhead, after all you probably do want that data. Trying to fix production issue when the data you need is missing due to sampling is not fun


All good points, thank you.

What's the limit on language support? Is it theoretically possible to support any language/runtime? Or does it come down to the protocol (HTTP, gRPC, etc) being used by the communicating processes?


We already solved compiled languages (Go, C, Rust) and JIT languages (Java, C#). Interpreted languages (Python, JS) are the only ones left, hopefully we will solve these as well soon. The big challenge is supporting all the different runtimes, once that is solved implementing support for different protocols / open-source libraries is not as complicated.


Got to get PHP on that list :)


FWIW it's theoretically possible to support any language/runtime, but since eBPF is operating at the level it's at, there's no magic abstraction layer to plug into. Every runtime and/or protocol involves different segments of memory and certain bytes meaning certain things. It's all in service towards having no additional requirements for an end-user to install, but once you're in eBPF world everything is runtime-and-protocol-and-library-specific.


We are currently supporting just Kubernetes environments. docker-compose, VMs, and Serverless are on our roadmap and will be ready soon


Thanks for the valuable feedback! We used a constant throughout of 10,000 rps. The exact testing setup can be found under “how we tested”.

I think the example you gave for the lock used by Prometheus library is a great example why generation of traces/metrics is a great fit for offloading to different process (an agent).

Patchyderm looks very interesting however I am not sure how you can generate distributed traces based on metrics, how do you fill in the missing context propagation?

Our way to deal with eBPF root requirements is to be transparent as possible. This is why we donated the code to the CNCF and developing as part of the OpenTelemetry community. We hope that being open will make users trust us. You can see the relevant code here: https://github.com/open-telemetry/opentelemetry-go-instrumen...


> I am not sure how you can generate distributed traces based on metrics

Every log line gets an x-request-id field, and then when you combine the logs from the various components, you can see the propagation throughout our system. The request ID is a UUIDv4 but the mandatory 4 nibble in the UUIDv4 gets replaced with a digit that represents where the request came from; background task, web UI, CLI, etc. I didn't take the approach of creating a separate span ID to show sub-requests. Since you have all the logs, this extra piece of information isn't super necessary though my coworkers have asked for it a few times because every other system has it.

Since metrics are also log lines, they get the request-id, so you can do really neat things like "show me when this particular download stalled" or "show me how much bandwidth we're using from the upstream S3 server". The aggregations can take place after the fact, since you have all the raw data in the logs.

If we were running this such that we tailed the logs and sent things to Jaeger/Prometheus, a lot of this data would have to go away for cardinality reasons. But squirreling the logs away safely, and then doing analysis after the fact when a problem is suspected ends up being pretty workable. (We still do have a Prometheus exporter not based on the logs, for customers that do want alerts. For log storage, we bundle Loki.)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: