Depends on what you want to monitor. Grafana is pretty decent, but the real draw to Datadog is their APM stack. The UI for tracing and looking at stuff is pretty awesome.
Though you could get most things into Grafana with something like Prometheus. The problem with Grafana is understanding what the limitations are. If you're not careful with the number of panels and such it can get quite slow.
I've used Grafonnet before for doing Grafana at scale. Simply put, I hate it. Apparently an alternative is being worked on at Grafana so I'm waiting for that. But if you need to make hundreds of panels....it works well enough.
If you need to monitor some infrastructure you can just use Telegraf and output it to Grafana if needed. It kinda falls apart though because another great benefit of something like Datadog is not managing a time series db. That can get ugly real quick.
I guess it all just depends. If my bill was super high I wouldn't mind spending some resources on Prom/Grafana if you're in the Kube space or some Telegraf/InfluxDB if you're not.
I've also heard good things about Timescale but haven't used it.
> I've used Grafonnet before for doing Grafana at scale. Simply put, I hate it. Apparently an alternative is being worked on at Grafana so I'm waiting for that. But if you need to make hundreds of panels....it works well enough.
Hi, I run the Grafana team at Grafana Labs. I'd love to learn more about your Grafonnet use to help us build something better. I'm david at grafana com
Depends a lot what sort of scale you are on too. Grafana Cloud will be cheaper than DD but is not quite as end-user friendly.
Running it yourself is not too hard up if you are not having to do clustering ( say 1m metric series, 100GB/day logs). But different people have different comfort levels for that.
With any monitoring system most of the work is actually making use of the data. Tagging, Alerts, Dashboards and especially onboarding all the teams. You can spend a lot of time and money rolling something out and then barely anybody uses it.
(disclaimer, I'm with Grafana). We added a lot to our free Grafana Cloud so you can kick it pretty hard (and harder during the first 14 days when everything is beefed up). Free tier comes with 3 Grafana front end users fully managed, backend (with storage) 50gb Loki logs, 50gb Tempo traces, 10k monthly active series prom metrics, IRM/on call, k6 user testing hours and other stuff too. And for the quick solution integrations we made a K8s monitoring solution with out of the box dashboards, KPIs and alerts. Same thing we did with many others. We absolutely have more work to do in simplifying the user experience too.
Plug: If you're looking for something a bit more "few-clicks-and-you-are-up-and-running", check out OpsVerse ObserveNow: https://opsverse.io/observenow-observability/ .. Entirely powered by OSS tools, ingestion-driven pricing, and without the hassle of managing the stack and scaling up.
Best of all, can also be run entirely within your own AWS/GCP/Azure so you only pay OpsVerse for maintaining the stack based on your ingestion (and we also monitor the monitoring system for you ;))
My small team had to choose between New Relic and DD and I found New Relic's billing model to be more appetizing. It was per seat and you could switch who was in the seat. Unlimited instances and most of the features were covered under that seat besides some extra things like HIPAA / Finance related stuff. They also have "regular" users that are free that can make dashboards and such. DD drove me nuts with their crazy amount of sales calls that just seemed to balloon.
For others reading this - you can’t just switch back and forth a few times a week. A full platform user can be moved to a basic user only twice in a 12-month period.
Thanks for mentioning qryn! We are a non-corporate alternative and feature full ingestion compatibility with DataDog (including Cloudflare emitters, etc), Loki, Prometheus, Tempo, Elastic & others for both on-prem (https://qryn.dev) and Cloud (https://qryn.cloud) deployments, without the killer price tag.
Note: in qryn s3/r2 are as close to /dev/null as it gets!
I'm saying most logs are pointless to keep and would be better directed to /dev/null. Keep important transaction related logs and sample the rest.
The notion that every single log or metric across your entire technical architecture is worth keeping is one implanted by SaaS providers with a financial interest in naive engineers doing just that.
Kubernetes with Graphana is free. The combo provides logging, performance stats and graphs and lets you auto-scale based on usage.
Unfortunately, avoiding insanely costly SaaS solutions requires engineers to plan ahead and design the entire stack on top of certain open source solutions. I suspect that many engineers today receive kickbacks from SaaS providers to lock-in their employers. Employers are none the wiser and rarely push back when an engineer suggests a big-name SaaS solution with insane lock-in factor. Nobody seems to care about lock-in these days, it's only when your costs reach almost 100 million and interest rates are going up that you start thinking "Damn, I could have had all that for free if I had planned ahead and resisted all these platform lock-ins and unnecessary proprietary tools..."
> Unfortunately, I suspect that many engineers today receive kickbacks from SaaS providers to lock-in their employers.
Cmon man, really? Drop the conspiracy theories. I’ve personally been the guy advocating for datadog at
4 startups. Mainly because of opportunity cost - we have 10-100 engineers, I want them building product not figuring out how to deploy a whole ecosystem of observability tools. IF we get big let’s reevaluate… but in the meantime.
am I doing it wrong? If others are getting kickbacks I want in
The difference between datadog and doing it yourself is that datadog is a well thought through product rather than a cobbled together set of various tools
Having a single interface for everything makes life so much easier across a number of different teams
Search is fast and easy to use for logs and traces
Being able to see what a user actually clicked on in their session is absolutely game changing for support teams
I’m not a huge fan of the bill but it’s so much better than anything we could do ourselves without a team of engineers dedicated to observability (which would cost far more than datadog)
I do a lot of negotiating on products like this. The most I've ever gotten was a shirt and some stickers for my kid. Definitely not enough to move the needle on $250k/year deals. I feel like I'm missing out!
I love how good DataDog is. It's a great product. Too expensive though. I love most of the people I've worked with at Grafana Cloud but it's a painful product. The price makes up for it though, so we use Grafana Cloud.
We may end up with something like signoz, when we have the cycles but the ROI is bad when I already have twice as much work as people and that barely more than KTLO.
General usability. DataDog is intuitive and easy. Grafana is rough and requires a solid understanding of statistics and data analytics. The bar for using it is pretty high, so most engineers I know push it to someone else, which means there's one team doing the toil and working on creating simpler abstractions to hide the complexity.
HN has a tendency to explain every little thing with conspiracy theories. It can’t be a clear explanation based on incentives and people taking the path of least resistance, it must be malice. I’m not a psychologist, I don’t want to psychoanalyse why they think this way. But it is a bit tiring to interact with such people.
People who aren't harmed by these things don't notice them. Their incentive is to ignore as much as possible. Turning a blind eye is literally the safest option. But when you've had it rough, you literally can't stop seeing this stuff everywhere.
If you change your mindset from ignoring problems to looking for problems, you will find that there are problems everywhere. I'd rather be biased in that way than in the former. In my position, I can't afford to ignore even the tiniest problems.
> we have 10-100 engineers, I want them building product not figuring out how to deploy a whole ecosystem of observability tools. IF we get big let’s reevaluate…
Moving away from those SaaS tools can be extremely painful and a lot more costly due to vendor lock in. In practice, typically, this "let's reevaluate" time never happens.
On the other hand, I don't really care. I normally suggest open source tools, but if people want to throw money at some vendor, fine by me.
Obviously SaaS providers will not offer kickbacks to startups, the deals aren't usually big enough. I've witnessed it in a big corporation once. One of the engineers was VERY insistent on using a specific solution even though it didn't make sense technically and everyone else was against it but because they were more senior, they made the final call. If they don't get outright bribes, they will get lucrative job offers from these big companies in the future.
Imagine being the guy who convinced Coinbase to use DataDog... That person will probably end up working at DataDog sooner or later if not already there... You can bet they will be getting a very cushy salary.
I could probably make a living out of extorting corrupt engineers. It's so predictable.
And hiring someone corrupt enough to sell out their previous employer for their current one is rarely a smart move, as they are liable to do the same when angling for their NEXT job.