Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's a terrible article. The author misunderstands competition and how much it drives products in this area. Snowflake is incentivized to make their product better on every dimension. If Snowflake don't improve, customers will leave in droves - like when they moved to Snowflake.

In practice, as has been pointed out in other comments, they do improve their performance (for competitive reasons) and it does cost them money when they do it.... They did it a couple qtrs ago and left $97 mill on the table.

https://www.fool.com/earnings/call-transcripts/2022/03/02/sn...



There are many degrees of optimization and clearly there's some cost to bad performance, but Snowflake still has a massive perverse incentive to not spend too much effort on improving performance. If Snowflake is like every software company I've ever been involved with there are many competing projects at any given time and direct revenue impact is a big factor in what gets prioritized.

My own experience with Snowflake absolutely backs up the article's point. At my work we routinely encounter abysmal performance for certain types of queries, due to a flaw on Snowflake's side. We have had numerous talks with them and there is no question that they have an issue, but they have shown absolutely no urgency to fix it. Their recommendation is that we spend more money to work around the problem on their end.


>At my work we routinely encounter abysmal performance for certain types of queries, due to a flaw on Snowflake's side.

Do tell! I'm a current Snowflake customer, I'd like to know what to look out for.


Don’t you see this with any cost based query optimizer based product?


It is a terrible article. I’ve been on the engineering side of these big data platforms including snowflake in its early days, Paraccel (redshift’s code ancestor), redshift, and others you probably use but don’t realize are actually hyper scale database engines. The author missed the mark consistently. I chortled when he discussed the redshift WLM which I helped design a very long time ago and it’s absolute garbage. Snowflakes entire point is you can decouple the storage and the database from the warehouse query engine to provide total isolation from noisy neighbors. If you’re encountering noisy neighbors you’re using the product entirely wrong.

And you’re right. The motivation snowflake has to improve is survival. It’s not like their architecture is impossible to replicate. Redshift is doing a total reorganization of the product and rewrite to compete more directly with snowflake (redshift aqua etc).

They also seem to completely discount the value of SaaS outsourcing database and storage operations to snowflake whose only focus is operating the database product. Running your own clusters is an exercise that seems smart in the first few months then like a puppy when it grows up you’re stuck with a dog. If you love dogs and train them well then great. But fact is most people are terrible dog owners, and the same is true for MPP clusters. Being able to focus on the query management operations exclusively is really ideal. Highly stateful distributed products are a PITA.

He also rants about snowflake not telling him the hardware. Snowflake runs in ec2, gcp, azure. You can literally guess the hardware types - there’s just not that many saddle point instance types for that sort of workload. Discussing ssd vs hdd is also an obvious sign of ignorance - it’s basic premise is it does very wide highly concurrent s3 gets and scans of the data using a foundation db metadata catalog to help prune. Being in aws, it’s implausible they use hdd and realistically they could elide ssds (I do not remember if they use local disks for caching, but it’s stateless regardless).

The unit costing being hardware agnostic is totally normal too - they don’t have to expose to you the details of their costing because they normalize it to a standard fictional unit.


I'm a snowflake customer and I've felt/am feeling all of the pain that this article talks about. There might be some handwaving over technical complexity that you don't like given your detailed understanding of how the thing is built, but the article is fundamentally right in its message.

The thing it's most right about is the power imbalance and the innovators dilemma. I've had more than one instance of the case where we've found that query performance/cost is too high, complained about it, and Snowflake have "made a configuration change" (undisclosed) that has brought the cost down.


Don’t you have the same issue with any query optimized product? If I’m using redshift and hit a bad execution plan that I can’t get around by tweaking the query I’m SOL, and redshift engineers aren’t going to tweak a configuration change to help me.

This is why products like DynamoDB were created - cost based optimizers are imperfect and unpredictable, and once you’ve stepped over some limit or threshold performance wildly changes. The reasons can be your query, or the data has changed, or there’s a noisy neighbor consuming a resource you depend on for your query. If you need highly predictable times you can reason about you won’t get it from any RDB solution.

Given that, what about snowflake feels different? That the details are obscured from you so you don’t understand why things are happening? Is the lack of ability to deeply introspect making you uncomfortable? My experience had been the ability to introspect rarely leads to any change in outcome but instead leads to me identify the query optimizer has done something stupid I can not do anything about, but at least I can point to the specific resource being exhausted by it.


We regularly benchmark the "big 3" Cloud Data warehouses - Redshift, Snowflake and Big Query at SingleStore. Their performance is very close to the same (within 10-20%) on most benchmarks on reasonable sized data sets (10s of TB).

I agree if the performance of one of them fell behind the others for any prolonged period of time the cost to the laggard in market share would be much much worse then short term revenue gain of "being slow on purpose".


I don't think it misunderstands business competition. In fact it understands the concept of competition very well, and develops an insightful critique into the perverse incentives that are borne from competition.

It benefits no one except for a couple thousand people to so blatantly play their customers in this way. In fact, it's worse, as it incentivizes that same behavior of other market actors in the space.


What exactly in the article suggest the author understands the pressure of competition on incentives?

The author states that Snowflake are not incentivized to increase performance due to short term revenue concerns but doesn't mention they are also incentivized to do the opposite from a competitive perspective. The result is incomplete enough that it ends up being flat wrong with respect to the behavior that the company actually engages in.

The author missed the fact Snowflake did the very thing he/she suggested they were incentivized not to do, recently, at a cost of $97 million. The CEO explained why they are doing it and how they are actually incentivized. I don't know how the article could miss the mark by more than it has. The company literally does the opposite of what he/she suggested.It's not like they are the only one either, AWS has a history of reducing prices. Why? Once again, competition.


> The CEO explained why they are doing it and how they are actually incentivized.

The CEO explained why he thinks it's a good long term plan... but for now, they get money i.e. are actually incentivized by slow code. The CEO's incentives are theoretical ones.

And the market, which ultimately control whether the CEO gets to continue that plan or not, did not seem to agree it was a good plan.


By this reasoning, everyone would shirk at work. If you think incentives only act over short time horizons, I don't know how you explain an enormous amount of human behavior.

The market didn't even understand it. Most of the people trading equities, especially around earnings announcements, don't know what a data warehouse is or what matters in that market. All they saw was "miss".


I didn't say the CEO was wrong or that long-term thinking is bad! I said the actual incentives are still misaligned. (I mean, a lot of people do shirk at work, and it even works out well for them.)

I think you have a weird and probably not useful definition of "actual" if "monthly revenue" is not actual but "projected monthly revenue two years from now" is actual. (Or maybe I've just lived in Germany too long.)


You are right, I've used the word "actual" incorrectly. What I should have said was "net". Ie, both short term and long term revenue incentivize behavior and in this case the net result was increasing performance, ie long term incentive > short term incentive.


I think you’re providing a false dichotomy here. The structure may provide an opportunity to maximize short term profits but there is no reason to believe they, or any one, has to follow that opportunity especially if they rationally believe investing energy and money now has a much higher NPV.

When I read these comments about incentives to screw customers and a naked belief everyone must be, I really wonder who traumatized the authors. There are tons of excellent engineering cultures that prioritize excellence for long term gain. Find a better job.


While I think it is definitely in a company's best long term interest to implement features that benefit its customers; it might not be in the best interest of those who are currently running the company.

We have seen many, many examples of executives who are willing to sacrifice the future of the company to get a personal short-term gain. Jack up the revenues (or slash costs) in ways that alienate customers is a great strategy when you plan to jump off with your golden parachute in a couple years when all your stock options vest.


Sure but to not even mention churn as something Snowflake is worried about is pretty silly. With the funding environment taking a dramatic turn they (and every other SaaS company) are going to be deeply concerned about price competition and churn


Agreed. But a good article should have shown an example rather than a counter example. Intel might have been a good example. A good article would have shown the competing incentives at play rather than a single incentive.


> It's a terrible article. The author misunderstands competition and how much it drives products in this area.

Agree, but the author has one thing right. Snowflake is not transparent about product behavior, which makes it hard to reason about costs and performance.

Open source data warehouses like ClickHouse and Druid don't have this problem. If you want to know how something works, you can look at the code. Or listen to talks from the committers. This transparency is an enduring strength of open source projects.


Sure but if you want full transparency you don't use Snowflake. They never sold themselves as that.

I wouldn't buy a Ferrari and complain about lack of trunk space.


I'm not complaining, of course. It's just an observation. Snowflake is very similar to Oracle in that respect, which is not surprising given where the founders came from.

Personally I think Snowflake is very impressive on the things they optimize for, which includes complex queries on enterprise data sources. The same could be said for BigQuery.


> The author misunderstands competition and how much it drives products in this area.

Snowflake compete on marketing.

Plenty of people rave about Snowflake and have never heard of Databricks, BigQuery or Redshift.


The main flaw of the article is not controlling for product category.

I suspect most data warehouses have similar NDRs.

In many companies a data warehouse is the place where you dump all your data and let everyone run poorly written programs against it.

Add to that poor engineering culture in data teams (often lead by non-technical people) and costs are bound to skyrocket.


> In many companies a data warehouse is the place where you dump all your data and let everyone run poorly written programs against it.

Hilariously accurate description of a data warehouse.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: