Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It seems like Snowflake is going all-in on building features and doing marketing that encourage their customers to build applications, serving operational workloads, etc... on them. Things like in-product analytics, usage-based billing, personalization, etc...

Anyone here taking them up on it? I'm genuinely curious how it's going.



Disclaimer: I work at SingleStoreDB.

Building a database that can handle both analytics and operations is what we've been working on for the past 10+ years. Our customers use us to build applications with a strong analytical component to them (all of the use cases you mentioned and many more).

How's it going? It's going really well! And we're working on some really cool things that will expand our offering from being a pure data storage solution to much more of a platform[1].

If you want to learn more about our architecture, we published this paper at SIGMOD in late 2022 about it[2].

[1]: https://davidgomes.com/databases-cant-be-just-databases-anym...

[2]: https://dl.acm.org/doi/pdf/10.1145/3514221.3526055


After a series of calls, examples and explanations with them we never managed to get close to a reasonable projection of what our monthly costs would be like on Snowflake. I understand why companies in this field use abstract notions of 'processing' /'compute' units but it's a no go finance wise.

Without some close to real world projections we don't have time to consider implementation to find out for ourselves.


Snowflake is one of the easier tools to measure because it’s a simple function of region, instance size, uptime. If you can simulate some real loads and understand the usage then you do have a shot at forecasting.

Of course the number is going to be high, but you have to remember it rolls up compute and requires less manpower. This is also a win for finance if they are comfortable with usage based billing.


Who's finance team likes usage based billing? It makes sense for elastic use cases and is definitely "fair", but there are a lot of issues: Forecasting is hard. "dev team had an oops" situations.

I had frog getting boiled situation at one job that was exactly the process described in the posted article: usage of the cloud data warehouse grew as people trusted the infrastructure and used more fresh data for more and more use cases. They were all good, sane use cases. I repeatedly under-forecast our cost growth until we made large changes and it really frustrated the finance people, rightly so.


I've noticed that too. I think the marketing is definitely working, I'm seeing a few organisations starting to shift more and more workloads onto them, and some are also publishing datasets on their marketplace.

One of their most interesting offerings coming up is Snowpark which lets you run a Python function as a UDF, within Snowflake. This way you don't have to transfer data around everywhere, just run it as part of your normal SQL statements. It's also possible to pickle a function and send it over... so conceivably one could train a data science model and run that as part of a SQL statement. This could get very interesting.



In theory, fine. Then you look at the walled garden that is Snowpark - only "approved" python libraries are allowed there. It will be a very constrictive set of models you can train, and very constrictive feature engineering in Python. And, wait, aren't Python UDFs super-slow (GIL) - what about Pandas UDFs (wait that's PySpark.....)


Having worked with a team using Snowpark, there are a couple things that bother me about it as a platform. For example, it only supported Python 3.8 until 3.9/10 recently entered preview mode. It feels a bit like a rushed project designed to compete with Databricks/Spark at the bullet point level, but not quite at the same quality level.

But that's fine! It has only existed for around a year in public preview, and appears to be improving quickly. My issue was with how aggressively Snowflake sales tried to push it as a production-ready ML platform. Whenever I asked questions about version control/CI, model versioning/ops, package managers, etc. the sales engineers and data scientists consistently oversold the product.


Yeah it's definitely not ready for modelling. It's pretty rocking for ETL though, and much easier to test and abstract than regular SQL. Granted it's a PySpark clone but our data is already in Snowflake.


Disclaimer: Snowflake employee here. You can add any Python library you want - as long as its dependencies are also 100% Python. Takes about a minute: pip install the package, zip it up, upload it to an internal Snowflake stage, then reference it in the IMPORTS=() directive in your Python. I did this with pydicom just the other day - worked a treat. So yes, not the depth and breadth of the entire Python ecosystem, but 1500+ native packages/versions on the Anaconda repo, plus this technique? Hardly a "walled garden".


Good luck with trying to install any non-trivial python library this way. And with AI moving so fast, do you think people will accept that they can't use the libraries they need, because you haven't approved them yet?!?


> run a Python function as a UDF

Is that a differentiator? I'm unfamiliar with Snowpark's actual implementation but know SQL Server introduced Python/R in engine in 2016? something like that.


Snowflake is capturing a large market share in analytics industries thanks to its “just works” feature. I’m a massive fan.

But in the end, snowflake stores the data in S3 as partitions. If you want to update a single value you have to replace the entire s3 partition. Similarly you need to read a reasonable amount of s3 data to retrieve even a single record. Thus you’re never going to get responses shorter than half a second (at best). As long as you don’t try and game around that limitation it works great.

Materialize up here also follows the same model in the end FWIW.


Yeah, they're providing a path-of-least-resistance for getting stuff done in your existing data environment.

A common challenge in a lot of organizations is IT as a roadblock to deployment of internal tools coming from data teams. Snowflake is answering this with Streamlit. You get an easy platform for data people to use and deploy on and it can all be done within the business firewall under data governance within Snowflake.


It's really bad for in-product analytics. Slow and expensive to keep running on a 24/7 SLA.

Even with Vertica doing that, we're seeing 10x costs just doing back-office DWH. My job now is keeping Vertica running so we can pay our Snowflake bill.


I assume they're angling for a salesforce acquisition as they move towards being a micro-hosting service like salesforce.


Snowflake is worth at least 25% of Salesforce so such an acquisition is very unlikely unless Salesforce has $60 billion or more burning a hole in their pocket.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: