We're willing to spend money, but I've had the "datadog billing problem" before where it starts reasonable and then grows to a non-trivial percent of saas budget, and then theres a scramble to refactor. Trying to get ahead of that as the LLM logs are MUCH larger that my APM logs.
Then I'd try to integrate via a standard connector, OTel for example. Then the cost of switching is much lower. But yeah not sure myself how this will scale and how expensive or even useful it will be.
Makes sense. I'm not currently in snowflake because I'm mostly working with local parquet files. Would prefer not to have to pay for snowflake just to explore my data. I'm interested in better data UIs though so I might need to check it out.
I started Hyperparam one year ago because I knew that the world of data was changing, and existing tools like Python and Jupyter Notebooks were not built for the scale of LLM data. The weights of LLMs may be tensors, but the input and output of LLMs are massive piles of text.
No human has the patience to sift through all that text, so we need better tools to help us understand and analyze it. That's why I built Hyperparam to be the first tool specifically designed for working with LLM data at scale. No one else seemed to be solving this problem.
This is a Q&A I did on what I learned from a year of open source data transformation. Most of all, it reinforced my belief that browser-native tools aren’t “toys” that don’t work for real systems. When Hugging Face integrated my libraries, it confirmed that the browser can handle serious data work, and maybe there's an opportunity for more browser-based data tools.
As with anything, there are engineering tradeoffs.
What I've found is that moving data processing toward the browser has been for one, a refreshing developer experience because I don't need to build a pair of backend+frontend. From a user experience point of view, I think you can build MORE interactive data applications by pushing it toward the frontend.
Why not? We are trying to evaluate AI's capabilities. It's OBVIOUS that we should compare it to our only prior example of intelligence -- humans. Saying we shouldn't compare or anthropomorphize machine is a ridiculous hill to die on.
If you are comparing the performance of a computer program with the performance of a human, then using terms implying they both "understand" wrongly implies they work in the same human-like way, and that ends up misleading lots of people, especially those who have no idea (understanding!) how these models work. Great for marketing, though.
Funny you say that, because I built these tools because I wanted to build something very much like what you're describing!
I was trying to look at, filter, and transform large AI datasets, and I was frustrated with how bad the existing tool was for working with datasets with huge amounts of text (web scrapes, github dumps, reasoning tokens, agent chat logs). Jupyter notebook is woefully bad at helping you to look at your data.
So I wanted to build better browser tools for working with AI datasets. But to do that I had to build these tools (there was no working parquet implementation in JS when I started).
Anyway I'm still working on building an app for data processing using LLM chat assistant to help a single user curate entire datasets singlehandedly. But for now I'm releasing these components to the community as open source. And having them "do a single task each" was very much intentional. Thanks for the comment!
That's fair criticism... to be honest when I started the project it was more focused on hyperparameters, and it evolved into this javascript-for-ai mission. But now I just kind of liked the name.
reply