More

platypii · 2026-01-09T19:44:05 1767987845

We're willing to spend money, but I've had the "datadog billing problem" before where it starts reasonable and then grows to a non-trivial percent of saas budget, and then theres a scramble to refactor. Trying to get ahead of that as the LLM logs are MUCH larger that my APM logs.

barbazoo · 2026-01-09T20:09:30 1767989370

Then I'd try to integrate via a standard connector, OTel for example. Then the cost of switching is much lower. But yeah not sure myself how this will scale and how expensive or even useful it will be.

platypii · 2025-12-10T18:35:49 1765391749

Makes sense. I'm not currently in snowflake because I'm mostly working with local parquet files. Would prefer not to have to pay for snowflake just to explore my data. I'm interested in better data UIs though so I might need to check it out.

rawgabbit · 2025-12-10T23:11:20 1765408280

To keep costs down. I use standard edition with x-small data warehouse. The biggest cost driver is CPU or how long queries run.

platypii · 2025-11-19T17:09:13 1763572153

I started Hyperparam one year ago because I knew that the world of data was changing, and existing tools like Python and Jupyter Notebooks were not built for the scale of LLM data. The weights of LLMs may be tensors, but the input and output of LLMs are massive piles of text.

No human has the patience to sift through all that text, so we need better tools to help us understand and analyze it. That's why I built Hyperparam to be the first tool specifically designed for working with LLM data at scale. No one else seemed to be solving this problem.

platypii · 2025-11-13T19:30:07 1763062207

This is a Q&A I did on what I learned from a year of open source data transformation. Most of all, it reinforced my belief that browser-native tools aren’t “toys” that don’t work for real systems. When Hugging Face integrated my libraries, it confirmed that the browser can handle serious data work, and maybe there's an opportunity for more browser-based data tools.

platypii · 2025-11-06T17:16:58 1762449418

As with anything, there are engineering tradeoffs.

What I've found is that moving data processing toward the browser has been for one, a refreshing developer experience because I don't need to build a pair of backend+frontend. From a user experience point of view, I think you can build MORE interactive data applications by pushing it toward the frontend.

platypii · 2025-08-22T04:08:51 1755835731

Why not? We are trying to evaluate AI's capabilities. It's OBVIOUS that we should compare it to our only prior example of intelligence -- humans. Saying we shouldn't compare or anthropomorphize machine is a ridiculous hill to die on.

sema4hacker · 2025-08-23T04:09:55 1755922195

If you are comparing the performance of a computer program with the performance of a human, then using terms implying they both "understand" wrongly implies they work in the same human-like way, and that ends up misleading lots of people, especially those who have no idea (understanding!) how these models work. Great for marketing, though.

platypii · 2025-07-24T16:09:51 1753373391

This is the story of how I spent a year making the world's fastest Parquet loader in JavaScript. The goal:

- Make a faster, more interactive viewer for AI datasets (which are mostly parquet format)

- Simplify the stack by doing everything from the browser (no backend)

TLDR: My open-source library Hyparquet can load data in 155ms, which would take 3466ms in duckdb-wasm for the same file.

platypii · 2025-05-01T15:51:11 1746114671

I don’t have benchmarks specifically against duckdb. I’m sure native C++ will run faster than JavaScript.

But whats important is that with Hyperparam you can do it in the browser, where the bottleneck will always be network-bound not cpu-bound.

platypii · 2025-05-01T15:26:58 1746113218

Funny you say that, because I built these tools because I wanted to build something very much like what you're describing!

I was trying to look at, filter, and transform large AI datasets, and I was frustrated with how bad the existing tool was for working with datasets with huge amounts of text (web scrapes, github dumps, reasoning tokens, agent chat logs). Jupyter notebook is woefully bad at helping you to look at your data.

So I wanted to build better browser tools for working with AI datasets. But to do that I had to build these tools (there was no working parquet implementation in JS when I started).

Anyway I'm still working on building an app for data processing using LLM chat assistant to help a single user curate entire datasets singlehandedly. But for now I'm releasing these components to the community as open source. And having them "do a single task each" was very much intentional. Thanks for the comment!

platypii · 2025-05-01T15:01:22 1746111682

That's fair criticism... to be honest when I started the project it was more focused on hyperparameters, and it evolved into this javascript-for-ai mission. But now I just kind of liked the name.