How Novy Built Zelma for Emily Oster Using OpenAI and Postgres

mpalmer · on June 2, 2024

You can almost see the (slightly) richer information hidden behind this lazy LLM summary they are passing off as a blog post.

EdwardDiego · on June 2, 2024

This rather low content promotional blog post really doesn't portray the team involved in a good light.

> Database War Stories: Initial high traffic led to 100% CPU usage on Postgres. Solved by indexing and vacuuming the database.

Indexes? In a database? Get out of here. Vacuuming a Postgres database? No way!

> Accuracy Issues: Data partitioning and denormalization led to incorrect results. Solution involved creating a structured JSON schema for queries.

... given the low content promotional blog post doesn't include any relevant info like "how big the dataset was", this sentence just fills me with existential horror.

If you're using PG's partitioned tables for data partitioning, how would that affect accuracy?

Denormalisation, what led you to that choice? And why did it cause your queries to return incorrect results?

Was it because your queries were incorrect, or how you ingested data into the DB was incorrect, given that you no longer had a normalised schema acting as guard-rails?

And how did using JSONB resolve this accuracy issue?

> Database Migration: Potential move from Postgres to ClickHouse for better handling of large-scale analytics queries.

Well I suppose that'll improve things. Mainly because the people running Clickhouse know how to use a DB well. I bet they even know about indexes and autovacuum.