Glad you asked! This system contains measurements and state of physical devices ...

napsterbr · on Oct 7, 2024

One interesting thing your team may want to look into (if you haven't already) is compression.

Of course there are a multitude of variables we don't have access from the outside, but Postgres only compresses data that is TOASTed, and based on your description of the table, the data is not being TOASTed (and therefore not being compressed).

Instead, if you could somehow pack your timeseries entries into an array, you would get the benefits of compression automatically.

Given your write performance requirements, using an array may be out-of-question (and you may get too much overhead from dead tuples) -- but who knows? Always a good idea to benchmark.

I actually considered mentioning this at the post but figured it was too long already and could be the material for a future one :)

rtuin · on Oct 8, 2024

This is interesting! Turning the values into arrays is not suitable for our base-tables, but might be for the several materialization tables we run.

aidos · on Oct 7, 2024

The updates wouldn’t be in place anyway because of the Postgres mvcc approach, right?

I can see the advantage in terms of just needing a single tuple for a reads. So a timestamp + value model would likely take twice as much heap space than your approach?

Given that you’re probably always just inserting new data you could use a brin index to get fast reads on the date ranges. Would be interesting to see it in action and play around to see the tradeoffs. The model you’ve settled on sounds like it would be a pain to query.

sgarland · on Oct 7, 2024

> The updates wouldn’t be in place anyway because of the Postgres mvcc approach, right?

They might be if the columns being updated weren’t indexed [0], but since it sounds like at least one is, no, not in-place.

Though interestingly, your comment on BRIN indexes is quite relevant, as that’s the one type of index that HOT can still work with.

[0]: https://www.postgresql.org/docs/current/storage-hot.html

rtuin · on Oct 8, 2024

Good observation. The updates are as much HOT-updates as possible. I wasn't familiar with BRIN-indexes before, so I have to look into that.

At first glance our solution follows a similar approach, let me elaborate:

- no index columns are updated ever, only inserted

- all tables are partitioned based on date (partition range is 1 month)

- for some tables there is another layer of partitioning (3 sub-partitions, based on one specific column)

- finding an appropriate fillfactor is important to improve the speed of UPDATE statements (HOT-updates)

- standard vacuum / auto vacuum settings work great for us so far.

- to improve ANALYZE performance, set column statistics of value-only columns to 0.

sevensor · on Oct 8, 2024

Fascinating. How reliable is the measurement apparatus? Or to put it another way, how tolerant of missed measurements is this scheme? I’ve been in a similar situation in a past life and we took a different approach. I was never satisfied with how we ended up doing it though.

rtuin · on Oct 8, 2024

I see your point. Some of our measurements are counters and interpolate missing data via certain rules. We store these interpolations in a separate table with exactly the same format and join the two tables directly when querying the data. Since the real value always takes precedence over the interpolated value this is just a COALESCE(realmeasure0000, interpolation0000) statement.

This is super fast when taking advantage of postgres' partition-wise joins.

sevensor · on Oct 8, 2024

That’s really neat, thanks for explaining!

sgarland · on Oct 7, 2024

I assume you looked at the various Postgres functions, like width_bucket()?

I’ve never ran a PG-based TSDB so I’m reluctant to pass judgment one way or the other, but as I hope you can understand, “we have hundreds of columns in a table” is alarming at first glance.

rtuin · on Oct 8, 2024

The schema is organized by how we need to query it which is mostly by fetching all measurements for an entire day or month. Querying statistics on those metrics is not often needed, maybe a few times per week or day, so we do that in Snowflake if necessary. Or create a specialized streaming data flow when it the stats are queried more often.