More

AdamProut · on Oct 30, 2024

Very cool! pg_duckdb itself is missing fully integrated storage - it can query data stored externally (say in S3) in delta/iceberg formats, but it can't write out data in those formats via transactional writes to PG tables (insert\update\deletes). pg_mooncacke is one neat way of solving that problem. It lets you have a columnstore table in Postgres that can do both reads and writes as if it's any other PG table and have the storage format be an open format like delta/iceberg/etc with that data persisted to blob store (like most cloud DWs would do anyways).

AdamProut · on Oct 17, 2024

Had a similar thought. Azure Postgres has something similar to pg_parquet (pg_azure_storage), but we're looking into replacing it with pg_duckdb assuming the extension continues to mature.

It would be great if the Postgres community could get behind one good opensource extension for the various columnstore data use cases (querying data stored in an open columnstore format - delta, iceberg, etc. being one of them). pg_duckdb seems to have the best chance at being the goto extension for this.

mslot · on Oct 17, 2024

Fun fact, I created pg_azure_storage :)

brinox · on Oct 18, 2024

I was just wondering if pg_parquet could be combined with pg_azure_storage to write Parquet files to Azure Storage.

I had problems with pg_azure_storage in the past, because the roles pg_read_server_files and pg_write_server_files are unassignable on Azure PostgreSQL databases which makes the use of `COPY {FROM,TO}` impossible.

mslot · on Oct 18, 2024

Azure is not supported as a backend in pg_parquet right now, but shouldn't be hard to add (contributions welcome!)

https://github.com/CrunchyData/pg_parquet

It would not be safe to let any user access object storage. Therefore, pg_parquet has two roles called parquet_object_store_read and parquet_object_store_write that give permission to COPY FROM/TO object storage (but not local file system).

In pg_azure_storage there is a comparable azure_storage_admin role that needs to be granted to users that need Azure Blob Storage permission.

dektol · on Oct 22, 2024

Is pg_azure_storage available on GitHub?

AdamProut · on July 1, 2024

Maybe good to mention torn pages somewhere too? Both MySQL and Postgres jump through some hoops to both detect them and repair them [1][2]. So, even the scenario in the post where fsync is used to harden writes, the database still needs to handle torn pages (or requires using a file system \ storage that guarantees atomic page writes at the page size the database is using as several managed\cloud databases do).

[1] https://wiki.postgresql.org/wiki/Full_page_writes [2] https://dev.mysql.com/doc/refman/8.0/en/innodb-doublewrite-b...

eatonphil · on July 1, 2024

Thanks Adam! I think torn writes would still be caught via checksums, no? Although that may be later than you'd wish.

I'm not confident but from reading that page it seems that for Postgres at least, if it did do checksums it might not need to count on page-level atomic writes?

AdamProut · on July 1, 2024

Checksums can detect a torn page, but not always repair them. It's likely a good part of the database page is gone (i.e., an amount of data that matches the disk / file system atomic write unit size is probably missing). Torn page writes are a pretty common scenario too, so databases need to be able to fully recover from them - not just detect them and report a corruption (ie., just pull the power plug from the machine during a heavy write workload and you're likely to get one - it doesn't require a solar ray to flip a bit :) ).

eatonphil · on July 1, 2024

That's fair. In the post I did mention disk redundancy (and I guess I only implied recovery) as one additional level for safety. Which I think is what you're getting at too.

mattashii · on July 1, 2024

Disk redundancy won't help guarantee torn page protection if the writes across the redundant disks are not coordinated to have one start after the other finishes such that there is always have one copy of the page that is not currently being written. So writing to a RAID1 array won't help here without knowledge about how that raid1's writes work.

AdamProut · on April 1, 2024

I'm not sure why you're getting downvoted. I think the pressure to appease Gartner usually starts when companies bring in CEOs whose primary background is enterprise sales. They tend to over value magic quadrant positioning (in my view).

ghaff · on April 1, 2024

For enterprise sales, the Gartner MQ and Forrester Wave are pretty big deals. (As is how they talk about you in client inquiries generally.) Maybe enterprise buyers place more stock in how the big analyst firms view a vendor than they should but it's generally the reality.

AdamProut · on Jan 16, 2024

Based on the writings of 2 Turing awards winners for work on databases; No MongoDB doesn't have ACID transactions.

AdamProut · on Jan 16, 2024

Control theory is also used by databases (probably not as often as it should be). It's great for "self tuning" [1], for example tuning the various cache sizes a database has to maximize throughput under changing workload conditions. Its definitely worth spending the time to understand PID controllers if your an engineer working on databases.

[1] https://www.vldb.org/conf/2006/p1081-storm.pdf

Espressosaurus · on Jan 16, 2024

It's worth looking at any time you're doing dynamic alterations to the system.

There's a lot of terrible things people do in ignorance of control theory while trying to come up with solutions in that space.

AdamProut · on Jan 9, 2024

RE: "Distributed SQL offers elasticity (no downtime resize"). I'm not sure this is as much of an advantage of distributed databases vs single host databases anymore. Some of the tech to move virtual machines between machines quickly (without dropping TCP connections) is pretty neat. Neon has a blog post about it here[1]. Aurora Serverless V2 does the same thing (but I can't find a detailed technical blog post talking about how it works). Your still limited by "one big host" but its no longer as big of a deal to scale your compute up/down within that limit.

[1] https://neon.tech/blog/scaling-serverless-postgres

AdamProut · on Oct 20, 2023

Databricks has kept their Photon[1][2] query engine for Spark closed sourced thus far. Unless EMR has made equivalent changes to the Spark runtime they use Databricks should be much faster. Photon brings the standard vectorized execution techniques used in SQL data warehouses for many years to Spark.

[1] https://docs.databricks.com/en/clusters/photon.html [2] https://dl.acm.org/doi/10.1145/3514221.3526054

whinvik · on Oct 21, 2023

I am a bit hazy about the exact details of how we did it since its been some time, but we definitely did not use Photon as it was too expensive.

One of the issues was that we started experimenting with Delta Tables and EMR was horrible in leveraging that.

AdamProut · on Aug 22, 2023

yeah, that is part of the trade off. Using an ephemeral SSD (for a database) means the database needs to have another means of making the data durable (replication, storing data in S3, etc.).

There are AWS instance types (I3en) with large and very fast SSDs (many times higher IOPS then EBS).

AdamProut · on Aug 22, 2023

Its a popular design for SQL Data warehouses. I think almost all of them (snowflake, redshift, etc.) store cold data in S3 and hot data on local disk[1][2].

It works well if the data is stored as immutable files (i.e., A log structure merge tree) or is not index at all (classical columnstores). S3 doesn't provide an efficient way to update a file.

[1] https://dl.acm.org/doi/10.1145/2882903.2903741 (snowflake SIGMOD paper) [2] https://dl.acm.org/doi/10.1145/3514221.3526055 (singlestore SIGMOD paper)