Hacker Newsnew | past | comments | ask | show | jobs | submit | matthew16550's commentslogin

Using UUIDv4 as primary key has unexpected downsides because data locality matters in surprising places [1].

A UUIDv7 primary key seems to reduce / eliminate those problems.

If there is also an indexed UUIDv4 column for external id, I suspect it would not be used as often as the primary key index so would not cancel out the performance improvements of UUIDv7.

[1] https://www.cybertec-postgresql.com/en/unexpected-downsides-...


> Using UUIDv4 as primary key has unexpected downsides because data locality matters in surprising places.

Very true, as detailed by the link you kindly provided. Which is why a technique I have found useful is to have both an internal `id` PK `serial`[0] column (never externalized to other processes) and another column with a unique constraint having a UUIDv4 value, such as `external_id`, explicitly for providing identifiers to out-of-process collaborators.

0 - https://www.postgresql.org/docs/current/datatype-numeric.htm...


> I suspect it would not be used as often as the primary key index

That doesn't matter because it's the creation of the index entry that matters, not how often it's used for lookup. The lookup cost is the same anyways.


The page I linked shows uses after creation where the cost can be different.


Making the assumption:

> Since workloads commonly are interested in recently inserted rows

That's only true for very specific types of applications. There's nothing general about that.

Plenty of applications grab rows from all time, and there's nothing special about the most recent ones. The most recent might also be the least popular rows, since few things reference them.


SOPS can be part of the solution. It takes care of encrypting and decrypting config files.

https://github.com/getsops/sops


"Approval" / "Golden Master" / "Snapshot" / "Characterization" testing can be very helpful.

They all seem to be names for more or less the same idea.

The first time a test runs successfully it auto captures the output as a file. This is the "approved" output and is committed with the code or saved in whatever test system you use.

The next time the test runs, it captures the new output and auto compares it with the approved output. If identical, the test passes. If different, the test fails and a human should investigate the diff.

The technique works with many types of data:

* Plain text.

* Images of UI components / rendered web pages. This can check that your code change or a new browser version do not unexpectedly change the appearance.

* Audio files created by audio processing code.

* Large text logs from code that has no other tests. This can help when refactoring, hopefully an accidental side effect will appear as an unexpected diff.

See: * https://approvaltests.com/ * https://cucumber.io/blog/podcast/approval-testing/ * https://en.wikipedia.org/wiki/Characterization_test



That's a doc site and a pull-through cache, neither is a package repository


Pythons builtin async always confuses me.

The Trio library felt easy to learn and just worked without much fuss.

https://trio.readthedocs.io/


It's a concept from Fieldings REST thesis that is important to the current meaning of REST:

Transfer Objects are not the Storage Objects.


The Pace soldering lessons on YouTube are good.

https://youtube.com/playlist?list=PL926EC0F1F93C1837&si=U4Jx...


As an asside, I've found IntelliJ very helpful in this situation as it can load many repos into one project then doing commits / pushes / branches etc across various repos at the same time just seemed to work the way I wanted without much thinking about it.


Mermaid is sort of a defacto standard because github auto renders it inside markdown files.


And Resume padding


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: