I'm amazed at how "we" have managed to turn such a simple idea as "128 bits is a...

hombre_fatal · on Feb 5, 2024

> No one else really needs to care how you produced your 16 bytes

UUID isn't about how it's done, it's about what it is.

Instead of everyone doing something differently, everyone can just comply with UUID.

Instead of having to repeat it across your docs that the IDs of this entity are sortable, you can just say they are UUIDv7. If someone wants to extract the timestamp from your ID, they don't need to figure out which (32, 48, 50?) bits are the timestamp nor what resolution the timestamp has because you can tell them UUIDv7.

You don't have to write your own validation functions because you can tell the database that this hex string is a UUID and it can do it for you.

You're probably making the case for Sqlite here which is very minimal, but something more full-featured like Postgres, I prefer these conveniences. I can tell because whenever I use Sqlite in a case where I could've used Postgres, I regret it!

015a · on Feb 5, 2024

But I feel the point is: None of that is a relevant concern IDs should take on.

Most functional things related to e.g. embedding the record creation time within the ID is one of those "that's cool, but I've never seen anyone do it" kind of things. If you need to sort records by when they were created, there are probably three or four happened_at fields on the record you'd use (created_at in this case). If you need the exact time; those are there for that.

Counter-argument: Well, you can save a few bytes on every record by getting rid of the created_at field and just using a UUIDv7. Maybe, but I've never seen anyone do it. What if you need to change the time the record was created? Are you planning to explain to all your integration providers the process of extracting a timestamp from a UUIDv7? What if you need to run complex SQL timestamp functions on created_at? Etc. Its cool, but it never actually happens.

Once we enter the domain of "using the node id or timestamp or something to reduce the probability of ID collision", that's a totally reasonable responsibility within an ID's set of concerns. But, that's a very different need.

> You don't have to write your own validation functions

Why are we validating IDs?

> but something more full-featured like Postgres, I prefer these conveniences.

Agreed. I am a vocal UUID hyper-hater. UUIDs should be destroyed, and humanity would be (oh so slightly) better off if they had never existed. But, they're still a thing, and I think its cool that databases have hyper-specific types like this.

My wish is Postgres would have other more sane automatic ID types and gen capability, in addition to uuid & autoincrement.

arghwhat · on Feb 5, 2024

The point of the timestamp in UUIDv7 is not to encode creation time, it is to provide some (coarse-grained) chronological sortability.

Random primary keys are bad, but exposing incremental indexes to the public is also bad, and hacking on a separate unique UUID for public use is also bad. UUIDs are over-engineered for historical reasons, and UUIDv7 as raw 128 bits without the version encoding would be nicer.

But, to the end-user it's just a few lost bits in a 128-bit ID with an odd standard for hyphenation. The standardization means you know what to expect as developer, instead of every DB rolling their own unique 128-bit ID system with its own guarantees and weirdnesses.

015a · on Feb 5, 2024

But my point is: When is that standardization actually leveraged? Literally, tactically, what does "you know what to expect as a developer" mean? When is this standardization used in a fashion that enables more capability than just "the ID is a string don't worry about it"?

The realistic answer is: it isn't, because pre-UUIDv7 there was literally nothing about the UUID spec that conferred more capability than just a random string. And, truly; people used them as "just gimme a random string" all the flipping time. The pipes of the internet are filled with JSON that contains UUIDs-in-a-string-field, 4 bytes wasted to hyphens, 1 byte wasted to a version number, none of that is in service to anyone or anything.

arghwhat · on Feb 6, 2024

1. The other UUID versions are actually used. However, the expectations is in what the developer gets when generating it. Even "random ID" can be messed up if the author tries to be smart - e.g., rolling their own secret chronical sortability hack for their database but not telling you how much entropy and collision resistance you have left, or them hacking in multi-server collision resistance by making some bits a static server ID.

People have reason to do those things, and oh boy do you want to know that it's happening. With UUID, over-engineered as it may be, you know what you're asking for and can see what you're getting - truly random, server namespaced, or chronologically sortable.

2. Being upset over 4 bytes wasted to hyphens but not being upset about JSON itself seems hypocritical. JSON is extremely wasteful on the wire, and if you switch to something more efficient you also get to just send the UUID as 16 bytes. That's a lot more than 4 bytes saved.

Over JSON you can still base64 encode the UUID if it's not meant to be user-facing.

masspro · on Feb 5, 2024

> hacking on a separate unique UUID for public use is also bad.

Is it bad just because of the extra bytes used, or something else?

arghwhat · on Feb 5, 2024

You need to maintain a UNIQUE index and have two IDs, with the one set as primary key solely existing for on-disk ordering. It's a nasty hack.

wiredfool · on Feb 6, 2024

Postgres doesn't use the primary key for on disk ordering. MySQL does.

arghwhat · on Feb 6, 2024

Ah, good point. The concern about on-disk ordering then only applies to other databases.

I suppose the index order may still be relevant for PostgreSQL?

vivzkestrel · on Feb 6, 2024

what do you even mean by "Why are we validating ids"? zzzzyyyy-zyzy-zyzy-zzyyzzyyzzyy does this look like a valid ID? I could totally store this in the database if there was no validation involved

sagarm · on Feb 6, 2024

From GP (and my) perspective, the useful part of UUID is that it's 16 bytes. This is usually for formatted as 32 hex digits with dashes in specific places.

The version/variant bits are the pointless part. Of course if you put the 16 bytes on the wire you would still have some encoding (perhaps 22 base64 characters?) that requires decoding/validation, but in memory and in your DB it's just 16 bytes of opaque data.

hulitu · on Feb 5, 2024

> Instead of everyone doing something differently, everyone can just comply with UUID.

With which UUID ? UUID v1 ? UUID v2 ? UUID v3 ? ... UUID v7 ?

akira2501 · on Feb 5, 2024

UUID has the version number and variant number encoded in the generated UUID. You can just examine the UUID to determine which version it is.

Lammy · on Feb 5, 2024

The UUID specs are still confusing (or at least were to me lol) because the words "version" and "variant" both just say that something changes, not what is changing or why it's changing.

version from Latin vertere "to turn, turn back, be turned; convert, transform, translate; be changed"

variant from Latin variare "change, alter, make different,"

In my own UUID/GUID code I have taken to calling them "behavior" and "layout", respectively: https://github.com/okeeblow/DistorteD/blob/NEW%E2%80%85SENSA...

akira2501 · on Feb 6, 2024

4.1.1 The variant field determines the layout of the UUID. That is, the interpretation of all other bits in the UUID depends on the setting of the bits in the variant field. As such, it could more accurately be called a type field; we retain the original term for compatibility.

4.1.3 The version number is in the most significant 4 bits of the time stamp (bits 4 through 7 of the time_hi_and_version field). The following table lists the currently-defined versions for this UUID variant. The version is more accurately a sub-type; again, we retain the term for compatibility.

It's recognized in the RFC and all you've done is broke compatibility for fashion.

Lammy · on Feb 6, 2024

Naming != compatibility. Nobody should be looking at my code except for me and it helps me remember it better so I'm not sorry 8)

acchow · on Feb 5, 2024

Unless the random bits don’t follow those conventions.

Lammy · on Feb 5, 2024

It isn't a UUID if it doesn't contain those flag bits in their expected positions. A "random" UUID has only 122 bits of randomness for this reason.

jandrewrogers · on Feb 5, 2024

In practice, UUIDs are treated as an opaque 128-bit field. In any sufficiently complex system, there is no practical way to standardize on a single blessed version. Furthermore, all of the standardized UUIDs are deficient in various ways for some use cases, so there are a large number of UUID-like types used in large enterprises that are not "standard" UUIDs but which are better fit for purpose. This is deemed okay because there is no way to even standardize on a single UUID version among the official ones. Furthermore, there are environments where some subset of the UUID standard types (including each of v3/v4/v5 in various contexts) are strictly forbidden for valid security reasons.

The practical necessity of mixing UUID versions, along with other 128-bit UUID-like values, means that the collision probabilities are far higher in many non-trivial systems than in the ideal case of having a single type of 128-bit identifier. There is a whole separate element of UUID-like type engineering that happens around trying to mitigate collision probabilities when using 128-bit identifiers from different sources, some of which you may not control.

Having 128-bits is the only common thread across these identifiers which everyone seems to agree on.

zeroxfe · on Feb 5, 2024

I think you're drastically underestimating the purpose and management of UUIDs in large scale systems.

If you're building for a single application or data type, sure do your thing, have at it. If you're trying to coordinate UUID spaces and generation across thousands of different applications and data types, like large data pipelines, then this matters a lot.

Also, having native database support (like indexing, filtering, etc.) improves efficiency for these types of workloads.

pixl97 · on Feb 5, 2024

Because it turns out that trying to index/sort things by UUID doesn't work great. UUID, at least somewhere after version one isn't just some large number. Different parts of the field have different meanings depending on the specification.