I don't have an answer for you but you do have my curiosity. Why did you choose ...

zomglings · on June 18, 2021

With integer keys as an alternative?

UUID v4 keys don't give away information about the number of rows in a relation. You can directly use them in api responses.

In recent events, iirc, parler was so easy to scrape precisely because they were using int keys exposed in their api get endpoints.

JMTQp8lwXL · on June 18, 2021

You could start with a large, non-zero value for the initial key, to obfuscate the true number of records in the collection.

zomglings · on June 18, 2021

The difference between IDs of multiple resources would still leak count information.

It also makes it too easy to paginate through relations for certain use cases where you may want obfuscation.

steelbrain · on June 18, 2021

Security through obscurity is only sweeping the problem under the rug instead of addressing it IMPO. I don't know what Parler is or was, but I don't think sequential int IDs would be the major factor that would lead to a website being scraped.

There are generally two types of information in applications, "public" and "privileged". The former has the IDs and such discoverable by an index or explore page, the latter requires authentication and has user-specific permissions.

For both cases, if hiding IDs is the only access control on the backend, it's fundamentally flawed. For both cases, if the access control is implemented well (in addition to rate limiting), integer vs UUIDs don't make a difference.

What do you think?

kevan · on June 18, 2021

No one will argue that opaque identifiers is a sufficient security control but it's (slightly) better than nothing, at least it would've stopped a naive enumeration approach to scrape everything. Don't get distracted by that part, the main reason is many companies wouldn't want to publicly post a dashboard of how many users/entities/widgets they have for all to see but exposing sequential identifiers basically does that.

lsaferite · on June 18, 2021

> Security through obscurity

This _should_ be part of a multi-layer security plan though. You don't depend on it as a primary source of security, but why would you expose more internal information than needed? If something does go wrong with another layer that obscurity _helps mitigate the damage_.

_3u10 · on June 18, 2021

In the case that there IS an index page that enumerates all entities you are correct. However, many systems don't provide such an index page.

In many cases it's useful for a page to be publicly accessible yet, not indexable.

This is why sites like YouTube have an "unlisted" level of permission, UUID keys are a convenient way to implement that level of access control.

UUID keys are very useful for distributed systems where a local machine wants to generate a unique key locally, and then later upload it to a centralized store. It's especially helpful in third normal form databases where often you'll need to create objects that reference each other via the primary key.

blumomo · on June 18, 2021

We built a messenger app, the client can generate the message row locally and cache it immediately upon submitting the row without waiting for the response from the server. As the primary key can be generated client side (it’s a UUID, it’s quasi guaranteed to be unique), there’s no clash with existing IDs server side. By using the optimistic response pattern the message appears in the frontend immediately. Once the response for inserting the row comes back, the message can be updated in case the server decided to set additional columns — all with the same message ID which the server gladly accepted. Wonderful.

the_arun · on June 18, 2021

Why not use UUID as primary keys?

ddek · on June 18, 2021

Can't speak for all dbs, but many use a clustered index on the primary key. In this case, the physical rows are stored in the order of the index, rather than just pointers to the rows.

If you are inserting non-sequential data into a clustered index, every insert results in a non-trivial rearrangement of the rows. UUIDs are not sequential, so at scale you will experience performance issues if you are using UUID primary keys and the PK index is clustered.

You won't notice this until significant scale, however. You can still use a unique identifier alongside an incrementing primary key, and you could choose to use a more compact format than the UUID. 8 base32 characters have over a trillion combinations, and are nowhere near as unsightly in a URL.

dragonwriter · on June 18, 2021

> Can't speak for all dbs, but many use a clustered index on the primary key.

AFAIK, only MySQL (with InnoDB engine) and SQL Server, AFAIK, do it by default (always for MySQL/InnoDB, and by default unless you create a different clustered index before adding the PK constraint for SQL Server, but even then you can specify a nonclustered PK index.)

PG doesn't have clustered indexes at all, DB2 has a thing called clustered indexes which aren’t quite the same thing, Oracle calls having a clustered index on the PK an “index organized table” and its an non-default table option, and SQLite has what seems equivalent to a clustered index ONLY for INTEGER PRIMARY KEY tables not declared as WITHOUT ROWID.

> You can still use a unique identifier alongside an incrementing primary key, and you could choose to use a more compact format than the UUID.

A key point of using a UUID is distributed generation avoiding lock contention on a sequence generator, which is defeated by using both. Just “don’t use a clustered index where distributed key generation is important” seems a better rule, even if it precludes MySQL/InnoDB use.

Also, most DB’s that explicitly handle UUIDs store them compactly as 128-bit values. If you want to transform them to something other than the standard format for UI reasons [0], that doesn’t preclude using UUIDs in the DB.

[0] seems like bikeshedding, but, whatever.

sk5t · on June 19, 2021

PG does offer clustering: https://www.postgresql.org/docs/11/sql-cluster.html

dragonwriter · on June 19, 2021

That's not the same as a clustered index, it just does a one-time rearrangement of the table’s current contents; you have to run it after each change effecting the index to simulate a clustered index (with stable PKs, and the PK index, that would be after each insert, I think.)

You get closer to a clustered index (at the cost of more storage, but the benefit that you can have more than one on a table) with an index using INCLUDE to add all non-key columns in the index.

mping · on June 18, 2021

Can't uuids be time based?

Topgamer7 · on June 18, 2021

Prevents good data clustering on disc. Probably less of an issue since most DB's are probably run on ssd's now.

holtalanm · on June 18, 2021

best solution is to use int/long primary keys, with a uuid column that has a unique index. then the uuid can be used with public-facing apis.

nirvdrum · on June 18, 2021

I guess it's just my default setup now. Beyond not wanting to leak data or have people mess around with URLs, I've had much better luck evenly sharding DB instances with UUIDs. Hasura ostensibly supports both (the option exists to create PKs using UUIDs), but then treats them entirely differently at the GraphQL level.