ULIDs and Primary Keys (2022)

sedatk · on April 12, 2024

ULID has an unnecessarily convoluted monotonically incrementing variant in its spec which requires serialization to generate IDs without conflicts. That makes the spec look like trying to solve multiple irrelevant problems in one shot. That’s confusing and more like a design smell. If you need a sequentially incrementing identifier, just use a big enough int.

I think the issue above and UUIDv7’s existence make ULID completely unnecessary.

radarsat1 · on April 12, 2024

I've found ULID to be really useful in my API backend that runs on AWS Lambda and writes to DynamoDB. This is a use case where I need Lambda's parallelism, meaning no sharing some common sequential integer. Meanwhile I can roughly order items by creation time which makes pagination and selecting items by time range possible, right on the sort key instead of as a filter. In other words items might end up out of order by a few milliseconds but in my use case that's not a big deal, and I get the benefit of communication-free parallelism.

sedatk · on April 12, 2024

I guess that was before UUIDv7 was a thing? Is there a reason to pick ULID over it?

vlovich123 · on April 12, 2024

The monotonic version of ULID is probably faster to generate the next number vs UUIDv7. Whether or not that performance difference matters to you is application-specific.

sedatk · on April 12, 2024

Monotonic version relies on locking to avoid conflicts. Not applicable in this case.

vlovich123 · on April 13, 2024

Can't you have a per-thread ULID without any locks?

gopher_space · on April 12, 2024

What’s wrong with timestamp + worker number in your situation?

makeworld · on April 12, 2024

I agree that the spec is a bit messy on this, but monotonicity is not required so in the end it's not a big deal.

sedatk · on April 12, 2024

It’s a big deal because it causes a semantic confusion: for any ID scheme that claims to be “universal”, users expect pervasive uniqueness without resorting to serialization. The “U” in ULID becomes misleading when devs bump into the monotonic variant. That can lead to hard to detect bugs because conflicts or perf degradation may happen only in production. Bad idea to make it part of the spec.

It doesn’t even make sense to have such a scheme in ULID. The only advantage it has over a 64-bit integer is that ULID is 128-bits but, that doesn’t solve any problem or help with anything. It’s just waste, and will cause pain and confusion.

cowthulhu · on April 12, 2024

In my eyes, the fact the UUIDs (or specifically v4, as mentioned in the article) encode no information is an advantage. I don’t think encoding the creation date into the UUID is likely to be a huge security hole, but I can imagine some instances where you would want to keep that secret, or at least not have a good reason to publish it.

I would also argue, strongly, that if you are planning to use the datetime an account/event/item was generated, you should track it separately from the UUID, in a dedicated column.

This isn’t intended to refute the article - it was an interesting read and I definitely learned some stuff, but rather to provide an argument for why you might want to stick with v4.

throwaway74432 · on April 12, 2024

It's a security vs convenience tradeoff. For most people, making the ID practically unguessable, while also preserving index locality in databases, is the right balance.

beejiu · on April 12, 2024

> making the ID practically unguessable

80 bits of randomness is just on the edge of unguessable, though. Most people accept that 128+ bits is cryptographically secure now and in the future.

lll-o-lll · on April 12, 2024

It is a huge security hole. The problem is that people are using v4 all over the place as a source of “unguessable” tokens. This is not the purpose of a UUID and is wrong of course, but it doesn’t change the reality that many services will become vulnerable if they change UUID schemes.

iends · on April 12, 2024

You can track it separately, but if you use a database like DynamoDB using a ULID as the SK leads to optimized access patterns. If you use a UUID as your SK, then sort order of two items created at the same second is not guaranteed.

craftkiller · on April 12, 2024

> While there are 48 bits of time information in both UUIDv7 (with millisecond precision) and ULID, the ULID standard encodes the time as Unix Epoch in milliseconds for the whole 48 bits whereas UUIDv7 splits the time into 36 seconds bits and 12 millisecond bits.

This is incorrect / outdated. UUIDv7 switched to 48 bits of milliseconds instead of 36 seconds / 12 milliseconds: https://datatracker.ietf.org/doc/html/draft-peabody-dispatch...

Rican7 · on April 12, 2024

Ironically, the information and link that you've shared is incorrect / outdated.

The newer draft proposal allows for multiple different options of time information to be added to a UUIDv7 encoding, beyond the typical 48-bits of Unix epoch millisecond data.

https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc...

sbuttgereit · on April 12, 2024

This article is a little old.

First thing that isn't correct or changed since this was written:

> There don't seem to be any existing built-in or extension functions that support generating UUIDv7s in PostgreSQL. To generate a UUIDv7, the function below could be tweaked in order to support the correct formatting.

This is the extension I use until UUIDv7 native support lands in PostgreSQL https://github.com/fboulnois/pg_uuidv7. As for native support, it's definitely underway, though it seems to have a few issues keeping up with the version 17 target release: https://commitfest.postgresql.org/47/4388/

nicwolff · on April 12, 2024

It was already out of date when published in Feb 2022:

> PostgreSQL has built in support for UUIDv4 through the pgcrypto or the uuid-ossp extensions.

PostgreSQL hasn't needed an extension to generate UUIDv4 since v13 which was released in Sep 2020.

echelon · on April 12, 2024

I like the ULID properties for the database, but it misses so many of the human factors that would make it excel.

Your tokens/identifiers should look like this:

namespace_crockfordEncodedEntropy

- There must be token/identifier prefixes like Stripe. While one could concatenate/prefix these bytes in application code, it's best to just accept the overhead and store them in your records for ease of lookup. It makes an engineer's life so much easier when debugging foreign keys on the fly or during an emergency SEV.

- There should be no hyphens (in string serialization). You probably want to encode and store crockford directly rather than do bytes to hex encoding, again for engineer ease of use.

- You may want to leave some trailing bytes for request routing in active-active high SLA systems. Especially for when record lookup cannot fail.

FridgeSeal · on April 12, 2024

I’ve seen this sort of design referred to as “typed IDs”: https://github.com/jetify-com/typeid

Doesn’t use the crockford encoding, but does is another one that minimises confusables. Definitely looks pretty solid, so keen to give them a shot in my next project.

SahAssar · on April 12, 2024

I feel like the benchmarks and conclusion did not do any real comparison of uuidv7 vs ulid, and instead only compared uuidv4 and ulid.

naranha · on April 12, 2024

uuidv7 will be supported in PostgreSQL 17, at which point generation should be as fast as uuidv4, or if you implement it in pgsql now it will be as fast as the proposed ULID algorithm.

Insert performance could be even better, iirc for BTree Indexes monotonic increasing values are better than random ones, but feel free to correct me on that ;)

lfittl · on April 15, 2024

Commenting on this a bit late, but in case anyone reads this later too:

UUIDv7 support unfortunately didn't make it to Postgres 17, since the RFC wasn't completely finalized yet by the time of feature freeze (April 8), see discussion on pgsql-hackers:

https://www.postgresql.org/message-id/flat/ZhzFQxU0t0xk9mA_%...

So I guess we'll unfortunately have to rely on extensions or client-side generation for the time being, until Postgres 18.

SahAssar · on April 12, 2024

In that case I don't understand why the author didn't go for uuidv7? It seems like existing tooling (both in database and outside) deals with it better, it seems like there are no downsides unless you expect your identifiers to be generated past 4147 but don't care if they are generated past 10889 (I'd love to hear that use-case, seems like it must be interesting).

hobs · on April 12, 2024

As with databases it always depends, for maximum insert performance you'd actually often go with random uuids so you dont get a hot page.

HumanOstrich · on April 12, 2024

Hot page?

Using a monotonically increasing PK would cause pages in the index to be allocated and filled sequentially, increasing throughput.

Using random UUIDs would lead to page-splitting and partially-filled pages everywhere, negatively impacting performance and size-on-disk.

rileymat2 · on April 12, 2024

Not always, this article describes the problem: https://learn.microsoft.com/en-us/troubleshoot/sql/database-...

hobs · on April 12, 2024

Exactly, with one big insert its better to have a sequential value, for many small ones its often better to not.

As with all databases, measure before you cut.

erik_seaberg · on April 12, 2024

The way to take advantage of the bandwidth of multiple storage devices is to distribute concurrent writes across them, rather than forcing everything to commit sequentially using contended locks or rollbacks.

sgarland · on April 13, 2024

The DB (or any application) should not have any need to know what devices are underneath its mount point. If you’re striping across disks, that’s a device (or filesystem, for ZFS) level implementation.

dang · on April 12, 2024

Discussed at the time:

ULIDs and Primary Keys - https://news.ycombinator.com/item?id=34175639 - Dec 2022 (46 comments)

dang · on April 12, 2024

Recent and related:

The UX of UUIDs - https://news.ycombinator.com/item?id=39974465 - April 2024 (196 comments)

cpill · on April 12, 2024

man, I'm glad someone find this interesting so that I don't have to.