UUID stands for Universally Unique ID. It is a 128-bit value that is usually represented by hexadecimal characters divided by dashes:

1
b54c9b1a-e19c-44e7-ab81-9528c195da02

They are called Universally Unique because in practice it is very hard to have collisions even if two(or more) independent systems generate these IDs independently. It is of course possible to have collisions, but the chances of it are low enough that it can be treated as impossible in most scenarios.

How are UUIDs generated

There are a few versions of the UUID specification, they differ on the information they use to generate the UUID. These are the most widely used versions:

Version 1 – Uses the MAC address of the system generating the UUID and the current timestamp. If two UUIDs are generated fast enough in the same system there is a possibility of a crash. Most implementations detect when this is going to happen and use different strategies to prevent the duplicate. This approach makes for very unique UUIDs, but has been criticized because it reveals the moment and location where the UUID was generated. Unless the MAC address is spoofed it is not possible to have crashes.

Version 4 – Generates a random number of 122 bits. It is very unlikely that there will be a crash, but it is theoretically possible.

Version 5 – It uses a namespace and some unique data to generate a SHA1 hash. This hash is then truncated to 128 bits. Crashes here depend on the uniqueness of SHA1.

Pros and Cons

The idea of having universally unique IDs is cool, but it is important to understand when they are useful and when they are not.

A common scenario where UUIDs are useful is when there is a need to merge two data sets. Imagine you are a company that just bought another company. Both companies have databases with a customers table. Both systems use auto increment integers to generate IDs for these customers. Merging the data would require changing the IDs of many customers and all the foreign keys associated with those customers. If UUIDs were being used this wouldn’t be necessary.

Some kind of UUID is also used in distributed systems that work with eventual consistency. Since the data is generated in different places, UUIDs make it easy to identify a particular piece of data without the need of having a central system generating the IDs.

Another minor advantage of UUIDs is that it makes your IDs opaque, which prevents people from guessing the size of your DB or the ID of another user.

There are also disadvantages associated with UUIDs. They are usually saved in a field 36 characters long. This is way bigger than an integer field. Since IDs are usually used for indexes, this makes it even larger.

A bigger problem is the fact that UUIDs are (kind of)random. Databases usually use trees to store indexes. Having the keys come in random order makes the tree have to change it’s shape often, which requires a lot of IO and could take very long compared to sequential indexes.

[ architecture  ]