brad's life - UUIDs in the database [entries|archive|friends|userinfo]
Brad Fitzpatrick

[ website | bradfitz.com ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

UUIDs in the database [Nov. 11th, 2005|11:28 am]
Previous Entry Share Next Entry
[Tags|, ]

There's talk at work of changing an upcoming's product's schema to use UUIDs as primary keys instead of dual-column integers of the form (userid, per-user-assetid), and I'm not really comfortably with the idea, but I wanted to throw it out for discussion.

The advantages to UUIDs that motivated the discussion are:

  • No reliance on centralized number allocators. Any node in a cluster could inject a record without coordination with others. Multiple data centers could be merged without breaking numbers.
My concerns are:
  • Bigger keys -- 16 byte primary keys instead of 8. And for inter-asset relationships (for the same user) where normally a table would have 4 bytes (user) + 4 bytes (asset1) + 4 bytes (asset2) for a total of 12 bytes, now we'd have 16 bytes + 16 bytes for a total of 32 bytes.... that's a big increase, and there'd still be no way to do a prefix match on the key to get all relationships for a given users without adding YET ANOTHER index to the table, or doing a join. Let me make that more clear. Consider this schema:
    create table thingy_relation (
      userid  INT UNSIGNED NOT NULL,   /* 4 bytes */
      thing1  INT UNSIGNED NOT NULL,   /* 4 bytes */
      thing2  INT UNSIGNED NOT NULL,   /* 4 bytes */
      PRIMARY KEY (userid, thing1, thing2),  /* 12 bytes */
      INDEX       (userid, thing2)           /* 8 bytes  */
    )
    Now we'd have to do:
    create table thingy_relation_uuid (
      thing1  CHAR(16) BINARY NOT NULL, /* 16 bytes */
      thing2  CHAR(16) BINARY NOT NULL, /* 16 bytes */
      PRIMARY KEY (thing1, thing2),   /* 32 bytes */
      INDEX       (thing2)            /* 16 bytes */
    )

    This hypothetical schema tracks which assets are related to each other in some way, and lets you query either forward or backward from an asset.

    And notice in the thingy_relation_uuid table, there's no way to query and just get all relations from a userid.

  • Clustered indexes, now not clustered by userid (spatial locality) and assetid (temporal locality) -- with a dual-key (userid,assetid) schema, all of a user's rows are clustered together in the B-Tree (with InnoDB, at least), and better, sequential posts/photos/etc (potentially related and/or near each other in time) are also clustered next to each other. So a query like, "Give me the most recent 100 blog entries form userid 8232." generally means one real disk seek and very few database pages. But if you're using a UUID, records will most likely be all over the B-Tree, meaning a lot more disk seeks and database pages brought in to memory, meaning the "needed records / pages brought in" ratio goes down, wasting memory. The high bits of a UUID are generally low-res time, so you still get temporal locality for things like "Give me the most recent 100 blog posts for all users on the server", but we're already partitioning the data, so that's not a query we'd even do per-partition. So things like "Give me 100 most recent blog posts from bob" will be efficient if bob does 100 posts all back-to-back without a few million other posts from other users mixed in, but that's not going to happen on a busy server.
  • MySQL shell -- "select * from thingy_relation_uuid". Whoops. You just screwed your terminal with binary gibberish. And if you have to debug something and do some one-off queries for reports/investigation/etc, how do you type in the binary packaged UUID structure? You don't, not without a new MySQL shell and/or some UDFs to convert to/from UUID ASCII form. This is all solvable, but I'm afraid it'll never get done.
  • Silent replication failures -- another thing that was argued about UUIDs is that it makes really tricky replication topologies easier, due to PKs not colliding, but it also means it's possible to mess up the replication and not notice it for the same reasons... you won't get errors when you do something wrong. So I see this as a sort of non-bonus.

Counter argument to ease of number allocation: there are lots of ways to do number allocation, several of which LJ has used, several of which other people have done. I don't thing something as drastic as UUIDs should be seriously considered for that reason alone, without considering other approaches to number allocations, and all the potential traps of UUIDs.

It may seem like I'm leaning heavily against UUIDs, and that might be true currently, but I'm really curious what other people's experiences with UUIDs as PKs have been.

Enlighten me? Any other pros/cons?

LinkReply

Comments:
[User Picture]From: moonwick
2005-11-11 07:44 pm (UTC)

(Link)

Have you thought about creating your own custom UUID format? IIRC, 6 bytes of a standard UUID are consumed by the MAC of the system that created it; if you could come up with some way to guarantee a unique but smaller number that each node could use for UUID generation, you could shave a few bytes off each UUID.

For example, assign each node a unique 16-bit number to use for generating UUIDs, in place of using a MAC address. You'd save 4 bytes per UUID, in return for the overhead of having to manage this local numberspace. May or may not be worth the trouble, though, to save 4 * 2 bytes per record.

Of course, it goes without saying that these would only be LJUIDs, not UUIDs. :)

[User Picture]From: brad
2005-11-11 07:50 pm (UTC)

(Link)

We were talking about changing the UUID format, yes, in particular preprending the UUID with a userid, so we'd get user's data next to each other on disk, ....

.... but then I just heard something about why not making the userid be a UUID too!

In any case, I think some custom UUID format might be the best bet.
[User Picture]From: kfk2
2005-11-11 09:56 pm (UTC)

(Link)

IIRC at least MS stopped using the MAC address for generating UUIDs a while back.. due to being able to trace them back to the machine and all.