Log in

No account? Create an account
UUIDs in the database - brad's life — LiveJournal [entries|archive|friends|userinfo]
Brad Fitzpatrick

[ website | bradfitz.com ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

UUIDs in the database [Nov. 11th, 2005|11:28 am]
Brad Fitzpatrick
[Tags|, ]

There's talk at work of changing an upcoming's product's schema to use UUIDs as primary keys instead of dual-column integers of the form (userid, per-user-assetid), and I'm not really comfortably with the idea, but I wanted to throw it out for discussion.

The advantages to UUIDs that motivated the discussion are:

  • No reliance on centralized number allocators. Any node in a cluster could inject a record without coordination with others. Multiple data centers could be merged without breaking numbers.
My concerns are:
  • Bigger keys -- 16 byte primary keys instead of 8. And for inter-asset relationships (for the same user) where normally a table would have 4 bytes (user) + 4 bytes (asset1) + 4 bytes (asset2) for a total of 12 bytes, now we'd have 16 bytes + 16 bytes for a total of 32 bytes.... that's a big increase, and there'd still be no way to do a prefix match on the key to get all relationships for a given users without adding YET ANOTHER index to the table, or doing a join. Let me make that more clear. Consider this schema:
    create table thingy_relation (
      userid  INT UNSIGNED NOT NULL,   /* 4 bytes */
      thing1  INT UNSIGNED NOT NULL,   /* 4 bytes */
      thing2  INT UNSIGNED NOT NULL,   /* 4 bytes */
      PRIMARY KEY (userid, thing1, thing2),  /* 12 bytes */
      INDEX       (userid, thing2)           /* 8 bytes  */
    Now we'd have to do:
    create table thingy_relation_uuid (
      thing1  CHAR(16) BINARY NOT NULL, /* 16 bytes */
      thing2  CHAR(16) BINARY NOT NULL, /* 16 bytes */
      PRIMARY KEY (thing1, thing2),   /* 32 bytes */
      INDEX       (thing2)            /* 16 bytes */

    This hypothetical schema tracks which assets are related to each other in some way, and lets you query either forward or backward from an asset.

    And notice in the thingy_relation_uuid table, there's no way to query and just get all relations from a userid.

  • Clustered indexes, now not clustered by userid (spatial locality) and assetid (temporal locality) -- with a dual-key (userid,assetid) schema, all of a user's rows are clustered together in the B-Tree (with InnoDB, at least), and better, sequential posts/photos/etc (potentially related and/or near each other in time) are also clustered next to each other. So a query like, "Give me the most recent 100 blog entries form userid 8232." generally means one real disk seek and very few database pages. But if you're using a UUID, records will most likely be all over the B-Tree, meaning a lot more disk seeks and database pages brought in to memory, meaning the "needed records / pages brought in" ratio goes down, wasting memory. The high bits of a UUID are generally low-res time, so you still get temporal locality for things like "Give me the most recent 100 blog posts for all users on the server", but we're already partitioning the data, so that's not a query we'd even do per-partition. So things like "Give me 100 most recent blog posts from bob" will be efficient if bob does 100 posts all back-to-back without a few million other posts from other users mixed in, but that's not going to happen on a busy server.
  • MySQL shell -- "select * from thingy_relation_uuid". Whoops. You just screwed your terminal with binary gibberish. And if you have to debug something and do some one-off queries for reports/investigation/etc, how do you type in the binary packaged UUID structure? You don't, not without a new MySQL shell and/or some UDFs to convert to/from UUID ASCII form. This is all solvable, but I'm afraid it'll never get done.
  • Silent replication failures -- another thing that was argued about UUIDs is that it makes really tricky replication topologies easier, due to PKs not colliding, but it also means it's possible to mess up the replication and not notice it for the same reasons... you won't get errors when you do something wrong. So I see this as a sort of non-bonus.

Counter argument to ease of number allocation: there are lots of ways to do number allocation, several of which LJ has used, several of which other people have done. I don't thing something as drastic as UUIDs should be seriously considered for that reason alone, without considering other approaches to number allocations, and all the potential traps of UUIDs.

It may seem like I'm leaning heavily against UUIDs, and that might be true currently, but I'm really curious what other people's experiences with UUIDs as PKs have been.

Enlighten me? Any other pros/cons?


[User Picture]From: scsi
2005-11-11 07:36 pm (UTC)
Hows the UUID going to be generated (if you can say)?
I dont really see any clear benefit short of not having PK conficts if its generated using something random (like the time/date). But if its going to be a binary form of a combined UID+eventid then whats the point?
(Reply) (Thread)
[User Picture]From: brad
2005-11-11 07:48 pm (UTC)
There are standards for this stuff:
(Reply) (Parent) (Thread)
[User Picture]From: moonwick
2005-11-11 07:44 pm (UTC)
Have you thought about creating your own custom UUID format? IIRC, 6 bytes of a standard UUID are consumed by the MAC of the system that created it; if you could come up with some way to guarantee a unique but smaller number that each node could use for UUID generation, you could shave a few bytes off each UUID.

For example, assign each node a unique 16-bit number to use for generating UUIDs, in place of using a MAC address. You'd save 4 bytes per UUID, in return for the overhead of having to manage this local numberspace. May or may not be worth the trouble, though, to save 4 * 2 bytes per record.

Of course, it goes without saying that these would only be LJUIDs, not UUIDs. :)

(Reply) (Thread)
[User Picture]From: brad
2005-11-11 07:50 pm (UTC)
We were talking about changing the UUID format, yes, in particular preprending the UUID with a userid, so we'd get user's data next to each other on disk, ....

.... but then I just heard something about why not making the userid be a UUID too!

In any case, I think some custom UUID format might be the best bet.
(Reply) (Parent) (Thread)
[User Picture]From: kfk2
2005-11-11 09:56 pm (UTC)
IIRC at least MS stopped using the MAC address for generating UUIDs a while back.. due to being able to trace them back to the machine and all.
(Reply) (Parent) (Thread)
[User Picture]From: loganb
2005-11-11 08:07 pm (UTC)
UUIDs are one of those solutions that squeezes a single central problem into many little ones on the fringe. And many little fringe problems almost always looks easier to fix than one central problem. But they aren't. They just make it a pain in the ass to work with the data, harder to uncover bugs, and harder to add features later (you have to jump through more hoops to get a prototype out the door).

But in terms of arguments you can expect others to accept: After doing intense DB perf-oriented design and testing for the last few months, I've learned that page-data locality is everything. Unless your working set is overwhelmingly in memory, I/O bandwidth falls like a rock when you go from sequential seeks to random ones (seriously, like 2 orders of magnitude, base 10).

Just ask them if they feel like running 100x the servers. :) (I know, it doesn't scale that bad, but it will be noticeable)
(Reply) (Thread)
[User Picture]From: brad
2005-11-11 08:09 pm (UTC)
That's been my experience too, that page-data locality is everything. I remember how fast our inter-partition user mover processes ran once we got the first two partitions on InnoDB and we were moving users between them.... it was so fast compared to before that I'd assumed there was a bug and it wasn't doing any work.
(Reply) (Parent) (Thread)
From: evan
2005-11-11 08:12 pm (UTC)
We encounter lots of these sorts of tradeoffs (with likely bigger data sets) and generally switch from one to the other depending on the particular use case.

Custom UUIDs with shared prefixes, as someone else proposed, have clustering problems: if you cluster (by that I mean split into disjoint clusters) by the prefix, you get locality, but you get nonuniform distribution of data; if you cluster by the suffix, you lose locality. Again, this tradeoff depends on a number of factors, like how many clusters you have and what the variance in your user dataset size is.
(Reply) (Thread)
From: jamesd
2005-11-11 08:16 pm (UTC)
You've explained why I would not use the typical somewhat arbitrary UUID. I've some reluctance to use a custom UUID based on desires for the way a particular database engine lays things out. What happens when MySQL buys or writes or someone else buys or writes some great new engine you want to use, but it happens to benefit from a different layout?

UUIDs are good. But they don't need to be the primary key. Can make them a unique key instead, so they are still a direct reference to a record. If you don't mind trusting that they are unique, the secondary key can be left non-unique in the DB to let InnoDB save some unique checks and use its insert buffer.

Why is there a desire to use a UUID? What's it supposed to gain? Does that logical desire need to be reflected in the database layer or should it be abstracted from it?

If you're using a recent enough version, autoincrement increment and autoincrement offset handle many of the replication architecture issues.

When tuning servers I think about cache effects and disk seeks more than almost anything else. Generic UUIDs as primary keys tend to be bad for both.
(Reply) (Thread)
[User Picture]From: krow
2005-11-11 10:44 pm (UTC)
Are you using 5.0? If so you can set offset increments so that a cluster of MySQL servers will never have clashes with autoincrements.

The longer the primary key, the worse the performance.

As far as printable goes, you could use the printable version of a UUID (don't call the function from MySQL though, it will not replicate without 5.1).
(Reply) (Thread)
[User Picture]From: brad
2005-11-11 10:45 pm (UTC)
As far as printable goes, you could use the printable version of a UUID

And make the primary key twice as long?
(Reply) (Parent) (Thread)
[User Picture]From: krow
2005-11-11 10:57 pm (UTC)
I wouldn't use one in the first place, but if I wanted it printable...
Just use 5.0 and make use of the new offset autoincrements. You could probably back port that patch with a minimum amount of effort if you wanted to.
(Reply) (Parent) (Thread)
[User Picture]From: bsdguru
2005-11-12 01:49 pm (UTC)
I personally find when using the autoincrement offset with MySQL replication a slight bit annoying as I end up with user_id's for example with big gaps between the numbers.
(Reply) (Parent) (Thread)
[User Picture]From: krow
2005-11-13 07:27 pm (UTC)
Are you not balancing your load between your hosts? Or are you just leaving large gaps so you can do division to keep a hundred or so servers in the network?
(Reply) (Parent) (Thread)