brad's life - UUIDs in the database [entries|archive|friends|userinfo]
Brad Fitzpatrick

[ website | bradfitz.com ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

UUIDs in the database [Nov. 11th, 2005|11:28 am]
Previous Entry Add to Memories Share Next Entry
[Tags|, ]

There's talk at work of changing an upcoming's product's schema to use UUIDs as primary keys instead of dual-column integers of the form (userid, per-user-assetid), and I'm not really comfortably with the idea, but I wanted to throw it out for discussion.

The advantages to UUIDs that motivated the discussion are:

  • No reliance on centralized number allocators. Any node in a cluster could inject a record without coordination with others. Multiple data centers could be merged without breaking numbers.
My concerns are:
  • Bigger keys -- 16 byte primary keys instead of 8. And for inter-asset relationships (for the same user) where normally a table would have 4 bytes (user) + 4 bytes (asset1) + 4 bytes (asset2) for a total of 12 bytes, now we'd have 16 bytes + 16 bytes for a total of 32 bytes.... that's a big increase, and there'd still be no way to do a prefix match on the key to get all relationships for a given users without adding YET ANOTHER index to the table, or doing a join. Let me make that more clear. Consider this schema:
    create table thingy_relation (
      userid  INT UNSIGNED NOT NULL,   /* 4 bytes */
      thing1  INT UNSIGNED NOT NULL,   /* 4 bytes */
      thing2  INT UNSIGNED NOT NULL,   /* 4 bytes */
      PRIMARY KEY (userid, thing1, thing2),  /* 12 bytes */
      INDEX       (userid, thing2)           /* 8 bytes  */
    )
    Now we'd have to do:
    create table thingy_relation_uuid (
      thing1  CHAR(16) BINARY NOT NULL, /* 16 bytes */
      thing2  CHAR(16) BINARY NOT NULL, /* 16 bytes */
      PRIMARY KEY (thing1, thing2),   /* 32 bytes */
      INDEX       (thing2)            /* 16 bytes */
    )

    This hypothetical schema tracks which assets are related to each other in some way, and lets you query either forward or backward from an asset.

    And notice in the thingy_relation_uuid table, there's no way to query and just get all relations from a userid.

  • Clustered indexes, now not clustered by userid (spatial locality) and assetid (temporal locality) -- with a dual-key (userid,assetid) schema, all of a user's rows are clustered together in the B-Tree (with InnoDB, at least), and better, sequential posts/photos/etc (potentially related and/or near each other in time) are also clustered next to each other. So a query like, "Give me the most recent 100 blog entries form userid 8232." generally means one real disk seek and very few database pages. But if you're using a UUID, records will most likely be all over the B-Tree, meaning a lot more disk seeks and database pages brought in to memory, meaning the "needed records / pages brought in" ratio goes down, wasting memory. The high bits of a UUID are generally low-res time, so you still get temporal locality for things like "Give me the most recent 100 blog posts for all users on the server", but we're already partitioning the data, so that's not a query we'd even do per-partition. So things like "Give me 100 most recent blog posts from bob" will be efficient if bob does 100 posts all back-to-back without a few million other posts from other users mixed in, but that's not going to happen on a busy server.
  • MySQL shell -- "select * from thingy_relation_uuid". Whoops. You just screwed your terminal with binary gibberish. And if you have to debug something and do some one-off queries for reports/investigation/etc, how do you type in the binary packaged UUID structure? You don't, not without a new MySQL shell and/or some UDFs to convert to/from UUID ASCII form. This is all solvable, but I'm afraid it'll never get done.
  • Silent replication failures -- another thing that was argued about UUIDs is that it makes really tricky replication topologies easier, due to PKs not colliding, but it also means it's possible to mess up the replication and not notice it for the same reasons... you won't get errors when you do something wrong. So I see this as a sort of non-bonus.

Counter argument to ease of number allocation: there are lots of ways to do number allocation, several of which LJ has used, several of which other people have done. I don't thing something as drastic as UUIDs should be seriously considered for that reason alone, without considering other approaches to number allocations, and all the potential traps of UUIDs.

It may seem like I'm leaning heavily against UUIDs, and that might be true currently, but I'm really curious what other people's experiences with UUIDs as PKs have been.

Enlighten me? Any other pros/cons?

LinkReply

Comments:
[User Picture]From: loganb
2005-11-11 08:07 pm (UTC)

(Link)

UUIDs are one of those solutions that squeezes a single central problem into many little ones on the fringe. And many little fringe problems almost always looks easier to fix than one central problem. But they aren't. They just make it a pain in the ass to work with the data, harder to uncover bugs, and harder to add features later (you have to jump through more hoops to get a prototype out the door).

But in terms of arguments you can expect others to accept: After doing intense DB perf-oriented design and testing for the last few months, I've learned that page-data locality is everything. Unless your working set is overwhelmingly in memory, I/O bandwidth falls like a rock when you go from sequential seeks to random ones (seriously, like 2 orders of magnitude, base 10).

Just ask them if they feel like running 100x the servers. :) (I know, it doesn't scale that bad, but it will be noticeable)
[User Picture]From: brad
2005-11-11 08:09 pm (UTC)

(Link)

That's been my experience too, that page-data locality is everything. I remember how fast our inter-partition user mover processes ran once we got the first two partitions on InnoDB and we were moving users between them.... it was so fast compared to before that I'd assumed there was a bug and it wasn't doing any work.