June 1st, 2003

belize

Fun with genetic algorithms...

I was going to finish the memcached website today, and I probably will later, but first I wanted to play with optimizing the slab sizes for LJ's usage.

The first version of memcached used malloc for everything. Once the virtual address space was totally full and unfrequently accessed items started being removed, the address space got extremely fragmented and glibc's malloc eventually fell over and died, eating 99% CPU looking for space that wasn't contiguously there.

The solution, as suggested by ff, was to move to using a set of slab caches, each a different size. The most obvious set of sizes were the powers of two.

Unfortunately, that set (while flexible for any random input) resulted in a 30% waste of memory for LJ's usage pattern. If you're storing a 94 byte item and the only slab size available is of 128 byte items, you're wasting 34 bytes. If there are 232530 of them, you've lost 7.5 of memory in your cache. Why not have a slab size of 96 bytes?

So basically the problem is: given the item size distribution from a live cache, which set of, say, 15 slab class sizes produces the least waste for that distribution?

I'm sure linear algebra could have helped me, but I thought it'd be more fun to make a genetic algorithm.

Starting with the 5 copies of the powers-of-two set, I took each candidate in the list, randomly mutated one element (moving it somewhere between its neighbors, on a 4 byte alignment), and put it on the list. Then, I ran the waste() heuristic on each candidate, both old and new, and threw away the bad ones.

Then the algorithm starts anew, always mutating, calculating, and discarding.

While the memory utilization started out at 70%, it doesn't take long for it to climb to over 80%:

Collapse )