new memcached memory layout |
[Oct. 18th, 2003|07:21 pm]
Brad Fitzpatrick
|
Current memcached memory management: http://lists.danga.com/pipermail/memcached/2003-September/000214.html
New proposal: http://lists.danga.com/pipermail/memcached/2003-October/000302.html
If you've read doc/memory_management.txt in the memcached release, you'll know that currently we keep a dozen or so different LRUs, one for each slab class size (64 bytes, 128 bytes, .... 256k, 512k, 1MB).
And then we have slabs of each size.
Once a 1MB page is allocated to a size class and cut up, it's never used for any other size.
The problem is: if the application changes over time, or if you enable compression, it's likely one of your size classes will get disproportionately sized... objects living too long or too short compared to other objects.
Ideally you'd have one LRU.
Also not ideal about slabs of size 2^n is memory efficiency. Empirical evidence (41 GB of LiveJournal memcached data) shows that 30-35% of the memory is wasted. (if you want to store a 129 byte object, it has to go in the 256 byte class.)
I propose all objects are stored as a linked list of 64 byte chunks.
So we have one LRU, and one slab size class.
When an object rolls off the LRU, or when it's deleted or expired, all its chunks are freed.
The top efficiency for this is 60/64, or 93.7%. (or for 64 bit computers, 87.5%)
The whole reason we did a slab allocator was to avoid fragmentation, which makes it hard to allocate a contiguous region after running for hours. Early versions of memcached would hang after hours, with glibc's malloc totally confused.
But fragmentation isn't important. It is for disks, where seek times are terrible, but in memory we could put things wherever and it's always fast.
Pros: much better memory use, one LRU (so no slab class reassignment logic)
Cons:
The main complication is long keys, where the internal structure for the item is longer than 64 bytes. But that's already at the end, and we already have macros to get to it. The important stuff in the struct (refcounts, etc) will always be in the first chunk, so we can continue casting that first chunk to the item* struct.
More CPU, but this thing already takes no CPU, so I'm not concerned about that at all.
- Brad |
|