Brad Fitzpatrick (brad) wrote,
Brad Fitzpatrick
brad

new memcached memory layout

Current memcached memory management:
http://lists.danga.com/pipermail/memcached/2003-September/000214.html

New proposal:
http://lists.danga.com/pipermail/memcached/2003-October/000302.html

If you've read doc/memory_management.txt in the memcached release, you'll
know that currently we keep a dozen or so different LRUs, one for each
slab class size (64 bytes, 128 bytes, .... 256k, 512k, 1MB).

And then we have slabs of each size.

Once a 1MB page is allocated to a size class and cut up, it's never used
for any other size.

The problem is: if the application changes over time, or if you enable
compression, it's likely one of your size classes will get
disproportionately sized... objects living too long or too short compared
to other objects.

Ideally you'd have one LRU.

Also not ideal about slabs of size 2^n is memory efficiency. Empirical
evidence (41 GB of LiveJournal memcached data) shows that 30-35% of the
memory is wasted. (if you want to store a 129 byte object, it has to go
in the 256 byte class.)

I propose all objects are stored as a linked list of 64 byte chunks.

So we have one LRU, and one slab size class.

When an object rolls off the LRU, or when it's deleted or expired, all its
chunks are freed.

The top efficiency for this is 60/64, or 93.7%. (or for 64 bit computers,
87.5%)

The whole reason we did a slab allocator was to avoid fragmentation, which
makes it hard to allocate a contiguous region after running for hours.
Early versions of memcached would hang after hours, with glibc's malloc
totally confused.

But fragmentation isn't important. It is for disks, where seek times are
terrible, but in memory we could put things wherever and it's always fast.

Pros: much better memory use, one LRU (so no slab class reassignment logic)

Cons:

The main complication is long keys, where the internal structure for the
item is longer than 64 bytes. But that's already at the end, and we
already have macros to get to it. The important stuff in the struct
(refcounts, etc) will always be in the first chunk, so we can continue
casting that first chunk to the item* struct.

More CPU, but this thing already takes no CPU, so I'm not concerned about
that at all.

- Brad
Subscribe
  • Post a new comment

    Error

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 5 comments