?

Log in

No account? Create an account
Relocating memory allocator - brad's life — LiveJournal [entries|archive|friends|userinfo]
Brad Fitzpatrick

[ website | bradfitz.com ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Relocating memory allocator [May. 26th, 2003|01:53 pm]
Brad Fitzpatrick
So, it turns out the memcached takes up no CPU and makes the site blazing fast (even though we're only caching a few object types so far!).

Unfortunately, it did what we suspected: died hard when its address space got fragmented. (In the meantime I restarted the daemons with much smaller sizes, accounting for the bloat which accumulates over a few hours....)

I jotted out notes for an arena allocation scheme where we only malloc large, fixed-size pages and handle our own placement and and recompaction within those pages. However, I can't believe this hasn't been done before.

I want a memory allocator that lets you register a hook when it wants to move allocations around to defragment, so you can update pointers and whatnot. Anybody know of one?
LinkReply

Comments:
[User Picture]From: scsi
2003-05-26 02:42 pm (UTC)
Interesting.. How much ram were you allocating per memcached that caused the huge site improvement? Or were you running it on jesus?
(Reply) (Thread)
[User Picture]From: brad
2003-05-26 03:07 pm (UTC)
Just three 1 GB processes now. We'll probably have 20 GB or so once we fine-tune the allocator.
(Reply) (Parent) (Thread)
[User Picture]From: taral
2003-05-26 03:38 pm (UTC)
You shouldn't be having problems with address space fragmentation unless you have a very strange allocation pattern.

The problem with heap-compaction allocators is that the overhead kills the average case. Few programs can really benefit from them.

Any chance I can look at this memcached thing? It may be possible to rework its allocation pattern to eliminate the problem.
(Reply) (Thread)
[User Picture]From: brad
2003-05-27 01:28 am (UTC)
(Reply) (Parent) (Thread)
[User Picture]From: taral
2003-05-27 08:31 am (UTC)
Most obvious change:

if (daemonize) daemon(0, 0);
(Reply) (Parent) (Thread)
[User Picture]From: taral
2003-05-27 08:38 am (UTC)
Why doesn't libevent support realtime signal notification?
(Reply) (Parent) (Thread)
[User Picture]From: brad
2003-05-27 09:25 am (UTC)
epoll's faster, but yeah... it should.
(Reply) (Parent) (Thread)
From: ex_ff928
2003-05-26 07:39 pm (UTC)
Have you looked at a slab-based allocation scheme? Look for a couple of papers. One by Jeff Bonwick presented at Usenix '94, and one by Bonwick and Adams, Usenix '01.
(Reply) (Thread)
[User Picture]From: brad
2003-05-28 11:39 am (UTC)
I found the paper and was reading it, but it seems it deals with allocating only like-sized objects. Our allocations are all over the place.
(Reply) (Parent) (Thread)
From: ex_ff928
2003-05-28 12:36 pm (UTC)
(I'm using the same API as described in the Bonwick paper for a coherent frame of reference.)

You can resolve the fixed-size issue by creating multiple object caches in a power-of-two size schema, and then populating cache_table[] below. Then wrap the slab allocator with a rudimentary layer. Round your allocation to the closest matching bucket by indexing into it based on the size of the object you're requesting.

So something like:

#define MAXOBJSIZE 8192 /* Maximum size you want to create an object cache for */
#define ALIGN_SHIFT 4 /* 2^4 = 16 = minimum guaranteed alignment */

void *slabmem_alloc(int size, int flags)
{
void *obj;
int idx = (int)(size - 1) >> ALIGN_SHIFT;

if ((unsigned int)idx < MAXOBJSIZE >> ALIGN_SHIFT) {
kmem_cache_t *cp = cache_table[idx];
obj = kmem_cache_alloc(cp, flags);
return (obj);
} else {
/* Fall through to non-slab allocation here */
}
}

Basically you treat each slab cache as a backing store bucket in a power-of-two (or similar) allocator. It gives you O(1) allocations and frees. The only thing you have to watch for is internal fragmentation, which is not an issue if you pick your cache_table entries properly. This is also greatly mitigated by the slab allocator's slab management strategy, and the fact there are no O(n) coalescings performed as in most other allocators (e.g. Doug Lea's malloc, which while pretty cool, is not O(1) for this reason.)

Also, frees need the size of the object to be passed in as a parameter.

Make sense?
(Reply) (Parent) (Thread) (Expand)
(Deleted comment)
[User Picture]From: brad
2003-05-27 01:26 am (UTC)
Good link to DLmalloc. Didn't realize that was glibc's malloc.

BTW, Judy isn't our trouble.

We though our trouble was having a few GB of random small to large allocations. But actually it's doing fine. It's growing a little over time, and could use some work (maybe rounding all allocations to some multiple?) but it's not bad.

I think our main problem earlier was Linux not liking to mlockall() a huge process. Something made the machine die, and nothing got written to logs.
(Reply) (Parent) (Thread)
[User Picture]From: taral
2003-05-27 08:30 am (UTC)
Yes, mlockall() is your problem. The 2.5 kernel is _far_ better about this scenario.

What you really want is something I term "transient memory": memory that the kernel knows it can reclaim in the event of a low-real-memory condition. I'm sure you wouldn't care about it stealing pages away so long as it tells you about it.

But there's nothing like that in the linux kernel. I'll see if I can't hack something up.
(Reply) (Parent) (Thread) (Expand)
From: jeffr
2003-05-27 03:35 pm (UTC)
DLmalloc is interesting. It looks like a heavily tuned buddy allocator.

As raja pointed out, a slab allocator would probably be better for your purposes. Although it sounds like fragmentation isn't the issue so I'll move along now. ;-)
(Reply) (Parent) (Thread)
(Deleted comment)