September 23rd, 2004

belize

Solaris 10, memcached, ....

With some spare computers around the office now, we've been tinkering with things like LVM2/device-mapper, Solaris 10, OpenSSI, etc...

I'm installing Solaris 10 now, with the intention of playing with ZFS and DTrace, but it didn't even detect my network card (yet?), so I'm not sure it'll be any fun.

In particular I want to write a DTrace script that shows in real-time how long processes are blocked on syscalls owning fds that are sockets, aggregated on the socket's endpoint. We've done this in the LiveJournal code, kinda, via a lot of manual instrumentation, but it'd be cool to do on the system globally, without modifying or restarting the application.

Well, the Solaris 10 install just failed, telling me to look at /tmp/install_log for more details, and it has none.

Okay, next project. I suppose I should learn LTT and OProfile.

Jeff Garzik (of Linux fame) saw my OSCON presentation on Slashdot and checked out memcached, replying with a ton of suggestions. The foremost one was to not store data in the heap, where it has to be copied to the kernel all the time to send to users, but just keep it in tmpfs, in page cache, then just sendfile(2) it to sockets. So now I've been thinking of new memory layout ideas for memcached on tmpfs.... Things over 2k can have their own inode (granularity is 4k PAGE_SIZE), which is the same space as it takes in memcached. Things over 8k immediately win, taking 12k (3 pages) instead of 16k in memcached's powers-of-two allocations. Things 2k or under get interesting... could have a lot more slab granularities than powers of two, packed into 4k files in tmpfs. Then we're on the way to a global LRU, not LRU-per-size-class, if we can figure out how to rearrange data in slabs or something.