| Linux on Delta |
[Feb. 18th, 2007|11:01 am] |
The in-seat entertainment on Delta (even in economy: impressive), was really good. Made my first flight from SFO to JFK quite bearable. Listened to music, watched a movie, etc.
And I noticed when it booted:

Heh. Apparently it runs Linux. You'd think they'd do some Delta-themed bootsplash. |
|
|
| "Generic AIO by scheduling stacks" |
[Jan. 30th, 2007|02:55 pm] |
Zach Brown just posted to lkml (a few minutes ago) ...
[PATCH 0 of 4] Generic AIO by scheduling stacks
It's a syscall to submit syscalls to run async. Then another syscall to async gather the results of the submitted syscalls as they complete. One of the most wonderful things I've seen in awhile! Any syscall!
And I'm especially happy that Linus loves it, so we should expect to see it sooner than later in real kernels.
Yay! |
|
|
| RAID-5 misc |
[Jan. 27th, 2007|03:11 pm] |
I never use RAID-5, so I'd never noticed this before:
-f, --force
Insist that mdadm accept the geometry and layout
specified without question. Normally mdadm will not
allow creation of an array with only one device, and
will try to create a raid5 array with one missing
drive (as this makes the initial resync work faster).
With --force, mdadm will not try to be so clever.
And indeed, when I created the array with 5 disks, it marked one as a spare:
# mdadm --detail /dev/md1
/dev/md1:
Version : 00.90.03
Creation Time : Sat Jan 27 13:30:36 2007
Raid Level : raid5
Array Size : 1953545984 (1863.05 GiB 2000.43 GB)
Device Size : 488386496 (465.76 GiB 500.11 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Sat Jan 27 13:52:08 2007
State : clean, degraded, recovering
Active Devices : 4
Working Devices : 5
Failed Devices : 0
Spare Devices : 1
Layout : left-symmetric
Chunk Size : 64K
Rebuild Status : 27% complete
UUID : 5ad3ba82:30b256f3:c70f55c8:1f40abbd
Events : 0.194
Number Major Minor RaidDevice State
0 8 48 0 active sync /dev/sdd
1 8 64 1 active sync /dev/sde
2 8 80 2 active sync /dev/sdf
3 8 96 3 active sync /dev/sdg
5 8 112 4 spare rebuilding /dev/sdh And you can see that 4 disks are reading, and 1 is writing:
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
...
sdd 198.00 45824.00 0.00 45824 0
sde 198.00 45824.00 0.00 45824 0
sdf 200.00 45824.00 0.00 45824 0
sdg 203.00 46080.00 0.00 46080 0
sdh 203.00 0.00 46080.00 0 46080
...
Neat!
It makes sense why it's done this way: 4 disks doing sequential reads and 1 doing sequential writes is faster than 5 disks doing mixed reads and writes.
But really? 6 hours?
I'd prefer an option where all disks are zeroed, and then initial resync is skipped. Yes, array wouldn't be immediately usable like it is with the kernel doing the background sync for me, but I think I could zero a 460 GB disk quicker than 6 hours... based on 100 MB/s filesystem writes I saw, assuming I can do even more to a raw block device, should be about an hour) But I can't see how... --assume-clean may be what I'm looking for? Do I just zero all the devices myself first, then re-create the array?
I wouldn't normally mind, but I want to performance-test several configurations and 6 hour waits seriously kills my flow. :) |
|
|
| HDR and Linux |
[Nov. 18th, 2006|08:24 pm] |
Anybody here do any HDR work with Linux?
I think I'm finally wrapping my head around all the concepts, formats, tools, processes, etc, but I've yet to do anything with any of it. Although I have some source images I'm eager to play with.
If my understanding is correct, there are basically two phases:
1) get multiple source images (on a tripod) of different exposures (shutter speed differences, not aperature) and run them through a tool to convert them into an HDR file (floating point pixel values, not 8-bit per channel). These file formats are either *.hdr or the ILM OpenEXR format? Or maybe some other formats. The tool mkhdr looks like it can do this, with some hand-holding. (you have to give it ppm files and shutter speeds on command line, since it can't read any raw files, but that's understandable because there's a dozen+ raw formats...)
2) given HDR file of some format (depending on tool), do cool shit with it. Canonical examples are various blurs that used to clip out highs, and "tone mapping", of which there are various algorithms it seems to reduce the HDR data down into something sexy for screen (which is low dynamic range)
So I guess my questions are:
-- best Linux tool for creating HDR images? is mkhdr good? I can automate the parameter hell. -- best tools for converting between *.hdr and OpenEXR, etc? The imagemagick of HDR file formats? -- best tone mapping algorithms to create the typical HDR photos you see online
Any pointers appreciated. Thanks! |
|
|
| djabberd: c10k? hah! |
[Jun. 26th, 2006|10:09 pm] |
DJabberd just did 25,200 (fully setup) connections with 97 MB of RAM before my Xen instance ran out of memory. It's now 3.4kB of overhead per connection (contrast to 30kB this morning) but there's still obvious ways to trim it down. Should be able to get it down to 2kB. The big win was when I implemented a [forget design pattern name] system where libxml parsers are shared, returned, kept on a freelist, etc.
From what Artur and I can tell, this is better than most/all the other jabber servers out there.
It means with 1GB of ram we can do 300k connections per process. (8GB of RAM boxes, 2x 2x core)
<3 epoll. |
|
|
| readahead / blocking sendfile |
[Jun. 5th, 2006|11:34 pm] |
I've had a known inefficiency in Perlbal for ages now and finally broke down and fixed it. The inefficiency is that sendfile can block, even if the destination fd is a non-blocking socket, because the source fd (a disk-based file), can force a disk read if it's not already in pagecache.
FreeBSD has a fancy sendfile that lets you request it not block, but Linux doesn't.
The solution on Linux is to do a readahead() call first in another thread, or just sendfile() in another thread, either of which IO::AIO can do. I wanted to test the theory changing as little code as possible, so I went with the async readahead.
Before I did that, though, I wrote a test case.
The test case runs two processes in parallel: one fetching 3 small hot files over and over again, measuring the mean speed of 100 requests. The other process is there just to mess with the first one: it doesn't actually output anything. The second process either fetches the same 3 small files, or with the "big" parameter, fetches seven 100MB in a loop, more than this xen instances's 512 MB of memory. The idea is see if the disk reads serving the big files stall the event loop and decrease turn-around time.
Yup:
lj@LJ_web:~$ ./parallel.pl small; ./parallel.pl small; ./parallel.pl small; ./parallel.pl big;./parallel.pl big;./parallel.pl big; mean: 0.287987213134766, stddev: 0.0829109309255669 mean: 0.279777903556824, stddev: 0.0957734761804354 mean: 0.238886480331421, stddev: 0.0949280425469577
mean: 0.351436612606049, stddev: 0.0791952577383974 mean: 0.361295075416565, stddev: 0.0863025646086743 mean: 0.3904807305336, stddev: 0.173639453608837
The first set of three lines is the time to serve small files with other small files being served in the background. The second send of three is serving small files with big files being served in the background.
Adding in the async readahead() call, doing the sendfile in the callback (once the data to be sendfile'd is in the pagecache), and the results even out a bunch:
lj@LJ_web:~$ ./parallel.pl small; ./parallel.pl small; ./parallel.pl small; ./parallel.pl big;./parallel.pl big;./parallel.pl big; mean: 0.296060967445374, stddev: 0.0586433388625736 mean: 0.262518639564514, stddev: 0.0726501212827927 mean: 0.285000162124634, stddev: 0.0321991597111094
mean: 0.302280473709106, stddev: 0.0811435349061447 mean: 0.303003549575806, stddev: 0.0787071540895621 mean: 0.298841729164124, stddev: 0.0953137343692458
Probably some more work to be done, but promising. |
|
|
| lutimes(2) and Linux |
[Apr. 1st, 2006|04:18 pm] |
This homeboy is sad that lutimes(2) isn't implemented in Linux.
I didn't even know of lutimes earlier today, because I didn't even know of utime()/utimes(). I was like, "How can I get brackup to restore modtimes? I know tar and rsync do it." So I straced tar, found utime/utimes, was happy, implemented, then found it didn't work on symlinks (or rather, it tried to follow symlinks). Told Whitaker about lstat (vs stat), then he googled lutimes, found it, told me, and I find that Linux doesn't implement it.
And that was after I straced tar on symlinks and found it didn't do anything, so I had little hope anyway.
But this breaks my brackup test suite which compares the output of "ls -lR" on backup dir and restored dir. So now I have to compare instead some new serialization of the before/after directories, ignoring symlink modtimes. Lame. I thought I was done.
Oh yeah, work on Brackup continues. It restores now. Coming soon to svn and CPAN near you. |
|
|
| splice() |
[Mar. 30th, 2006|10:38 am] |
Is anybody else excited about the in-development splice() system call?
I've wanted this for, like, ever.
I need to add support to Sys::Syscall so Perlbal can use it, avoiding copies to/from userspace to/from sockets. |
|
|
| wish before I leave; cdrecord; Jörg |
[Feb. 17th, 2006|08:24 pm] |
When I come back from Belize, I hope somebody has kicked Jörg Schilling in the metaphoric nuts by forking cdrecord so I don't have to see his name on the lkml anymore.
I guess Debian's already done that, a bit, but I want it done more. |
|
|
| linux-kernel; inlines |
[Jan. 2nd, 2006|09:37 pm] |
I'm addicted to reading the linux-kernel list. There's a big thread going on about changing the meaning of "inline" in the kernel tree to mean "if gcc4 wants to" instead of the historical "always-inline!" that was required due to gcc3 quirks. Then introducing a new "__always_inline" to actually mean __attribute__((always_inline)), for the few places in the kernel that require inline.
I guess the whole argument is that inline has turned into a "ricing option" that programmers throw about for tons of bogus reasons, not understanding gcc, not understanding other architectures, etc. Hence the patches to remove them all and just let the compiler do it, because it can't get any worse.
I liked this post from Ingo:.... furthermore, there's also a new CPU-architecture argument: the cost of icache misses has gone up disproportionally over the past couple of years, because on the first hand lots of instruction-scheduling 'metadata' got embedded into the L1 cache (like what used to be the BTB cache), and secondly because the (physical) latency gap between L1 cache and L2 cache has increased. Thirdly, CPUs are much better at untangling data dependencies, hence more compact but also more complex code can still perform well. So the L1 icache is more important than it used to be, and small code size is more important than raw cycle count - _and_ small code has less of a speed hit than it used to have. x86 CPUs have become simple JIT compilers, and code size reductions tend to become the best way to inform the CPU of what operations we want to compute. ... |
|
|
| Xen |
[Oct. 9th, 2005|04:30 pm] |
My home server is now running atop Xen [about]. I love it so. |
|
|