October 15th, 2004

belize

busy busy

Work's been interesting lately. Dozen things going on in all directions.

I'll try to list some of them:

-- Mahlon's investigating using MySQL Cluster. He has 4 nodes up (2 sets of 2 redundant nodes) so we can learn how to admin it, monitor it, benchmark it, understand its limitations, etc. Fun stuff. This is the future.

-- We've been running an increasing number of async jobs distributed. The problem with that is how do they get their work? They can all independently query what needs to be done, randomize the list, and iterate over it with proper locking on each work item, but it gets to be a waste of time. We're moving all to a new model (called a "message board" or "Linda tuple space") where a job server is the hub between workers and populators, all working in parallel, and the job server atomically hands out jobs ("grab" operation). This is speeding up async work a lot, and we're only just starting to use it in more places.

-- We've moved all userpics and phoneposts to MogileFS, stressing it in a way I didn't anticipate: ephemeral port exhaustion. 65k should be enough for anybody! (well, more like half that) Anyway, came up with some fixes for that, some done, some pending. (also got a bunch new hardware tonight that'll help quite a bit... we'd been running pretty low)

-- Ephemeral port exhaustion led us to work on persistent connections for Perlbal from the client side. Previously it only did persistent connections between perlbal and backends. But because we can do that funky reproxy (internal redirect) thing, those reproxied connections weren't cached and were wasting ephemeral ports pretty quickly. Junior to made the webserver (mogstored, which is perlbal's libraries) to persistent connections, then we need to make the internal redirect HTTP client stay persistent.

-- Minor perl profiling / optimization work with Perlbal. You could waste days and weeks on this... best to limit yourself and not get carried away. Makes for good work when everything else is frustrating you though.

-- Been getting lots of new hardware in and setup. Ordered 8 new dual proc 3.0 Ghz Xeon 1MBs. Also 2 dual 246 Opterons w/ 8 GB of memory. Fought some NUMA bugs where Linux was unreliable in certain topologies. lkml people blamed the motherboard for even allowing it, or being unstable with it as it was. Got two new Itanium machines coming soon.

-- Jesse's investigating OpenLaszlo, which I can't wait to see be magic. Also more JavaScript stuff, which finally feels almost stable enough as a platform to do reliable things. Almost stable. I watch the crap he puts up with and am glad I'm not a JavaScript programmer.

-- FotoBilder's new protocol is coming along. Some more of us need to fully review it and officially publish it to spur some client developent.

-- Been doing a lot of late-night database maintenance, so I sit here while operations are pending, writing long-winded posts.
belize

MogileFS is fun

We keep making a bunch of new monitoring and admin tools for MogileFS.

This is a cute little command-line one to show all the known hosts and devices. In addition to "sto1" and "sto2" shown below, we now have 8 new machines each with 4 hot-swap SATA bays, and 3 or 4 others with 2 hot-swap SATA. If we ever fill up our 6.1 TB free we can easily expand. We'll probably be adding a third host soon anyway, just so we can upgrade the file classes to have "min_replicas" of 3 instead of 2.

Currently MogileFS never makes replicas on devices on the same machine, because doing the indexes efficiently is too big/slow that way. The danger is that if you have 3 replicas, but split up with 2 on host_A and 1 on host_B, what happens if you lose that device on host_B? Now the replica count is down to 2, but it's not a "safe" 2, because both are on the same host. So we avoid hurting our brain about that and say that our per-file "devcount" value is always "devcount on different hosts".

lj@grimace:~$ mogcheck.pl
Checking mogilefsd availability...
        10.0.0.81:7001 ... responding.
        10.0.0.82:7001 ... responding.

Device information...
  hostname     device   age    size(G)       used       free    use%  delay
      sto1       dev1    7s    224.319     14.930    209.389   6.66% 0.005s
      sto1       dev2    7s    229.161      9.248    219.912   4.04% 0.004s
      sto1       dev3    7s    229.161      9.178    219.983   4.01% 0.004s
      sto1       dev4    7s    229.161      9.224    219.936   4.03% 0.004s
      sto1       dev5    7s    229.161      9.182    219.979   4.01% 0.004s
      sto1       dev6    7s    229.161      9.314    219.847   4.06% 0.076s
      sto1       dev7    7s    229.161      9.203    219.958   4.02% 0.026s
      sto1       dev8    7s    229.161      9.242    219.919   4.03% 0.004s
      sto1       dev9    7s    229.161      9.212    219.948   4.02% 0.051s
      sto1      dev10    7s    229.161      9.156    220.004   4.00% 0.117s
      sto1      dev11    7s    229.161      9.254    219.907   4.04% 0.019s
      sto1      dev12    7s    229.161      9.194    219.967   4.01% 0.004s
      sto1      dev13    7s    229.161      9.273    219.887   4.05% 0.005s
      sto1      dev14    7s    229.161      9.191    219.970   4.01% 0.004s
      sto2      dev15   41s    224.319      9.254    215.065   4.13% 0.005s
      sto2      dev16   41s    229.161      9.230    219.930   4.03% 0.006s
      sto2      dev17   41s    229.161      9.287    219.874   4.05% 0.004s
      sto2      dev18   41s    229.161      9.292    219.868   4.05% 0.008s
      sto2      dev19   41s    229.161      9.142    220.019   3.99% 0.005s
      sto2      dev20   41s    229.161      9.232    219.929   4.03% 0.004s
      sto2      dev21   41s    229.161      9.109    220.052   3.97% 0.004s
      sto2      dev22   41s    229.161      9.221    219.940   4.02% 0.016s
      sto2      dev23   41s    229.161      9.133    220.028   3.99% 0.018s
      sto2      dev24   41s    229.161      9.277    219.884   4.05% 0.004s
      sto2      dev25   41s    229.161      9.220    219.941   4.02% 0.004s
      sto2      dev26   41s    229.161      9.165    219.995   4.00% 0.004s
      sto2      dev27   41s    229.161      9.183    219.978   4.01% 0.008s
      sto2      dev28   41s    229.161      9.226    219.934   4.03% 0.020s
                total         6406.817    263.774   6143.043   4.12% 0.437s


MogileFS needs better docs.