?

Log in

No account? Create an account
Hacking Update - brad's life — LiveJournal [entries|archive|friends|userinfo]
Brad Fitzpatrick

[ website | bradfitz.com ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Hacking Update [Sep. 10th, 2006|06:14 pm]
Brad Fitzpatrick
[Tags|, ]

Another hacking status update, following the one I did in June. Feel free to comment here with anything you'd like to see me fix/add in any of these (or other) projects.

mogilefs [svn]
Hacking level: Med Needs release: Med
Major refactoring and surge of new development in Jun 2006 for speed/robustness/readability. No protocol level changes. Works with any DAV server for storage nodes now too.

Todo:

  • make mogstored have a apache/lighttpd assist mode, where they do HTTP, but mogstored still does per-node monitoring and usage file writing.
memcached [svn]
Hacking level: Low Needs release: No
Two major releases happened on 2006-09-09. See this post.

Todo:

  • vbuckets
  • win32 merge (easy)
  • merge 3 authors/contributors files
perlbal [svn]
Hacking level: Low Needs release: No
Nothing major to do. Pretty much just works. Only adding new features/optimizations/tricks as needed.

Todo:

  • add splice syscall support
  • low-priority queue & async plugin support
  • ongoing performance work
  • use new Danga::Socket::SSL to give it non-blocking SSL support
  • gzip content support (parallel .gz files). partially done.
djabberd [svn]
Hacking level: None Needs release: High
Pretty much just works, so development has stopped. Needs a release and publicity, opening the floodgates to plugin hacking.

Todo:

  • write an example clustering plugin (Spread?)
  • rearrange repo to separate core vs. modules. complicated by dependencies with test suite. null (memory-storage-only) plugins in core for testing?
brackup [svn]
Hacking level: None Needs release: Low
Never started using it myself. It has a great test suite that includes backing up and restoring, both encrypted and not, but I never implemented the hooks for restoring in the Amazon plugin. I feel guilty about not using it, finishing any usage quirks I find, and releasing it. I'd love somebody to take this over for me if you're excited about it.

Todo:

  • start using
  • amazon s3 restore support
gearman [svn]
Hacking level: None Needs release: No
Using it in production, but lacks a website/docs/etc.

Todo:

  • website/docs
openid [svn]
Hacking level: None Needs release: No
David Recordon at Versign, the JanRain crew, and other members of the identity community are largely steering this boat now.

Todo:

  • catch up on latest changes
  • write a credential/token passing extension ("Authorize $FOO to post to your LJ as you?")
LinkReply

Comments:
[User Picture]From: dossy
2006-09-11 01:37 am (UTC)
It'd be neat to see brackup be able to use mogilefs as its storage backend. Is an authentication/authorization layer implemented in mogilefs?
(Reply) (Thread)
[User Picture]From: brad
2006-09-11 01:55 am (UTC)
No auth.
(Reply) (Parent) (Thread)
[User Picture]From: dossy
2006-09-11 02:03 am (UTC)
But, it would be easy enough to create a brackup-to-mogilefs gateway, one that implemented the auth and could speak to brackup on one end and mogilefs on the other. (Perlbal?)
(Reply) (Parent) (Thread)
[User Picture]From: brad
2006-09-11 04:54 am (UTC)
However it's actually done, it's not hard. But why? So you could offer your friends backup space on your servers, perhaps? Could be cool.
(Reply) (Parent) (Thread)
[User Picture]From: userunknown
2006-09-11 02:36 am (UTC)

memcache & gearman

re: memcache:
I was playing around with swig the other day trying to wrap libmemcache.. It sort of worked, until I OOM'd myself. I could onl y figure out how to free() properly using python-ctypes(which directly loads the .so)...

I got interesting results, for small value sizes the ctypes version is 2x as fast as the native python client. But once you get up to about 50k, it starts dropping

for size=1000
python 20000 in 2.79 seconds, 7169.65/sec
ctypes 20000 in 1.44 seconds, 13849.54/sec
...
for size=200000
python 2000 in 2.64 seconds, 757.29/sec
ctypes 2000 in 3.57 seconds, 560.65/sec

It doesn't seem to be a libmemcache problem, something with the conversion of char * -> python str must be slow... 200,000 at ~600 req/sec is over 100MB/s after all.

I'll have to play with swig again... If I can figure out how to get it to not leak memory it should be even faster than ctypes.

re: gearman
I thought this was being replaced with theschwartz... I guess I'm not sure what the relationship between the two is. gearman looks cooler from my point of view since I can write a python client for it :-)
(Reply) (Thread)
[User Picture]From: brad
2006-09-11 05:00 am (UTC)

Re: memcache & gearman

gearman: low latency. no disks involved. optionally joins requests and multiplexes responses. can track multiple outstanding requests. but if you go away and stop listening, you don't know if job succeeded or failed, and it makes no promises about finishing your job once you go away. if you want to be failure-proof, you as a client have to resubmit it if it fails, which means you have to be there listening. gearman is supposed to be used from web context for lots of parallel, short things. (or expensive things or things in other langauges you want to run elsewhere....) but synchronous with the caller.

theschwartz: fire and forget. higher latency (seconds or more), but it'll get done, according to the rules you put down for it. (regarding backoff policies and retries...) like sending an email. who cares if it takes 5 seconds, but fuck if you're gonna sit around and wait for it. you just want to be confident it'll work. and you're given a handle so you can check on its return code (or current errors or error history) later, if policy for the job you've submitted is set to retain that exit status.

The two complement each other nicely, and you can even imagine gearman jobs submitting TheSchwartz jobs, and TheSchwartz jobs being serviced by gearman worker processes.

We want to write a TheSchwartz network daemon, so you can submit into it and grab jobs to do from it using plain HTTP (GET, PUT/POST, DELETE) and use it from any language. It's only implemented as a Perl library because we had no need for a daemon right away because most of the pieces involved we needed it for were Perl. And its logic can be embedded... no fancy behaviors or dependencies.

I need to formally document both of these projects.

BTW, TheSchwartz is probably going to be renamed Data::JobQueue. and the server interface djobqueued or something.
(Reply) (Parent) (Thread)
[User Picture]From: codetoad
2006-09-11 05:10 am (UTC)

Re: memcache & gearman

Check out Pyrex? It might not be worth it, though, because it's doubtful it will be much faster because memcache.py does so little (the looping may be faster, though).
(Reply) (Parent) (Thread)
[User Picture]From: mtbg
2006-09-11 04:38 am (UTC)
Because I'm too lazy to sign up and post to the djabberd list (feel free to tell me to do so):

I've been having a problem with LJ's Jabber service vis-a-vis OTR. It goes something like this: I have a client (Adium) logged in at home. I log in again at work (with Gaim). Both of my clients are set to autonegotiate OTR. Neither client sets a priority. At work, I message a friend whose client speaks OTR; his (unencrypted) response gets delivered to both of my clients, which each then attempt an OTR handshake, causing his client to resend his message, which gets delivered to both of my clients... This goes on and on at about .5Hz until at least one of the three clients involved gets taken offline.

Now, having just reread the relevant portion of RFC3921 (section 11.1.4.1), I understand that djabberd's behavior, viz., sending a message to each of two resources having equal priority, is certainly allowed:

If two or more available resources have the same priority, the server MAY use some other rule (e.g., most recent connect time, most recent activity time, or highest availability as determined by some hierarchy of <show/> values) to choose between them or MAY deliver the message to all such resources.

However, given the snafu described above, I'd obviously prefer that "some other rule" be used, and the message delivered to only one resource.

Let me know if there's any more information I could provide. Thanks for your time.
(Reply) (Thread)
[User Picture]From: brad
2006-09-11 05:02 am (UTC)
Hm. So this is more of an OTR-vs-Jabber-spec issue, right?

Maybe djabberd could keep its policy but do some sort of loop detection and stop it.
(Reply) (Parent) (Thread)
[User Picture]From: mtbg
2006-09-11 05:17 am (UTC)
So this is more of an OTR-vs-Jabber-spec issue, right?

Could be. Not that I've ever even been within eyeshot of the OTR source, but I wouldn't be surprised if the scenario I described was something the authors never really considered.

Maybe djabberd could keep its policy but do some sort of loop detection and stop it.

It would still be pretty annoying if the loop restarted every time my friend sent me a message. If that's the easiest solution, though, I won't complain.

FWIW, I'm thrilled that LJ is providing all of my friends with Jabber accounts. It's a great step towards my goal of ridding my life of crappy protocols, AIM being the biggest current offender.
(Reply) (Parent) (Thread)
[User Picture]From: brad
2006-09-11 05:41 am (UTC)
And I'm thrilled to help get rid of the crappy protocols.
(Reply) (Parent) (Thread)
[User Picture]From: kvance
2006-09-11 05:33 am (UTC)
If no one else does in the next couple of months, I might take brackup and run with it. It's still running on my desktop every day, and I may want to extract all that data from amazon at some point :P
(Reply) (Thread)
[User Picture]From: brad
2006-09-11 05:41 am (UTC)
Hah. What do you pay per month lately?
(Reply) (Parent) (Thread)
[User Picture]From: kvance
2006-09-11 05:54 am (UTC)
Last month, 50 cents. But my computer was off for about a week, so maybe 60 cents this month. They actually charge my credit card that.

I have 1GB stored since June. This is for my local cvs and svn, and a couple of mysql hotcopies.
(Reply) (Parent) (Thread)
[User Picture]From: mart
2006-09-11 06:29 am (UTC)

I started thinking again the other day about the “let third party do operation X on identity Y as identiy Z” thing. Got sidetracked this weekend and didn't write any code. Must put that back on my own todo list…

(Reply) (Thread)
[User Picture]From: edm
2006-09-15 08:34 pm (UTC)

Brackup

FYI, this person seems to be looking for something pretty much like Brackup and considering writing something themselves:

http://www.daemonology.net/blog/2006-09-13-encrypted-backup.html
http://www.daemonology.net/blog/2006-09-14-more-about-backup.html

I've pointed him at the CPAN page and this posting. If he's keen that may be a possible new maintainer.

Ewen
(Reply) (Thread)
[User Picture]From: askbjoernhansen
2006-09-20 01:42 am (UTC)

perlbal gzip

> perlbal:
> gzip content support (parallel .gz files). partially done.

Where's that? I looked in trunk but didn't see any signs of it...


- ask
(Reply) (Thread)