Log in

No account? Create an account
Never ending feed of Atom feeds - brad's life [entries|archive|friends|userinfo]
Brad Fitzpatrick

[ website | bradfitz.com ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Never ending feed of Atom feeds [Aug. 16th, 2005|12:58 pm]
Brad Fitzpatrick
[Tags|, , , ]

An increasing number of companies (large and small) are really insistent that we ping them with all blog updates, for reasons I won't rant about.

Just to prove a point, I flooded a couple of them and found that sure enough, nobody can really keep up. It's even more annoying when they don't even support persistent HTTP connections.

So --- I decided to turn things on their head and make them get data from us. If they can't keep up, it's their loss.

Prototype: (not its final home)

$ telnet danga.com 8081
GET /atom-stream.xml HTTP/1.0<enter>

And enjoy the never ending XML stream of Atom feeds, each containing one entry. And if you get more than 256k behind (not including your TCP window size), then we start dropping entries to you and you see:

<sorryTooSlow youMissed="23" />

I think soon we'll get TypePad and perhaps MovableType blogs all being sent through this. The final home will probably be on a subdomain of sixapart.com somewhere, including documentation better than this blog entry.

And yes, I'm sure my Atom syntax is bogus or something. I spent a good 2 minutes on that part of it.

[User Picture]From: kragen
2005-08-16 11:25 pm (UTC)

Re: Laaaame.

It's not such a BFD --- in the 4 minutes 6 seconds I listed, I got about 1.3 megabytes, comprising 733 posts. That's only about 5 kB/sec, which you could handle on a 56kbps modem, and a little under 3 posts per second. (Presumably this is the public half of LJ.) That's, like, half a gigabyte a day, or a few dimes worth of bandwidth. The junky but relatively optimized full-text indexer I use on my email can reindex a gigabyte in about ten minutes on my old 600MHz laptop. (see kragen-hacks archives for details.)

I am not really qualified to speculate on why LJ needs a big server farm to handle 6 posts per second (although bradfitz's talk at OSCON was really great), but I'm guessing that it includes items from the following list: pageviews, comments, authorization, usericons, reliability, friends pages. All of these are required to actually run LJ, but you don't need any of them to slurp from this particular firehose.

FWIW I think this is really excellent. Thanks Brad! Yay perlbal!
(Reply) (Parent) (Thread)
[User Picture]From: scsi
2005-08-17 12:14 am (UTC)

Re: Laaaame.

Judging how I can see the atom TIME counter increment (without having 3-6 posts shoved in between), the stats page (stats/latest.bml) on posts/second is either taking into account private/friends posts or the feed isnt 100% LJ public posts.
LJ needs the server farm because the GET >>>> POSTs.
(Reply) (Parent) (Thread)