Log in

No account? Create an account
Never ending feed of Atom feeds - brad's life — LiveJournal [entries|archive|friends|userinfo]
Brad Fitzpatrick

[ website | bradfitz.com ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Never ending feed of Atom feeds [Aug. 16th, 2005|12:58 pm]
Brad Fitzpatrick
[Tags|, , , ]

An increasing number of companies (large and small) are really insistent that we ping them with all blog updates, for reasons I won't rant about.

Just to prove a point, I flooded a couple of them and found that sure enough, nobody can really keep up. It's even more annoying when they don't even support persistent HTTP connections.

So --- I decided to turn things on their head and make them get data from us. If they can't keep up, it's their loss.

Prototype: (not its final home)

$ telnet danga.com 8081
GET /atom-stream.xml HTTP/1.0<enter>

And enjoy the never ending XML stream of Atom feeds, each containing one entry. And if you get more than 256k behind (not including your TCP window size), then we start dropping entries to you and you see:

<sorryTooSlow youMissed="23" />

I think soon we'll get TypePad and perhaps MovableType blogs all being sent through this. The final home will probably be on a subdomain of sixapart.com somewhere, including documentation better than this blog entry.

And yes, I'm sure my Atom syntax is bogus or something. I spent a good 2 minutes on that part of it.

Page 1 of 3
<<[1] [2] [3] >>
[User Picture]From: crschmidt
2005-08-16 08:05 pm (UTC)
(Reply) (Thread)
[User Picture]From: caladri
2005-08-16 08:06 pm (UTC)
It's entertaining to watch, at a minimum.
(Reply) (Thread)
From: catamorphism
2005-08-16 08:10 pm (UTC)
I want a way to stream just my friends page!
(Reply) (Thread)
[User Picture]From: feignedapathy
2005-08-16 08:18 pm (UTC)
Wow, you could completely add a "Wankery Per Minute" counter using this data.
(Reply) (Thread)
(Deleted comment)
From: andr3
2005-08-17 12:12 am (UTC)
haha i second that. :D
(Reply) (Parent) (Thread)
[User Picture]From: mart
2005-08-16 08:40 pm (UTC)

Hmm. I don't understand how but somehow when I try this on my Windows box here it starts printing out gibberish and items from my command history going back weeks. Weird stuff. On my linux box it messes up my terminal. I guess there are some control characters in there causing weirdness. Good thing that telnet isn't the recommended way to access this. ;)

Hopefully this stuff will start getting PubSubbed at some point, assuming that someone can keep up with it well enough to produce decent PubSub output.

(Reply) (Thread)
[User Picture]From: brad
2005-08-16 08:51 pm (UTC)
It's utf-8 data. The Russians are fucking with you.
(Reply) (Parent) (Thread) (Expand)
[User Picture]From: scsi
2005-08-16 08:53 pm (UTC)


An increasing number of companies (large and small) are really insistent that we ping them with all blog updates, for reasons I won't rant about.

Uhh, do they have rocks in their heads? At ~6 posts a second, thats like asking to take a shower via a firehose.

I'd just get something in writing just to confirm they really want to do this. Then when you open the proverbial floodgates upon them they cant nail you or 6A for blowing the servers off of the face of the planet.

In their defense its probably some marketing/management drone pressing for you to do this, while the admins are already getting their white flags and upstream blocks in place.
(Reply) (Thread)
[User Picture]From: kragen
2005-08-16 11:25 pm (UTC)

Re: Laaaame.

It's not such a BFD --- in the 4 minutes 6 seconds I listed, I got about 1.3 megabytes, comprising 733 posts. That's only about 5 kB/sec, which you could handle on a 56kbps modem, and a little under 3 posts per second. (Presumably this is the public half of LJ.) That's, like, half a gigabyte a day, or a few dimes worth of bandwidth. The junky but relatively optimized full-text indexer I use on my email can reindex a gigabyte in about ten minutes on my old 600MHz laptop. (see kragen-hacks archives for details.)

I am not really qualified to speculate on why LJ needs a big server farm to handle 6 posts per second (although bradfitz's talk at OSCON was really great), but I'm guessing that it includes items from the following list: pageviews, comments, authorization, usericons, reliability, friends pages. All of these are required to actually run LJ, but you don't need any of them to slurp from this particular firehose.

FWIW I think this is really excellent. Thanks Brad! Yay perlbal!
(Reply) (Parent) (Thread) (Expand)
[User Picture]From: mart
2005-08-16 09:06 pm (UTC)

With a little bit of extra markup (plus ideally a little bit of support for handshaking on connection) this could become an XMPP stream and the existing XMPP client libraries would be able to suck it up, saving people from having to write new parsing code (which is a pain because many XML libraries won't play nice when there's no proper end to a document). I guess once it stops being HTTP Perlbal becomes less helpful, though. Last I checked the XMPP libraries for Perl were a little clunky as well.

(Reply) (Thread)
[User Picture]From: edm
2005-08-16 09:58 pm (UTC)
The two obvious ways of parsing it would be to either use a SAX API parser (ie, process the start/stop tags as they come in), or do some high-level parsing to spot start/end of Atom entries and then wrap those in enough surrounding junk to make a DOM parser happy. Neither seems especially difficult to do.

I, too, love the <sorryTooSlow/> tag. And marvel at the people who ever thought their architecture could take 6+ incoming posts a second and do useful things with them, without a LJ sized infrastructure.

(Reply) (Parent) (Thread) (Expand)
[User Picture]From: ydna
2005-08-16 09:10 pm (UTC)
Bloody brilliant. You're like that kid at school. The one in the school yard. With the firehose. Going full tilt. A kid behind you says, "Brad?," and you turn casually then watch as he skids on his ass across the blacktop into a cinder block wall. "Brad, could you?" You turn again.

Won't somebody think of the children?!!?!
(Reply) (Thread)
[User Picture]From: scsi
2005-08-16 09:37 pm (UTC)
"You get to drink from the firehose!!!! Are you READY?"
(Reply) (Parent) (Thread)
[User Picture]From: wetzel
2005-08-16 09:16 pm (UTC)
you guys should be like google and get a giant LCD screen for your office, and just have this scrolling past it all day.

it's the live angstweb!
(Reply) (Thread)
[User Picture]From: pfig
2005-08-16 09:28 pm (UTC)

live livejournal

i hesitate between spawn of satanor saviour of mankind :)
(Reply) (Thread)
[User Picture]From: lithiana
2005-08-16 09:57 pm (UTC)
(Reply) (Thread)
[User Picture]From: kragen
2005-08-16 11:34 pm (UTC)
Nice. Of course the next thing that occurred to me was, "What about usernames?" And apparently it was to you too... http://www.knams.wikimedia.org/~kate/lj/users.html
(Reply) (Parent) (Thread)
[User Picture]From: boggyb
2005-08-16 10:19 pm (UTC)
C:\Documents and Settings\Thomas>nc danga.com 8081
GET /atom-stream.xml HTTP/1.0

C:\Documents and Settings\Thomas>

Ideas? About the only thing I can think of is netcat only sends 0x0a, not 0x0d 0x0a.
(Reply) (Thread)
[User Picture]From: brad
2005-08-16 10:48 pm (UTC)
You need 0x0d.
(Reply) (Parent) (Thread) (Expand)
(no subject) - (Anonymous) Expand
[User Picture]From: allezbleu
2005-08-16 11:30 pm (UTC)
hilarious! thank you.
(Reply) (Thread)
[User Picture]From: smackfu
2005-08-16 11:34 pm (UTC)
Wow, Safari doesn't like that at all, if you turn it into a URL. Presumably because the atomStream tag is never terminated.
(Reply) (Thread)
Page 1 of 3
<<[1] [2] [3] >>