brad's life - Never ending feed of Atom feeds [entries|archive|friends|userinfo]
Brad Fitzpatrick

[ website | bradfitz.com ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Never ending feed of Atom feeds [Aug. 16th, 2005|12:58 pm]
Previous Entry Share Next Entry
[Tags|, , , ]

An increasing number of companies (large and small) are really insistent that we ping them with all blog updates, for reasons I won't rant about.

Just to prove a point, I flooded a couple of them and found that sure enough, nobody can really keep up. It's even more annoying when they don't even support persistent HTTP connections.

So --- I decided to turn things on their head and make them get data from us. If they can't keep up, it's their loss.

Prototype: (not its final home)

$ telnet danga.com 8081
GET /atom-stream.xml HTTP/1.0<enter>
<enter>


And enjoy the never ending XML stream of Atom feeds, each containing one entry. And if you get more than 256k behind (not including your TCP window size), then we start dropping entries to you and you see:

<sorryTooSlow youMissed="23" />

I think soon we'll get TypePad and perhaps MovableType blogs all being sent through this. The final home will probably be on a subdomain of sixapart.com somewhere, including documentation better than this blog entry.

And yes, I'm sure my Atom syntax is bogus or something. I spent a good 2 minutes on that part of it.
LinkReply

Comments:
Page 1 of 2
<<[1] [2] >>
[User Picture]From: crschmidt
2005-08-16 08:05 pm (UTC)

(Link)

Coolest.
Thing.
Ever.
[User Picture]From: caladri
2005-08-16 08:06 pm (UTC)

(Link)

It's entertaining to watch, at a minimum.
From: catamorphism
2005-08-16 08:10 pm (UTC)

(Link)

I want a way to stream just my friends page!
[User Picture]From: feignedapathy
2005-08-16 08:18 pm (UTC)

(Link)

Wow, you could completely add a "Wankery Per Minute" counter using this data.
[User Picture]From: duskwuff
2005-08-16 08:39 pm (UTC)

(Link)

Oh no! It's an endless stream of angst!
[User Picture]From: nothings
2005-08-16 08:39 pm (UTC)

(Link)

sorryTooSlow is the best XML tag ever.
From: andr3
2005-08-17 12:12 am (UTC)

(Link)

haha i second that. :D
[User Picture]From: mart
2005-08-16 08:40 pm (UTC)

(Link)

Hmm. I don't understand how but somehow when I try this on my Windows box here it starts printing out gibberish and items from my command history going back weeks. Weird stuff. On my linux box it messes up my terminal. I guess there are some control characters in there causing weirdness. Good thing that telnet isn't the recommended way to access this. ;)

Hopefully this stuff will start getting PubSubbed at some point, assuming that someone can keep up with it well enough to produce decent PubSub output.

[User Picture]From: brad
2005-08-16 08:51 pm (UTC)

(Link)

It's utf-8 data. The Russians are fucking with you.
[User Picture]From: scsi
2005-08-16 08:53 pm (UTC)

Laaaame.

(Link)

An increasing number of companies (large and small) are really insistent that we ping them with all blog updates, for reasons I won't rant about.

Uhh, do they have rocks in their heads? At ~6 posts a second, thats like asking to take a shower via a firehose.

I'd just get something in writing just to confirm they really want to do this. Then when you open the proverbial floodgates upon them they cant nail you or 6A for blowing the servers off of the face of the planet.

In their defense its probably some marketing/management drone pressing for you to do this, while the admins are already getting their white flags and upstream blocks in place.
[User Picture]From: kragen
2005-08-16 11:25 pm (UTC)

Re: Laaaame.

(Link)

It's not such a BFD --- in the 4 minutes 6 seconds I listed, I got about 1.3 megabytes, comprising 733 posts. That's only about 5 kB/sec, which you could handle on a 56kbps modem, and a little under 3 posts per second. (Presumably this is the public half of LJ.) That's, like, half a gigabyte a day, or a few dimes worth of bandwidth. The junky but relatively optimized full-text indexer I use on my email can reindex a gigabyte in about ten minutes on my old 600MHz laptop. (see kragen-hacks archives for details.)

I am not really qualified to speculate on why LJ needs a big server farm to handle 6 posts per second (although bradfitz's talk at OSCON was really great), but I'm guessing that it includes items from the following list: pageviews, comments, authorization, usericons, reliability, friends pages. All of these are required to actually run LJ, but you don't need any of them to slurp from this particular firehose.

FWIW I think this is really excellent. Thanks Brad! Yay perlbal!
[User Picture]From: mart
2005-08-16 09:06 pm (UTC)

(Link)

With a little bit of extra markup (plus ideally a little bit of support for handshaking on connection) this could become an XMPP stream and the existing XMPP client libraries would be able to suck it up, saving people from having to write new parsing code (which is a pain because many XML libraries won't play nice when there's no proper end to a document). I guess once it stops being HTTP Perlbal becomes less helpful, though. Last I checked the XMPP libraries for Perl were a little clunky as well.

[User Picture]From: edm
2005-08-16 09:58 pm (UTC)

(Link)

The two obvious ways of parsing it would be to either use a SAX API parser (ie, process the start/stop tags as they come in), or do some high-level parsing to spot start/end of Atom entries and then wrap those in enough surrounding junk to make a DOM parser happy. Neither seems especially difficult to do.

I, too, love the <sorryTooSlow/> tag. And marvel at the people who ever thought their architecture could take 6+ incoming posts a second and do useful things with them, without a LJ sized infrastructure.

Ewen
[User Picture]From: ydna
2005-08-16 09:10 pm (UTC)

(Link)

Bloody brilliant. You're like that kid at school. The one in the school yard. With the firehose. Going full tilt. A kid behind you says, "Brad?," and you turn casually then watch as he skids on his ass across the blacktop into a cinder block wall. "Brad, could you?" You turn again.

Won't somebody think of the children?!!?!
[User Picture]From: scsi
2005-08-16 09:37 pm (UTC)

(Link)

"You get to drink from the firehose!!!! Are you READY?"
[User Picture]From: wetzel
2005-08-16 09:16 pm (UTC)

(Link)

you guys should be like google and get a giant LCD screen for your office, and just have this scrolling past it all day.

it's the live angstweb!
[User Picture]From: pfig
2005-08-16 09:28 pm (UTC)

live livejournal

(Link)

i hesitate between spawn of satanor saviour of mankind :)
[User Picture]From: lithiana
2005-08-16 09:57 pm (UTC)

(Link)

[User Picture]From: kragen
2005-08-16 11:34 pm (UTC)

(Link)

Nice. Of course the next thing that occurred to me was, "What about usernames?" And apparently it was to you too... http://www.knams.wikimedia.org/~kate/lj/users.html
[User Picture]From: boggyb
2005-08-16 10:19 pm (UTC)

(Link)

C:\Documents and Settings\Thomas>nc danga.com 8081
GET /atom-stream.xml HTTP/1.0


C:\Documents and Settings\Thomas>


Ideas? About the only thing I can think of is netcat only sends 0x0a, not 0x0d 0x0a.
[User Picture]From: brad
2005-08-16 10:48 pm (UTC)

(Link)

You need 0x0d.
(no subject) - (Anonymous) Expand
[User Picture]From: allezbleu
2005-08-16 11:30 pm (UTC)

(Link)

hilarious! thank you.
[User Picture]From: smackfu
2005-08-16 11:34 pm (UTC)

(Link)

Wow, Safari doesn't like that at all, if you turn it into a URL. Presumably because the atomStream tag is never terminated.
[User Picture]From: taral
2005-08-16 11:35 pm (UTC)

(Link)

Must be pretty quiet... I have no trouble keeping up. :)
[User Picture]From: taral
2005-08-16 11:35 pm (UTC)

(Link)

Is this public entries only?
[User Picture]From: brad
2005-08-16 11:59 pm (UTC)

(Link)

Of course.
[User Picture]From: adamthebastard
2005-08-16 11:55 pm (UTC)

(Link)

Now all I need is a machine and link fast enough to download, parse and forward all of my friends entries to a jabber client.

Best Telnet session ever.
From: jamesd
2005-08-16 11:56 pm (UTC)

(Link)

I can see it now. Someone monitoring that and producing the LJ sextalk sub-feed containing every sex-related post. Or the LJ censored images feed comparing this to the lj images feed and reporting differences.

I suppose a few places could keep up; those doing a billion queries per day already have probably got a handle on what keeping up means.
[User Picture]From: quindarprime
2005-08-17 12:21 am (UTC)

(Link)

So, um... is it unreasonable to ask what "an increasing number of companies" want with a real-time feed of all public posts? Sounds a bit paranoia-inducing.
[User Picture]From: brad
2005-08-17 12:30 am (UTC)

(Link)

Everybody loves this blogging thing lately and wants to link/index/aggregate/analyze the data. Nothing scary... just people trying to compete on making blog data the most interesting.
[User Picture]From: aredridel
2005-08-17 12:22 am (UTC)

(Link)

Sweet!
From: evan
2005-08-17 12:33 am (UTC)

(Link)

<link href='http://www.livejournal.com/users/buttonfeind/16132.html' />
<content type='html'>
Hey, does anyone know the address for grant hall? I have to get something sent there and it's impossible to find it on the website. That website is crap by the way, complete crap.

The post is "friends only" for possible stalker purposes, by the way.

</content>


Even if there are no bugs (likely), from a user-happiness perspective it might be nice to only push these posts a few minutes after they've been up and the security is still public.
[User Picture]From: brad
2005-08-17 12:46 am (UTC)

(Link)

The data comes from the same place as the recently updated data on the front page of the site, so if there are bugs about security, they're ancient bugs. And also, my code does a redundant filtering pass checking doing:

foreach my $p (@$recent) {
next unless $p->{security} eq 'public';

Also, the existing front-page data isn't artificially lagged. Though it might not be a bad idea... just kidna lame.
(Deleted comment)
(Deleted comment)
[User Picture]From: brad
2005-08-17 05:01 am (UTC)

Re: FeedMesh?

(Link)

Fat content was the big motivator here.
From: (Anonymous)
2005-08-17 04:45 am (UTC)

This is basically changes.xml

(Link)

This is essentially a changes.xml format. Why not just implement that format instead of inventing your own?

Not that changes is all THAT but a lot of people already support it.

Kevin
[User Picture]From: brad
2005-08-17 05:01 am (UTC)

Re: This is basically changes.xml

(Link)

Fat content.
Page 1 of 2
<<[1] [2] >>