?

Log in

No account? Create an account
Never ending feed of Atom feeds - brad's life Page 2 [entries|archive|friends|userinfo]
Brad Fitzpatrick

[ website | bradfitz.com ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Never ending feed of Atom feeds [Aug. 16th, 2005|12:58 pm]
Brad Fitzpatrick
[Tags|, , , ]

An increasing number of companies (large and small) are really insistent that we ping them with all blog updates, for reasons I won't rant about.

Just to prove a point, I flooded a couple of them and found that sure enough, nobody can really keep up. It's even more annoying when they don't even support persistent HTTP connections.

So --- I decided to turn things on their head and make them get data from us. If they can't keep up, it's their loss.

Prototype: (not its final home)

$ telnet danga.com 8081
GET /atom-stream.xml HTTP/1.0<enter>
<enter>


And enjoy the never ending XML stream of Atom feeds, each containing one entry. And if you get more than 256k behind (not including your TCP window size), then we start dropping entries to you and you see:

<sorryTooSlow youMissed="23" />

I think soon we'll get TypePad and perhaps MovableType blogs all being sent through this. The final home will probably be on a subdomain of sixapart.com somewhere, including documentation better than this blog entry.

And yes, I'm sure my Atom syntax is bogus or something. I spent a good 2 minutes on that part of it.
LinkReply

Comments:
Page 2 of 3
<<[1] [2] [3] >>
[User Picture]From: taral
2005-08-16 11:35 pm (UTC)
Must be pretty quiet... I have no trouble keeping up. :)
(Reply) (Thread)
[User Picture]From: taral
2005-08-16 11:35 pm (UTC)
Is this public entries only?
(Reply) (Thread)
[User Picture]From: brad
2005-08-16 11:59 pm (UTC)
Of course.
(Reply) (Parent) (Thread) (Expand)
[User Picture]From: adamthebastard
2005-08-16 11:55 pm (UTC)
Now all I need is a machine and link fast enough to download, parse and forward all of my friends entries to a jabber client.

Best Telnet session ever.
(Reply) (Thread)
From: jamesd
2005-08-16 11:56 pm (UTC)
I can see it now. Someone monitoring that and producing the LJ sextalk sub-feed containing every sex-related post. Or the LJ censored images feed comparing this to the lj images feed and reporting differences.

I suppose a few places could keep up; those doing a billion queries per day already have probably got a handle on what keeping up means.
(Reply) (Thread)
[User Picture]From: quindarprime
2005-08-17 12:21 am (UTC)
So, um... is it unreasonable to ask what "an increasing number of companies" want with a real-time feed of all public posts? Sounds a bit paranoia-inducing.
(Reply) (Thread)
[User Picture]From: brad
2005-08-17 12:30 am (UTC)
Everybody loves this blogging thing lately and wants to link/index/aggregate/analyze the data. Nothing scary... just people trying to compete on making blog data the most interesting.
(Reply) (Parent) (Thread)
[User Picture]From: aredridel
2005-08-17 12:22 am (UTC)
Sweet!
(Reply) (Thread)
From: evan
2005-08-17 12:33 am (UTC)
<link href='http://www.livejournal.com/users/buttonfeind/16132.html' />
<content type='html'>
Hey, does anyone know the address for grant hall? I have to get something sent there and it's impossible to find it on the website. That website is crap by the way, complete crap.

The post is "friends only" for possible stalker purposes, by the way.

</content>


Even if there are no bugs (likely), from a user-happiness perspective it might be nice to only push these posts a few minutes after they've been up and the security is still public.
(Reply) (Thread)
[User Picture]From: brad
2005-08-17 12:46 am (UTC)
The data comes from the same place as the recently updated data on the front page of the site, so if there are bugs about security, they're ancient bugs. And also, my code does a redundant filtering pass checking doing:

foreach my $p (@$recent) {
next unless $p->{security} eq 'public';

Also, the existing front-page data isn't artificially lagged. Though it might not be a bad idea... just kidna lame.
(Reply) (Parent) (Thread) (Expand)
(Deleted comment)
(Deleted comment)
[User Picture]From: brad
2005-08-17 05:01 am (UTC)

Re: FeedMesh?

Fat content was the big motivator here.
(Reply) (Parent) (Thread)
From: (Anonymous)
2005-08-17 04:45 am (UTC)

This is basically changes.xml

This is essentially a changes.xml format. Why not just implement that format instead of inventing your own?

Not that changes is all THAT but a lot of people already support it.

Kevin
(Reply) (Thread)
[User Picture]From: brad
2005-08-17 05:01 am (UTC)

Re: This is basically changes.xml

Fat content.
(Reply) (Parent) (Thread)
[User Picture]From: crschmidt
2005-08-17 12:57 pm (UTC)
I don't know what I'm doing wrong, but I've written some python code that looks at this, and it seems like there are way more entries than LiveJournal's front page is saying: as in, about 1300 entries per minute rather than just 200 that it says right now.

[crschmidt@creusa ~]$ python test.py
1124282728.47
Entries: 100. Entries/second: 19.2611058701. Time: 1124282733
Entries: 200. Entries/second: 22.5625200497. Time: 1124282737

http://crschmidt.net/python/ljentries.py is the code: am I really insane? did I do something wrong? I don't see obvious dupes in my code, and although I'm assuming I'd miss entries (if they broke over a 1024 barrier, since I'm not looking at the buffer) I don't think that I can think of a way I'd get extras...

Added in some extra collision checking, and found out that there are indeed clashes, but I can't seem to figure out why/where they're coming from. All my code is in that Python link up there.

*shrug* No clue what's up, but thought you might want to know. It almost seems like you're grabbing a full set of 100 new URLs from the cache every 10 seconds or so, and not checking if they're already printed out somehow... but that doesn't make any sense at all. So it's probably my code, but I can't figure out how.
(Reply) (Thread)
[User Picture]From: crschmidt
2005-08-17 01:00 pm (UTC)
Okay, so I just checked it in telnet, and I'm definitely seeing the same entries over and over. So, ignore my earlier comment.
(Reply) (Thread)
[User Picture]From: brad
2005-08-17 09:37 pm (UTC)
My ghetto injector is surely at fault. I'll fix soon here. This was just a prototype, after all.
(Reply) (Parent) (Thread)
[User Picture]From: ghewgill
2005-08-17 01:13 pm (UTC)
I ran the stream through "grep sorry", curious whether I would get that sorryToSlow tag. Of course, this shows me all the post lines with "sorry" in them, apparely LJ users are an apologetic bunch. However, I noticed that I saw the same posts over and over - does anybody else notice this?
(Reply) (Thread)
[User Picture]From: eichin
2005-08-17 09:29 pm (UTC)

geography

That's kind of neat. I grabbed a 300M snapshot, wrote 10 lines of python to break it into pieces, and fed it to a metacarta appliance, to see if it keeps up :-) I'll let you know if I find anything "interestingly geographic" out of it...
(Reply) (Thread)
[User Picture]From: kragen
2005-08-17 11:35 pm (UTC)

"bogus or something"

According to the current Atom spec, the <feed> is supposed to contain an <id> with the URI of the feed (rather than just a <link>) and an <updated> as well; these two, plus the <title>, are the only mandatory bits of <feed>.

The same three elements are the only mandatory bits of <entry>. The standard LJ .../data/atom Atom feeds seem to use some hokey URN for the entry id, but there's no need for that --- LJ posts have perfectly valid, dereferenceable perma-URLs, which are presently in the <link> elements in this stream.

I'm no Atom expert, so I hope this is helpful. Presumably either you will change the stream format to be valid Atom, or people who want to feed these things to their current Atom-consuming software will have to transform your current format into valid Atom --- what do you plan to do? I'd hate to put effort into doing the mapping myself if you're about to fix it.

(Reply) (Thread)
[User Picture]From: brad
2005-08-17 11:37 pm (UTC)

Re: "bogus or something"

The Atom LJ is injecting is total absolute crap. It's just a demo.

TypePad's injecting real Atom and LJ's about to start.
(Reply) (Parent) (Thread)
From: rosskarchner
2005-08-18 04:12 pm (UTC)

corny matrix reference

All I see is blonde, brunnette, redhead...
(Reply) (Thread)
Page 2 of 3
<<[1] [2] [3] >>