?

Log in

No account? Create an account
Size of journals - brad's life — LiveJournal [entries|archive|friends|userinfo]
Brad Fitzpatrick

[ website | bradfitz.com ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Size of journals [Nov. 27th, 2004|11:40 am]
Brad Fitzpatrick
I've been doing a lot of work this morning calculating "how big" journals are, and how much disk space we'll need going forward. (especially as our months-long migration from MyISAM to InnoDB continues)

First I calculated MB/journal on 4 InnoDB clusters and got:

0.139
0.184
0.091
0.625 -- what?

The first 3 numbers (and especially the first 2) were promising, but then I realized I would have to factor in account age.

So I calculated bytes/day/journal: (day == total sum of account ages in days across all journals on a cluster)

1036.59
927.1858
955.3326
1071.819

Much more consistent!

BTW, these sizes are considering all data and indexes on a single machine, but not redundant copies of data in the cluster, or backups.

So once somebody signs up, historically, on average, their account grows by about 1 kB per day. With 5.3 million users, that's 5.04 GiB/day. Considering redundant copies and backups, that's more like 20 GiB/day.

See why I hate disks?

Update: the data above is bogus, because our moving process has been moving active users onto InnoDB first, and active accounts are bigger. I'll post better numbers later.
LinkReply

Comments:
[User Picture]From: scosol
2004-11-27 07:56 pm (UTC)
you need to modofy the lj client progs and turn them all in to p2p-style distributed storage nodes- then you can push all the storage back where it belongs ;P
(Reply) (Thread)
[User Picture]From: brad
2004-11-27 08:00 pm (UTC)
And if reliability and latency were problems now, that's a whole new world of problem.
(Reply) (Parent) (Thread)
[User Picture]From: scosol
2004-11-28 01:04 am (UTC)
hahah but of course :P
heh- in all seriousness, latency is a problem but (properly designed) reliability isn't
what a coup that would be hahahah
(Reply) (Parent) (Thread)
(Deleted comment)
[User Picture]From: brad
2004-11-27 08:16 pm (UTC)
:-(

We are considering this, though, whenever we make new tables. We plan to redo all tables at some point so there's a global "revision number" for an account and given your rev number (not a date), you can replay all changes. That's a ways off, though.

How's that FB client? :-)
(Reply) (Parent) (Thread)
(Deleted comment)
[User Picture]From: brad
2004-11-27 08:23 pm (UTC)
Sounds like your Thanksgiving weekend is more eventful than mine.

In any case, I'm exciting to see you and Whitaker making so much progress on the client.
(Reply) (Parent) (Thread)
(Deleted comment)
[User Picture]From: brad
2004-11-27 08:29 pm (UTC)
Heh.
(Reply) (Parent) (Thread)
From: evan
2004-11-27 09:20 pm (UTC)
Neato.
(Reply) (Thread)
[User Picture]From: stephenbooth_uk
2004-11-27 10:14 pm (UTC)
Have you thought of looking at some sort of time based phased managed storage so only the most recent data is on the quick and expensive storage and the rarely accessed stuff migrates onto slower but much cheaper storage.

I've been starting to look into something like that for my employers for our Oracle databases (individually in the tens to hundreds of Gb range but collectively many Tb and growing). We're looking at the recent or frequently accessed data being on fast direct attached disk, older and less frequently accessed on NAS, older and rarely accessed on Bladestore and archival on WORM. For legislative reasons we have to keep data accessible online for upto 12 years on some systems.

I suspect that you're a ways off our volumes of data (and budget, our annual, non-staff, IT budget is about $70million) but if you can migrate the older data onto cheaper mass storage that should help some (aside from memory linked posts how often do you look back to entries over a few weeks old?).
(Reply) (Thread)
[User Picture]From: brad
2004-11-27 10:48 pm (UTC)
Actually we do move inactive accounts to cheaper storage.
(Reply) (Parent) (Thread)
[User Picture]From: controversial
2004-12-02 06:00 pm (UTC)

Hey Brad

I bring this directly to your personal journal because I haven't found another forum that answers my question.

How might I directly invest in LJ?
(Reply) (Thread)
[User Picture]From: brad
2004-12-02 06:03 pm (UTC)

Re: Hey Brad

Well, given that we're a private company and not taking funding, there's not really a way. *shrug*
(Reply) (Parent) (Thread)