I'm building something that's going to store 2 billion media objects in the short term, with each object having 10+ related objects on top. Think crawler with some interesting extrapolations/filters/etc applied.
Would want to be able to power multiple views on data, mostly in aggregate form.
This is all greenfield: the data exists but i'm starting from scratch to crawl and reformat it. I'm interested to know how you might consider storing the info?
Thanks :)
Does each object have a fixed size? (Or can be a fixed size with padding?)
What kind of index do you need? Do you need advanced SQL commands, like aggregate data, and complex joins?
MySQL can handle billions http://dev.mysql.com/doc/refman/5.1/en/features.html of rows - but it may be overkill for you. Pretty much any database can.
But that's the wrong question. The right question is what do you need the database to do. Mainly what kind of index, and what kind of joins.
PS. If all you need is the aggregate data, why store the details? Just store the aggregate and update it on the fly. (Presumably store the details offline, so you can regenerate the aggregate if needed.)
Read this: http://news.ycombinator.com/item?id=594889 for updating the agregates on the fly. And this: http://news.ycombinator.com/item?id=587723 for one type of system for storing large numbers of objects.