Privacy aside, have you ever tried to look something up from a few months ago, l...

rendezvouscp · on Nov 16, 2010

I think it can be a pain to get at the information at facebook.com and twitter.com, but both have APIs that make it relatively easy to pull out information.

I haven’t seen a similar product for Facebook, but Tweet Nest (http://pongsocket.com/tweetnest/) is a step in the right direction for Twitter. It makes it easy to download browse, and search your tweets on your own server. Here’s an example (my own tweets): http://chasenlehara.com/tweets/ [Note: I have no affiliation with Tweet Nest, I just really like it!]

blasdel · on Nov 16, 2010

Those APIs are only superficially easy, especially in the case of Twitter — their API is purely based on a hideously broken pagination model that counts up from the present, and is cut off completely at 3200 tweets for your own account and 800 for others. It's completely impossible to access anything older than that through any means!

Nearly everyone makes the fundamental pagination mistake (Blogger being the sole exception), but calendar-based archives are a basic assumption that everybody implements. Facebook doesn't expose it in their interface but it's possible through the API. For Twitter it's completely fucked — the only people with access to your old tweets are the Library of Congress.

angusgr · on Nov 16, 2010

fundamental pagination mistake

I imagine there are technical limitations leading to economic reasons for these kind of mistakes. So much data, sitting in so many massive silos, that they must design their systems on the basis of peoples' access patterns only hitting the most recent subset.

That, and the opposite economic reason - that you can charge for the older/richer/more complete dataset.

(EDIT: I know it's odd that I was whinging about the same thing a few posts up. I don't really know what the resolution is for that.)

blasdel · on Nov 16, 2010

Except that the technical limitations favor correct pagination — it's perfectly cacheable unlike the idiotic model: http://www.dehora.net/journal/2008/07/20/efficient-api-pagin...

I think the true reason is really just pervasive ignorance — everybody royally fucks this up and doesn't question it for a second.

If you have N items with M on each page, and the same request for 'page 2' always returns items N-M through N-2M, with the contents shuffling off the end as N increases, you're an abject failure. SELECT … LIMIT M OFFSET N-(P-1)•M is in almost every single web app and totally bullshit. It's incredibly depressing, but we'll probably be stuck with it for at least the rest of our lifetimes.

izendejas · on Nov 16, 2010

I assure you someone, more than likely Google, will one day develop this. Greplin could also position themselves in this space?

jedbrown · on Nov 16, 2010

It is a truly remarkable engineering achievement that Facebook has managed to implement a less functional search feature than Google Reader, which ironically, holds the unique distinction of the least searchable system that I interact with daily. We have extended conversations on Reader, mostly related to recent papers that someone shares, and the comments are not searchable so you have to guess keywords in the title to find old threads. I assume they will fix this eventually, but it's a major usability problem at present.