Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Privacy aside, have you ever tried to look something up from a few months ago, let alone (hypothetically) 20 years ago?

This is one of my absolute dislikes with both Facebook and Twitter - you pile all your data in there, they keep it all and own it all, and make it a royal pain for you to get at it again - even just for simple things like "I wonder when I was in XYZ?" or "What was that link that Matt posted last year?".



I think it can be a pain to get at the information at facebook.com and twitter.com, but both have APIs that make it relatively easy to pull out information.

I haven’t seen a similar product for Facebook, but Tweet Nest (http://pongsocket.com/tweetnest/) is a step in the right direction for Twitter. It makes it easy to download browse, and search your tweets on your own server. Here’s an example (my own tweets): http://chasenlehara.com/tweets/ [Note: I have no affiliation with Tweet Nest, I just really like it!]


Those APIs are only superficially easy, especially in the case of Twitter — their API is purely based on a hideously broken pagination model that counts up from the present, and is cut off completely at 3200 tweets for your own account and 800 for others. It's completely impossible to access anything older than that through any means!

Nearly everyone makes the fundamental pagination mistake (Blogger being the sole exception), but calendar-based archives are a basic assumption that everybody implements. Facebook doesn't expose it in their interface but it's possible through the API. For Twitter it's completely fucked — the only people with access to your old tweets are the Library of Congress.


fundamental pagination mistake

I imagine there are technical limitations leading to economic reasons for these kind of mistakes. So much data, sitting in so many massive silos, that they must design their systems on the basis of peoples' access patterns only hitting the most recent subset.

That, and the opposite economic reason - that you can charge for the older/richer/more complete dataset.

(EDIT: I know it's odd that I was whinging about the same thing a few posts up. I don't really know what the resolution is for that.)


Except that the technical limitations favor correct pagination — it's perfectly cacheable unlike the idiotic model: http://www.dehora.net/journal/2008/07/20/efficient-api-pagin...

I think the true reason is really just pervasive ignorance — everybody royally fucks this up and doesn't question it for a second.

If you have N items with M on each page, and the same request for 'page 2' always returns items N-M through N-2M, with the contents shuffling off the end as N increases, you're an abject failure. SELECT … LIMIT M OFFSET N-(P-1)•M is in almost every single web app and totally bullshit. It's incredibly depressing, but we'll probably be stuck with it for at least the rest of our lifetimes.


I assure you someone, more than likely Google, will one day develop this. Greplin could also position themselves in this space?


It is a truly remarkable engineering achievement that Facebook has managed to implement a less functional search feature than Google Reader, which ironically, holds the unique distinction of the least searchable system that I interact with daily. We have extended conversations on Reader, mostly related to recent papers that someone shares, and the comments are not searchable so you have to guess keywords in the title to find old threads. I assume they will fix this eventually, but it's a major usability problem at present.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: