Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A Developer's Worst Nightmare (The TinyGrab Story) (cocoacoding.com)
48 points by nam3d on April 3, 2011 | hide | past | favorite | 46 comments


If anything this is a cautionary tale about how to screw up your startup.

I have no idea what TinyGrab's architecture looks like, but I can say with full confidence that backups were an afterthought, and they find themselves in deep shit because they did not have a proper backup in place. It is inexcusable to have the current situation and still claim they have good backups.

"The issues lays with the server being compromised and being forced to shut down. The old API and apps relied on a fixed IP address. It would have taken us months to get TinyGrab v1 back up to scratch"

Ok, computer scientists reading this: To which of you does this make sense? It's hard to keep counts of all the fails in here:

- If your problem is a corrupted server, replace it and assign the same IP address.

- Why would anyone hard-code IP addresses for an API to work? Were you born yesterday?

- It should take at the very most hours to recover from a truly catastrophic failure. "Months" you say? You do not know what you're doing, I say.

- They have a bunch of backup and not one works, specially when there is no data loss? Smells to me like no source control; no off site backup, and absolutely no testing of the backup. If you have lived a single day in system admin work, you know that it is not enough to do a backup. You need to test the damned restore.

Maybe this could be a story about when _not_ to self-finance your startup. Get outside funding and hire people that know more than you do about the technology involved.

Sorry for being an asshole in my comments, but this situation is full of fail at every step.


I believe there is a complete, one sentence explanation that would ring true for most developers, but is not being offered for one reason or another.

My best guess: It would have been easier to get 1.0 back up and working, but this was viewed as an opportunity to finally push out the 2.0 and convert over the vast majority of the existing user-base in short order. The release was not tested well enough, and so this was a bad plan in hindsight.


Well, the worst time to change technologies or perform version updates is when there is a fire under your ass to get the system working, but that is a minor screw up compared to all else going on.

This whole thing stinks of lack of experience. No one should repeat the steps these guys are taking.


No need to get outside funding for any of this. It can all be done on the cheap.

I agree though, how can it take months to reset up a server? Late last year I had both of our supposedly redundant webservers fail at the same time - never happens, right? ;) We were able to get two entirely separate boxes setup with the same static addresses in under an hour due to maintaining all possible configuration in svn.

Worse, this was on windows servers where we couldn't use something like puppet or one of the other 20 solutions to script your configuration.


I guess a lot of this could be attributed to the fact that the TinyGrab developers are volunteers (at least according to the article). Like many open source projects they probably focused on the glamourous work of actually coding the app and not the boring stuff like backups. I'm not saying this is acceptable, I'm just suggesting an explanation.

I could sort of see not having a completely bulletproof backup strategy in place, but with all the options available, I can't imagine not working with source control.


Regarding the hard-coded IP piece, I've run into a similar scenario wherein the early versions of the distributed counters patch for Cassandra relied on vector-clocks which used IPs as part of those clocks. Certainly not a robust design in the case of machine failure, but I've seen smart people make awkward decisions, and if someone had gone into production with that patch and lost a machine, it would have been remarkably annoying to recover (it would have required rewriting all the SSTables to a new IP, which isn't impossible or anything, but kind of a pita).


You said it - you are indeed being an arsehole in your comments.

Firstly:

"...this could be a story about when _not_ to self-finance your startup"

That's easier said than done. There is absolutely no point in over analysing the situation and then concluding that TinyGrab should have got "outside funding" and "hired people that know more than they do" about the technology involved - because what is done, is done. You are seriously insulting the intelligence of everyone behind TinyGrab if you think they don't already know that version one was problematic. The point here is that the popularity of TinyGrab was never predicted or planned and that it grew from strength to strength entirely organically. At no point did they think about scalability from the outset - so they acknowledged the problem and were rectifying this with a full rewrite in TinyGrab 2. They we're simply unlucky in that they were unable to roll this update out quickly enough - they were hacked before they could complete TinyGrab 2, and therefore did the right thing and released TinyGrab 2 which was very nearly complete anyway. With the lack of income and voluntary manpower they had, they couldn't have changed anything in hindsight at all. So in that respect this was not "full of fail" as you so crudely put it.

Secondly:

What you need to do is put yourself in the shoes of those battling to actually keep TinyGrab online, and realise that they are actually trying their best to respond to every support request and resume the excellent level of service their users have come to expect - and frankly they've almost achieved that now.

What you also need to realise is that those behind TinyGrab are not in it for commercial gain - they are in it because they love the product and their users.

Since this is a completely free service they could have very easily - and might I add completely legally closed TinyGrab as of today, with no risk of litigation against them. They would also most probably refund existing premium users, but would not be legally obliged to.

What they have instead chosen to do, is keep this service online, do their best to get TinyGrab 2 to the bare minimum in order to roll it out completely, and work day and night as a team to respond to every support request and @mention on Twitter to keep users informed and solve their issues individually. I think this is absolutely admirable given their 300,000+ user base and could never be described as "full of fail" especially since the vast proportion of their users do not pay them a penny - and especially since those behind TinyGrab work in their own free time on the project whilst receiving no commercial gain whatsoever.

TinyGrab will easily recover from this over the coming days - all they need at this moment in time is some moral support and some understanding - a bit of slack if you like - not a bunch of anonymous, cynical, patronising and moronic nobodies trying to tell them how it should have been done in hindsight. As I said above, it's frankly just insulting the intelligence of those behind it - any reasonable person can see that they were fully aware of the pitfalls of TinyGrab version one and were trying their best to roll out version two. Give them a break.


Maybe the problem is that I _have_ been in these guys shoes (anyone that has been awakened at 3am with your system completely off line and bleeding money raise your hands).

The fact that it is a volunteer effort is pretty much irrelevant. One does not make worse decisions when one volunteers his/her time.

Rather than nitpick your points against my "full of fail" comment I will say this: The people involved had several choices to make in all of this, before and after the hack. Pretty much the key choices were mistaken: Not having a good backup in place. Not giving a second's thought to how to restore. "Taking the opportunity" to perform a major version change (to an unfinished version, no less) while the system was completely down. And on and on and on.

I really don't get why it matters if it is a volunteer effort or not. Volunteering does not mean "do low quality work", nor does it mean "not mission critical".

Had this all been handled differently, it would have been a "why we were doen for a couple hours yesterday" blog post instead of "why were barely working, and will be for the next few weeks".


"TinyGrab was attacked and the attacker was able to gain access to one of their servers. Although the details are scarce, the attacker corrupted the codebase for TinyGrab 1. The TinyGrab team was forced to migrate their entire userbase to the newest version, a version which wasn’t completely ready for yet."

I didn't find any mention of it in TFA, but do people just not do backups anymore? Or rolling, dated backups? My wife owns a small retail/web store, and I backup her Quickbooks, the database, and the source code nightly, and keep at least a few weeks of those backups on the drive where it's stored, 2 different backup drives, and and offsite place.

Is this overkill? Perhaps, but it would have saved us from that.


Apparently part of Web 2.0 is deploy and forget? How could no one have at least a snapshot of the code and database?


I was a bit confused when I first read about it. I'm sorry if I worded it a bit weird in the article.

My best guess is they broke something with how the TinyGrab client interacts with their internal API's. They did have backups, but it looks like it was deeper than that. That is my best guess.


I must be missing something. How is it possible that they don't have umpteen copies of the sources checked out of whatever source control system they are using? Even if the central repository was destroyed, every developer should have a more or less recent version checked out somewhere.


We deploy from a special deployment repository. This has a bunch of advantages: 1) You can revert to a known state. 2) You can see if anything has changed on individual servers using svn stat. 3) You have a history of what has changed on production.

Any scm would work just fine.


Looks like a PHP site. I'm guessing they weren't using version control and were instead just editing the files over FTP.


If that turns out to be the case .. I wouldn't go so far as to say they deserved it, but my sympathy would definitely be pretty limited.


I agree. There is absolutely no excuse for not having some sort of source control and deployment system in place at an organization with a commercially deployed solution. Why would anyone regard such a system as optional? It is irresponsible.


so php == ignorance? are we really going to make that assumption? thats just foolish.


No, but PHP easily allows a "edit source on server" as a deployment model.


Seriously, You dont even need a security attack for losing the changes with this model. A nice little "harddisk" failure would do the trick for you. :)


No, but given how often I've seen PHP developers do this, even at larger organizations, it's a plausible explanation.


Not at all. Just trying to figure out how they might have a dev environment setup where there isn't a copy on local developer machines. Not all languages make that easy. PHP does.


ok, i get your point now.

They may have been using version control, but if they were checking out to a server and editing remotely, and that server happened to be the same server where the central repo was stored, it could explain how one attack would get it all.


We're a PHP shop and we have version control. All but one of the companies I worked at used _some_ sort of version control, so it just doesn't make any sense that deploying an old release should be so tough.


We have more backups than you can shake a stick at. The issues lays with the server being compromised and being forced to shut down. The old API and apps relied on a fixed IP address. It would have taken us months to get TinyGrab v1 back up to scratch, or a few weeks to perfect 2.0. No data was lost, what would you choose?

Chris Leydon TinyGrab Founder and Project Manager chris@tinygrab.com


Now I'm even more confused. What was stopping you redeploying the server side app onto a new box and changing the hardcoded IP over to the new machine?


Exactly. As a lot of other comments have mentioned, using URLs would have been a better idea, but its irrelevant now. Assigning the same IP to a different machine would have been a perfectly acceptable (and most straightforward) solution. I fail to see why they didn't choose to do that.

We're all interested here in learning from tinygrab's mistakes, not criticize them after the fact. Why patronize us with that "would have taken _months_" quote instead of giving some technical details about what really the problem was?


Thanks for the explanation, but I still don't understand the problem. You re-image the server, fix whatever holes they exploited and re-install from a backup.

Applications should be designed for failure. Assume this will happen every week and plan for it. If not someone cracking your system then a hardware failure, or your ISP going bankrupt or any number of other disasters.


Fixed ip addresses? I apologise if I fail to understand why that should be necessary without assuming that someone is cutting corners. Surely such values should be configurable allowing you to easily move the site elsewhere?

I certainly think moving to 2.0 was a bad move as a broken service can be worse than a missing service (although that can depend on how "critical" the service is for people). Re-establishing 1.0 would be the sensible thing to do and if that is impossible then it sounds like you've been doing something wrong somewhere.


Could you explain the decision process behind using fixed IP addresses? It's not my area of expertise, but surely using a URL would provide you just as much control with vastly more flexibility.


It was an early and foolish decision we made back in the dearly days of TinyGrab. The original system wasn't built, or even created, to handle more than a handful of people using the service. Thanks to some great PR though we scaled to ver 400,000+ users almost over night. The code base was horrible, buggy and 2.0 was being written to fix all of that. It was supposed to be a smooth transition over, but unfortunately someone got there first and forced us to launch the new system.


We are seeing the downside of the "get version 1.0 out in front of users as quickly as possible" mentality.

Fine, get the first version in front of users sooner rather than later. But if you are not in a position to recover from an intrusion, disk failure, fire in the server room, etc. you are not ready to have users. You really need to address all those concerns before the first line of code is written, as without it all your investment in development is at risk.


Were there backups of the source code? The article seems to indicate code corruption on the application hosts, which would also indicate a VCS restore would have reverted the systems to the state prior to the attack.

OTOH backups that cannot be used to restore service are by definition, not backups.


But .. why can't you `grep|sed` your source code for the ip address? I'm confused.


I suspect it's hard-coded into all the clients -- thus they'd have to get everyone to download a new version of the client (which they are now doing, but straight to version 2.0).

Although it should not have been hard to get a new server up at the same IP.


This still doesn't make any sense. You have source code and the ip address. Why can you not configure a new server?


They did an extremely poor PR job. My account is no longer working. I would expect an email of apologies that explains the problem and buys them time. Instead, I got an account that mysteriously doesn't work anymore, and a password reset link that claims my account doesn't exist. I had to learn about what happened from a random HN post.


If you can shoot me an email, chris@tinygrab.com, I'll deal with your account and help get you set back up again.


Thanks Chris. Just sent an email.


"Like all premature babies, TinyGrab 2.0 just wasn’t ready for mainstream usage"

What the fuck?


Yes, this seemed like a particularly offensive metaphor. I suppose it works at a logical level, but to compare a human life to a piece of software seems kind of extreme.


I used to use TinyGrab all the time, it was a pretty indispensable part of my workflow, which is why, after nearly a week of it being dead, I got fed up and wrote my own:

http://dl.dropbox.com/u/651972/ScreenDuck-0.1a.zip

It's obviously very rough right now, but it actually works. Later on I'll write a website for it and put it up on screenduck.com.


This is the first I've heard of TinyGrab, but for Mac users looking for something else Cloud seems to do the same thing (that is, it can automatically upload screenshots and put the URL in the clipboard for you). http://getcloudapp.com/


A buddy of mine made this, works pretty well: http://grabbox.devsoft.no/


A good alternative you'll find is FileShuttle. - http://getfileshuttle.com


Why wouldn't they just put the old code base and fix the security hole? Why go to a new code base? Now you have two problems!


This situation makes no sense at all based on the explanations here by chrisleydon.

You have backups of the code - can you explain why it isn't a pretty straightforward process to set up a new server? The fact that you have a fixed IP address has no importance as far as I can tell, and that's the only explanation offered.

> It would have taken us months to get TinyGrab v1 back up to scratch

Really?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: