Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Facebook Method of Dealing with Complexity (acm.org)
146 points by ingve on July 25, 2015 | hide | past | favorite | 46 comments


> A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over, beginning with a working simple system. - John Gall

Mr Gall wrote a book about complex systems and how they fail back in the 70's and most of it (from what I recall it's been a while since I read it) applies as much now as it did then, it's a very good read.


Thanks for the book recommendation. Just had a Google and it looks great! Added to Amazon wish list


You are Welcome, It's genuinely excellent, be warned though once you've read it you'll see what it discusses everywhere.

For large organisations it's a bit like getting a look at Oz behind the curtain :)


Amazon has three books listed about systems by John Gall that look very similar, is there a specific one that you're talking about? or would recommend to start with?

http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Dap...


"Systemantics" is the original classic. "Systems Bible" is the same book, foolishly renamed, but otherwise great. That's the edition to start with.


Another Systemantics person! I profoundly wish that I had been exposed to this book during my education. Most traditional engineering education deals with predictable, small, compartmentalized systems. But the real world is none of these things, and engineering is the struggle to bring some semblance of order to chaos. Systemantics really outlines just what you can and can't expect out of any engineered system, and it stays in my head like a little crystal of knowledge that I draw from time and time again.


> ... decades of dealing with the public and the public servants have led to an accumulation of rules, regulations, practices and traditions clinging to everything like barnacles to a medieval man o’ war.

I strongly feel that "garbage collection" is grossly neglected in almost every organization and is part of the reason firms slow down as they grow older.

The article mentioned simplification of the Norwegian tax code. Are there governments with garbage collection and simplification of laws built into the legislative process?


In Texas, all state agencies other than those listed in the state constitution will be automatically abolished at some future date unless the legislature extends their mandate.

Prior to each agency's expiration, the Sunset Commission reviews their purpose and effectiveness and then makes a recommendation to the legislature to abolish the organization, continue with modifications, merge the agency under review with another or let it expire.

Agencies subject to sunset review: https://www.sunset.texas.gov/reviews-and-reports/agencies

Since the commission was created ~40 years ago, they've abolished 79 state agencies.


Personally, I'd like to see a US constitutional amendment more or less along the lines of:

> Any law that isn't routinely and consistently enforced is invalid.

...since in my view unenforced laws don't do anything except invite corruption.


This really bothers me, too. Common Good, started by Philip K. Howard, is one of the few organizations focused on the fact we're "drowning in law":

http://www.commongood.org/pages/the-problem

I'm a supporter.


> Are there governments with garbage collection and simplification of laws built into the legislative process?

I've often been thinking about the same thing. I think that most laws should have an expiration date, especially those that were enacted during "emergency" periods (e.g. everything legislated as a response to the Great Recession). The only reasonable exception are the core laws (e.g. the Penal Code), which should be replaced (not amended) by newer versions every so often.


> Are there governments with garbage collection and simplification of laws built into the legislative process?

Sort of, e.g. see:

http://www.bbc.co.uk/news/uk-politics-30334812


I admit that after finishing CS course at the university I had the same mindset regarding software engineering: understand the context, capture requirements (as much of them), and stick to it while developing subsequent parts of the system. In many occasions this is a trap: many details are just secondary, and would have probably never been articulated by the customer. Some of them seem to be important but then they are not. Sometimes some important problems appear somewhere around the end of the development, but we already have spent a lot of time working on some secondary issues.

Dealing with complexity by reducing or ommiting some problems are vital also for small and medium projects. This encouranges rapid creation of some working prototype and the overall process tends to shift towards more iterative development.


I was hoping for an article about code complexity, but this was good too.

This is one of the ideas behind building a Minimum Viable Product, and think it's a big strength that startups have. The product doesn't do everything. People will adapt to the product, and start using it in new and unexpected ways. And it's a lot cheaper and easier to build.

I'm curious if there are examples of large companies successfully simplifying existing products. How many features could gmail take away before they alienate too many of their users? Quickbooks, Microsoft office (the ribbon?), or any of the Adobe products?


I think Apple is the poster child for a big company doing this. They've been stripping away features on their hardware for decades. Now they sell a computer that is literally just a screen with one button and a switch or two. They also sell a single port laptop. Everyone screams bloody murder when they remove hardware features, but then people adjust and the industry follows.


The product doesn't do everything

But it does everything for some people or some use cases. I think that's an important distinction.


I think code complexity is often caused by feature complexity. Keep on tacking on new features and the code gets complicated.


I usually try to tell my boss that there are 3-people shop solutions and 20-people shop solutions, and he's proposing a 20-people shop solution to a 3-person shop.


I totally agree with this. Some people may say we need to cover every possible situation in a system, but this is true in theory, but practically we need some trade offs when designing a system. We have to calculate the gains and costs while considering fixing the complexities.


The catch tends to be if you're able to detect when you're in a more complicated situation and bail out accordingly. One very useful thing is auto-labeling, which turns out, getting it wrong is more problematic than one would expect.

Take for example the 'Gorilla' issue that Google encountered.

http://nyti.ms/1JlTBab [www.nytimes.com]

Then there was the issue with the bookmarks and suicide categories

http://bit.ly/1JFkmCr [productforums.google.com]

(This one isn't as big of a problem, as the frustration was probably more due to the compulsory usage of the new bookmarks, but it still shows some hazards of auto-categorization).

With social things, it can sometimes be very hard to solve a problem that works for 95% of the population, but doesn't piss off the remaining 5% (often with good justification). Unfortunately, detecting when you're in that situation in the first place is sometimes equivalent to the problem you're trying to solve.


Please don't use link shorteners. They just obfuscate the link target.


I don't understand why a big company like Google can put some feature online if it's not thoroughly tested or thought. It is because it is a web product and it wants the users to be the testers?


What would the testing plan look like?


Simple procedure: take a large sample (1M pictures or so), tag them with the system and then show each picture with the label to a person (using Mechanical Turk or similar) and ask "is this label offensive?"

Use simple statistics to identify trouble spots.


A related question - how you design a database for something as complex as a tax system? Given that tax laws can (and will change), that changing legislation often adds possible combinations to any field, and that there is surely a every "long tail" of unusual circumstances/combinations.

Is this a good use case for a document store, NoSQL database?


The largest issue isn't actually storing the data (though that is a big issue) the biggest problem (from a system point of view) is the 'business' logic.

I've worked on some large very heavy business logic heavy systems and often what you find is that the database ends up a dumb store and the logic is entirely handled on the application side, this is fine when the original system only spoke to the one application but becomes a massive problem once you want to allow other applications to access that data store, from experience this is one of the reasons why large enterprise re-writes often fail, the specification on paper vs the massive complexity of edge cases in the application code, years sometimes decades of if/else statements.

Of course the inverse is when the database does have all the business logic and the client is dumb while better that does often mean that you can't get the business logic out easily if the database has become obsolete (some of these systems aren't even SQL based or a dialect of SQL that barely looks like SQL, when you have stuff that was written in the 70's all bets are off).

It's things like this why IBM still sells zSeries with COBOL support for code that was written 30-40 years ago.

Not really sure what the solution is, I'd say some kind of business specific almost domain language that goes beyond SQL or Java (but we tried that, it was COBOL ;) )

As for your suggestion about NoSQL there is merit in that but if you go that route you have all the complexity of the business logic in your application as well as much of the complexity that would have been handled for you in the database via SQL (government systems are mostly amenable to normalisation).

Possibly some kind of SQL/Document hybrid (something like Postgres with JSONB) with a business logic layer over the top might work but I've no idea what that would look like (possibly something like SAP's ABAP which I haven't used but people I know who have don't like it very much), most of our current languages care more about pointers, bytes and function than they do about the actual modelling of the system they represent.


> Of course the inverse is when the database does have all the business logic and the client is dumb while better that does often mean that you can't get the business logic out easily if the database has become obsolete (some of these systems aren't even SQL based or a dialect of SQL that barely looks like SQL, when you have stuff that was written in the 70's all bets are off).

You have the same problem if the application implementation language becomes obsolete even if the logic is not in the database. Which your COBOL example illustrates.

That has nothing to do with where you put your business logic, it has to do with (on of the many reasons) why you need to have, and maintain, technology-neutral specifications documentation.


> why you need to have, and maintain, technology-neutral specifications documentation.

In theory that is absolutely correct, with an unambiguous and current spec you can re-implement the software in something else without the original system the problem there (other than trying to get programmers to keep the documentation in sync with the system...) is that a specifications document written in English (or any natural language) is way too ambiguous to get away with that so then you decide to use a constrained version of English to write your spec and then someone says well why can't we get the machine to understand this and....COBOL again ;).


A good insight into this is to take a look at how e-filing works. It's been several years since I've looked at it but you could find the documentation and XML schema files online from the IRS that companies like Intuit and others use to transmit their e-filings.

From what I recall the XML corresponded one to one with the paper forms. If you think about how the paper forms work it's pretty straight forward. You go top to bottom entering data and making calculations. Sometimes there are rules (if this line is less than X, goto Y) that you would have to write somehow.

As to how they store it I can only speculate. I imagine they're stored as blobs by return/form. I doubt they would be stored in some denormalized form as it seems unnecessary and difficult. They don't need to query across all people's returns at once. They're probably batch processed return by return.


XBRL [1] is how some parts of the ATO (Australian Tax Office) chose to do it, as well as a variety of other places.

[1]: https://en.wikipedia.org/wiki/XBRL


This is interesting, it's actually uses by the SEC as well [1]. I think it still does not have 100% adoption though.

[1]. http://www.sec.gov/Archives/edgar/data/1288776/0001288776140...


Here, have a look at this data model: https://github.com/OpenSourcePolicyCenter/Tax-Calculator


This is how you discriminate against the minority, quite literally.


Personally, I never really got it why companies/forms need things like Race or Sex (or Gender) or Title. What if, in letters they sent you, they addressed you as "Dear Tom" instead of "Dear Mr. Tom"? Problem solved. Even the government only needs these for the purpose of running statistics, and for the needs of the health system (in which case, three sexes are more than sufficient, and often just two options will do - "has an uterus" and "doesn't have an uterus").


In many languages you need a gender to correctly conjugate verbs and adjectives.


Unfortunately, yes. It's a shame to see your comment currently at the bottom, since it's the point I wanted to make.

I think the example of the Norwegian tax system was a good one, because it demonstrates the article's point without any risk of discrimination. But the earlier example of Gender is a terrible one. Gender is more than just whether you have dangly bits hanging off your pelvis, it's also a very critical part of a person's core identity, and there are a lot of different variations. Lumping everything other than cis-gendered male/female into "its complicated" is a great way to make anyone who doesn't fit into the gender binary feel like they don't belong.

Perhaps a more compelling example would be race. I know there's a fairly standard set of racial categories used when asked to self-report race (e.g. in those optional questions you can fill out when doing various standardized tests), which are fairly broad, and I'm pretty sure the last one is "Other". But at least with race, nobody feels like their racial identity is not recognized as legitimate by a large percentage of the world population (which is a serious problem that non-cisgendered or non-gender-binary people have). But even here, imagine if the test just said "White, Black, or It's Complicated". It should be easy to imagine that a lot of people would be pretty outraged over that.

All that said, if you capture the main categories, and then have a free-form "Other" that people can fill in the details, that's similar in spirit to "It's Complicated" but a lot more palatable. Heck, depending on why you're asking the question, you might not even need to save the answer to Other (e.g. if you're only ever looking at data in aggregate, although even then you might want to pull out keywords from Other in order to report more groupings than the form offers), assuming the people filling in the form have no way of knowing whether you're keeping that data or not. So in the backend you might still have the equivalent of "It's Complicated". But how you present that to the user is important.

---

Edited to add: Looking back at this, I realize that it looks like I'm just talking about whether the user feels fairly represented, which is related to but not the same thing as whether the form actually discriminates against minorities. So on that note, I just want to emphasize the importance of picking an appropriate set of main categories, even if you have an Other, such that very few people should ever resort to using the Other, and those that do should not end up getting unfair treatment as a result. For example, if policy is made based on the demographics of a particular set of people, that policy needs to account for anyone who doesn't fit into the standard groupings. If you just have e.g. "Male, Female, It's Complicated", policy is almost certainly going to be made based on the standard gender binary, without accounting for anyone who doesn't fit.

And on a related note to that, it's not even just the person filling out the form who might feel unfairly represented by a poor choice of categories. Anyone else who does fit into those categories is going to see the categories as a reinforcing of their worldview, i.e. anyone who sees "Male or Female" as a set of gender choices will just be reinforcing the incorrect idea that everyone fits into the gender binary, and this helps cause discrimination.


This problem sneaks up on us, because in non-emotive contexts it's normally acceptable to say "what percentage of users will be affected by this?", and if it's below a certain percentage it may be the case that it's not worth the cost of fixing the problem. However, whilst it's acceptable to give a degraded service to people who use Internet Explorer 6, it being presumed that they have the option of using a more modern browser, it's not acceptable to do the same to people with non-binary gender identities (I mean, you can do it, and plenty of people do, but it will upset at least some people and you're not going to have any good answer for their complaints). What looks like an edge case from an engineering perspective looks like a fundamental part of their identity to the person who doesn't fit the available categories.

In these cases, I think there are two approaches that satisfy the requirements of simplicity and humanity: think very carefully about whether you need that field and, if not, just remove it; alternatively, make it a free input field. If you think that collecting as much data as possible is a good thing, then the free input field gives you the best possible scenario - the strictly most accurate, detailed data direct from the person who is in the best position to tell you. If you don't need it, then just don't ask.


Of course, depending on why you're asking the question in the first place, a free input field can be a problem, because there can be many ways of expressing the same concept. So if you have a free-form input, you're going to need to figure out how to analyze that data to produce the groupings that you really want. But if you are capable of doing that, then absolutely, a free-form input field is the best way of avoiding unintentionally discriminating against anyone.

There's also a distinction here between questions that are objective vs subjective. Many would argue gender is objective, but they'd be wrong, it's subjective. You can't look at a person and tell them what their gender is; it's something only they can decide. So a free-form input field for Gender would be great, because everybody can feel fairly represented. But if you're asking for, say, Age, that's objective, and you can get away with letting the user choose from a set of ranges without any risk of discrimination (assuming you cover the full range of ages of all possible users).


So if you have a free-form input, you're going to need to figure out how to analyze that data to produce the groupings that you really want.

That suggests the root of the problem. Presumably the model is based on identifying a particular feature of the world as worth measuring [e.g. gender]. But boxing gender [e.g. masculine | feminine ] is not a feature of the world and the boxing means that the model does not correspond to the world in regard to gender, even though that was the purpose of capturing gender in the model. The idea of "getting the groupings I want" means my methods are suboptimal scientifically. The objective truths are in the data not in my interpretation.


Sometimes, you just want to find out if someone is a girl or a guy.

Not every question needs to become a forum for minorities to express themselves. Not every worksheet or webform should worry about "reinforcing the incorrect idea that everyone fits into the gender binary."

Walking on eggshells...


This comment here is a perfect representation of the problem. Most people do fit into the worldview of "girl or guy". But plenty of people don't fit into the gender binary at all,, which means there's no way to find out if they're "a girl or a guy", because the answer is "neither". And there's plenty of people who do self-identify as one of those 2 genders, but it's a different gender than most of the world insists they should identify as (I'm talking about transgender people here, if that's not immediately obvious), and they may not necessarily be open about it yet, or open to everyone, and being forced to pick "guy or girl" may therefore be problematic because either they have to give an answer that they may not be comfortable with everyone knowing, or else they have to give an answer that doesn't match their personal identity (the ideal solution in this particular case is to allow them to give no answer).

> Not every question needs to become a forum for minorities to express themselves.

This is perhaps the most problematic part of your comment. If a question allows the majority to express themselves, but denies that same right to minorities, or in fact refuses to acknowledge that the minorities even exist, that's absolutely discrimination. And perhaps even worse, when you have questions like "what gender are you?" that are asked often and where most questioners only accept male / female, that reinforces the incorrect idea that gender is binary and that there are only 2 answers, and effectively tacitly condones other forms of discrimination centered around the same question of personal identity. Which comes right back to your question. The reason you think it's perfectly ok to say "Sometimes, you just want to find out if someone is a girl or a guy" is because you're part of a culture in which the majority of people either refuse to acknowledge that there are more possible answers to that question or think it's perfectly acceptable to pretend that the minorities don't even exist.


And the fact that a M/F radio button is being called discrimination shows just how far we've come since the days of firehoses, police dogs, and internment camps.


It sounds like you're being dismissive, like you're saying discrimination is only discrimination if someone is suffering bodily harm or being deprived of their constitutionally-granted rights. I hope that's not how you meant it.


Then you can have a radio button:

   Girl or guy
   ( ) Yes
   ( ) No


Eh, that's only assuming your market is everyone.


Truth systems: yes. no. it's complicated.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: