Instagram was what, 12 employees when they got sold for a gazillion dollars? The...

ignoramous · on March 3, 2020

WhatsApp, similarly, had 30+ employees when they got acquired [0]. They built the fastest IM on the market with 450M+ users sending 1B+ messages everyday, and at one point surpassed Facebook in terms of number of images uploaded.

The engs they had were world-class, so really, I think, saying microservices (or latest-fad) get in the way etc is disingenuous since you also require world-class talent to begin with (if you're going to keep the team-size small and yet be able to manage crazy scale) and a competitor willing to pay through the nose for the acquisition.

[0] https://www.sequoiacap.com/article/four-numbers-that-explain...

nickbauman · on March 3, 2020

They chose Erlang. A language built for communication and managing wire protocols at scale. Which describes WhatsApp itself. That was probably the biggest impact single decision for WhatsApp technically.

ignoramous · on March 3, 2020

I am not sure if WhatsApp engs chose ejabberd because it was written in Erlang or because ejabberd was the defacto implementation of XMPP. They stumbled upon and fixed bugs in BEAM/OTP at their scale [0][1][2]. They also ran FreeBSD (for its superior networking?) on bare-metal hosts running customised system-images [3] and employed networking experts at some point.

[0] https://www.youtube-nocookie.com/embed/c12cYAUTXXs

[1] https://www.youtube-nocookie.com/embed/wDk6l3tPBuw

[2] https://www.youtube-nocookie.com/embed/93MA0VUWP9w

[3] https://www.youtube-nocookie.com/embed/TneLO5TdW_M

scurvy · on March 3, 2020

They ran FreeBSD because Jan came from Yahoo, and that's what Yahoo ran.

vaingloriole · on March 4, 2020

'customized systems (kernel)..FreeBSD..bare metal': Which is what most of us still do if we care about performance for a single application. Running openmp fortran code for both host/accelerator on oversubscribed vmware or vanilla kvm/qemu for a good laugh and a lot of pain. Simple systems approaches are best even when what you are doing is complicated.

dnautics · on March 3, 2020

I could be wrong but I think freebsd allowed multiplexing connections per port before Linux? Please correct me I am mistaken about this.

sitkack · on March 3, 2020

Erlang is like Bugman from the Nightmare Before Christmas. It is literally made out of microservices at every level.

Microservices are a design philosophy that people confuse as a deployment strategy.

cristaloleg · on March 3, 2020

> Microservices are a design philosophy that people confuse as a deployment strategy.

I'm going to print this and show everyone, great words, thank you.

dnautics · on March 3, 2020

Not necessarily, you can design erlang systems with more coupling than you'd expect in traditional microservices. Microservices mean that your codebase is segmented along business concern lines.

barrkel · on March 3, 2020

I would rather argue that microservices are best implemented when each service owns a bounded context in the lingo of domain-driven design. Business concerns might align with that, or they may not, depending on how technically complex (vs product complexity) the product is.

dnautics · on March 3, 2020

Yes, that is a better definition, but nonetheless you can design erlang systems with a single domain, or design erlang systems as a monolith, even if it has a billion actors running around underneath.

bcrosby95 · on March 3, 2020

Kinda, yeah, except Erlang is microservices made easier. I've loved how the language makes building distributed systems so comparatively easy since I ran across it back in 2005.

munchbunny · on March 3, 2020

That was probably the biggest impact single decision for WhatsApp technically.

As a rule of thumb based on my own experiences and the opinions of more experienced engineers I've had the good fortune to work with, language choice is far less important than the quality of the team using it.

While I have no doubt that trying to build WhatsApp in a language that would be the wrong tool for the job (say... PHP) would have been fatal, I have many more doubts that choosing Erlang was the key enabling decision.

Twisol · on March 3, 2020

Language choice I can perhaps agree with, but from what I've read, the biggest factor with Erlang is much more its runtime environment. BEAM + OTP is a very impressive piece of kit.

bcrosby95 · on March 3, 2020

I don't know. Erlang is a "funny" language that can somewhat easily do certain things that would be much more time consuming to accomplish in a more mainstream programming language. The difference between building whatsapp in Erlang vs any other programming language is probably larger than building it in php vs any other mainstream language.

ipnon · on March 3, 2020

Converse: Can you imagine WhatsApp parallel universe serving with Node?

jopsen · on March 5, 2020

Yes :)

Success is more to do with user experience..

If you hit gold and scale you can almost always optimize the implementation later.

skrebbel · on March 3, 2020

Iirc Slack's backend is PHP.

mkopinsky · on March 4, 2020

They've moved over everything to Hacklang, which is a derivative of PHP.

hinkley · on March 3, 2020

It's amazing how much effort it takes to do something with the wrong tools.

Sunk Cost Fallacy tends to fight any broad-stroke improvements. Until a competitor starts eating your lunch.

serbrech · on March 3, 2020

The thinking is backwards, if you are only 10 people, there I no need to shape the system in a microservicr fashion. You just extract the pieces that need scaling. If you are 60 devs, you need to split the system so that everyone can work on it without walking on each other.

therealdrag0 · on March 3, 2020

Also you can have so few devs when your value is in your network. Most other businesses' value is in their features. We have customers constantly begging for features, so we have to have more engineers to produce more value to our customers.

hinkley · on March 3, 2020

I believe that the instinct to over-engineer is based in part on bad prior experiences with trying to separate concerns after it's 'too late'. Either your own personal experiences, or those of your mentors.

Lacking any better skills to identify and avoid those problems when they begin, they try to stop it from happening in the first place. Fences get erected everywhere in case they might be needed, and they frequently turn out to be in not quite the right spot or shape. The code becomes coupled to the bad interface instead of to other code, and the fixes are just as bad.

YAGNI in theory is about trying to develop those other skills, but gets twisted into an excuse for bad tech debt loads.

papito · on March 3, 2020

As someone said, microservices is a technical solution to a people problem. Devs don't want to talk to each other so they wall off behind their own API. Boom, no need to talk to each other. Ever. Or is there?

barrkel · on March 3, 2020

Requiring everyone to talk to everyone so everyone has global context isn't just "devs don't want to talk to each other"; it is actually an information dissemination and coordination problem which scales non-linearly (at least n^2), and needs some kind of modularity to be tractable to normal humans.

Microservices are like modules but for SaaS rather than shrinkwrap, and are where you end up when you follow SLAs, encapsulation of resource consumption, etc. to their logical conclusion.

majormajor · on March 3, 2020

"Microservices" don't guarantee that everyone doesn't have to talk to everyone. Good thoughtful design is necessary regardless of how you're building/organizing/deploying/operating the code.

vikiomega9 · on March 3, 2020

> scales non-linearly

Indeed, https://en.wikipedia.org/wiki/The_Nature_of_the_Firm

marcosdumay · on March 3, 2020

Microservices are not the only way to split a system.

Far from it. They are the most onerous, least tractable way to get what you are going for.

krab · on March 3, 2020

I'm now working with a system where we add microservices. In my experience, they allow you to split a team that then owns its whole deployment cycle. This allows for easier hot-fixing and dealing with database schema changes.

There are now almost fifty developers and we already have a third (micro-)service. ;-)

I agree completely that starting with a monolith is a win for feature development, performance and operations. But if your team grows, your release cycle will slow down unless you decentralize it.

nradov · on March 3, 2020

That has nothing to do with the release cycle. A large monolith can still do continuous delivery and release multiple times per day. What tends to slow down is the feature delivery cycle.

krab · on March 3, 2020

Would you care to explain your reasoning? This topic genuinely interests me.

My experience is quite the opposite. To deliver a feature, you still need to integrate changes in multiple services. To do it in a monolith is IMHO easier. You have a single artifact you can test, the probability you have the right automation is higher. Also, the devs will run bigger part of the system in the development version.

What I was talking about was the latency of smaller changes. Think of bug fixes. Somebody has to notice it, triage, fix, (wait for automated tests) and deploy. Often, a bug fix affects just one service. With smaller services, this chain is simpler.

nradov · on March 4, 2020

There are no hard rules. In general a microservices architecture tends to enforce segmentation which makes it faster to iterate on features that only impact a single service. But coordinating larger changes across multiple services owned by separate teams can certainly be slower.

Monolith architectures often degenerate into the proverbial "big ball of mud" over time. But if the team has the discipline to maintain proper design through continuous refactoring then they can retain the ability to deliver features with short cycle times.

aguyfromnb · on March 3, 2020

Sure,but the value of Instagram came from the fact it was a good idea, well executed,at the right moment in time. I'm not sure they managed to solve a ton of complexity with a small team.

dandigangi · on March 3, 2020

Has to be true. They were dealing with major scale pre-purchase. Also... talented set of engineers.

jayd16 · on March 3, 2020

In my experience you do end up having external dependencies and more than one service. You do end up breaking out some code into special instance types, (high ram for video processing or what have you). These are problems you do have and do have to solve so you might as well come up with a plan. Deploying microservices really isn't that hard once you make it routine, imo.

But what do I know? I would not have expected Instagram to run things like user login on the same instance as photo upload and processing.

chrismarlow9 · on March 3, 2020

microservice vs monolithic is a shade of grey. what if all the code is in one codebase, but distributed to varying instance types that use some config flag to say the role it plays in that context. is that monolithic or microservice? I'd say the code is monolithic and the architecture is microservices. So it's some hybrid version.

what you're talking about (isolation of responsibilities) can be done and still be considered monolithic. you can also use the exact same instance type for everything and still be considered microservice.

I think what we're really talking about is containers vs machine images. And personally I think containers right now are suffering from the same abuse/hype that datastores suffered like redis/mongo/couch etc. Sure they have an application and solve problems, but they're being over used to the point of causing technical debt.

papito · on March 4, 2020

It doesn't mean that they do. Their system is partitioned to a point, and something like image processing is very easy to offload and expose via HTTP, for sure.

odiroot · on March 3, 2020

I would really love to read their codebase.

papito · on March 3, 2020

Read all about it: https://instagram-engineering.com/static-analysis-at-scale-a...

justaguyhere · on March 3, 2020

typically deploying to production around a hundred times per day

That is insane!

We make a release once a week at work and things still go wrong sometimes. I am in awe how they are able to pull this off, especially at their scale.

snovv_crash · on March 3, 2020

The more frequently you release, the less, or less serious, bugs there typically are, and the earlier you catch them.

The faster development cycle also helps people invest in testing infrastructure more effectively.

K0SM0S · on March 3, 2020

I'm trying to get a sense of magnitude, here.

Let's say 10-20 commits per day per dev. Over 5~10 hours that's what, on the order of 1 commit-test-release cycle every 15 to 60 minutes? (subjectively for each dev)

What do we actually write in that timeframe on average (thus including the ~90% of time we don't type code but think or read or test)? What's the "unit commit" here?

So I'm thinking... let's take an example: today I'll refactor a few functions to update our model handling; I wish to reflect our latest custom types in the code. So it's a lot of in-place changes, e.g. from some list to tuple or dict; and the syntax that goes with it. No external logic change, but new methods mean slight variations in the details of implementation.

- refactor one function: commit every testable change, like list to tuple? At least, I'm sure I'm not breaking other stuff by running the whole test suite every time it "works for me" in my isolated bubble. So I might commit every 5-10 minutes in that case.

- Now I'm touching the API so I can't break promises to clients: I actually need to test more rigorously anyway. I'm probably taking closer to 20-40 minutes per commit, it's more tedious. Assuming I commit every update of the model, even insignificant, I get immediate feedback (e.g. performance dump), so I know when to stop, backtrack, try again? And it's always just one "tiny" step?

- Later I review some code and have to go through all these changes. I assume it's easier to spot elementary mistakes; but what of the big picture? Sure I can show a diff over the whole process — I assume you'd learn to play with git with such an "extreme" approach.

Am I on the right track, here? I totally get your comment but I'm trying to get a feel for how it works. I typically commit-test-release prod 3-4 times a day at most (on simple projects), and typically more like once every 2-3 days, 2-3 times a week. Which is "agile" enough I reckon... So I'm genuinely interested here. I feel there's untapped power in the method I'm just beginning to grasp.

justaguyhere · on March 4, 2020

Are you assuming the tests are all 100% automated? If QA needs to take a look, how is it possible to have a commit-test-release cycle every 15-60 mins? I mean, it would take a human few mins to just read and understand what they need to test, isn't it?

The article talks about static analysis, I wonder if they do human code reviews at all?

Any which way we slice this, this is incredible! Sure instagram is not healthcare, transport or banking application - nobody is going to die if the website goes down, it is still an awesome achievement.

kamikaz1k · on March 4, 2020

Indeed, and not only are their tests automated, they also rely on production traffic to expose failure cases. Since some problems that are exposed are only applicable at scale. They use canary deployments to slowly ramp up traffic to the new version; rolling back if they detect anomalies.

Maybe you'll find this video interesting: https://youtu.be/2mevf60qm60

snovv_crash · on March 3, 2020

I think you've got it. Now put 10 people on the project, and have them all working at that pace.

K0SM0S · on March 3, 2020

Ah, awesome, thanks for the feedback.

pmarreck · on March 3, 2020

What’s their stack?

grouchoboy · on March 3, 2020

Primary I think is a Django project https://www.youtube.com/watch?v=lx5WQjXLlq8&t=10s

pmarreck · on March 4, 2020

I am straining to hide my lack of being impressed

nickbauman · on March 3, 2020

But microservices are an example of simpler systems. Each microservice does far less than the whole monolith does. You can read all the code in ~15 minutes.

I've worked at companies that have monoliths that are 50x more difficult to work on because of the size. Some of them millions of lines of code. Nobody really knows how they work anymore.

sgift · on March 3, 2020

> But microservices are an example of simpler systems. Each microservice does far less than the whole monolith does. You can read all the code in ~15 minutes.

Microservice usually means distributed system. A distributed system is more complex than a non-distributed system since it has to do everything the non-distributed system has to do and, additionally, handle all the distributed problems. Microservices just hide the complexity in places where people don't see them if they take a cursory look over the code, e.g. what is a function call in a monolith can be a call to a completely different machine in a microservice architecture. Often they look the same on the outside, but behave very differently.

The hierarchy of simplicity is: Monolith > multithreaded[1] monolith > distributed system. If you can get away with a simpler one it will save you from many headaches.

> I've worked at companies that have monoliths that are 50x more difficult to work on because of the size. Some of them millions of lines of code. Nobody really knows how they work anymore.

That is a bad architecture, not something inherent to a "monolith". There's probably also a wording problem here. A monolith can be build out of many components. Libraries were a thing long before microservices reared their ugly heads. What you describe sounds more like a spaghetti architecture where all millions of lines are in one big repository and every part can call every other part. Unfortunately, microservices are not immune from this problem.

[1] or whatever you want to call "uses more than one core/cpu"

jayd16 · on March 3, 2020

>Microservice usually means distributed system. A distributed system is more complex than a non-distributed system since it has to do everything the non-distributed system has to do and, additionally, handle all the distributed problems. Microservices just hide the complexity in places where people don't see them if they take a cursory look over the code, e.g. what is a function call in a monolith can be a call to a completely different machine in a microservice architecture. Often they look the same on the outside, but behave very differently.

This is a false assumption. Some problems are distributed. Sometimes you'll have an external data store or you'll need to deal with distribution across instances of the monolith. You really run into pain when you build a distributed system in your single monolithic code base and your monolithic abstractions start falling apart.

In my experience you end up solving these problems eventually, monolith or not. You might as well embrace the idea that you're deploying multiple services and some form of cross service communication. You don't need to go crazy with it though.

marcosdumay · on March 3, 2020

What assumption are you talking about?

Anyway, if your problem requires a distributed system, congratulations, you'll have to go to the top of that complexity hierarchy, and will have to solve all the problems that come with it.

That doesn't change anything about there being more problems. You just don't have any other option.

closeparen · on March 3, 2020

Simple example where this is not quite true: moving a slow, fallible operation out of band to a durable queue with a sensible retry policy will tend to make the system simpler and less brittle, even though it becomes distributed.

LargeWu · on March 3, 2020

Microservices, often, is just another word for "distributed monolith". Sure you can read the code of a single service quickly, but often in practice it's not possible to just make changes to one service. There are usually both explicit and implicit dependencies that span many service layers. What you gain in readability I think you often lose more in maintenance overhead.

nickbauman · on March 3, 2020

I worked on a microservices practice (with over 400 microservices by the time I left) where it definitely was not a "distributed monolith". I could change all kinds of individual services that did not require changes to other services.

SPBS · on March 3, 2020

I think monoliths can be written the same way such that you can change one module without changing the others. Microservices enforce that best practice.

goostavos · on March 3, 2020

"Enforce" gets thrown around a lot, but that's approaching silver bullet expectation levels, in my opinion.

A core skill problems on a team that would prevent building a maintainable monolith do not go away because you've added more things for them to manage.

krab · on March 3, 2020

The complexity is still somewhere. You can structure your monolith in a way that you can read each module's code in ~15 minutes. Each module also can be developed and tested in isolation.

However, if the organization is not capable of structuring the monolith, why should it be successful with microservices?

Such organization may lead to sharing data stores and custom libraries between microservices and that's when the real fun begins. Maybe even trying to deploy all of the services atomically to not worry about API compatibility.

darkerside · on March 3, 2020

Underrated point, although incomplete from my perspective. 2 simple systems > 1 complex system.

But 1 semi-complex system > 10 simple systems. Especially when you consider the points of integration between those systems increases geometrically with the number of systems.

dspillett · on March 3, 2020

Microservices are simple components of what could be a simple or a complex system. If things are overly broken down then unnecessary complexity could easily be added.

hopia · on March 3, 2020

Microservices make you architect your whole service differently. The communication is essentially asynchronous message passing which may or may not make things more difficult.

jjtheblunt · on March 3, 2020

Excellently said