The lighter let-it-crash is a circuit breaker. This is done quite frequently in ...

lostcolony · on Feb 9, 2016

'Let it crash' is a philosophy geared toward handling errors.

Circuit breakers are geared toward handling resources that may become unavailable.

While they seem similar, they're conceptually very, very different. Let it crash is mostly for things where one's own code, one's own state, may end up faulty, and where recovering in a known good state will solve the issue. And it turns out this is really effective for most 'bugs'.

A circuit breaker is where -external- state, environmental state if you will, may become faulty. This is really effective not for 'bugs', but for predictable periodic issues such as one's network going down, a database becoming inaccessible, etc.

Everyone who writes a reasonably complex system in Erlang, that interfaces with systems external to it, learns the shortcomings of applying 'let it crash' to those instances (a network hiccup overloads your supervisor threshold with crashes, leading to parts of, or the entirety of your system going down), and goes looking for (and hopefully finding) the circuit breaker pattern.

davidw · on Feb 8, 2016

Erlang has circuit breakers too, like this: https://github.com/jlouis/fuse

Sadly, they are not mentioned much in books or other documentation, despite being a potentially extremely useful piece of infrastructure for some kinds of projects.

jlouis · on Feb 8, 2016

Author of fuse here :)

What we do in order to make the ideas of load regulation (see https://github.com/jlouis/safetyvalve or https://github.com/uwiger/jobs ) and circuit breakers is that we "prove" them correct by extensive use of property based testing. That is, it is highly unlikely that these tools have errors under production runs because the corner-cases tested for them are far more complex than what a normal program would do.

The reason it is nice to have circuit breakers is what Fred touched on in another thread here: you want to gracefully degrade a system, even if parts of it is temporarily down, either due to error or for maintenance. You can thus keep up the processes that are proxying for the underlying cascading dependency, and turn faults into terms of the form `{error, system_unavailable}` which lets you turn an implicit crash into an explicit error path.

mononcqc · on Feb 8, 2016

Chapter 3 of Erlang in Anger (http://www.erlang-in-anger.com/) does mention them among other strategies in handling overload (3.2.2). I tried to put as much concise production experience as I could into that manual. Hopefully it proves helpful!

thedudemabry · on Feb 9, 2016

Since you're here, I just want to thank you for the most thorough, accessible, and pragmatic Erlang writing I've run across. Cheers!

pron · on Feb 9, 2016

> well the JVM has a really shitty startup time

The JVM does not have a shitty startup time. Starting up a JVM takes 50-80ms. What takes time is HotSpot's warmup -- getting to peak performance. Erlang doesn't have this problem simply because it never gets anywhere near HotSpot's performance.

As to thread pools, that's an apples-to-oranges comparison. Erlang's processes should be compare to Java tasks or fibers; not to Java's heavyweight threads.

agentgt · on Feb 9, 2016

I agree with you and probably should have made that statement more specific (ie the extreme class loading that typically happens in most Java apps and what exactly is a full started up app). A typical closure app for example is well well above 50-80ms time to being ready to receive requests.

As for the threads the same goes. I agree with you that ideally should be the case but in practice there are so many libraries that boot up their own thread pool (for isolation reasons, or because they are using blocking IO... rabbitmq).

BTW I'm a big fan of all your concurrency work and I too agree that subscribers are sort of hard to get right in reactive-streams and could be easier (I think that was you) :)

pron · on Feb 9, 2016

> A typical closure app for example is well well above 50-80ms time to being ready to receive requests.

That is because Clojure does a lot of stuff when it loads. It's got very little to do with the JVM.

And thank you :)