The lighter let-it-crash is a circuit breaker. This is done quite frequently in the JVM world because .. well the JVM has a really shitty startup time.. even restarting threadpools can be expensive.
I get the whole let-it-crash but I really would like more tools on feedback control and backpressure handling (ie whats the right amount threads to allocate and how many failures/timeouts should you allow etc...). Even monitoring is a pain (ie too many alarms). I don't know if erlang provides libraries for this but its a hard problem (see https://github.com/Netflix/Hystrix/issues/131).
'Let it crash' is a philosophy geared toward handling errors.
Circuit breakers are geared toward handling resources that may become unavailable.
While they seem similar, they're conceptually very, very different. Let it crash is mostly for things where one's own code, one's own state, may end up faulty, and where recovering in a known good state will solve the issue. And it turns out this is really effective for most 'bugs'.
A circuit breaker is where -external- state, environmental state if you will, may become faulty. This is really effective not for 'bugs', but for predictable periodic issues such as one's network going down, a database becoming inaccessible, etc.
Everyone who writes a reasonably complex system in Erlang, that interfaces with systems external to it, learns the shortcomings of applying 'let it crash' to those instances (a network hiccup overloads your supervisor threshold with crashes, leading to parts of, or the entirety of your system going down), and goes looking for (and hopefully finding) the circuit breaker pattern.
Sadly, they are not mentioned much in books or other documentation, despite being a potentially extremely useful piece of infrastructure for some kinds of projects.
What we do in order to make the ideas of load regulation (see https://github.com/jlouis/safetyvalve or https://github.com/uwiger/jobs ) and circuit breakers is that we "prove" them correct by extensive use of property based testing. That is, it is highly unlikely that these tools have errors under production runs because the corner-cases tested for them are far more complex than what a normal program would do.
The reason it is nice to have circuit breakers is what Fred touched on in another thread here: you want to gracefully degrade a system, even if parts of it is temporarily down, either due to error or for maintenance. You can thus keep up the processes that are proxying for the underlying cascading dependency, and turn faults into terms of the form `{error, system_unavailable}` which lets you turn an implicit crash into an explicit error path.
Chapter 3 of Erlang in Anger (http://www.erlang-in-anger.com/) does mention them among other strategies in handling overload (3.2.2). I tried to put as much concise production experience as I could into that manual. Hopefully it proves helpful!
The JVM does not have a shitty startup time. Starting up a JVM takes 50-80ms. What takes time is HotSpot's warmup -- getting to peak performance. Erlang doesn't have this problem simply because it never gets anywhere near HotSpot's performance.
As to thread pools, that's an apples-to-oranges comparison. Erlang's processes should be compare to Java tasks or fibers; not to Java's heavyweight threads.
I agree with you and probably should have made that statement more specific (ie the extreme class loading that typically happens in most Java apps and what exactly is a full started up app). A typical closure app for example is well well above 50-80ms time to being ready to receive requests.
As for the threads the same goes. I agree with you that ideally should be the case but in practice there are so many libraries that boot up their own thread pool (for isolation reasons, or because they are using blocking IO... rabbitmq).
BTW I'm a big fan of all your concurrency work and I too agree that subscribers are sort of hard to get right in reactive-streams and could be easier (I think that was you) :)
I get the whole let-it-crash but I really would like more tools on feedback control and backpressure handling (ie whats the right amount threads to allocate and how many failures/timeouts should you allow etc...). Even monitoring is a pain (ie too many alarms). I don't know if erlang provides libraries for this but its a hard problem (see https://github.com/Netflix/Hystrix/issues/131).