Open-sourcing Riskquant, a library for quantifying risk

batterseapower · on March 9, 2020

If you like the idea of this library you'll probably like the book "How to Measure Anything" by Douglas Hubbard (https://www.goodreads.com/book/show/20933591-how-to-measure-...). It's all about how to get sensible confidence intervals for things that are often considered unmeasurable such as the value of IT security. The book mostly uses Excel to do this modelling, but it looks like riskquant would be an excellent alternative on that approach, that for the more technically minded practitioner.

jacques_chester · on March 9, 2020

The relevant book for this is Measuring and Managing Information Risk: A FAIR Approach by Freund and Jones[0].

Both books are worth reading; Hubbard's influence on FAIR is noticeable and positive. FAIR has the advantage that it comes with a fairly built-out ontology for assembling data or estimates. The OP touches on the top level (Loss Event Magnitude and Loss Event Frequency), but the ontology goes quite deep and can be used at multiple levels of detail.

The calculations are not difficult, I've implemented them twice in proofs-of-concept, including one that produces pretty charts.

The difficult part, to be honest, is that developing good estimates is difficult and frequently uncomfortable and the gains are not easily internalised.

Additionally, serious tool support is lacking in the places where it would make a difference -- issue trackers, for example.

[0] https://www.amazon.com/Measuring-Managing-Information-Risk-A...

edit -- Another good book in this area is Waltzing with Bears by DeMarco & Lister. A short, funny, insightful read, as you'd expect from the authors of PeopleWare: https://www.amazon.com/Waltzing-Bears-Managing-Software-Proj...

lifeisstillgood · on March 10, 2020

I regularly put a risk / loss / impact assessment into my issue tracker tickets - it's not the tool support is not there, it's that

a) everyone else needs to do this across the board

b) it's still just a guess - normalising my guess and you're guess is hard

mindcrime · on March 18, 2020

b) it's still just a guess - normalising my guess and you're guess is hard

True, but that's where the calibrated probability assessment stuff comes in. If you can at least establish obviously correct (if very course grained) upper and lower bounds for estimates and then gradually shrink them in until you aren't confident doing so anymore, you have something at least slightly better than a complete guess. And if you can choose the correct distribution that the actual values would vary over, then you can use Hubbard's approach using a Monte Carlo simulation, and get some insight into likely outcomes.

No, it's not a perfect approach, but it gives you a little something to hang your hat on.

jacques_chester · on March 10, 2020

Tool support would ideally both of those problems easier. But good estimating is in itself fundamentally a "system 2" activity.

malenos · on March 10, 2020

system 2: is that a reference to 'no silver bullet'?

elbear · on March 10, 2020

I assume it's a reference to Kahneman's "Thinking, Fast and Slow".

malenos · on March 10, 2020

Thanks

thanatropism · on March 10, 2020

I have that book. Basically I’m the last in a giveaway chain and can’t honestly recommend it enough that someone should lug it home. Next time I move it’s going on the trash.

It’s really not very good, even for executives who shouldn’t care for technicalities. The best thing are the calibration exercises. But my advice is, skip this one.

qznc · on March 10, 2020

Can you elaborate?

I'm half-way through it. I know most of the general stuff already but my knowledge is from lots of sources I mostly forgot. This books seems to be a good collection for this topic. At least, I don't know any substitute.

thanatropism · on March 10, 2020

It overstates its claims. An admittedly unfair "take" would be that it encourages people to pin made-up numbers and take solace in having quantified the qualitative.

For example: it tells you to do Monte Carlo simulations with made-up probability distributions, but silently lets the issue of the joint distribution -- how the made-up random variables correlate. But so much risk is driven not by marginals (say, I know the physical parameters for my chair and have a certain confidence that it won't crumble like carboard) but by correlations, even apparently distant ones (there's fracking or whatever going on and destabilizing the upper layers of the Earth, increasing the chances of earthquakes).

An even broader critique is the McNamara fallacy:

> "The first step is to measure whatever can be easily measured. This is OK as far as it goes. The second step is to disregard that which can't be easily measured or to give it an arbitrary quantitative value. This is artificial and misleading. The third step is to presume that what can't be measured easily really isn't important. This is blindness. The fourth step is to say that what can't be easily measured really doesn't exist. This is suicide."

----

There's this book I like called "Guesstimations in the back of a napkin" or something -- it's a book of exercises in Fermi-type estimation. It encourages you to consider the whole, to think bottom-up, top-down and core-out. It preserves a keen sense of the qualitative complexity of problems even as it encourages you to, well, make wild guesses.

mindcrime · on March 10, 2020

There's this book I like called "Guesstimations in the back of a napkin" or something

This? https://www.amazon.com/Guesstimation-Solving-Worlds-Problems...

mindcrime · on March 10, 2020

It’s really not very good

Compared to what?

But my advice is, skip this one.

And read what instead?

Not trying to start an argument here, I'm genuinely curious, as I consider How To Measure Anything to be one of the best books I've ever read (and I read a lot of books), and I recommend it highly to, well, pretty much everybody. If you feel that there's a better resource out there that relates to these topics, I'd be curious to know about it.

thanatropism · on March 10, 2020

I'm not very fond of Taleb, but well -- anything by Taleb.

Exercise books for Fermi estimates like Guesstimation, etc.

Further out, something in systems thinking, maybe Donella Meadows' "Thinking in systems". Further further out, maybe those Stafford Beer papers about the Viable Systems Model? At one point Beer and Allende thought they were about to implement Red Plenty.

---

I understand the businessy logic that nothing is so fundamentally qualitative that it shouldn't be quantified. But you'll always be safer if you keep rich qualitative models and treat quantification as gravy on top of that.

The extreme opposite of rich qualitative models is the Soviet method of material balances. Halfway through there's the McNamara Fallacy:

https://en.wikipedia.org/wiki/McNamara_fallacy

mindcrime · on March 10, 2020

I'm not very fond of Taleb, but well -- anything by Taleb

Yeah, Fooled by Randomness and The Black Swan were both pretty good. I haven't necessarily thought of them as significantly overlapping with the HTMA stuff up until this point, but now that you mention it I can see a connection. I should probably go back and re-read both, and read Antifragile.

maybe those Stafford Beer papers about the Viable Systems Model?

Hmm... never heard of "Viable Systems Model" before, so I'll have to go read up on that. Thanks for the pointer.

Exercise books for Fermi estimates like Guesstimation, etc

I'll take a look at Guesstimation. Thanks for the pointer on that as well.

But you'll always be safer if you keep rich qualitative models and treat quantification as gravy on top of that.

I can buy that. I'm a fan of using approaches like Hubbard's to quantify things to a point. I do think his approach can supply a bit of extra rigor and some useful bounds to things that otherwise seem impossible to quantify at all. But it's not a perfect system by any means. The two biggest risks, so far as I can tell, would be leaving a variable (or more than one) out of your model completely, or using the wrong probability distributions for the various variables when doing the simulation part.

juskrey · on March 10, 2020

You can quantify in normal domains e.g. probability of seeing a human over 2.05m tall. You CAN'T quantify tail risk in unbounded (for practical purposes) domains, period. This is just ridiculous: "For this example, there’s about a 2% chance losses would exceed $60 million in a year." If you see you can lose everything, you are not quantifying, you are simply not going that way. If you don't know if you can protect yourself from losing everything, you are not going that way. If you can afford losing some limited amount you just write down the loss from the very start. That is that simple.

Netflix is just going straight to hell with such prediction approaches. And, mind that, this is not a prediction, this is a fragility statement.

bernardv · on March 10, 2020

Oh yes you can. The insurance and reinsurance actuaries, and regulators, have done this for a very long time.

juskrey · on March 10, 2020

Insurance and reinsurance just clip the tail contractually, they never take "infinity multiplied by some small percent" risk! They don't go underground when their client does.

Chris2048 · on March 10, 2020

> they never take "infinity multiplied by some small percent" risk

At what point does the tail represent "all humanity dead"? No one needs a contract for the humanly unrecoverable.

juskrey · on March 10, 2020

We are talking about business death

Chris2048 · on March 10, 2020

then the tail ends even before then, but that doesn't mean insurance companies don't pay out billions in case of an economic crash.

harperlee · on March 10, 2020

They are typically exempt in case of natural disasters, war, force majeur, nuclear effects, and a well-thought-out plethora of tail risk events.

juskrey · on March 10, 2020

Yes, to be exempt from tails is basically the definition of their business.

Chris2048 · on March 10, 2020

Whose business? Those that don't cede such risk to larger entities?

juskrey · on March 10, 2020

And larger entities, I guess, are connected to some extraterrestrial gold source, right?

Chris2048 · on March 16, 2020

no, they have limits. They still insure those types of risks (natural disasters, etc).

Chris2048 · on March 10, 2020

Some (re-)insurance companies specialise in exactly these kind of things.

juskrey · on March 10, 2020

Sorry, I barely understand, what insurance against economic crashes?

Chris2048 · on March 10, 2020

I didn't mention "insurance against economic crashes" - there is a crash, and this has the effect of lots of payouts.

Some event may cause an economic crash, companies can have insurance against business disruption. If an event suddenly causes a lot of insured business disruption, that might mean a large total payout.

pps43 · on March 10, 2020

Not clear why the losses should follow lognormal distribution rather than some less pleasant to work with distribution like Cauchy.

Also not clear why loss frequency is assumed to be independent from loss magnitude.

Instead of "percent chance that the loss will exceed X" (VaR), expected shortfall would be more useful.

jacques_chester · on March 10, 2020

The distribution comes from PERT 3-point estimates.

LEF and LEM are built of lower factors that roll up, which to my mind at least seem independent.

I'm not sure what you meant by expected shortfall.

bernardv · on March 10, 2020

The choice of distribution is subjective, especially if you little data to start with and are going on theoretical grounds. Agree on expected shortfall/TVaR.

thenightcrawler · on March 9, 2020

Nassim Nicholas Taleb is enraged by this!

rq1 · on March 9, 2020

Why would he be?

I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail.

syndacks · on March 10, 2020

>Why would he be?

Read Black Swan, by NNT. His thesis is that the most significant impacts on an individual and/or a collective are the function of outliers or "Black Swans". In a Malcolm Gladwell kind of way, he uses a bunch of anecdata to prove his thesis. It's a pretty good book I suppose, but it was also before the all the faux-philosophy Malcolm Gladwell type books became a thing.

>I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail.

What exactly do you mean by that? In all sincerity, I think you misused an idiom or at least didn't make a clear connection as to why you chose that one.

tajd · on March 10, 2020

> In a Malcolm Gladwell kind of way, he uses a bunch of anecdata to prove his thesis.

In my opinion this is perhaps the best 1 line description of his work in general. I think there are some interesting ideas which are worth discussing, but sometimes his attitude and use of "anecdata" tires me out.

DyslexicAtheist · on March 10, 2020

loved all his books. he literally captures the problem with this approach in a single "book title" (3 words total) https://en.wikipedia.org/wiki/Fooled_by_Randomness

caseyf7 · on March 9, 2020

I wish there were detailed scenarios of how they were using it.

fancyfredbot · on March 9, 2020

It doesnt seem to have an input for correlation between the loss scenarios? That would affect expected loss a lot.

veeralpatel979 · on March 10, 2020

Link to the GitHub repo: https://github.com/Netflix-Skunkworks/riskquant

rubyfan · on March 9, 2020

This seems like applying insurance modeling to security. Is this a new way of looking at risk or a reinvention?

TeMPOraL · on March 10, 2020

Neither; the idea has been around for many years now (source: worked professionally on exactly these calculations this library does, in the same space). It's just basic probability theory and Monte Carlo simulations; nothing anywhere near as complex as the quants/insurance folks do.

IME, the biggest problem in this space is: on the one side, you have all kinds of metrics related to hardware, software and operations in your company; on the other side, you have FAIR and related ideas that let you model scenarios in a sane way, and let you plug in those statistical methods. But, there's no good idea how to connect these two sides. There is a vast gulf between what you can reliably and accurately measure, and what you actually want to know. It's hard to cross it while maintaining any kind of accuracy or statistical validity. I've been doing work on this "middle layer", but never got the chance to test it out in the open, as the company I worked for suddenly imploded.

(If anyone is working in this space and is accepting remote workers, I'm available for a chat.)

Eridrus · on March 9, 2020

This is part of an attempt to make information security risk modelling more quantitative that has been going on for a few years. There's very little in the way of data to really back most of this up, but actually putting numbers to things is significant progress IMO.

rubyfan · on March 10, 2020

Agree it seems like a better way to inform decisions and manage risk. Is it backed by any sort of real understanding of litigation, settlement, statutory experience? I often hear reputational risk cited by security teams at public companies... it usually follows some sort of indecision or appeal to a higher level of management or beuqacratic tool.

Eridrus · on March 10, 2020

Not really. But it's significant progress compared to sticking charts like this into reports https://www.shipownersclub.com/media/2017/02/Risk-matrix-1-7...

I think it's also worth saying that when people say reputational risk they're generally not thinking about litigation, they're thinking about consumer perception, which it's not crazy to believe would have large impacts for large companies.

im_down_w_otp · on March 9, 2020

Looks like parts of FMEA w/ some specific measurements to inform occurance & severity.

veeralpatel979 · on March 15, 2020

I chatted with Markus de Shon, one of the creators of riskquant, a few days ago. I would also check out Ryan McGeehan's writings on this at scrty.io.

I'd be happy to discuss this topic with anyone; my email address is in my profile.

maxymajzr · on March 10, 2020

I see an interesting library, risk quantification (relates to what I work with).

I click the link.

Sniff/Accountwall stops me. I hit ctrl+w before I even thought of doing it (yes for muscle memory!)

For people who went through the trouble to see the content, is the library worth it?

These days, anything behind paywall/sniffwall/must-be-authenticated-to-read-this-wall warrants a ctr + w from me.

Dowwie · on March 10, 2020

Focus on the stress tests and worst case scenarios, Netflix.

Jugurtha · on March 10, 2020

I suppose this is the impetus. I think that this gives them a way to prioritize which issues to fix in a better than the arbitrary hunch way. We assign "weights" all the time to issues either based on expected "time to complete" or on the severity of not fixing this. However, someone's severe is another's meh. This could give a score in a systematic way and help alleviate the guesswork.

Dowwie · on March 10, 2020

Risk is managed differently in places that have experienced actual loss. No need to try to educate me about weighted decision making.

Jugurtha · on March 10, 2020

I was not trying to "educate" you, not that I see anything wrong being educated. I was brainstorming with you, hence the "I think" and "I suppose". I exposed my thoughts in writing.

Your reply could have quoted my reply, with a comment that said "maybe, but [insert counter argument, or poke holes in my statement's logic]".

Chris2048 · on March 10, 2020

I feel "riskquant" is a little vague for a project name. It's like there are a million internal libraries called "riskLib" - "risk" is a very general concept!

leecb · on March 9, 2020

Requires registration to read.

Tab closed.

muglug · on March 9, 2020

If the authors are reading, I think you mistakenly marked this as a paywalled article. I can't believe this was done deliberately from someone on the Netflix side.

sheerun · on March 10, 2020

Can someone explain to me why Netflix puts their posts behind a paywall?

nurettin · on March 10, 2020

It looks as if they just registered for medium.com and slapped their domain on it.