Hacker Newsnew | past | comments | ask | show | jobs | submit | malf's commentslogin

This is why it annoys me when journalists say P!=NP is about scheduling flights.


If I leave other people’s stuff that I promised to take care of on the street and it gets stolen, I would be to blame.


blame isn't mutually exclusive. you can still blame the person that stole it too!


> It's a solved problem on the CPU side of things

Is it? gestures at pile of cpu bugs


I always assumed you had to talk your way into some committee to submit one of these. Apparently it’s wide open. But why do it? Resume padding?


> common sense

75% of people surveyed drink coffee every day?


Everybody breathes 70% of nitrogen mixture. Nitrogen at elevated pressures has narcoleptic effect. It's common sense that even at atmospheric pressure nitrogen also has some narcoleptic effect. That is in fact true, as breathing helium mixture at atmospheric pressure improves reaction times.

People don't necessarily do what common sense would say if they have to do something else for other reasons.

Caffeine doesn't stimulate you, it just makes you unaware of how tired you are. You are still tired with many of the downsides of being tired like the negative impact on learning. People who drink coffee routinely have to do it to mask how tiring their lives are.


I dunno. I drink coffee because I like to get up in the morning, grind the beans with my hand grinder, boil the water in the kettle, and put everything together in the French press. It's a transition moment that I take for myself from bedtime to day time. I like to have it after lunch to transition from lunch time where I am typically reading a novel back to work where I'm sitting and writing codes. I like to have a coffee at the coffee shop on the weekend because I can go for a walk with my gf and talk about random shit or plan a holiday or something. I'm not convinced my life is especially tiring except for all the drama I seem to create for myself inside my own head. That's a bit tiring. And so far, coffee seems to offer no respite to that self imposed exhaustion.


You can do all that with decaf beans. Besides turning something into quaint ritual doesn't alter purely biological impact in any way. You could describe in very close terms lighting a joint every day after coming back home. Or drinking a beer, but it doesn't make you any less high or drunk.

Caffeine strongly dysregulates nervous system and sleep patterns. Surprising thing for many people is that after stopping caffeine a lot of nervous drama inside your head might simply fade away.


I love my coffee too and would never give it up. But if you know some smokers they will say the exact same thing about their smoking addiction.

“I don’t really smoke for the nicotine; I enjoy rolling the cigarette, the sensation of holding the cigarette, it gives me something to do with my hands, I like to get fresh air, it’s a social activity, etc.” The human mind is extremely good at rationalizing addiction.


Interesting interview with a award winning barista and coffee entrepreneur that says that caffeine ingestion is something that you should be mindful about:

https://www.youtube.com/watch?v=TqNrJNhcf5g


It’s also delicious, aside from the psychoactive effects. Good coffee, that is.


Decaf is exactly the same.


I'm going to have to argue that point. The best decaf is just ok. Typical decaf is pretty noticeably bad.

I go on decaf stints occasionally, and not opposed to it. I'm grateful it exists. But you do pay for it in taste.


I wonder how it fares in double blind comparisons.

Maybe you can tell, maybe not?

https://www.huffpost.com/entry/decaf-vs-regular-coffee-taste...


Hey, i wasn't aware of nitrogen's narcotic effects. That led me down to this hilarious madness: "Hydreliox is an exotic breathing gas mixture of hydrogen, helium, and oxygen" https://en.wikipedia.org/wiki/Hydreliox

Still, comparing breathing air with drinking coffee is one hell of a bad analogy. Ironically you did succeed in showing the ambiguity of common sense by your own lack thereof.


Some people may prefer that to “oh no, Steam did rm -rf $typo/“


This page tries to send expensive SMS messages.


“Using modern experiment frameworks, all 3 of ideas can be safely tested at once, using parallel A/B tests (see chart).”

Nooo! First, if one actually works, you’ve massively increased the “noise” for the other experiments, so your significance calculation is now off. Second, xkcd 882.


> Nooo! First, if one actually works, you’ve massively increased the “noise” for the other experiments

I get that a bunch at some of my clients. It's a common misconception. Let's say experiment B is 10% better than control but we're also running experiment C at the same time. Since C's participants are evenly distributed across B's branches, by default they should have no impact on the other experiment.

If you do a pre/post comparison, you'll notice that for whatever reason, both branches of C are doing 5% better than prior time periods, and this is because half of them are in the winner branch of B.

NOW - imagine that the C variant is only an improvement _if_ you also include the B variant. That's where you need to be careful about monitoring experiment interactions, I called out in the guide. But better so spend a half day writing an "experiment interaction" query than two weeks waiting for the experiments to run in sequence.

> Second, xkcd 882 (https://xkcd.com/882/) I think you're referencing P-hacking, right?

That is a valid concern to be vigilant for. In this case, XKCD is calling out the "find a subgroup that happens to be positive" hack (also here, https://xkcd.com/1478/). However, here we're testing (a) 3 different ideas and (b) only testing each of them once on the entire population. No p-hacking here (far as I can tell, happy to learn otherwise), but good that you're keeping an eye out for it.


The more experiments you run in parallel, the more likely it becomes that at least one experiment's branches do not have an even distribution across all branches of all (combinations of) other experiments.

And the more experiments you run, whether in parallel or sequentially, the more likely you're to get at least one false positive, i.e. p-hacking. XKCD is using "find a subgroup that happens to be positive" to make it funnier, but it's simply "find an experiment that happens to be positive". To correct for p-hacking, you would have to lower your threshold for each experiment, requiring a larger sample size, negating the benefits you thought you were getting by running more experiments with the same samples.


... and one such correction is the (simple, conservative, underused) Bonferroni Correction.


Super helpful - looked it up, will aim to apply next time!

Curious how the bonferroni correction applies in cases where the overlap is partial - IE, experiment A ran from Day 1 to 14, and experiment B ran (on the same group) from days 8 to 21. Do you just apply the correction as if there was full overlap?


I believe you would apply the correction for every comparison you make regardless of the conditions. It's a conservative default to avoid accidentally p-hacking.

There might be other more specific corrections that give you power in a specific case. I don't know about that, I went Bayesian somewhere around this point myself.


There are a bunch of procedures under the label Family-wise Error Correction, some have issues in situations with non-independence (Bonferoni can handle any dependency structure, I think).

If there are a lot of tests/comparisons could also look at controlling for the False Discovery Rate (usually increases power at the expense of more type I errors).


Thanks, that is a well reasoned argument!

My take is for small n (say 5 experiments at once) with lots of subjects (>10k participants per branch) and a decent hashing algorithm, the risk of uneven bucketing remains negligible. Is my intuition off?

False positives for experiments is definitely something to keep an eye on. The question to ask is what is our comfort level for trading-off between false positives and velocity. This feels similar to the IRB debate to me, where being too restrictive hurts progress more than it prevents harm.


No, the risk of uneven bucketing of more than 1% is minimal, and even when it’s the case, the contamination is much smaller than other factors. It’s also trivial to monitor at small scales.

False positives do happen (Twyman's law is the most common way to describe the problem: underpowered experiment with spectacular results). The best solution is to ask if the results make sense using product intuition and continue running the experiment if not.

They are more likely to happen with very skewed observations (like how much people spend on a luxury brand), so if you have a goal metric that is skewed at the unit level, maybe think about statistical correction, or bootstrapping confidence intervals.


You are confusing:

a. the Family-Wise Error Rate (FWER what xkcd 882 is about) and the many solutions of Multiple Comparison Correction (MCC: Bonferoni, Homes-Sidak, Benjamini-Hochberg, etc.) with

b. Contamination or Interaction: your two variants are not equivalent because one has 52% of its members part of Control from another experiment, while the other variant has 48%.

FWER is a common concern among statisticians when testing, but one with simple solutions. Contamination is a frequent concern among stakeholders, but very rare to observe even with a small sample size, and that even more rarely has a meaningful impact on results. Let’s say you have a 4% overhang, and the other experiment has a remarkably large 2% impact on a key metric. The contamination is only 4% * 2% = 0.08%.

It is a common concern and, therefore, needs to be discussed, but as Lukas Vermeer explained here [0], the solutions are simple and not frequently needed.

[0] https://www.lukasvermeer.nl/publications/2023/04/04/avoiding...


SO had a fairphone. Broke screen. Turned out the screen replacement module was vaporware.


Was it out of stock? Non-functional?


Noob question: the Top type here is exactly the Node constructor of the stree type. It seems useful to declare a type that is “this ADT, but limited to these constructors”, but even Haskell seems to avoid it. Why?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: