Hacker Newsnew | past | comments | ask | show | jobs | submit | rd11235's commentslogin

Yeah - my guess is this was just a very roundabout solution for setting axis limits.

(For some reason, plt.bar was used instead of plt.plot, so the y axis would start at 0 by default, making all results look the same. But when the log scale is applied, the lower y limit becomes the data’s minimum. So, because the dynamic range is so low, the end result is visually identical to having just set y limits using the original linear scale).

Anyhow for anyone interested the values for those 3 points are 2.0000 (exact), 1.9671 (trapezoid), and 1.9998 (gaussian). The relatives errors are 1.6% vs. 0.01%.


I don’t understand the trend of individual studies making it to the top of HN, while metaanalyses are ignored, despite examining 10s or 100s of such individual studies in aggregate to reduce noise.

There are we-are-human reasons, but are there any logical reasons?


Individual studies don’t exist in a vacuum. They can cite tens to hundreds of papers just like a meta analysis. The difference is in addition to the literature review they had to do anyway to develop their research question, they also contributed novel data to the field and tried to put it into context. Meta analysis don’t get published in big journals like Nature, novel data do.


I wouldn't necessarily assume people's interest in science is limited to what meta analyses tell us.

A single study probably can't answer the high-level questions (like, "does creatine help build muscle?") but can nevertheless be pretty interesting to read and discuss. (Personally, I found this discussion interesting, and I don't care about the high-level question at all.)


Thank you for reminding people of this.


If by "logical" you mean "practical" then the reason is that the study which cost a lot of money to do, the constituent studies that generate data, are more likely to get a press release, and that is how non-specialists learn about new studies. Logically, from the perspective of the university, being able to point out the result of spending a lot of money to contribute to human knowledge is important, so they publicize these expensive studies. The metaanalysis is cheap, and takes a few people's time for doing the analysis, no new expensive data involved.


Because if studies are trash the meta-result is also trash.

This is what is happening in all nutrition science, where most studies are trash.

That is why you can prove any hypothesis you want by picking (meta) studies in nutrition science.


It's like the CDO of science.


Good motivation for a PSA:

This happens more and more often, and there is a fairly easy + popular workaround (which also comes with 99% ad blocking as a bonus). Just either set up pi-hole locally OR use a hosted DNS service that does essentially the same thing.

Main idea: Ads, updates, etc. typically (not always) need to resolve hosts before connecting to servers. Simply resolve these hosts to 0.0.0.0 instead of a real IP.

Arguments for pi-hole or other local solution: Free. Private.

Arguments for hosted solution: No set-up headache, no local raspberry pi or other machine to maintain. Overall a bit simpler.

Guide for blocking updates after the service is set up (I just went through this a month or two ago to block updates to my LG TV):

Step 1: Search around for servers that correspond to updates for your device.

Step 2: Test these lists; realize that they are often incomplete.

Step 3: Shut your device off. Open pi-hole like service, and watch queries live. While doing so, turn on your device (and if you have the option, check for updates).

Step 4: Put all of the queried hosts you see into your block list.

Step 5: Later, you may encounter broken functionality. When this happens, look at your logs, and see which server(s) were blocked at that moment. Remove only those from the blocklist. (And cross your fingers that the manufacturer doesn't use the same hosts for typical functionality and updates.)


> Step 5: Later, you may encounter broken functionality. When this happens, look at your logs, and see which server(s) were blocked at that moment

Eventually you end up with advertisements being served because the application refuses to show the content without the advertisements.

So let me cut back to your main idea:

> Main idea: Ads, updates, etc. typically (not always) need to resolve hosts before connecting to servers. Simply resolve these hosts to 0.0.0.0 instead of a real IP.

Better solution: resolve these hosts to an address you control on your network. You could even resolve it to a "public" address and add a static route to your router.

You can then choose to serve no-content from that address.


Maybe that worked 10 years ago but nowadays they figured out ssl certificate pinning


Even easier: just don't connect it to your network. Everything connected to my network runs free software that I control. If you absolutely must have something on your local network then have it in a VLAN that has no internet access and can't access your main LAN.


> This happens more and more often, and there is a fairly easy + popular workaround (which also comes with 99% ad blocking as a bonus). Just either set up pi-hole locally OR use a hosted DNS service that does essentially the same thing.

DNS over HTTPS is going to render this method ineffectual eventually. Smart devices are going to stop trusting anything on the local network.


why connect the junk to the internet to begin with? it’s a TV. I can buy a better streaming box and plug it in. People really over complicate things sometimes IMO.


You didn’t mention an important point: speed.

Suppose conda had projects. Still, it is somewhat incredible to see uv resolve + install in 2 seconds what takes conda 10 minutes. It immediately made me want to replace conda with uv whenever possible.

(I have actively used conda for years, and don’t see myself stopping entirely because of non Python support, but I do see myself switching primarily to uv.)


It's true conda used to be slow, but that was mostly at a time when pip had no real dependency resolver at all. Since I started using mamba, I haven't noticed meaningful speed problems. I confess I'm always a bit puzzled at how much people seem to care about speed for things like install. Like, yes, 10 minutes is a problem, but these days mamba often takes like 15 seconds or so. Okay, that could be faster, but installing isn't something I do super often so I don't see it as a huge problem.


The near instant install speed is just such a productivity boost. It's not the time you save, it's how it enables you to stay in flow. In my previous job we had a massive internal library hosted on Azure that took like 5 minutes to install with pip or conda. Those on my team not using uv either resorted to using a single global environment for everything, which they dreaded experiment with, or made a new environment once in the project's history and avoided installing new dependencies like the plague. Uv took less than 30 seconds to install the packages, so it freed up a way better workflow of having disposable envs that I could just nuke and start over if they went bad.


Agreed. When I tried uv first I was immediately taken aback by the sheer speed. The functionality seemed OK, but the speed - woah. Inspiring. So I kept using it. Got used to it now.


> The meticulous PT format and exercise selection allows them to achieve more muscle gain in 20 minutes per week than median trainees achieve in 2-3 hours.

This is a strong statement that is presented without evidence, and which conflicts with scientific consensus.

Studies show again and again that the most important factors are consistency and training volume.

If the program is more effective, which is questionable in itself, then the reasons are likely 1. reduction of burnout and/or 2. much more intense training. Not PT format or exercise selection.


I agree but the opposite can be true too. Sometimes the narrator seems to target some general audience that doesn’t fit me at all, in a way that makes me cringe when I listen, until I stop listening altogether. In these cases I’d rather listen to a relatively flat narration from a tool like this.


An obvious question that isn’t answered (in this article - not sure about the paper itself) is whether feeding fructose results in MORE tumor growth than feeding glucose (or other sources of calories)

Without knowing this, it doesn’t make any sense to assume that there is anything inherently bad about fructose, at least other than the mechanistic arguments mentioned in the article (which are weak if not backed up by empirical evidence)


> it was just the wrong abstraction: too easy to start with, too difficult to create custom things

Couldn’t agree with this more. I was working on custom RNN variants at the time, and for that, Keras was handcuffs. Even raw TensorFlow was better for that purpose (which in turn still felt a bit like handcuffs after PyTorch was released).


What causes comments like this one to get faded out? I guess many people flagged it?

Isn’t it just a logical statement?


No, he is being down voted for trying to be a smart ass.

The OP never said the main reason you should wear your seat belt is so you don't get sucked out the side of the airplane, just that massive plane construction failure is a reason. I remember an awful incident some years ago, where the roof of the plane came off when landing and everyone but a flight attendant lived because they were preparing to land. The flight attendant was in the aisle checking seat belts and preparing for landing and sadly was sucked out of the plane.

Also the only way to win the lottery is to play.


I can see how it would be taken that way. My point was that it's not a risk at all, in any meaningful sense; people are way overreacting to the risk of these events affecting them individually (not to Boeing's potential manufacturing issues).

> Also the only way to win the lottery is to play.

Seriously? The way to 'win' the lottery is to not play. I mean, have some fun if you like, but don't play for profit.


It was faded before because its score was <1. As far as I know, flagging a post doesn't do that; enough flags will [flagged] the post and even more flags will [dead] the post.


Anyone who believes that this completes their understanding of automatic differentiation is tricking themselves.

When your graph is a TREE, then everything is very simple, as in this post.

When your graph is instead a more general directed acyclic graph (e.g., x = 5; y = 2x; z = xy), then the IMPLEMENTATION is still very simple, but understanding WHY that implementation works is not as simple (repeat: if you think it’s ‘just the ordinary chain rule’, you are tricking yourself).

One of the earliest descriptions of this was by Paul Werbos. He called the required rule “the chain rule for ordered derivatives”, which he proved by induction from the ordinary chain rule. But it is nevertheless not immediately evident from the ordinary chain rule.

I welcome anyone who believes otherwise to prove me wrong. If you do I will be very happy.


Then, where to read more about this? People who built autograd and other frameworks like Pytorch, mxnet, etc. should have learnt them in details somewhere. Where? AFAIK mxnet came out of academia (probably CMU).


Here's what you do: you watch this video of Andrew Karpathy [1] called "Becoming a backprop ninja". Then you pick up a function that you like and implement this backprop (which is a different way of saying reverse mode automatic differentiation) using just numpy. If you use some numpy broadcasting, an np.sum, some for-loops, you'll start getting a good feel for what's going on.

Then you can go and read this fabulous blog post [2], and if you like what you see, you go to the framework built by its author, called Small Pebble [3]. Despite the name, it's not all that small. If you peruse the code you'll get some appreciation of what it takes to build a solid autodiff library, and if push comes to shove, you'll be able to build one yourself.

[1] https://www.youtube.com/watch?v=q8SA3rM6ckI

[2] https://sidsite.com/posts/autodiff/

[3] https://github.com/sradc/SmallPebble


I don't have a great answer. Most modern descriptions are shallow and/or unclear. My favorite discussions were actually in Werbos's original papers.

A nice overview was Backpropagation through time: what it does and how to do it, 1990. The rule itself is stated very clearly there, but without proof. The proof can be found in Maximizing long-term gas industry profits in two minutes in lotus using neural network methods, 1989 (which I believe was copied over from his earlier thesis, which I could never find a copy of).


To be honest, I never get what people want with all that business and wonder if it is because the abstraction ("ordered derivatives") implied is not ideal.

If we follow the ordinary chain rule (for a single coordinate if you want) through the edges of the computational (DAG) graph, we get the right thing in each step.

The only other rule you need is that "if you use one variable several times in a calculation (i.e. several edges from(fw)/to(bw) the same node), you need to add the gradients computed for each", but IMHO that is pretty basic and intuitive, too. (So if you plug in z for both x and y into f(x, y), you have d/dz f(z, z) = f_x(z, z) + f_y(z, z), where the subscript indicates partial derivative.)

To me this seems both mathematically simpler than mixing the two into a "more than chain rule" thing and closer to what is actually going on algorithmically in a given implementation (the one I'm most familiar with is probably PyTorch's).


But the chain rule for ordered derivatives is exactly the backprop rule. It's just the mathematical representation of 'the simple implementation' I mentioned.

I think what you're saying is that you find the process intuitive. I don't have much of a way to argue with that. But I think it's important to note that we're dealing with two things: 1. a process that we follow (backprop), 2. a true answer that is obtainable using only the chain rule. And yes it turns out that (1) and (2) both give the same answer. But (2) requires much more work, and I question anyone who claims that (1) is 'obvious' from (2): getting (1) from (2) requires work.

I'm guessing you'll agree that using only the chain rule takes much more work, but in case you don't: consider a fully connected graph with at least 5 variables, say a = 5; b = 2 a; c = 2 a b; d = 2 a b c; e = 2 a b c d. If you use backprop, you can compute de/da rapidly. If you use only the chain rule, it will take a long time to compute de/da, because the number of terms you have to deal with increases exponentially fast with the number of variables.


chain rule is defined for partial derivatives, so it's still technically just chain rule


OC's point is that the chain rule for partial derivatives shouldn't be assumed because the ordinary chain rule holds, there's more depth to it than that, and the proof is harder than you might instinctively expect based on the ordinary chain rule.

It's epistemically acceptable to understand these both as "the chain rule" once we're satisfied they've both been proved, and apply liberal amounts of synecdoche from there (and I don't think OC disagrees with you on that).


Actually by 'ordinary chain rule' I am referring to what you're referring to as 'the chain rule for partial derivatives'. It seems like backprop follows very quickly even from that, but it does not.


> chain rule is defined for partial derivatives

I agree. That's what I'm referring to as 'the ordinary chain rule'.

> so it's still technically just chain rule

No. Go try to derive backprop for general DAGs using only the chain rule. If you complete the proof, then you will agree that the proof was more elaborate than you ever expected.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: