The article seems to suggest that the loop buffer provides no performance benefit and no power benefit.
If so, it might be a classic case of "Team of engineers spent months working on new shiny feature which turned out to not actually have any benefit, but was shipped anyway, possibly so someone could save face".
I see this in software teams when someone suggests it's time to rewrite the codebase to get rid of legacy bloat and increase performance. Yet, when the project is done, there are more lines of code and performance is worse.
In both cases, the project shouldn't have shipped.
> but was shipped anyway, possibly so someone could save face
Was shipped anyway because it can be disabled with a firmware update and because drastically altering physical hardware layouts mid design was likely to have worse impacts.
What you describe would be shipped physically but disabled, and that certainly happens a lot. For exactly those reasons. What GP described was shipped not only physically present but also not even disabled, because politics. That would be a very different thing.
No kidding. I was adjacent to a tape out w some last minute tweaks - ugh. The problem is the current cycle time is very slow and costly and u spend as much time validating things as you do designing. It’s not programming.
Once interviewed at a place which made sensors that was used a lot in the oil industry. Once you put a sensor on the bottom of the ocean 100+ meters (300+ feet) down, they're not getting serviced any time soon.
They showed me the facilities, and the vast majority was taken up by testing and validation rigs. The sensors would go through many stages, taking several weeks.
The final stage had an adjacent room with a viewing window and a nice couch, so a representative for the client could watch the final tests before bringing the sensors back.
Quite the opposite to the "just publish a patch" mentality that's so prevalent these days.
If you work on a critical piece of software (especially one you can't update later), you absolutely can spend way more time validating than you do writing code.
The ease of pushing updates encourages lazy coding.
> The ease of pushing updates encourages lazy coding.
Certainly in some cases, but in others, it just shifts the economics: Obviously, fault tolerance can be laborious and time consuming, and that time and labor is taken from something else. When the natures of your dev and distribution pipelines render faults less disruptive, and you have a good foundational codebase and code review process that pay attention to security and core stability, quickly creating 3 working features can be much, much more valuable than making sure 1 working feature will never ever generate a support ticket.
Even for software it’s often risky to remove code once it’s in there. Lots of software products are shipped with tons of unused code and assets because no one’s got time to validate nothing’s gonna go wrong when you remove them. Check out some game teardowns, they often have dead assets from years ago, sometimes even completely unrelated things from the studio’s past projects.
The article also mentions they had trouble measuring power usage in general so we can't necessarily (and, really, shouldn't) conclude that it has no impact whatsoever. I highly doubt that AMD's engineering teams are so unprincipled as to allow people to add HW features with no value (why would you dedicate area and power to a feature which doesn't do anything?), and so I'm inclined to give them the benefit of the doubt here and assume that Chips 'n Cheese simply couldn't measure the impact.
Note - I saw the article through from start to finish. For power measurements I modified my memory bandwidth test to read AMD's core energy status MSR, and modified the instruction bandwidth testing part to create a loop within the test array. (https://github.com/clamchowder/Microbenchmarks/commit/6942ab...)
Remember most of the technical analysis on Chips and Cheese is a one person effort, and I simply don't have infinite free time or equipment to dig deeper into power. That's why I wrote "Perhaps some more mainstream tech outlets will figure out AMD disabled the loop buffer at some point, and do testing that I personally lack the time and resources to carry out."
Sorry, I totally didn't mean this as a slight to your work--I've been a fan for quite a while :)
More so that estimating power when you don't have access to post synthesis simulations or internal gas gauges is very hard. For something so small, I can easily see this being a massive pain to measure in the field and the kind of thing that would easily vanish into the noise on a real system.
But in the absence of any clear answer, I do think it's reasonable to assume that the feature does in fact have the power advantages AMD intended, even if small.
> engineering teams are so unprincipled as to allow people to add HW features with no value
This is often pretty common, as the performance characteristics are often unknown until late in the hardware design cycle - it would be "easy" if each cycle was just changing that single unit with everything else static, but that isn't the case as everything is changing around it. And then by the time you've got everything together complete enough to actually test end-to-end pipeline performance, removing things is often the riskier choice.
And that's before you even get to the point of low-level implementation/layout/node specific optimizations, which can then again have somewhat unexpected results on frequency and power metrics.
Working at.. a very popular HW company.. I'll say that we(the SW folks) are currently obsessed with 'doing something' even if the thing we're doing hasn't fully been proven to have benefits outside of some narrow use cases or targeted benchmarks. It's very frustrating, but no one wants to put the time in to do the research up front. It's easier to just move forward with a new project because upper management stays happy and doesn't ask questions.
Is it that expectation of major updates coming in at a fixed cycle? Not only expected by upper management but also by end users? That's a difficult trap to get out of.
I wonder if that will be the key benefit of Google's switch to two "major" Android releases each year: it will get people used to nothing newsworthy happening within a version increment. And I also wonder if that's intentional, and my guess is not the tiniest bit.
Yeah, we've made great progress and folks are used to it. Now we've got to deliver but most of the low-hanging fruit has been picked(some of it also incurred tech debt).
Do you have new software managers/directors who are encouraging such behavior? From my experience new leaders tend to lean on this tactics to grab power.
Strangely no. Our management hasn't really changed in several years. Expectations have risen though and we've picked a lot of the low-hanging fruit. We also failed to invest in our staffing and so we don't have enough experienced devs to actually do the work now.
Well the other possibility is that the power benchmarks are accurate: the buffer did save power, but then they figured out an even better optimization on the microcodes level that would make the regular path save even more power, so the buffer actually became a power hog.
>> when the project is done, there are more lines of code and performance is worse
There is an added benefit though - that the new programmers now are fluent in the code base. That benefit might be worth more than LOCs or performance.
"The article seems to suggest that the loop buffer provides no performance benefit and no power benefit."
It tests the performance benefit hypothesis in different scenarios and does not find evidence that supports it. It makes one best effort attempt to test the power benefit hypothesis and concludes it with: "Results make no sense."
I think the real take-away is that performance measurements without considering power tell only half the story. We came a long way when it comes to the performance measurement half but power measurement is still hard. We should work on that.
Tell that to the share holders. As a public company, they can very quickly lose enormous amounts of money by being behind or below on just about anything.
Someone elsewhere quotes a game specific benchmark of about 15%. Which will mostly matter when your FPS starts to make game play difficult.
There will be a certain number of people who will delay an upgrade a bit more because the new machines don’t have enough extra oomph to warrant it. Little’s Law can apply to finance when it’s interval between purchases.
If so, it might be a classic case of "Team of engineers spent months working on new shiny feature which turned out to not actually have any benefit, but was shipped anyway, possibly so someone could save face".
I see this in software teams when someone suggests it's time to rewrite the codebase to get rid of legacy bloat and increase performance. Yet, when the project is done, there are more lines of code and performance is worse.
In both cases, the project shouldn't have shipped.