Hacker Newsnew | past | comments | ask | show | jobs | submit | _chris_'s commentslogin

Longer time horizon -- mortage inflates away. In the short-term, only need to beat the property tax bill, especially if the interest rate is <3% and the property is increasing in value faster than that.


> A lot of the time this is a hint to the compiler on what the expected paths are so it can keep those paths linear. IIRC, this mainly helps instruction cache locality.

The real value is that the easiest branch to predict is a never-taken branch. So if the compiler can turn a branch into a never-taken branch with the common path being straight line code, then you win big.

And it takes no space or effort to predict never taken branches.


> And it takes no space or effort to predict never taken branches.

Is that actually true, given that branch history is stored lossily? What if other branches that have the same hash are all always taken?


A BPU needs to predict 3 things:

  - 1) Is there a branch here?
  - 2) If so, is it taken?
  - 3) If so, where to?
If a conditional branch is never taken, then it's effectively a NOP, and you never store it anywhere, so you treat (1) as "no there isn't a branch here." Doesn't get cheaper than that.

Of course, (1) and (3) are very important, so you pick your hashes to reduce aliasing to some low, but acceptable level. Otherwise you just have to eat mispredicts if you alias too much.

Note: (1) and (3) aren't really functions of history, they're functions of their static location in the binary (I'm simplifying a tad but whatever). You can more freely alias on (2), which is very history-dependent, because (1) will guard it.


> The thing holding back VLIW was compilers were too dumb

That’s not really the problem.

The real issue is that VLIW requires branches to be strongly biased, statically, so a compiler can exploit them.

But in fact branches are very dynamic but trivially predicted by branch predictors, so branch predictors win.

Not to mention that even vliw cores use branch predictors, because the branch resolution latency is too long to wait for the branch outcome to be known.


Wouldn’t a smart tv do something ... smarter than just using the default dns given to it by the network?

I’m not up to speed on this stuff but I thought pihole only blocked the simplest stuff from devices that play nice?


> Wouldn’t a smart tv do something ... smarter than just using the default dns given to it by the network?

It could certainly try... but usually you would block that in your firewall. Fixed DNS servers or fixed server IP addresses are tricky because if you ever need to change them, you can't, because you'd need to update the hardware (which you can't since it sits behind a firewall).

It could try to use things like Google's DNS server, but that is easily blocked in your router.

Not a lot that could be done except trusting your (internal) DNS server...


Why should the programmers of the TV's OS look for edge cases, and do you think the TV makers would give them budget for that? For 90+% of users the standard config of trusting the DHCP server will work fine, and the Pi-Hole users will probably not give them money anyway, and will be dedicated to defeat their workarounds...


I've been worried about companies that make software like this (applications with embedded telemetry or advertisements) starting to do their on DoH style lookups.

I don't KNOW of any doing it but I can't imagine it'd be too hard for them to do.


It’s not that hard to debug— your signal names and register names all carry through. Sure, lots of temp wires get generated but that’s never where your bug is.


Maybe. I haven't used it. But with the compile-to-SV language I do use you're right it generates a lot of temporary wires and the bugs are never there, but they make it extremely tedious to trace drivers from the point of failure back to the cause.


> allow to select the purchase price within the last 2 years

I don't think that's true. My reading of that is "you lock in the price on your start date and can keep that for the next 2 years going forward". That doesn't help anybody joining at >$1k / share. :D (and that's only ESPP, not standard stock compensation).


Can't speak for NVIDIA but at another company I know they use the lowest price on the last 4 periods (so lowest of 8 timestamps)


ESPP is a very small amount vs RSUs. You’re limited to buying $25,000 per year (that you still have to shell out for even if it’s at a discount) vs just being given several hundred thousand (or more) in RSUs.


> I’d be interested in understanding why the compilers never panned out but have never seen a good writeup on that. Or why people thought the compilers would be able to succeed in the first place at the mission.

It's a fundamentally impossible ask.

Compilers are being asked to look at a program (perhaps watch it run a sample set) and guess the bias of each branch to construct a most-likely 'trace' path through the program, and then generate STATIC code for that path.

But programs (and their branches) are not statically biased! So it simply doesn't work out for general-purpose codes.

However, programs are fairly predictable, which means a branch predictor can dynamically learn the program path and regurgitate it on command. And if the program changes phases, the branch predictor can re-learn the new program path very quickly.

Now if you wanted to couple a VLIW design with a dynamically re-executing compiler (dynamic binary translation), then sure, that can be made to work.


> Now if you wanted to couple a VLIW design with a dynamically re-executing compiler (dynamic binary translation), then sure, that can be made to work.

RIP Transmeta


Transmeta lived on in Nvidia's Project Denver but Denver was optimized for x86 and the Intel settlement precluded that. It ended up being too buggy/inefficient to compete in the market and effectively abandoned after the second generation.


This makes a lot of sense to me, thanks for boiling it down. Compilers can predict the code instructions coming up decently, but not really the data coming up, so VLIW doesn't work that well compared to branch prediction and speculative and out of order execution complexities which VLIW tried to simplify away on branching-heavy commercial/database server workloads. Does that sound right?


I think it could have worked if the IDE had performance instrumentation (some kind of tracing) which would have been fed in to the next build. (And perhaps several iterations of this.)

Another way to leverage the Itanium power would have been to make a Java Virtual Machine go really fast, with dynamic binary translation. This way you'd sidestep all the C UB optimization caveats.


> L1i matters, people!

RISC-V consistently wins on L1i footprint.

The complaining is about number of dynamic instructions ("path length"), which can hit you if you don't fuse. Of course, path length might not actually be the bottleneck to raw performance, but it's an easy metric to argue, so a lot of people latch on to it.


>The complaining is about number of dynamic instructions ("path length"), which can hit you if you don't fuse.

Ironically, RISC-V does great there[0]. Note this is despite these researchers did not even consider fusion.

0. https://dl.acm.org/doi/pdf/10.1145/3624062.3624233


Dunno about "great" - "For 6 out of 10 mini-app+compiler pairs, Arm has a shorter path length, with the overall average difference when weighting each benchmark equally being 2.3% longer for RISC-V."


While applying the worst possible reading to RISC-V, and despite not considering fusion, it is not worse than ARM.

That's awesome.


Isn't shorter path length the goal here? And ARM is better by both those metrics. Am I misunderstanding something?

ARM of course would also benefit from fusion too; but camel-cdr's mention of it being only rv64g is a pretty significant caveat.


Yes, shorter path is the goal.

No, winning 4 and losing 6, by a small margin, isn't "being worse than arm". The paper's authors even explicitly conclude it is not losing to ARM.

This is even ignoring whether code is within or outside loops, counting fuseable instructions as always non-fused, and not considering any instructions from extensions after 2019's ratified (actually unchanged from 2017) rv64g... any of those would have a favorable effect on RISC-V.

This is an excellent result for RISC-V, that clears any doubts in terms of path length. On top of what we already know about RISC-V leading in code density in 64bit.


Might not be "worse" (I'd definitely agree that the difference is plenty small enough to be considered equal within error bounds), but is certainly not something worthy of RISC-V being noted as doing "great" either.

Excluding extensions is perhaps a significant question, but, for example, Debian RISC-V currently targets rv64gc, which should have the same instruction counts as rv64g does, so software compiled for Debian can't use the later extensions for most code anyway. (never mind that ARMv8 also has excluded extensions, namely NEON, which is always present on ARMv8 and is not designed to be ignored)

And, of course, even being better than ARM is not equivalent to being the best it could be; ARMv8 isn't some attempt at a magical optimal instruction set, it's designed for whatever ARM needed, and that includes being able to efficiently share hardware with ARMv7 for backwards compatibility.


If RISC-V is not worse (it is not) and yet it is much simpler (it is), that is a huge win.

Simplicity has enormous value.


it's also targeting just rv64g


Right. Bitmanip would also, on its own, reduce instruction count considerably.


Also the difference in number of instructions on real programs is in the 10% range, which could well be compensated by other factors. For example, keeping to simpler instructions might well result in a 10% higher clock speed and lower silicon area too, equalising matters if not gaining an advantage.


> > Cascade discovered 4 bugs in BOOM and CVA6 that produce wrong output val- ues regardless of the microarchitectural state

> These are unacceptable bugs, showing a lack of architectural tests. It means no one ever ran those instructions and checked the result. The community should be able to fix this.

For BOOM it looks like the only 2 bugs found were miscounting on the inst-retired perf counter if software over-wrote it, and a fdiv.s/fsqrt.s that always listened to the dynamic rounding mode instead of the statically provided rounding mode, when specified. Not great, but recoverable.


Looks like from Appendix D that only 2 bugs were found in BOOM:

> 1. Inaccurate instruction count when minstret is written by software

I don't know what that means, but having minstret written by software was definitely not something I ever tested. In general, perf counters are likely to be undertested.

> 2. Static rounding is ignored for fdiv.s and fsqrt.s

A mistake was made in only listening to the dynamic rounding mode for the fdiv/sqrt unit. This is one of those bugs that is trivially found if you test for it, but it turns out that no benchmarking ever cared about this and from all of the fuzzers I used when I worked on BOOM, NONE of them hit it (including commercial ones...). Ooops.

Fixed here: https://github.com/riscv-boom/riscv-boom/pull/629/files


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: