More

bonzini · 2026-01-24T13:52:52 1769262772

As someone who did assembly coding on the 8086/286/386 in the 90s, the xH and xL registers were quite useful to write efficient code. Maybe 64-bit mode should have gotten rid of them completely though, rather than only when REX.W=1.

AAA/AAS/DAA/DAS were used quite a lot by COBOL compilers. These days ASCII and BCD processing doesn't use them, but it takes very fast data paths (the microcode sequencer in the 8086 was pretty slow), large ALUs, and very fast multipliers (to divide by constant powers of 10) to write efficient routines.

I/O ports have always been weird though. :)

bonzini · 2026-01-22T21:35:15 1769117715

To be honest, validating bibliographies does not cost valuable time. Every research group will have their own bibtex file to which every paper the group ever cited is added.

Typically when you add it you get the info from another paper or copy the bibtex entry from Google scholar, but it's really at most 10 minutes work, more likely 2-5. Every paper might have 5-10 new entries in the bibliography, so that's 1 hour or less of work?

daveFNbuck · 2026-01-24T01:18:43 1769217523

The complaint here is about this being an insufficient amount of effort because the bibtex entry from Google Scholar is wrong sometimes.

bonzini · 2026-01-20T15:19:33 1768922373

Yes, I wish this was this simple. :) There are many other complications:

* Some instructions require VEX.L or VEX.W to be 0 or 1, and some encodings result in completely different instructions if you change VEX.L.

* Different bits of the EVEX prefix are valid depending on the opcode byte.

* Some encodings (called groups) produce different instructions depending on bits 3-5 of the modrm byte (the second byte after all prefixes). Some encodings further produce different groups depending on whether bits 6-7 (mod) of the modrm byte identifies a register or not.

* Some instructions read a whole vector register but only a scalar if the same instruction has a memory operand. Sometimes this is clear in the manual, sometimes it is not, sometimes the manual is downright wrong.

* Some instructions do not allow using the legacy high-8-bits registers even though they don't do anything with bits 8 and above of the operand: they only want a 32- or 64-bit register as their operand.

* APX (EVEX map 4) looks a lot like legacy map 0, but actually a few instructions were moved there from other maps for good reasons, a few more were moved there for no apparent reason (SHLD/SHRD iirc), and a few more are new.

* REX2 does not extend SSE and AVX instructions to 32 registers even though REX does extend them to 16.

* Intel defines a thing called VEX instruction classes, which makes sense except for a dozen or two instructions where it doesn't. For these, sometimes AMD uses a different class, sometimes doesn't; sometimes AMD's choice makes sense, sometimes it doesn't.

And many more that I found out while writing QEMU's current x86 decoder (which tries to be table based but sometimes that's just impossible).

jxors · 2026-01-20T16:46:29 1768927589

> Some instructions require VEX.L or VEX.W to be 0 or 1, and some encodings result in completely different instructions if you change VEX.L.

There is even an instruction where AMD got this wrong! VPERMQ requires VEX.W=1, but some AMD CPUs also happily execute it when VEX.W=0 even though that is supposed to raise an exception.

bonzini · 2026-01-20T06:06:09 1768889169

> There's no need to get full EVSE for most people,

It's a lot more comfortable though. It's been a great addition to the home to get an EVSE, even a small single-phase one.

bonzini · 2026-01-19T12:49:43 1768826983

Sort of, at least some degree of relativism exists though how much is debated. Would you ever talk about sea having the same color as wine? But that's exactly what Homer called it.

https://en.wikipedia.org/wiki/Wine-dark_sea

https://en.wikipedia.org/wiki/Linguistic_relativity_and_the_...

cthalupa · 2026-01-19T20:04:11 1768853051

This is still quite clearly something different than being unable to see the different colors, though.

Their mental model, sure. The way they convey it to others, sure.

But you can easily distinguish between two colors side by side that are even closer in appearance than wine and the sea, even if you only know one name for them. We can differentiate between colors before we even know the words for them when we're young, too.

bonzini · 2026-01-18T13:02:46 1768741366

Or can be replaced by other technologies, like CSS animations that replace jQuery animation code with just addClass/removeClass.

bonzini · 2026-01-17T22:02:30 1768687350

Unfortunately they don't have anymore a corporate culture worth speaking of... These days all you see their brand on is cash registers.

rbanffy · 2026-01-18T13:21:06 1768742466

And they don’t even design them anymore - most are just white box designs with their logo attached.

bonzini · 2026-01-16T16:53:44 1768582424

Also on alpha there's only store-store and full memory barriers. Acquire is very expensive.

gpderetta · 2026-01-16T23:10:01 1768605001

Indeed. On the other hand recently ARM has added explicit load acquires primitives which are relatively cheap, so converting a consume to an acquire is not a big loss (and Linus considered doing it for the kernel a while ago just to avoid having to think too hard about compiler optimizations).

bonzini · 2026-01-16T16:10:25 1768579825

There is a yolo way that actually works, which would be to change it to a relaxed load followed by an acquire signal fence.

loeg · 2026-01-16T16:19:38 1768580378

Is that any better than just using an acquire load?

gpderetta · 2026-01-16T16:21:19 1768580479

It is cheaper on ARM and POWER. But I'm not sure it is always safe. The standard has very complex rules for consume to make sure that the compiler didn't break the dependencies.

edit: and those rules where so complex that compilers decided where not implementable or not worth it.

bonzini · 2026-01-16T19:47:48 1768592868

The rules were there to explain what optimizations remained possible. Here no optimization is possible at the compiler level, and only the processor retains freedom because we know it won't use it.

It is nasty, but it's very similar to how Linux does it (volatile read + __asm__("") compiler barrier).

comex · 2026-01-16T20:43:59 1768596239

This is still unsound (in both C and Rust), because the compiler can break data dependencies by e.g. replacing a value with a different value known to be equal to it. A compiler barrier doesn't prevent this. (Neither would a hardware barrier, but with a hardware barrier it doesn't matter if data dependencies are broken.) The difficulty of ensuring the compiler will never break data dependencies is why compilers never properly implemented consume. Yet at the same time, this kind of optimization is actually very rare in non-pathological code, which is why Linux has been able to get away with assuming it won't happen.

gpderetta · 2026-01-16T22:51:35 1768603895

In principle a compiler could convert the data dependency into to a control dependency (for example, after PGO after checking against the most likely value), and those are fairly fragile.

I guess in practice mainstream compilers do not do it and relaxed+signal fence works for now, but the fact that compilers have been reluctant to use it to implement consume means that they are reluctant to commit to it.

In any case I think you work on GCC, so you probably know the details better than me.

edit: it seems that ARM specifically does not respect control dependencies. But I might misreading the MM.

jojomodding · 2026-01-17T16:09:28 1768666168

shouldn't it be preceded?

gpderetta · 2026-01-17T16:18:39 1768666719

No, you want to sequence any subsequent loads to after the acquire/consume load.

bonzini · 2026-01-15T08:31:55 1768465915

There is also the "momo" crate to do the same with a procedural macro attribute (https://docs.rs/momo/latest/momo/).