As someone who did assembly coding on the 8086/286/386 in the 90s, the xH and xL registers were quite useful to write efficient code. Maybe 64-bit mode should have gotten rid of them completely though, rather than only when REX.W=1.
AAA/AAS/DAA/DAS were used quite a lot by COBOL compilers. These days ASCII and BCD processing doesn't use them, but it takes very fast data paths (the microcode sequencer in the 8086 was pretty slow), large ALUs, and very fast multipliers (to divide by constant powers of 10) to write efficient routines.
To be honest, validating bibliographies does not cost valuable time. Every research group will have their own bibtex file to which every paper the group ever cited is added.
Typically when you add it you get the info from another paper or copy the bibtex entry from Google scholar, but it's really at most 10 minutes work, more likely 2-5. Every paper might have 5-10 new entries in the bibliography, so that's 1 hour or less of work?
Yes, I wish this was this simple. :) There are many other complications:
* Some instructions require VEX.L or VEX.W to be 0 or 1, and some encodings result in completely different instructions if you change VEX.L.
* Different bits of the EVEX prefix are valid depending on the opcode byte.
* Some encodings (called groups) produce different instructions depending on bits 3-5 of the modrm byte (the second byte after all prefixes). Some encodings further produce different groups depending on whether bits 6-7 (mod) of the modrm byte identifies a register or not.
* Some instructions read a whole vector register but only a scalar if the same instruction has a memory operand. Sometimes this is clear in the manual, sometimes it is not, sometimes the manual is downright wrong.
* Some instructions do not allow using the legacy high-8-bits registers even though they don't do anything with bits 8 and above of the operand: they only want a 32- or 64-bit register as their operand.
* APX (EVEX map 4) looks a lot like legacy map 0, but actually a few instructions were moved there from other maps for good reasons, a few more were moved there for no apparent reason (SHLD/SHRD iirc), and a few more are new.
* REX2 does not extend SSE and AVX instructions to 32 registers even though REX does extend them to 16.
* Intel defines a thing called VEX instruction classes, which makes sense except for a dozen or two instructions where it doesn't. For these, sometimes AMD uses a different class, sometimes doesn't; sometimes AMD's choice makes sense, sometimes it doesn't.
And many more that I found out while writing QEMU's current x86 decoder (which tries to be table based but sometimes that's just impossible).
> Some instructions require VEX.L or VEX.W to be 0 or 1, and some encodings result in completely different instructions if you change VEX.L.
There is even an instruction where AMD got this wrong! VPERMQ requires VEX.W=1, but some AMD CPUs also happily execute it when VEX.W=0 even though that is supposed to raise an exception.
Sort of, at least some degree of relativism exists though how much is debated. Would you ever talk about sea having the same color as wine? But that's exactly what Homer called it.
This is still quite clearly something different than being unable to see the different colors, though.
Their mental model, sure. The way they convey it to others, sure.
But you can easily distinguish between two colors side by side that are even closer in appearance than wine and the sea, even if you only know one name for them. We can differentiate between colors before we even know the words for them when we're young, too.
Indeed. On the other hand recently ARM has added explicit load acquires primitives which are relatively cheap, so converting a consume to an acquire is not a big loss (and Linus considered doing it for the kernel a while ago just to avoid having to think too hard about compiler optimizations).
It is cheaper on ARM and POWER. But I'm not sure it is always safe. The standard has very complex rules for consume to make sure that the compiler didn't break the dependencies.
edit: and those rules where so complex that compilers decided where not implementable or not worth it.
The rules were there to explain what optimizations remained possible. Here no optimization is possible at the compiler level, and only the processor retains freedom because we know it won't use it.
It is nasty, but it's very similar to how Linux does it (volatile read + __asm__("") compiler barrier).
This is still unsound (in both C and Rust), because the compiler can break data dependencies by e.g. replacing a value with a different value known to be equal to it. A compiler barrier doesn't prevent this. (Neither would a hardware barrier, but with a hardware barrier it doesn't matter if data dependencies are broken.) The difficulty of ensuring the compiler will never break data dependencies is why compilers never properly implemented consume. Yet at the same time, this kind of optimization is actually very rare in non-pathological code, which is why Linux has been able to get away with assuming it won't happen.
In principle a compiler could convert the data dependency into to a control dependency (for example, after PGO after checking against the most likely value), and those are fairly fragile.
I guess in practice mainstream compilers do not do it and relaxed+signal fence works for now, but the fact that compilers have been reluctant to use it to implement consume means that they are reluctant to commit to it.
In any case I think you work on GCC, so you probably know the details better than me.
edit: it seems that ARM specifically does not respect control dependencies. But I might misreading the MM.
AAA/AAS/DAA/DAS were used quite a lot by COBOL compilers. These days ASCII and BCD processing doesn't use them, but it takes very fast data paths (the microcode sequencer in the 8086 was pretty slow), large ALUs, and very fast multipliers (to divide by constant powers of 10) to write efficient routines.
I/O ports have always been weird though. :)
reply