Even though I see a graph of how memory accesses have gotten slower relative to everything else at every conference I attend, it's still striking to see how much relative cycle times have changed. IN and OUT are now serializing instructions that have a penalty of hundreds of clocks, not even including time spent waiting for the I/O itself, as they flush the machine or cause some scheduled instructions to replay, depending on whether or not your micro-architecture supports dependent replay. The memory access instructions aren't quite so bad, since they might hit in the cache, and they don't create a serializing memory barrier, but they can still be hundreds of cycles.
Correctly predicted branches now have a cost of 1, and almost all branches are correctly predicted (and they're actually free on the P4, if they're already in the trace cache). The arithmetic instructions are now single cycle, too, and multiple copies of them can execute in the same cycle (and they're only half a cycle on early P4 micro-architectures).
If you want to put this info to use, check out Stefan Tramm's JavaScript 8080 and CP/M emulator: http://www.tramm.li/i8080
Aside: The 8085, despite its similar part number, is not actually an x86 chip. It used the 8-bit 8080 architecture: the Z80 was an enhanced 8080 that achieved much greater popularity than its ancestor.
I don't get the impression this was posted for its "hotness," but rather for its nostalgia. Well, partially nostalgia and some HN readers may have never seen an 8086 before in their lives; so in that way, it's a history lesson, too.
Intel 64 and IA-32 Architectures Optimization Reference Manual http://www.intel.com/Assets/PDF/manual/248966.pdf