I was a bit turned off by the way assembly is described as "tricky" and "tedious...

simias · on April 12, 2018

>Assembly (specially on the 6502) is conceptually very simple and, while it may not be trivial to translate higher-level concepts to its simplicity, as long as what you want can be readily expressed in it, it's trivially easy.

That's precisely why I would qualify ASM as "tricky" and "tedious". You're bogged down in the tiny details that nowadays a compiler would probably solve as well (or even better) than you do. If you want to express "I want to make a function that adds two integers" then yeah it's trivial. If instead it's "I want to iterate through the leaves of a binary tree and compute its standard deviation" then have fun; good luck.

"This is a multiplication by 5, it's pretty expensive, probably better off shifting by two and adding once. Oh but then I need an intermediary register, do I have one available?"

"I guess I could inline this bit of assembly here instead of making a function call. Oh but wait now I need to re-do my register allocation to match."

"I need to reserve a bit more stack space to bank this register, let's modify my function's prelude and prologue to match."

Ain't got no time for that.

I think writing assembly is a skill any coder should have, especially if you usually deal with close-to-the-metal languages like C, C++ or Rust for instance. I think having a good grasp of assembly will make you a better coder, if only because it'll let you read the assembly output of your compiler to figure out what gets optimized and how (and also maybe what you could change to speed things up). There are also a few situations where writing assembly is the right solution.

That being said you'd have to pay me a lot to write a full application in assembly, regardless of how many layers of macros and pre-processing steps I'm allowed to use (unless gcc is allowable as a pre-processing step...)

jackhack · on April 12, 2018

You've captured exactly what it feels like -- this is the constant tension among asm developers. I love really high performance code, but I also like accomplishing more than one trivial thing per hour. It's hard to even remember the time before I knew C, when coding in assembly was all there was. When it was "normal." But I do remember the first time I wrote for the M68000 chipset and finding a multiply and divide instruction. "What? I don't have to write my own divide routine?" Tears of joy!

I will typically build something out in a high level language, profile to see where the time is going, and take a closer look at the algorithm first to see if it's sensible or some different approach should be taken. If it's an appropriate algo, then it's time to look at the implementation to see where we can shave cycles.

For the most part, even if you're talking to hardware that requires exceedingly precise timing (interface/bus protocols, certain chips), C will probably do the job. (It's no coincidence it was referred to as "universal assembly language".) Only where one is absolutely starved for resources (as the NES, and many 8-bit systems were/are) is ASM necessary.

"high level when you can, assembly when you must"

pjmlp · on April 13, 2018

> "high level when you can, assembly when you must"

I was already following that on MS-DOS with its 640 KB, with all Basic, Pascal, C and C++ compilers from Borland and Clipper.

Although I did spent one year still focused on using TASM for everything and playing how much I could cram into a COM file.

jacquesm · on April 12, 2018

It's the difference between being a plumber and being someone who makes custom jewelry. Both are artisans of a kind, the one is doing production and trying to 'get the job done' the other is making one-offs that will have a vastly inefficient time:product ratio where a lot of the value created will be in the eye of the beholder. Both are valid paths.

simias · on April 12, 2018

It's more like someone making custom jewelry using modern tools vs. somebody making custom jewelry only using methods available in ancient Rome. You might not see any obvious difference looking a the result but once you know how they're made one is definitely more impressive than the other. I'm also sure that one is more "tricky" and "tedious" than the other, which is what I was addressing.

If you code something for artistic or "competitive" reasons then of course it makes complete sense. Like people making 4KB demos, speedrunners or people folding thousands of paper cranes. There are no invalid paths if you're an artist.

On the other hand if you consider it from a practical engineering perspective then there are few use cases where I'd go with ASM nowadays, well optimized C code will be easier to write, probably nearly as fast, way easier to modify and maintain and much more portable. Some paths are wildly superior to others if you're an engineer.

rbanffy · on April 12, 2018

Modern CPUs are insanely complicated, with thousands of different instructions, countless layers of cruft and bewildering performance variations across architecture generations - what's optimal in one gen can be a worst case scenario two generations down the road. For any modern CPU I'd never attack a problem ASM first and, most probably, never even touch it.

But we are talking about the 6502 inside the NES, running at less than 2 MHz, with one accumulator and two index registers, a couple status flags and an 8-bit stack pointer within a 256-byte stack. It's a simple machine, for simpler problems.

And yet, guys like Paul Lutus were doing real-time 3D wireframes in FORTH, with fast 8-bit scaled trigonometric functions (no floats involved). It was pure badass of a magnitude not heard of since.

My own contribution was a windowing library for the Apple II that did fit in 1024 bytes and ran self-modifying code to display overlapping windows.

rbanffy · on April 12, 2018

> "I want to iterate through the leaves of a binary tree and compute its standard deviation"

Assuming this is a recurring problem, I'd just use two libraries - a binary tree and a floating point one (I cut my teeth on the Apple II+, so I could count on FP routines already in place in the ROM. If a tree library is not available, writing a tree walker is not a particularly difficult task.

Libraries, subroutines and macros were a huge helper then.

> I need an intermediary register, do I have one available?

If we are talking 6502, the answer is "no". You had page zero, which was almost as good as having 256 8-bit registers.

pjmlp · on April 13, 2018

> I think writing assembly is a skill any coder should have, especially if you usually deal with close-to-the-metal languages like C, C++ or Rust for instance.

It also applies to using languages like Java and C#.

Occasional devs might not be aware of it, but there are ways to see what the JIT/AOT compilers generate.

JustSomeNobody · on April 12, 2018

Part of the tedium is only having a few registers so programming in assembly begins to resemble solving the towers of Hanoi puzzle. But, it's certainly something every programmer should give a try at some point, if for no other reason than just a mental exercise.

SilasX · on April 12, 2018

Have you done http://nand2tetris.org ? It involves (virtually) building up a CPU that has two registers, which I'm guessing the minimum possible.

rbanffy · on April 12, 2018

You'll need some internal state (one register?) and a program counter. If instructions are long enough, you can encode the address of the next instruction in the instruction itself (sort of a perverse VLIW where everything ends with a jump to a given address) and ditch the program counter. You can make tests destructive, so your status bits end up overwriting the register - it would force some "interesting" programming techniques, but it seems doable.

But remember: there is always a lot of state flying around that's not exposed to the programmer through architectural registers.

a1369209993 · on April 13, 2018

You could make every operation of the form (op src dst nxt). Something like https://codegolf.stackexchange.com/questions/11880/build-a-w..., where results are always written to ram and 'register' indexes are just addresses. You'd still need some transient/architectural registers to hold the current instruction/memory address/alu input-output on its way though the cpu though.

rbanffy · on April 12, 2018

> resemble solving the towers of Hanoi puzzle

That's the relaxing part.

wolfgke · on April 12, 2018

> Part of the tedium is only having a few registers so programming in assembly begins to resemble solving the towers of Hanoi puzzle.

Solving the towers of Hanoi puzzle can be done completely mechanically (exercise: derive the really simple algorithm). Programming in such a constrained environment is something where you have to think for yourself, since I am not aware of any mechanical solution.

jontro · on April 12, 2018

A mechanical solution would be using other languages like C

wolfgke · on April 12, 2018

Using C you cannot get the advantages that 6502 assembly provides. On the other hand, to stay with towers of Hanoi, one can show that the simple mechanical algorithm that one can derive for it, will solve the problem with a minimum possible number of moves.

mywittyname · on April 12, 2018

So, what the c compiler does.

_diyu · on April 12, 2018

What always scares me away from assembly every time I try to learn it is that principle that makes code more dangerous or rigid or difficult to evolve, the more manual work you have to do in it. Like how it’s harder to refactor C code than Python because of how the implementation and algorithms get closer to each other the lower you get. And since assembly gives you almost nothing, changing an algorithm seems to mean massive changes. Which implies heavy up front design work. Which is contrary to the kind of exploration and experimentation I like to have in my side projects, especially video games, which is presumably the main type of program for NES.

pjmlp · on April 13, 2018

Assemblers with good macro languages help a lot.

For example using something like MASM or TASM back in the MS-DOS days was like having your own high level language.

So you end up creating a kind of DSL with those macros.

digi_owl · on April 12, 2018

Whenever i see people talk about languages (and CPU assembly is a language, each CPU arch has its own dialect), what it comes down to seems to be how "verbose" the programmer has to be about things.

Meaning that if they have to care about memory structures, or even variable types (just look at how popular JS and Python is), they see it as a bothersome language to work with.

End result though is software that balloons to multiple gigs and maxes out a gigahertz CPU just by being launched and sitting idle.

jstewartmobile · on April 12, 2018

It is a little misleading isn't it?

I miss the old assembler. It could just be that I have a poor memory, but with 8/16-bit assembly, I could remember almost everything about my toolset and focus on the problem.

With modern APIs and languages--even modern assembly languages--I spend more time googling than anything else.

bad_user · on April 12, 2018

I remember the 16-bit "real mode" assembly of Intel's 286 CPU very well.

> focus on the problem

Think harder, you might remember that you weren't solving many problems either.

jstewartmobile · on April 12, 2018

Not true. Programmed micros for practical things--many of which are still in service. Modest hardware forced us to focus on the task rather than presentation.

Modern hardware raised the stakes and forced us to spend more time (in some cases, the majority of the time) on packaging and presentation--even where it didn't really matter that much after the sale.

I'm not talking iPhones were a human interacts with it constantly. I'm talking control systems, where after the sale, human interaction is simple and rare.

rbanffy · on April 12, 2018

> I remember the 16-bit "real mode" assembly of Intel's 286 CPU very well.

Those CPUs already started being complicated enough to make ASM programming less attractive. IIRC, on 286 the string copy instructions were faster than the DMA transfers people used on earlier PCs and, because of that, if you wanted to have optimal performance, you'd need to check which CPU it was running and branch accordingly to the best code path.

jstewartmobile · on April 13, 2018

I'd blame it more on 286 being trash than actual complexity. I remember doing MIPS ASM in the early 90s, and it was a joy.

pjmlp · on April 13, 2018

When I learned MIPS on the university using SPIM, Pentiums 75 MHz were still fresh and I knew Z80 and x86 Assembly quite well.

The way some instructions were neither macros nor actual CPU instructions always felt strange to me.

colejohnson66 · on April 13, 2018

> if you wanted to have optimal performance, you'd need to check which CPU it was running and branch accordingly to the best code path.

The Intel C Compiler still does that. It compiles multiple versions of your code, each optimized for different processor versions.