Localization of sound is primarily based on the time difference between the ears. Localization is also pretty precise, to within a few degrees under good conditions.
Nit: time difference, phase difference, amplitude difference, and head related transfer function (HRTF) all are involved. Different methods for different frequency localisation.
There's this excellent (German?) for website that lets you play around and understand these via demos. I’ll see if I can find it.
Tihs is true, but a high density of loudspeakers allows the use of Wave Field Synthesis which recreates a full physical sound field, where all 3 cues can be used.
At least video games use way more complex models for that, AFAIK. It might be tricky to apply to mixes of recorded media, so loudness is commonly used there.
Unreal Engine, the only engine I'm more familiar with, implements VBAP which is just amplitude panning when played through loudspeakers for panning of 3D moving sources. It also allows Ambisonics recordings for ambient sound which is then decoded into 7.1.
For headphone based spatialization (binaral synthesis) usually virtual Ambisonics fed into HRTF convolution is used, which is not amplitude based, specially height is encoded using spectral filtering.
So loudspeakrs -> mostly amplitude based, headphones not amplitude based.
Which makes sense, there is only so much you can do with loudspeakers to affect the perceived location, you don't really know where the loudspeakers and the listener are located relative to each other.
Actually, the farther way the speakers are from the angles specified in the 7.1 format (see https://www.dolby.com/about/support/guide/speaker-setup-guid...) worse will be the localization accuracy. And if the the person is not sitting centered relative to the loudspeakers, but closer to one of the loudspeakers localization can completely collapse, and it will sound like the sound only comes from the closest loudspeaker.
In the case of gamers, they are usually centered relative to the loudspeakers, and usually the loudspeakers tend to be placed symmetrical to the computer screen, so the problem is not so bad.
For cinema viewers sitting in the cinema the problem is much worse, most of the audience is off center... That is why 7.1 has a center loudspeakers, the dialogue is sent directly there to make sure that at least the dialogue comes the right direction.
In music, simple panning works okay, but never exceeds the stereo base of a speaker arrangement. For truly immersive listener experience, audio engineers always employ timing differences and separate spectral treaments of stereo channels, HRTF being the cutting edge of that.
I believe Atmos as used in cinema rooms, is as far as I know amplitude based (VBAP probably), and it is impressive and immersive. Immersion depends more on the number and placement of loudspeakers. Some systems do use Ambisonics, which can encode time differences as well, at least from microphone recordings.
HRTF as used in binaural synthesis is for headphones only, not relevant here.
This wouldn't work well because in the frequency domain representation, different "pixels" have very different importance for the overall appearance of the image: The pixels at the center of the frequency domain representation represent low frequencies, so compressing them will drastically alter the appearance of the image. On the other hand, the corners/edges of the frequency domain representation represent high frequencies, i.e. image details that can be removed without causing the image to change much. That's the crucial benefit of the Fourier transform for compression: it decomposes the image into important bits (low frequencies) and relatively less portant bits (high frequencies). Applying compression that doesn't take that structure into account won't work well.
Minor note.
If the original data is a time-signal like in electrical engineering (amplitude vs. time function), then the "frequency domain pixels" (its transform) are different frequencies (points in frequency domain: how many repetitions in a second, etc.) and the time-signal's transform function becomes an amplitude vs. frequency graph.
But if the original data is an image (matrix or grid of pixels in space), then the "frequency domain pixels" are different wave-numbers (aka spatial frequencies: how many repetitions in a meter, etc.) and the Fourier transform (of the pixel grid) is a amplitude vs. wave-number function.
I'm into the glitch art scene and this makes me wonder what happens if you crop/erase patterns of the frequency domain representation and put it back together...
I think this persistent state is one of the main advantages of the notebook environment, or the Matlab workspace, which I guess it was inspired by. It allows you to quickly try alternative values for certain variables without having to re-calculate everything. Saving snapshots would not be feasible if the project contains large amounts of data. If you want to reset everything, just "run all" from the beginning, or use a conventional IDE with a debugger.
And that came from Emacs and old Lisp environments---and perhaps something yet earlier?
As late as 2000, this was the single biggest advantage and single biggest impediment to new programmers in MIT's 6.001 lab: a bunch of nonvisible state, mutated by every C-x C-e. The student has tweaked two dozen points trying to fix a small program, and re-evaluated definitions after many of them, but maybe not all. The most straightforward help from a teacher is to get the buffer into a form such that M-x eval-region paves over all that, sets a known environment of top level definitions, and---more than half the time---the student's code now works.
I have similar concerns about much of Victor's work, for the same reason. Managing a mental model of complex state is a n important skill for programming, but it's best learned incrementally over long experience with more complex programs. These very interactive environments front load the need for that skill without giving any obvious structure for helping the student learn.
Contrast Excel and HyperCard, which have no invisible state: you can click and see everything.
But you cannot recalculate if your calculation has trashed your inputs.And if it hasn't then the snapshot does not impose a cost. If you are willing to forego the opportunity to replay to save memory, just put the producer and consumer in the same cell.
Minor point: What you describe isn't usually called working memory. Working memory is what you can "keep in mind" at any one point. It lasts for a few seconds and then has to be refreshed, e.g. by repeatedly saying a phone number to yourself in your head. Working memory is more or less synonymous with short-term memory.
What you describe is long-term memory (everything beyond a few seconds is considered long-term).
Edit: Too slow. Some more justification: Very broadly, one hypothesis is that working/short term memory is stored in the currently present activity patterns of neurons, which fade/decorelate after a few secodns. Anything longer is thought to be stored in the weights of the synapses between neurons (there are alternative theories but I like this one).
Sending before deducting the money seems like an obvious design flaw that should have raised red flags. Is there any explanation for why it was implemented this way, and why it wasn't spotted by the developers?