I am always very impressed when I see these demos and how much can be done with so little. If you are like me you just jumped to Youtube[1] to see it in action.
When trying to make my significant other to understand what was happening I wanted to run it myself. I was amazed how simple that was!
- Install the assembler[2]
- Install dosbox[3]
- Get the source[4] and put it into c:\temp\demo\memories.asm
- Start nasm and enter:
cd c:\temp\demo
nasm.exe memories.asm -fbin -o memories.com
- Start dosbox and enter:
mount d c:\temp\demo
d:
dir
memories
- Press [ALT][ENTER] for fullscreen
The dosbox config is not optimized but it runs with sound with the default settings!
For me this is somehow much more impressive than simply watching the video.
PC64k is the main size constrained demo format, where people do seriously impressive things. 256b is the masochists category, where doing anything at all is hard. 4k intros are in between.
It's interesting that the demo scene is very Windows/DOS focused, unlike other hacker scenes. Linux or Mac demos are basically not a thing. You're far more likely to see C64 or Amiga demos.
Well I guess it makes sense after all, because in DOS you kinda have an "API", for instance using the default interrupts you can select video modes, and the video memory is mapped at a fixed offset and so forth. In Linux due to API fragmentation it would be hard to agree on something that works in the future, and even know likely more setup boilerplate setup code is needed.
Agreed. Just as a heads up regarding "future safety", the guys at NVIDIA - for now - seem to keep the door open for Dos/Bios with very high possible resolutions https://www.pouet.net/prod.php?which=63522#c858522 and even without the need for going the VESA way. Nothing i would really rely on in a business case ;) but neat anyway. (Mode List for several current GPUs : https://www.pouet.net/topic.php?which=11672&page=1 )
(author here) All right, but at least the 1k category for Mac/Intel had some cool productions recently. https://www.pouet.net/prodlist.php?type%5B0%5D=1k&platform%5... Not my field exactly, but it seemed like the boilerplate was a bit shorter than for other platforms, so there was decent interest in it.
> In 320x200 mode, instead of constructing X and Y from the screen pointer DI with DIV, you can get a decent estimation with multiplying the screen pointer with 0xCCCD and read X and Y from the 8bit registers DH (+DL as 16bit value) and DL (+AH as 16bit value). The idea is to interpret DI as a kind of 16 bit float in the range [0,1], from start to end. Multiplying this number in [0,1] with 65536 / 320 = 204,8 results in the row before the comma, and again as a kind of a float, the column after the comma. The representation 0xCCCD is the nearest rounding of 204,8 * 256 ( = 52428,8 ~ 52429 = 0xCCCD). As long as the 16 bit representations are used, there is no precision loss.
Take top byte is equivalent to divide by 0x1000000. So that gives you Y. The next lower (third) byte is then (x * 0xcccd / 0x10000) == (x * 52429 / 65536) =~ (x * 256/320). And the lower two bytes are noise.
(author here)
you're right (about confusing), i wasn't expecting more than a few people to
actually read this ;) at least i quickly repaired the float/fixed thing.
Trying and not succeeding to keep the language simple. As evidenced by the sibling comment by pjc50. Fixed point numbers are at least as approachable as floating point numbers in my opinion.
(author here) I didn't know about this fast inverse trick, but i find it VERY funny that in "Memories" i use almost the same technique to create the "ocean" effect ;) http://www.sizecoding.org/wiki/Memories#Ocean_night_to_day_2 Maybe i was inverse square rooting all the time without knowing it ^^
No, go fire up DOSBox and watch it in realtime! Watching demos on YouTube takes away a lot from the spirit of what makes productions like this one special.
There's something beautiful about the fact that the video is FAR larger in size than the program that initially generated the output. Almost worth watching for that fact alone.
I’ve pondered before the idea of a video codec that works like RAR, where the video embeds an arbitrary user-specified virtual machine that can be used to decode the video frames. (How is this not just a program binary? Because it still would have the semantics of a video stream, with no random access to frame data, only tape-head-like access.)
Seems like this would be perfect for videos that are just e.g. gameplay of games made of tiles+sprites: the video could just store one copy of the assets, and the frames could just be tile maps + sprite position information.
It would also work well for “videos” that are really just a single static image. Or videos that are visualizations of the audio stream: the VM could actually take the audio frames as input and output the respective video frame.
Sure, but the various machinima/demo file-formats of games are just application state-data formats, not video formats per se. The difference comes in what can decode them.
An application state-data format can only be decoded by the original application, because necessary context—in this case, the game engine that translates user input to game-state and then to displayed frames, and also the library of visual assets the game uses to render those frames—is in the application, rather than in the video.
A video format is self-contained, and usually not domain-specific. Many encoders and many decoders can be written to target a video format, and the decoders should not have to ship with an asset library (let alone a game engine) in order to properly render specific videos.
A format like I'm talking about—one that doesn't know anything about application state, but does understand that it's compositing and placing a set of embedded assets each frame, rather than only knowing about pixels/gradels—seems like something generically useful to me. (Heck, we're close to support for such a format already, since many video players already understand the idea of compositing arbitrary stuff with placement instructions on the screen each frame, care of support for the https://en.wikipedia.org/wiki/SubStation_Alpha subtitle format. That format is exactly the kind of "vector video" I'm talking about, except the only primitives it can position and style are text elements. Add RGBA-textured rectangles as another primitive type to it, and you'd get a video format!)
And yes, I'm basically talking about the visual equivalent of a https://en.wikipedia.org/wiki/Module_file (embedded samples/synth patches + sequencing information); or, if you prefer another analogy, "what Flash movies are if you exclude the ability to execute ActionScript."
It always stings when I make a website/app pulled through all the optimizers and compression algorithms, and the content people fuck it all up by adding 10MB of images :/.
If i had experienced this out of my dads humble little c64 back in the 80’s i think i would have passed out. That music is incredible esp considering how concisely it is stored.
There is a JavaScript implementation of this parallax checkerboards effect with just 140 characters of code, including 3D animated perspective:
https://www.dwitter.net/top/all
In this page you can also find an implementation of Pouet's tunnel effect.
I immediately think about the dwitter crowd when seeing the video. I think I recognized several patterns. Seems sound that in the shortest size goal, everyone ends up using the same function classes to generate maximal impact with minimal bytes.
Can anyone describe at a high level for a complete noob how this kind of thing works? Someone who is not going to be able to read a bunch of ASM and interpret it? I'm guessing that it is something along the lines of:
- the graphics "driver" reads values out of certain registers (AL and AH?) at a set interrupt (maybe every X clock cycles?) and writes one pixel to the screen of whatever color those registers had in them
- by writing values into those registers and aligning the number of operations the program does with the frequency of the interrupts, you can get animation?
Even achieving any sort of flow control so you can switch between the effects is mind-boggling to me.
It gets much simpler when you realise that in the original PC there's no "driver" in the way but bits of hardware are wired directly to various processor buses.
This is sixteen-bit assembly, so you have the famous 640kb of RAM available to the user and a 64k bit of RAM beyond that (see "0xa000" in the program). The graphics hardware is continuously rendering frames out of there at 320x200, one pixel per byte, using the default system palette.
The rendering is rather like a pixel shader. There is a big for loop over all the pixels, and at each point it computes a pixel value. First it decides which frame number it is on (stored in BP register I think), then calls an "effect" for that pixel.
It then jumps three pixels. This gives that nice "dissolve" transition between effects.
Keyboard controller is wired directly to the bus, so you can read the keyboard with a single instruction.
A MIDI controller is wired directly to address 0x330 (not standard equipment, back in the day this required a Roland card or SoundBlaster 32?), so you can just write MIDI to that.
There is a system timer interrupt configured for the music. The graphics appear to run continously, I can't see a link to the timer or vertical sync in the graphics code, that appears to just run continuously.
(author here)
The "three pixel jump" is just for the looks, and it smoothes the animation for more calculation heavy effects (f.e. raycast tunnel). The transition effect is not bound to this, it is rather using the "noise" (as you described it) from the coordinate calculation to offset the time (desribed in the writeup). The graphic output is linked to the timer via register BP, which is modified in the interrupt routine.
> - the graphics "driver" reads values out of certain registers (AL and AH?) at a set interrupt (maybe every X clock cycles?) and writes one pixel to the screen of whatever color those registers had in them
It's actually much simpler than that. After you set the right graphics mode (which for most simple dos demos is usually mode 13h, 256 color on 320x200) then there's an area of the memory that you can write to and it will show up as a pixel.
The "flow control" is usually just that you run your effect n times in these simple demos. Which means it will run faster on a faster CPU, but you usually wouldn't bother implement any form of timing in 256 byte.
MS-DOS programming was overall a pain in the... byte but what I miss most about it was the simplicity of graphics.
Wanna draw? Just write to memory. Setting a mode was one instruction
(Wanna play sound? Fumble with 2 levels of IRQ controllers one DMA controller then sob uncontrollably. Or use Allegro. Wanna do multithreading? What's that? )
That one is really amazing! I still don't understand how this didn't win the "Meteoriks" award (my "hypnoteye" did https://www.pouet.net/awards.php#2015tiny-intro )
Sadly, Baudsurfer has not been "around" for quite a while now ...
The best demo I've found, also 256 bytes, is Pyrit by Řrřola (Jan Kadlec, a Czech developer). It's frankly incredible, something I wouldn't have believed was possible:
How do you handle time with such small code size? I see a timer interrupt for the music, but what about the animation? Is it dependent on the speed of the underlying CPU?
It depends on the speed of the CPU - if you look at the archive at https://www.pouet.net/prod.php?which=85227, you'll find a DOSBox config specifically for this demo. If you run it in DOSBox you can fiddle with the emulation speed by pressing C-F11 and C-F12 and you'll notice the speed of the animation change.
Later: Your question made me wonder what the performance of virtual 'target CPU' is - the 'cycles' setting in the config is 20000 and there's a rough estimate of what these numbers translate to here
(author here)
"you'll notice the speed of the animation change"
that might be, but the demo is designed to run at equal speed on all systems (it hooks into the timer)
if you experience animation speed differences, that means your system can not handle what dosbox (on high cycles) demands. It should be noted that DosBox is far slower than people expect it to be, and also, that in actual competitions in the demoscene, real modern hardware is booted to Freedos, but has no sound. So if you want sound (with MIDI) in a competition, you have to stick to the rather slow dosbox, and even optimize against an emulator which can be really really weird. (https://www.pouet.net/topic.php?which=11881) I wouldn't claim the demo runs fine on a real 486, but a pentium should do, as a variation of the raycast tunnel part indicates (https://www.youtube.com/watch?v=5_3CU6shKlY)
> How do you handle time with such small code size? I see a timer interrupt for the music, but what about the animation? Is it dependent on the speed of the underlying CPU?
Usually you don't "handle" it in very small demos of the 256b/64b kind, you just run your effect. And yes this means speed will depend on the CPU speed.
There is an OS-provided timer interrupt that you can hook in to, making sure your interrupt handler is called every X time units. I think it was something like 18 times per second by default, but changeable.
I used this to slow down my computer so that old games were playable. Hooked into the interrupt, wasted cycles, and could enjoy the game. :)
(author here)
it doesn't.
it sets the timer to about 35 FPS and installs a callback routine that is called repeatedly as interrupt. Smoothing is rather done implicitly by "triple diagonal interlacing"
I didn't know about sizecoding.org, it seems to be a very valuable and interesting resource explaining the "black art" of tiny demos, thanks ! I did not check yet all pages, but "Memories" entry in particular is very well written and explained.
Since these are so small I don't see why we couldn't have a "demoscene launcher" with a "mailto:" style protocol handler and just let people click on base64 encoded links to start the demo.
Then I agree completely, and must have misunderstood what was proposed by a handler. Typically a handler will launch an external application such as mailto, ftp, magnet, etc.
If we want to run code in browser there is WASM.
So is the proposal that it would it be beneficial to have a DOS-like OS or x86 emulator in WASM for running COM files?
Yes, that would be better and more sandboxed than dosbox running outside the browser.
(author here) I didn't know about that one. I use http://twt86.co/ (no music there too, sadly) On both websites the performance is rather bad, but that is something that time will solve for us :D
I tried something similar with a 4K C64 demo recently in my emulator, basically percent-encoding the program to run right into the URL instead of hosting it somewhere. It works, but only up to about 2.5 KBytes (good enough for 256 byte demos though).
fun to watch this, like seeing the odd demo pop up on HN once in awhile and that code/techniques breakdown is incredible. Back in the day we rarely had that kind of insight into the mastery that went into a demo unless we really got into a discussion with the creator about the code.
I've always thought the demo scene looked cool. Problem is, I don't really care about graphics and sound and am not an especially creative person. Are there are competitions that are purely objective? As in, the objective criteria is quantitative?
You can always challenge yourself to create "something" in 32 bytes or 16 bytes. That is soo small, that sounds and graphics are rather abstract and it comes downto: does it produce something not random, noticable. For example, here is a paint program in 16 bytes : https://www.pouet.net/prod.php?which=62025 (the objective would be : create a program that allows painting on a canvas with the mouse)
There have been ASM competitions with clear objectives in the past (http://www.hugi.scene.org/compo/) but these are long gone and seem to be replace with something like https://codegolf.stackexchange.com/ now
The Advent of Code challenges are really interesting as well, you can often code golf those by a lot. I never get that far though :/.
But, what I did last year was visualize one of the assignments; the assignment was something about overlapping areas on a field, the naive solution (for me) was to create an x by y bitmap and just add the overlaps, which could then easily be converted into a visible image, which helped me with visualizing the problem and my solution.
256 bytes is in the "let's try every combination" range, I think. So, write a program that tries all of them and determines if any do something interesting enough to forward to a human for review.
8^512: hello from the other side of the quantum dimension
8^8 sounds interesting. 16 million reboots of a real {PC,C64,ST,Amiga,Mac,Z80,...} sounds like a collectively highly entertaining kind of hilarious. The issues only begin when you start wondering if any of the programs wedges the hardware into "interesting" states that are preserved across reboots - or at least the what if of that dimension of entropy... then the problem space becomes 8^8^8...
[I decided to compute 8^8^8. The result is apparently 15 million digits long. (`echo 8^8^8 | bc -ql | wc` -> `222814 222814 15596963`)
Thats weird "reference". Why 8 as the base number? Nobody works with 3bit bytes. 8^8 == 2^24 == 2^(8 * 3) == 256^3, i.e. the combination space of three bytes.
The smallest category in pouet for reference is 32b (or 256 bits), so 2^256 combinations to brute force. For comparison usually 128bit encryption is considered "safe" and infeasible to brute force.
You might be able to constrain the search space to only valid IA32 instructions, but realistically I don't see it helping that much
You could further constrain it to exclude a lot of instructions and instruction pairs that makes no sense given the context. E.g. any instruction pair where the second one makes the first one redundant, such as the second instruction clobbering the same register the first one modified. Or a "ret" in the first few instructions...
It'd probably not constrain the search space nearly enough though.
But even if it did and you'd somehow manage to even generate every combination, you'd still face the second problem of how to evaluate if they do something "interesting enough" to be worthwhile reviewing.
My original comment ran the numbers against the OP's "wouldn't that be 256^256?", but I got tripped up by the reply refuting that and saying it was 8^256 instead.
For as-yet unknown reasons my brain has always had a hard time mapping between the real world and the mathematical vacuum, so it was honestly less stressful to risk trusting that comment than try and [figure out how to] figure it out on my own. So I just substituted calculations for 8^n.
Hopefully I can figure out those mapping problems one day. I think neurological damage may be involved, or something - I had to resort to button-mashing on my calculator while trying to figure out how many vegetables I could buy for $X given that they were $Y/kg one day at the supermarket. I'm 29. </rant>
I think of it just slightly differently -- 256 bytes, 2048 bits -- so 2^2048 (same result as your 256^256).
To give people (who don't spend time with these numbers all the time, I do b/c of cryptography): 2 ^ 256 is on the order of the /number of atoms in the entire universe/ -- every star, moon, comet, black hole, galaxy, etc. across the entire known universe)
Now consider this: 2^512 -- take every single atom in the universe, and imagine that that atom //contains a universe of atoms//. Congratulations, you're only at 2 ^ 512.
(author here)
it is not, as others explained.
But if you're interested in bruteforcing, you can try to find a short code for the 7 bytes ( yes, seven bytes) version of my program "m8trix" (https://www.pouet.net/prod.php?which=63126, in the comments), that should be a tad more easy ;)
Yes, that would give some peace of mind, right? Unfortunately for us, that's not the case. The only platform specific code is the 8 instructions on top of "Code of framework" on http://www.sizecoding.org/wiki/Memories
First to set the video mode, and then to set up a timer used to progress time.
A common complaint on tiny demos. Here the OS is only used for setting graphics mode and setting up a timer. Plus all the boot code of course. Not much really, with 512 bytes you can probably do it on the bare metal, if someone didn't already.
There is even more debate in 4k. After all, most rely on graphics drivers that take hundreds of megabytes. But the thing to understand is that in any case, the intro ships all the code that produces the sound and image. The OS is just an abstraction layer. The exception would be fonts and MIDI instruments, that can be stored in the hardware or OS.
But not all intros have text, "Memories" doesn't. And many intros do their own sound synthesis, though in PC 256 bytes you are usually limited to MIDI or to that horrible buzzer.
(author here) Not quite, a 256 bytes PC intro CAN have decent non MIDI music as showcased here https://www.pouet.net/prod.php?which=79281 (won the "outstanding technical achievment" award) I did some intros in 32 bytes and 16 bytes having that "horrible buzzer", looks like some ASCII effects and a dutch gabber bassline is the maximum you can get in this category :D ( https://www.pouet.net/prod.php?which=76093)
That's why I said "usually", the moment someone says something is impossible, someone does it ;)
Anyways, great job. I was there during the compo, it was epic, with everyone double checking the executable size, even the old guys who have seen it all. You got my vote BTW.
#truckbreaker ;) yes, the overall reception was overwhelming, i didn't expect that. i completely agree with your post before, just wanted to point to "ikubun" =)
(author here)
interesting guess, but wrong, as others explained.
for maximum purity you can try to NOT call any dos functions or interrupts. i gave that a try in the production "noint10h" ;)
https://www.pouet.net/prod.php?which=80769
Normally these demos are filled with all kinds of 'tricks' to make things smaller.
Things like self modifying code, using bits of the bios or video ROM in ways they weren't intended by jumping into the middle of them, saving space by using code as data or vice versa, tiny packers which compress or uncompress the code, massive pregenerated buffers to do runtime lookups to generate data in one order but use it in another, etc.
There's quite a few 'tricks' in the small bits of asm in the article.
Also, the tiny unpackers are generally used from 4096b and upwards. The size of the unpacker takes too much space and doesn't make up for the compression ratio at 256b.
(author here)
Well, he is not entirely wrong. There is "m8trix", an 8 byte program (later optimized to seven bytes), that does
- jump into the middle of it's own instructions
- using 2 of that 7 bytes again as DATA
- using FLAG register content as COLOR
See : http://www.sizecoding.org/wiki/M8trix_8b
But all that doesn't really cut the space down in something as "big" as 256 bytes, it's the approach and the algorithms that do =)
When trying to make my significant other to understand what was happening I wanted to run it myself. I was amazed how simple that was!
- Install the assembler[2]
- Install dosbox[3]
- Get the source[4] and put it into c:\temp\demo\memories.asm
- Start nasm and enter:
- Start dosbox and enter: - Press [ALT][ENTER] for fullscreenThe dosbox config is not optimized but it runs with sound with the default settings!
For me this is somehow much more impressive than simply watching the video.
[1] https://www.youtube.com/watch?v=Imquk_3oFf4
[2] https://nasm.us/
[3] https://www.dosbox.com/
[4] http://www.sizecoding.org/wiki/Memories#Original_release_cod...