More

giovannibajo1 · 2025-06-07T21:44:38 1749332678

There were far fewer abstraction layers than today. Today when your desktop application draws something, it gets drawn into a context (a "buffer") which holds the picture of the whole window. Then the window manager / compositor simply paints all the windows on the screen, one on top of the other, in the correct priority (I'm simplifying a lot, but just to get the idea). So when you are programing your application, you don't care about other applications on the screen; you just draw the contents of your window and that's done.

Back at the time, there wouldn't be enough memory to hold a copy of the full contents all possible windows. In fact, there were actually zero abstraction layers: each application was responsible to draw itself directly into the framebuffer (array of pixels), into its correct position. So how to handle overlapping windows? How could each application draw itself on the screen, but only on the pixels not covered by other windows?

QuickDraw (the graphics API written by Atkinson) contained this data structure called "region" which basically represent a "set of pixels", like a mask. And QuickDraw drawing primitives (eg: text) supported clipping to a region. So each application had a region instance representing all visible pixels of the window at any given time; the application would then clip all its drawing to the region, so that only the visibile pixels would get updated.

But how was the region implemented? Obviously it could have not been a mask of pixels (as in, a bitmask) as it would use too much RAM and would be slow to update. In fact, think that the region datastructure had to be quick at doing also operations like intersections, unions, etc. as the operating system had to update the regions for each window as windows got dragged around by the mouse.

So the region was implemented as a bounding box plus a list of visible horizontal spans (I think, I don't know exactly the details). When you represent a list of spans, a common hack is to use simply a list of coordinates that represent the coordinates at which the "state" switches between "inside the span" to "outside the span". This approach makes it for some nice tricks when doing operations like intersections.

Hope this answers the question. I'm fuzzy on many details so there might be several mistakes in this comment (and I apologize in advance) but the overall answer should be good enough to highlight the differences compared to what computers to today.

II2II · 2025-06-08T00:49:04 1749343744

It's a good description, but I'm going to add a couple of details since details that are obvious to someone who lived through that era may not be obvious to those who came after.

> Obviously it could have not been a mask of pixels

To be more specific about your explanation of too much memory: many early GUIs were 1 bit-per-pixel, so the bitmask would use the same amount of memory as the window contents.

There was another advantage to the complexity of only drawing regions: the OS could tell the application when a region was exposed, so you only had to redraw a region if it was exposed and needed an update or it was just exposed. Unless you were doing something complex and could justify buffering the results, you were probably re-rendering it. (At least that is my recollections from making a Mandelbrot fractal program for a compact Mac, several decades back.)

gblargg · 2025-06-08T07:24:58 1749367498

And even ignoring memory requirements, an uncompressed bitmap mask would have taken a lot of time to process (especially considering when combining regions where one was not a multiple of 8 pixels shifted with respect to the other. With just the horizontal coordinates of inversions, it takes the same amount of time for a region 8 pixels wide and 800 pixels wide, given the same shape complexity.

duskwuff · 2025-06-08T03:04:49 1749351889

> But how was the region implemented?

The source code describes it as "an unpacked array of sorted inversion points". If you can read 68k assembly, here's the implementation of PtInRgn:

https://github.com/historicalsource/supermario/blob/9dd3c4be...

giovannibajo1 · 2025-06-08T10:46:40 1749379600

Yeah those are the horizontal spans I was referring to.

It’s a sorted list of X coordinates (left to right). If you group them in couples, they are begin/end intervals of pixels within region (visibles), but it’s actually more useful to manipulate them as a flat array, as I described.

I studied a bit the code and each scanline is prefixed by the Y coordinates, and uses an out of bounds terminator (32767).

duskwuff · 2025-06-08T20:56:12 1749416172

It's a bit more than that. The list of X coordinates is cumulative - once an X coordinate has been marked as an inversion, it continues to be treated as an inversion on all Y coordinates below that, not just until the next Y coordinate shows up. (This manifests in the code as D3 never being reset within the NOTRECT loop.) This makes it easier to perform operations like taking the union of two disjoint regions - the sets of points are simply sorted and combined.

giovannibajo1 · 2025-06-09T09:49:18 1749462558

Uhm can you better explain that? I don’t get it. D3 doesn’t get reset because it’s guaranteed to be 0 at the beginning of each scanline, and the code needs to go through all “scanline blocks” until it finds the one whose Y contains the one specified as argument. It seems to me that each scanline is still self contained and begins logically at X=0 in the “outside” state?

duskwuff · 2025-06-09T19:11:49 1749496309

> D3 doesn’t get reset because it’s guaranteed to be 0 at the beginning of each scanline

There's no such guarantee. The NEXTHOR loop only inverts for points which are to the absolute left of the point being tested ("IS HORIZ <= PT.H ? \\ NO, IGNORE THIS POINT").

Imagine that, for every point, there's a line of inversion that goes all the way down to the bottom of the bounding box. For a typical rectangular region, there's going to be four inversion points - one for each corner of the rectangle. The ones on the bottom cancel out the ones on the top. To add a second disjoint rectangle to the region, you'd simply include its four points as well; so long as the regions don't actually overlap, there's no need to keep track of whether they share any scan lines.

giovannibajo1 · 2025-04-28T00:05:32 1745798732

The Nintendo 64 homebrew scene uses libdragon which is 100% clean room, 100% based on reverse engineering, is fully open source and allows to create ROMs with no proprietary libraries.

giovannibajo1 · 2025-04-27T23:56:10 1745798170

Of course there was. You can clean-room reverse-engineer the hardware. This is what is done daily by Libdragon maintainers for supplying an open source SDK for Nintendo 64 with zero proprietary code in it.

mech422 · 2025-04-28T00:24:15 1745799855

way back in the before times... Open Source projects went to great lengths to make sure they didn't use anything that could 'taint' the code (eg Samba )

I think the DeCSS stuff wasn't used till it had been publicly leaked and was considered 'common knowlege' or some such to prevent lawsuits

amiga386 · 2025-04-28T12:36:32 1745843792

Not quite. There was nobody holding back on sharing for legal reasons, and it didn't prevent lawsuits.

The LiViD mailing list was full of people trying to get DVDs working with Linux, and they were already quite far into it. Derek Fawcus had already written the drive authentication code (so the drive would allow the host to read most disc sectors).

A piracy group, DrinkOrDie, reverse engineered the Xing DVD player for Windows and released DoD DVD Speed Ripper (no source code).

MoRE (Masters of Reverse Engineering) also reverse engineered the Xing DVD player and released DeCSS (no source code).

MoRE consisted of "mdx", "the nomad" and Jon Lech Johansen. "the nomad" reverse engineered the Xing DVD player. "mdx" used them to write a decrypter. Jon made a GUI frontend.

Prior to DeCSS's release, someone sent Derek Fawcus the decryption code. And he got around to playing with it, and was going to publish it on the LiViD list.

But before he did, DeCSS came out, and also its source code leaked, and Fawcus noticed his own code was in it (the drive authentication code), stripped of his credit. He complained about this and Johansen got in touch, and ultimately he allowed DeCSS to use his code under a non-GPL license.

Then, famously, Norway's "economic crime" unit brought criminal charges against Johansen. Ultimately, they concluded that Johansen himself hadn't infringed anything, because it was Derek Fawcus, "the nomad" and "mdx" who did that, and they're not Norwegian.

So, with that in mind:

- the LiViD mailing list would almost certainly have developed a DVD solution for Linux, not caring about clean room implementation, if DeCSS had not beaten them to the punch

- the fame DeCSS got also brought the angry litigators (though eventually justice prevailed)

I'll end on a quote from Derek Fawcus:

https://web.archive.org/web/20001202051300/http://livid.on.o...

> Something that may be of interest to people in the states is that I've had an offer of help to produce a specification of the algorithm - from which a third party could produce an implementation. i.e. proper clean room approach. This doesn't really matter from my point of view (or in my opinion most Europeans) but may be of use to the Yanks.

ranger_danger · 2025-04-28T01:48:09 1745804889

How could one ever prove that a solution was clean-room? For example I would consider the oman leak to taint all development of N64 in existence. Even if someone didn't personally look at it, they most certainly got information from someone else that did.

giovannibajo1 · 2025-04-28T13:44:41 1745847881

I don’t understand if this question is legal or morale/technical. I will answer to the latter, from the point of view of a prospective user of the library that wants to make their own mind around this.

Its quite easy to prove that libdragon was fully clean roomed. There are thousands of proofs like the git history showing incremental evolution and discovery, the various hardware testsuites being developed in parallel to it, the Ares emulator also improving its accuracy as things are being discovered over the past 4-5 years. At the same time, the n64brew wiki has also evolved to provide a source of independently verified, trustable hardware details.

Plus there are tens of thousands of Discord log messages where development has incrementally happened.

This is completely different from eg romhack-related efforts like Nintendo microcode evolutions where the authors explicitly acknowledge to have used the leaks to study and understand the original commented source code.

Instead, libdragon microcode has evolved from scratch, as clearly visible from the git history, discovering things a bit at a time, writing fuzzy tests to observe corner case behaviors, down to even creating a custom RSP programming language.

I believe all of this will be apparent to anybody approaching the codebase and studying it.

Moto7451 · 2025-04-28T02:27:14 1745807234

Lawyers, discovery, and a courtroom. The reason clean room works out is due to various lawsuits on the topic as a matter of law.

The Wikipedia article on clean room reverse engineering has all the examples that came to my mind and then some. https://en.wikipedia.org/wiki/Clean-room_design

braiamp · 2025-04-28T03:49:24 1745812164

> Lawyers, discovery, and a courtroom

In other words, money that these people don't have. The legal system is not a solution for these kinds of problems, nor it is affirmative defense. Anything that makes the defendant bear the burden of raising and proving that their actions didn't foul any legal requirement, is basically killing any project, even when using your "solution".

ranger_danger · 2025-04-28T02:44:50 1745808290

To me this still means "there IS no way". You can get sued and convince a judge you didn't do it, sure, but that's not necessarily 100% accurate, and also probably extremely unlikely to happen anyway in most cases. And you'd be surprised how easy it is to fake evidence with no way to prove otherwise. Plus all that still requires going to court.

timschmidt · 2025-04-28T03:39:43 1745811583

Generally one has two sets of developers, one doing the RE work, and one doing the new implementation, and the only way you allow them to communicate is through documentation of the reverse engineered implementation. Should this go to court, you can walk each member of each group in to testify, and show off the stacks of documentation produced in the process.

ranger_danger · 2025-04-29T04:15:30 1745900130

That might convince a judge, sure, although it's still possible to fake the evidence... but I would argue the vast majority of people who claim to have clean-room RE'd something absolutely did not go through anything close to this process.

timschmidt · 2025-04-29T04:57:20 1745902640

I don't know anything about the majority of developers, and I think sweeping claims about any group require strong evidence, which seems lacking. But there are plenty of examples of companies that followed such a process, and succeeded in court: https://en.wikipedia.org/wiki/Clean-room_design#Examples

theshackleford · 2025-04-29T07:08:13 1745910493

> I would argue

With what evidence?

ranger_danger · 2025-04-29T14:43:28 1745937808

I think when someone fails to produce any evidence of a clean-room process being followed when they claim it's clean-room, is probably a good indicator. Yea you can call that "trust me bro" if you want, I won't be upset.

giovannibajo1 · on July 11, 2024

Google Docs does a lot of algorithms over the data you put in. For instance, it paginate them and show a page count. This is an algorithm processing your data exactly like Gemini does. There is no option in Google Docs to avoid the pagination algorithm from reading my data and processing it.

Another example: Google Docs indexes the contents of your document. That is, it stores all the words in a big database that you don't see and don't have access to, so that you can search for "tax" in the Google Docs search bar and bring up all documents that contain the word "tax". There is no option in Google Docs to avoid indexing the contents of a document for the purpose of searching for it.

When you decide to put your data into Google Docs, you are OK with Google processing your data in several ways (that should hopefully be documented). The fact that you seem so upset that a specific algorithm is processing your data just because it has the "AI" buzzword attached to it, seems like an overreaction prompted by the general panic we're living in.

I agree Google should be clear (and it is clear) whether Gemini is being trained on your data or not, because that is something that can have side effects that you have the right to be informed about. But Gemini just processing your data to provide feature N+1 among the other 2 billions available, it's really not something noteworthy.

manuelmoreale · on July 11, 2024

> For instance, it paginate them and show a page count.

Do you think this information google is gathering can then be used in the future to paginate some other document? Do you think paginating my doc will help their algorithm to better paginate documents in the future? I see what you're trying to say but putting everything in the "algorithm" bucket doesn't help moving the whole conversation around AI forward.

> The fact that you seem so upset

Your upset detector is clearly wrong. I don't use google docs. I don't care about google docs. I'm just adding my 2c to a conversation around this type of practices google and co are using.

Isn't this why we're here on HN? To exchange ideas?

Jensson · on July 11, 2024

Google is pretty good at separating inference from training. If they wish to train on your data they do that by just training on your data, them running the model on that data to give you info is totally separate.

cwillu · on July 11, 2024

https://support.google.com/gemini/answer/13594961

“Google collects your Gemini Apps conversations, related product usage information, info about your location, and your feedback. Google uses this data, consistent with our Privacy Policy, to provide, improve, and develop Google products and services and machine-learning technologies, including Google’s enterprise products such as Google Cloud.”

“To help with quality and improve our products (such as generative machine-learning models that power Gemini Apps), human reviewers read, annotate, and process your Gemini Apps conversations. We take steps to protect your privacy as part of this process. This includes disconnecting your conversations with Gemini Apps from your Google Account before reviewers see or annotate them. Please don’t enter confidential information in your conversations or any data you wouldn’t want a reviewer to see or Google to use to improve our products, services, and machine-learning technologies.” [italics was bold in the original]

Seems pretty clear to me.

voxic11 · on July 11, 2024

You can opt out of that. Its explained right after what you have quoted.

> To stop future conversations from being reviewed or used to improve Google machine-learning technologies, turn off Gemini Apps activity. You can review your prompts or delete your conversations from your Gemini Apps activity at myactivity.google.com/product/gemini.

giovannibajo1 · on July 11, 2024

Should be tagged 2021

giovannibajo1 · on Feb 22, 2024

There's also a technical reason, which is that the build system is written in the language it targets. So the cool tool is written in coolang. That's obviously not required, you could use any programming language for the cool tool, it just happens that all people that care about the cool tool, understand the needs of the ecosystem, have issues with missing features etc. already have a non zero intersection of languages they know of: they all know coolang.

If coolang decided to try to add coolang support to Bazel instead, they would probably have to learn Java[1]. Current maintainers or contributors to Bazel don't know coolang, and they don't care about it much, especially in the early stage. And maybe coolang developers don't know Java, or even actively hate it with a passion (that's why they were on the market for a new language). And even if some coolang developer decided to contribute to Bazel, the barrier would be much higher: being a mature build system with so many features and different needs, surely working in it is going to be complex; there will be many different concepts, and layers, and compromises, and APIs to work with. So for them it just makes more sense to use coolang so that all coolang developers can contribute to it having a real need for the cool tool to improve.

[1] I know nothing of Bazel. So just bare with the example even if it's technically not correct.

dijit · on Feb 22, 2024

A nit (hopefully a welcome one given that it supports your statement) is that Bazel's rules are written in a language called Starlark, which has python syntax just without classes and a bunch of limitations surrounding switch statements and loops.

The core of your point is correct: who wants to both support an additional tool chain and an additional language for building things? Terrible sell.

Go itself is a little bit of an edge case because they recommend leaning on Make, but ironically they do not use Make for its intended purpose and all the (actually good) functionality that Make gives you is reimplemented from within the go compiler.

giovannibajo1 · on Jan 5, 2024

That will still require the compiler to serialize the three registers to the stack, to be able to pass the pointer to the structure to the callee. It seems like the described benefit is avoiding any serialization from registers to stack, which cannot be avoided with pass-by-reference.

bugbuddy · on Jan 5, 2024

But that would only be possible if you write out all the struct fields you are accessing into function parameters. If the struct is complex with a lot of fields, then you would end up with a messy function signature. Also, I am sure there is a limit to how many parameters you can have before this optimization stops working.

loeg · on Jan 5, 2024

> But that would only be possible if you write out all the struct fields you are accessing into function parameters. If the struct is complex with a lot of fields, then you would end up with a messy function signature.

This is the approach the article's author took.

fsckboy · on Jan 5, 2024

>That will still require the compiler to serialize the three registers to the stack, to be able to pass the pointer to the structure to the callee.

why can't it simply pass a pointer to the struct (it's probably already on the stack) without rewriting the struct to the stack? isn't that what a reference is?

kukkamario · on Jan 5, 2024

But it often isn't in the stack. This is a vector type so it is often modified and used as part of math operations. Each vector field is probably in some register because it was used to calculate something and then has to be stored back to stack to get valid data for the reference.

fsckboy · on Jan 5, 2024

if current values of source-code-struct fields are in registers, there are two options, that the calling function and struct is so small and so compiler optimized that there is no memory allocation for the struct, or there is an allocation and it's just dirty and not updated. Which means update it and call, or spill and call.

You want to call a function that is not expecting its arguments to be in registers, and you don't have unlimited registers on this hardware at this time, so I don't understand all the hand-wringing about either option. I guess what I'm saying is that this is all being treated like "because we assume optimization and we know how optimization works, we're entitled to have what's important in registers all the time so things will go faster, so this must be a bug and we have to fix it."

The actual solution is to inline the callee and rely on the compiler, switch to asm and hand guarantee, or create a new language that has register calling or data flow semantics that are different than what you have now. The conversation that's taking place here sounds to me like relying on undefined behaviors, something we used to do because we knew we could rely on them but you can't any more.

giovannibajo1 · on Jan 1, 2024

to be fair, PyPy predates PyPI

d-cc · on Jan 1, 2024

Now all we need is a PiPy and we'll have all the pies

dpflan · on Jan 1, 2024

PiPi would complete the set but that would be (doubly) irrational…

cdjk · on Jan 1, 2024

I think you mean transcendental.

dpflan · on Jan 1, 2024

That works too. Perhaps that would be how the pie would taste: truly transcendental, but I don't think it could ever be finished.

miketheman · on Jan 2, 2024

How so? PyPI launched in 2003, PyPy's first release was in 2007. https://www.pypa.io/en/latest/history/#before-2013

cfbolztereick · on Jan 2, 2024

PyPy was started early in 2003 too, the first release took a while. PyPI was branded as 'The Cheeseshop' in the early years.

giovannibajo1 · on Dec 30, 2022

I think also “least surprise” depends on your background. In Go, also files don’t buffer by default, contrary to many languages including C. If you call Write() 100 times, you run exactly 100 syscalls. Intermediate Go programmers learn this and that they must explicitly manage buffering (eg: via bufio).

I don’t think it’s wrong that sockets follow the same design. It gives me less surprise.

blibble · on Dec 30, 2022

that Write() doesn't call fsync() though, does it?

so there's no buffering going on in the application, but the bytes almost certainly don't hit the disk before Write() returns

they've just been staged into an OS buffer, with the OS promising to write them out to the disk at a later time (probably, maybe...? hopefully!)

which is exactly the same as a regular TCP socket (with Nagle disabled, i.e. the default, non Go way)

Ferret7446 · on Dec 30, 2022

For userland programming, what matters is the syscall level, as that is expensive (and also the API you have for the kernel). Whether the kernel then does internal buffering is irrelevant and uncontrollable beyond any other syscalls which may or may not be implemented (maybe you're running on a custom kernel that doesn't buffer disk writes?).

One write == one syscall, easy. If you want buffering, you add it.

blibble · on Dec 30, 2022

> For userland programming, what matters is the syscall level, as that is expensive

which is why pretty much every programming language buffers file output by default

even C

(other than Go, obviously)

> Whether the kernel then does internal buffering is irrelevant

everyone that's attempted to write reliable software that cares about what ends up on disk, or the other side of the socket will disagree

incompatible · on Dec 30, 2022

I think my C is getting rusty, but "write" operates on a file descriptor, doesn't it? It's unbuffered. The buffered versions are things like printf and puts.

tptacek · on Dec 30, 2022

That's POSIX; C's equivalent is a FILE, which generally is buffered.

Kab1r · on Dec 30, 2022

I thought Linux does in kernel buffering with `write`

giovannibajo1 · on Dec 1, 2022

The difference is that Facebook is in the market of mining your soul so whatever store they would create it would be targeted to that and the rules and policies will make sure that they are able to reach that goal.

Apple is in the business of selling phones and has decided that a good strategy for them is to protect the privacy of users against data miners. So their store and payment system by default protect users against practices of data collections.

> Having app stores compete for developers and users would be amazing for everyone except the current app store owners

It would be good for developers, not for users. I don’t think a single non-developer user is grasping for having multiple ways to download and install an app, and having to search for multiple stores with multiple payment systems to get one software. For users, the “iPhone” allows to download apps; none of them gives a thought to the fact that it happens via a single “store”.

johnbellone · on Dec 1, 2022

Apple is in the business of selling your eyes. They just also happen to sell the glasses, too.