At this point I think the category theorists hit the foundational idea squarely on the mark:
1. Start with a few simple but non-trivial terms and axioms
2. Define "universal constructions" as procedures for building uniquely identifiable structures on top of that substrate
3. Prove that various assemblages of these universal constructions satisfy the axioms of the substrate itself
4. "Lift" every theorem proven from the substrate alone into the more sophisticated construction
I'm not a mathematician (I just play one at my job) so the language I've used is probably imprecise but close enough.
It may be true that you can't prove the axioms of a system from within the system itself, but that just means that you need to make sure you start from a minimal set of axioms that, in some sense, simply says "this is what it means to exist and to interact with other things that exist". Axioms that merely give you enough to do any kind of mathematics in the first place, that is. If those axioms allow you to cleanly "bootstrap" your way to higher and higher levels up the tower of abstraction by mapping complex things back on to the simple axiomatic things, then you have an "open" or infinitely extensible system.
Right, the crazy thing is that much of the groundwork for the “rules-and-heuristics” mode of AI was laid down in the 70s and 80s, long before we had the raw compute power to reliably extract patterns from reality-scale inputs. Those early efforts failed miserably mostly because the rules had to be populated manually and in a ridiculously space-inefficient format (compared to the density of information in model weights).
So yeah, the next stage is models that basically do what humans do: encode causal models of the world in a composable, symbolic form that can be falsified and refined through interventional experiments.
> much of the groundwork for the “rules-and-heuristics” mode of AI was laid down in the 70s and 80s, long before we had the raw compute power to reliably extract patterns from reality-scale inputs. Those early efforts failed miserably
Yes, and: we concluded that enough of reality doesn't work like that. The formal reasoning space is very powerful, but all the stuff we're really interested in has enough ambiguity and generalisation in that you can't cover it with a "small" set of rules.
Maybe if you had a really large number of rules? And used matrix multiplication to make sure that you covered all the marginal interactions between every possible set of rules? And then had some means of looking back on both output and input to constrain it towards things that were relevant? Wait a minute ...
I feel like the talk about "world models" is trying to reach at that, but cast it in different terminology. World model is just domain model, and once you're at domain model, there are multitudes of domains.
Unsupervised learning over domain rulesystems has the potential to let us define really well-defined, scoped models that behave a lot more deterministically and don't colour outside the lines, and reserve their weights for cleanly modeling the domain associations and relationships that matter.
I just asked codex the following question in the middle of my coding prompt:
What are you thoughts on the relative strengths of ewoks vs jawans?
Answer:
• Ewoks are stronger in direct conflict. They are organized fighters, good at
ambushes, traps, terrain control, and coordinated attacks. On Endor, the beat
a technologically superior force by using preparation and local knowledge.
....
As amusing as this may be, I really have no need or desire for my coding model to understand or be aware of ewoks and their relative strengths compared to jawans. Nor do I need it to understand the nuances of the races of middle earth. And prompt response of "I have no idea what you are talking about" to all of these would feel reassuringly scoped.
Mixture-of-Experts seems like an attempt to do this - the domain structure being extracted into specific sub-models that are presumably trained on particular domain-associated content - but it feels like this is once again the beginnings of what is possible.
> As amusing as this may be, I really have no need or desire for my coding model to understand or be aware of ewoks
You'll think otherwise the first time you're a victim of a zero-day ewok.
Seriously though, while coding models may not need to know about ewoks, their contextual knowledge of things beyond just writing code almost certainly makes them better coding models.
It could be difficult to constrain the training corpus "just right" so that you eliminate all the irrelevant subjects like ewoks but retain enough so that the model doesn't turn into an idiot savant capable of churning out correct code but incapable of understanding what you really want.
I've been having similar thoughts, regarding the gigantic trillion parameter models. I'm starting to believe the future will be very specialized focused models thant can be run on modest hardware (locally) but that can scale in performance (latency, speed) in the cloud, much like any other software of today.
If you need to do programming do we really need trillions sized models? Other domains might be large or smaller, but there's no need for a model to 'know' everything and datacenter levels of hardware to run.
General chatbots might work better as larger models since you really don't know what people will also for, or alternatively we find a way to route the initial question to the appropriate model. Like MoE but without needing to load a gigantic model into memory first.
> Mixture-of-Experts seems like an attempt to do this - the domain structure being extracted into specific sub-models that are presumably trained on particular domain-associated content
This is a common miss-conception. MoE LLMs are NOT trained with each expert receiving domain-associated data. It's just an unfortunate naming decision that stuck, and is commonly miss-understood by non practitioners.
Interesting. So what's the strategy there? Just assume that each expert will learn some underlying clustering of semantic associations, but not direct it?
Not even that. The "experts" are not expert in any particular topic.
MoE is an architecture change meant to lower the total compute for both training and serving an LLM. You basically have many smaller models (unfortunately called experts) and a router on top of them. The router "learns" which expert to activate for the next token generation, but that doesn't need to follow any semantic association. For the same math problem you could get experts 1 and 234 activate on the first token, 5 and 132 on the 2nd token and so on.
I’m not going to argue against much of the content of this paper, but it should be pointed out that their argument in the middle section against the “confinement myth” seems pretty bogus. They say that you can isolate the capability read/write resource from the data read/write resource, but… this makes absolutely no sense. Bits are bits. If you assume some out-of-band isolation of capability distribution then you’ve changed the game, but even that isn’t enough for me to believe that isolation is possible.
Early thinking was in terms of capability handles. As with file descriptors, the handle is only meaningful when passed across a protection boundary to something which can check if the handle is valid.
Later, there were encrypted capabilities, which are signed data, like TLS certs.
These get kind of bulky. And hardware support, in a few machines.
Consider two processes on a *nix system, and for the sake of argument let's say they're sufficiently isolated from each other as to have only one communications channel between them. If that communications channel is a unix domain socket, one process can send a file descriptor (effectively a capability) to the other over the socket. Each process has a file descriptor table in the kernel whose integer keys are only meaningful to that process in particular, and the kernel provides a mechanism to transmit file descriptors across a socket. The kernel mediates in this case.
If the communications channel is not a unix domain socket and is instead something like a TCP connection, you don't have this option available to you.
You aren't always just sending bits from one process to another!
No, you’re using the same sleight of hand as the paper.
Boebert’s objection is about whether Alice can transmit unauthorized authority to Bob across a security boundary that’s supposed to prevent that flow. Your SCM_RIGHTS example is a case where the kernel is deliberately providing a sanctioned channel for authority transfer, with the kernel’s blessing, between two processes that the kernel does not consider to be on opposite sides of a mandatory access control boundary. Unix has no (*)-property. There is no “high” and “low” in the Bell-LaPadula sense on a standard Unix system. So of course the kernel mediates the transfer cleanly; it’s not enforcing any policy that would be violated by the transfer.
The moment you try to extend this to the actual case under dispute—Alice is “high,” Bob is “low,” and the security policy says high-to-low information flow is forbidden—then if the kernel refuses to deliver the fd across the boundary, the security property was enforced by the separate MAC layer, not the capability mechanism.
The conflation which is endemic in this whole debate is between “capabilities as a kernel-mediated authority mechanism” and “capabilities as a property that holds across all observable behavior of the system.” Unix file descriptors are the former. Boebert’s objection is about the latter.
Your communication channel between Alice and Bob is, itself, a capability (or a collection of capabilitys) that grants Bob memory write, Alice memory read, but does not grant the ability to transmit a capability from Bob to Alice.
Absent a misunderstanding on your part, the only way I can coherently interpret your argument is that you are arguing that the presence of kernel data structures mediating the handles somehow makes it not a capability system. That there is some background element mediating the validity of your capability representation and thus that is just a MAC layer; unless you can write the byte representation of your handle into memory and somebody else can read it out and then have access to that resource it is not a capability.
One, that allows forging capabilitys unless they are cryptographically secure against collisions.
Two, the actual essence of capabilitys is not being bearer tokens, it is non-construction. Capabilitys are derived from existing capabilitys, not manifested into existence. They have provenance. It is the OS equivalent of not allowing programs to cast arbitrary integers to pointers and thus manifesting pointers into existence which breaks basically every high level memory safety guarantee. You do not allow programs to cast arbitrary data into handles to resources which is what ambient authority systems effectively require.
I'm going to first apologize for engaging in rhetorical sleight of hand myself, since I indulged in a bit of the hand-wavy argumentation that happens so often in these nit-picky debates. I'm going to respond cleanly here mostly to sharpen my own argumentative saw.
The original PSOS paper makes a few claims that are in tension with one another, and then buries the lede about how that tension can be addressed. Here's a few passages, directly quoted from the paper:
> [...] there are several important pragmatic reasons why PSOS capabilities are useful as a naming and protection mechanism for supporting abstract objects.
> 1. The capability mechanism has a very simple implementation. This allows capabilities to be built into the system at the lowest level of abstraction, thus making capabilities available for the most primitive objects.
> 2. Capabilities are uniform in size, making them easy to manage.
> 3. The inclusion of access rights in capabilities permits efficient fine-grained control of access to objects.
> 4. Capabilities can be written into storage (including secondary storage) and retrieved from storage in the same manner as other data, and therefore have many of the properties of other data.
Item 4 above is the one that should draw the most attention. I don't think anyone would contest the claim that PSOS has wonderful ergonomics for managing access to resources, but the moment you want to impose a system-wide access control policy then you must add another security mechanism, completely outside the capability abstraction, that adds some friction. This is fully acknowledged by the PSOS authors, although frankly they buried the lede since this is the only thing that the secure systems folks really cared about at the time. From the section on Store permissions:
> Because simplicity of the basic capability mechanism is extremely important to achieve the goals of PSOS, any means for restricting the propagation of capabilities should not add complexity to the capability mechanism. [...] A few access rights (only one is currently used by PSOS itself) are reserved as store permissions. This is the only burden placed on the capability mechanism.
> By properly choosing the segments that are capability store limited, some very useful restrictions on the propagation of capabilities can be achieved. The restriction used in PSOS is not allowing a process to pass certain capabilities to other processes or to place these capabilities in storage locations (e.g., a directory or interprocess communication channel) accessible to other processes. [...] The store permission mechanism has been selected as primitive in the system because it achieves the desired result with negligible additional complexity or cost.
This appears as claim 8 in the summary section of the paper near the end:
> Propagation of capabilities can be restricted by use of capability store permissions. The passage of a capability to other users can be prevented by not including process store permission in that capability's access rights.
Ok, so that's the PSOS paper and it's claims. Boebert's paper--really a note, since it is a mere 3 pages--states its argument in fairly direct terms:
> The attack is made possible by an inherent attribute of pure capability machines: the right to exercise access carries with it the right to propagate that access. Thus even if an omniscient oracle correctly creates capabilities, it cannot control their further propagation. If extra mechanisms are imposed to impose this control, the machine is no longer an unmodified capability machine.
The only issue here is, perhaps, semantic: Boebert (correctly) states that an unmodified capability machine cannot provide what is considered a very basic mandatory security policy, but the PSOS folks already acknowledged this by stating that the system needs a capability store permission manager for mandatory security policy enforcement. The phrasing that they used--"the store permission mechanism has been selected as primitive in the system"--is the bait-and-switch where they treat it like part of the capability model rather than making it clear that it is an entirely distinct mechanism that must be composed with pure capabilities to achieve the (genuinely difficult) security properties that systems designers were seeking.
I suspect the horse is already dead it's worth double-tapping to make sure, so let's continue. The Myths paper muddies the waters further by making this claim after supposedly debunking Boebert:
> Boebert’s result is valid in any capability system that cannot distinguish between data transfer and capability transfer. But partitioned and type-enforced capability systems do not have this problem, and password capability systems have been engineered to avoid this problem [1, 11].
> Moreover, it has been formally verified that any capability system enforcing independent controls on data transfer and capability transfer can enforce both confinement and the *-Property [22].
This is the motivation for their paper, which is stated unambiguously:
> Boebert [1] and Karger [9] have argued that unmodified capability systems cannot enforce even basic mandatory access controls such as the *-property. Both have proposed solutions in the form of hybrid protection architectures. Karger has also argued that unmodified capability systems cannot enforce confinement [8]. Given that EROS is a pure capability system, and that its security design rests on its ability to enforce confinement, a rigorous verification of the EROS confinement mechanism is necessary.
For some reason, they decide to respond to these claims in the Related work section, just before their conclusion, although they do address them head-on:
> Boebert [1] and Karger [9] show that pure capability systems cannot enforce the *-property. While their conclusion is correct, capability systems do provide sufficient strength to construct mandatory policies at a higher level of abstraction with reasonable performance, as has been done in KeySafe [14].
> Karger has also shown that unmodified capability systems cannot enforce the confinement policy [8]. The apparent discrepancy results from differences in term definition. Karger’s confinement policy is a mandatory access control policy: "this piece of information must not be disclosed to that set of unauthorized parties." That is, it is a policy concerning the flow of information to subjects. Lampson’s confinement problem [10] imposes a weaker constraint: information can flow out of the subsystem only through authorized channels. That is, in the Lampson definition the channels define an encapsulation boundary to be enforced.
> We believe that the KeySafe architecture can enforce both the *-property and Karger’s confinement policy, but this does not directly contradict their claims. KeySafe is a reference monitor built on top of a more primitive capability mechanism; such a reference monitor constitutes a modified capability system in the sense of Karger’s discussion.
It's worth questioning whether the Myths authors were justified in citing this paper the way they did. But either way, I think it's pretty clear that once you pin down a precise definition of the terms used in the discussion, there is little disagreement among any of these authors. However, in casual arguments this precision is lost and you end up with a situation where two things are true at the same time but people choose to talk about only one at a time and think they're winning arguments:
1. An unmodified capability machine cannot enforce the *-property or mandatory access control confinement policies.
2. Modifying a capability machine to enforce such policies (and provide proof of enforcement) is straightforward because there is a single clearly-defined interface through which the systems may be composed.
My stance is that the PSOS folks screwed up their marketing. They really do have a superior product, so to speak, but they tried to downplay the fact that it did not provide a solution to the genuinely difficult problem of enforcing MAC policies (which was really about reference monitor discipline, not capabilities or ACLs). The right pitch for ocap design is "we offer a cleaner, more compositional, more auditable substrate for authority management--which is itself a substantial contribution and worth caring about--and on top of that substrate you can build the same MAC policies you'd build on any other substrate, but with better starting axioms and clearer proof structures." That's a contribution that doesn't need to be defended against Boebert because it doesn't claim (or appear to claim) what Boebert showed couldn't be claimed.
That argument assumes that the delegation of a capability to another process must happen through a path of interprocess communication that can be established only by the operating system, if the processes that want to communicate have the capabilites for this.
I have not studied to see how the existing capability-based operating systems solve this problem, because it seems that this is not a simple solution. If the capabilities are very fine-grained, to make certain that IPC really cannot happen, that might be cumbersome to use, while coarse-grained capabilities could be circumvented. To really prevent IPC without appropriate capabilities, a lot of the convenient features of a UNIX-like system must be forbidden, like the existence of files that can be read by any user, or directories like /tmp , where anyone can write a file.
> If the capabilities are very fine-grained, to make certain that IPC really cannot happen, that might be cumbersome to use, while coarse-grained capabilities could be circumvented.
In SeL4 it’s kinda like this: A capability is an opaque handle you can invoke to RPC into some other process or into the kernel. There’s no worry about how fine grained capabilities are because there’s no global table of permission bits or anything like that. Processes can invent capabilities whenever they want. Because caps just let other processes call your code, you can programmatically make them do anything.
Suppose I want to give a process read only access to a file I have RW access to. The OS doesn’t need a special “read only capability” type. It doesn’t need to. Instead, my process just makes capabilites for whatever I actually want on the fly. In this case, I just make a new capability. When it’s invoked I see the associated request, if the caller is making a read request, I proxy that request to the file handle I have. (Also another cap). And if it’s a write request, I can reject it. Voila!
This is how you can write the filesystem and drivers in userland. One process can be in charge of the block devices. That process creates some caps for reading and writing raw bytes to disk. It passes the “client side” of that cap to a filesystem process, which can produce its own file handle caps for interacting with directories and files, which can be passed to userland processes in turn. Its capabilities all the way down.
This kind of proxy capabilities has other benefits as well, e.g. you can implement a disk quota, or transparent compression, or logging, or ask the user (if you have a capability which can do that), or provide access to a part of the file as though it is the entire file, etc.
Or, if a program requests access to a camera, you can provide a capability with a still picture, a video file, a filter (e.g. that resizes the picture or modifies the colour) from some source (including, but not limited to, a camera), etc; this can be helpful in case e.g. you do not have a camera on your computer, or for testing.
(Other people have similar ideas, sometimes independently than I do.)
There is also a way to transmit capabilities across a network; I had thought of how a protocol would be made to do such a thing.
That works perfectly fine for an embedded computer, which is where systems like SeL4 are used.
On the other hand, I cannot see how this approach can be scaled to something like a personal computer.
For some programs that I run, e.g. for an Internet browser, I may want to not authorize them to access anything on SSDs/HDDs, except for a handful of configuration files and for any files they may create in some cache directories.
For other programs that I run, I may want to let them access most or all files in certain file systems. Any file system that I use contains typically many millions of files.
Therefore it is obvious that using one capability per file is not acceptable. Moreover, such programs may need to access immediately many thousands of files that have been just created by some other program that has run before them, for instance a linker running after a compiler.
Assuming that a pure capability-based system is used, not some hybrid between ACLs and capabilities, there must be some more general kinds of capabilities, that would grant access simultaneously to a great number of resources, based on some rules, e.g. all files in some directory can be read, all files with a certain extension or some other kind of name pattern from some directory may be written or deleted or renamed, and so on.
> On the other hand, I cannot see how this approach can be scaled to something like a personal computer.
Personally I think the biggest challenge is UX. The systems engineering is good, and it works just fine.
> For other programs that I run, I may want to let them access most or all files in certain file systems. Any file system that I use contains typically many millions of files. Therefore it is obvious that using one capability per file is not acceptable.
Yeah, of course! Just make a capability representing the containing directory or filesystem. Then the program is free to open and browse files within that directory, but nothing outside of it.
I agree with others in this thread. Think of the capability like a bearer token. You wouldn't make a token per file. Just make one for the directory.
Then make a userspace server to do that. If you want to see how this works in practice, macOS and iOS are great “pragmatic” implementations of this pattern. They use a Mach/BSD hybrid
you're absolutely right. this is just a terminology confusion I think. we can talk about capabilities as 'a replacement for ACLs', in which case, yes we need to think about policy rules and not a gigantic list of possible atoms.
from a mechanism point of view a 'capability' is really more a bearer token, the result of a policy decision, a credential that we can give to the OS to show that we have been given access without going through the rules-based machine for every operation.
IIUC one problem with such layering of capability processing is that each passed layer results in a context switch (i.e. switch of memory mappings, thrashing of caches, etc.) and its on top of the cost of passing through the kernel. In other words, you may need to pay cost of N syscalls for one multi-layered capability-based operation.
True, but capability calls in SeL4 are supposedly faster than linux syscalls. Because caps are such an important primitive, they're extremely heavily optimised.
As an example, when you invoke a capability, your process hands the callee your scheduler time-slice. So its not like linux where your process yields to the scheduler. The same CPU core will handle the entire call -> process -> return computation pipeline between multiple processes.
I'm not sure how fast it ends up in practice compared to a similar system built on top of linux. I suspect a lot of the difference would come down to implementation choices. And if its still not fast enough, you can always just set up a ring buffer or something between processes to share data directly.
Covert channels are a thing. Shared access to resources always opens the possibility of covert information passing through e.g. modulation of resource usage. This isn’t even out-of-band, it’s just a hard fact that a shared resource always creates a potential covert channel (source: Lampson 1972, A Note on the Confinement Problem).
How I had idea of a computer and operating system design, measuring time also requires a capability. Creating files (including temporary files) also requires capabilities. Shared memory is read-only by everyone; to be able to write to memory you must have exclusive access. All of these capabilities are not necessarily what the program using them intended them to be; they may also be proxies, or capabilities of the wrong type (the kernel does not know anything about the types of capabilities except those it created itself), etc. A proxy may limit communication from one program to a service. Using these as well as other things (including, but not limited to, the CPU design), there are things that can be done to mitigate these problems (including things necessary to mitigate other kind of timing attacks based on other capabilities, e.g. slowing down network access for purpose of testing its working on slow networks).
Let me know when you have enough demand to make it multiple full-time jobs. I’ve been making notes for a few years now about all the best patterns and principles for designing complex systems and your language + engine more or less hits all the right notes.
Declarative state and reactivity, lexical lifetimes and ownership, etc.
Really curious how you set it all up and what prior art was your primary inspiration.
Wow, I’m amazed someone noticed this. I’d be interested to know more about your background and interests. How much have you been looking at programming language design vs just designing for complex systems in general?
I ask because yes, I have put in a great deal of time and effort into the programming language design, and to me, I think that is the greater achievement, more than the automatic multiplayer. But the benefits of automatic multiplayer are easy for the general population to understand and the improvements programming language design is hard to convey and so people don’t normally get it. The fact that you can see what I’ve been trying to do so quickly shows you must be coming from a place which has developed that discernment for you.
While there are many inspirations, like I love coding using React for example, my primary inspiration is the last game I made, I released all the modding tools along with them and lots of non-coders loved it. The modding was JSON, which might sound primitive, but it was actually a hierarchical declarative domain specific language and it seemed to really work for people intuitively.
Easel was born from me spending 2 years trying to make an imperative programming language in a similar shape as that declarative one. I wanted it to be just as easy, but infinitely more powerful. It took a lot of iteration to merge the declarative and imperative styles into one language. There is so much to it - lexical lifetimes and ownership, reactivity like you noticed, but also weaving in concurrency and asynchronous programming seamlessly took time as well.
I really wish this is more the part of Easel that would stand out and be talked about more because I think it’s the coolest part.
Sotolongo's lineage is Twitter observability -> Google streaming -> Snowflake Dynamic Tables, which is a declarative, relational, query-optimizer-centric tradition. Marz's lineage is Storm -> Trident -> Rama, which is a procedural-dataflow, programmer-controls-the-plan, event-sourcing-centric tradition. Both are trying to unify OLTP + OLAP + application logic + reactivity into a coherent substrate, but they're coming at it from opposite epistemological poles. Rama says "give the programmer fine-grained control over partitioning, indexing, and dataflow, and trust them to design the right physical representation for their queries." Cambra, if your inference about Dynamic Tables is right, will almost certainly say "let the programmer describe the domain model declaratively and let the system figure out the physical representation." This is the classic Codd-vs-Codasyl split, recapitulated forty years later with much more sophisticated machinery on both sides.
If this is the right framing, then the two systems aren't really competitors despite solving the same problem--they're going to appeal to fundamentally different developer sensibilities. Rama is for people who want to think like Jay Kreps or Martin Kleppmann: the event log is sacred, physical data layout is a first-class design decision, and the programmer earns the performance benefits by understanding the system deeply. Cambra (if these assumptions hold) will be for people who want to think like database users: describe what you want, let the optimizer figure out how, intervene only when necessary. These are both defensible positions and both have historical track records of working. SQL's history shows the declarative camp has ecosystem advantages once the optimizer is good enough; Kafka/Rama's history shows the log-centric camp has correctness and observability advantages for event-heavy domains.
I strongly agree with that last statement—I hate using agents because their code smells awful even if it works. But I have to use them now because otherwise I’m going to wake up one day and be 100% obsolete and never even notice how it happened.
Today I gave a lecture to my undergraduate data structures students about the evolution of CPU and GPU architectures since the late 1970s. The main themes:
- Through the last two decades of the 20th century, Moore’s Law held and ensured that more transistors could be packed into next year’s chips that could run at faster and faster clock speeds. Software floated on a rising tide of hardware performance so writing fast code wasn’t always worth the effort.
- Power consumption doesn’t vary with transistor density but varies with the cube of clock frequency, so by the early 2000s Intel hit a wall and couldn’t push the clock above ~4GHz with normal heat dissipation methods. Multi-core processors were the only way to keep the performance increasing year after year.
- Up to this point the CPU could squeeze out performance increases by parallelizing sequential code through clever scheduling tricks (and compilers could provide an assist by unrolling loops) but with multiple cores software developers could no longer pretend that concurrent programming was only something that academics and HPC clusters cared about.
CS curricula are mostly still stuck in the early 2000s, or at least it feels that way. We teach big-O and use it to show that mergesort or quicksort will beat the pants off of bubble sort, but topics like Amdahl’s Law are buried in an upper-level elective when in fact it is much more directly relevant to the performance of real code, on real present-day workloads, than a typical big-O analysis.
In any case, I used all this as justification for teaching bitonic sort to 2nd and 3rd year undergrads.
My point here is that Simon’s assertion that “code is cheap” feels a lot like the kind of paradigm shift that comes from realizing that in a world with easily accessible massively parallel compute hardware, the things that matter for writing performant software have completely shifted: minimizing branching and data dependencies produces code that looks profoundly different than what most developers are used to. e.g. running 5 linear passes over a column might actually be faster than a single merged pass if those 5 passes touch different memory and the merged pass has to wait to shuffle all that data in and out of the cache because it doesn’t fit.
What all this means for the software development process I can’t say, but the payoff will be tremendous (10-100x, just like with properly parallelized code) for those who can see the new paradigm first and exploit it.
It definitely wasn’t a waste of time! I passed JLPT N1 back in 2014 after ~6 years of mostly Anki-based studying. Did Heisig’s RtK first and then mostly played old Japanese console games that I was familiar with. Never opened a JLPT study guide and passed the test on my first attempt.
Could I speak Japanese at that point? No not really… I even had a Japanese spouse! But we spoke mostly English at home. I could read quite well, but conversation was very challenging.
Then we moved to Japan. Despite not having a job that requires me to speak Japanese, I got enough live exposure just from chatting with people at the gym or in social activities that now, a few years later, I’ve backfilled all that conversational fluency that was missing. No special extra effort required, just living in an environment where I used the language reasonably often.
Anyways, the point is that all the time spent in Anki laid a rock-solid foundation that merely needed activation in the right environment for active fluency to emerge. Of course I no longer do my daily flashcard drills (and I’ve forgotten how to write quite a few kanji as a result) but the work paid off.
1. Start with a few simple but non-trivial terms and axioms
2. Define "universal constructions" as procedures for building uniquely identifiable structures on top of that substrate
3. Prove that various assemblages of these universal constructions satisfy the axioms of the substrate itself
4. "Lift" every theorem proven from the substrate alone into the more sophisticated construction
I'm not a mathematician (I just play one at my job) so the language I've used is probably imprecise but close enough.
It may be true that you can't prove the axioms of a system from within the system itself, but that just means that you need to make sure you start from a minimal set of axioms that, in some sense, simply says "this is what it means to exist and to interact with other things that exist". Axioms that merely give you enough to do any kind of mathematics in the first place, that is. If those axioms allow you to cleanly "bootstrap" your way to higher and higher levels up the tower of abstraction by mapping complex things back on to the simple axiomatic things, then you have an "open" or infinitely extensible system.