More

jkrejcha · 2026-06-10T19:10:34 1781118634

The main reason is that a lot of the reason comes around that it is incredibly difficult to do this in a general case just because of the grammar of SQL. Especially with the very different dialects, in the worst case you can get unintended remote code execution[1]

There's an incidental performance benefit on some database engines as well. When you write a SQL query, in general the database engine has to compile this to a form it can use

If you use raw string concatenation, "SELECT USERS FROM table WHERE id=1" might compile to something like (pseudocode below)

    def prepstatement1():
        ...

So if you use an explicit prepared statement[1], something like "SELECT USERS FROM table WHERE id=?" might compile to something like

    def prepstatement2(id: int):  # <--- notice the new parameter here
       ...

Some database engines also have the ability to cache a prepared statement and so these are a lil bit faster. Remember, your database has to still compile the string concatenated case, it's just a little bit hidden.

[1]: For example SQL Server has xp_cmdshell: https://learn.microsoft.com/en-us/sql/relational-databases/s...

[2]: https://en.wikipedia.org/wiki/Prepared_statement

jkrejcha · 2026-06-06T22:47:33 1780786053

Iirc Cygwin used to use it but iirc they moved away from it because they said that it was pretty slow

Though actually iirc werfault uses NtCreateUserProcess() to clone processes when writing out crash dumps to this day

jkrejcha · 2026-06-06T21:50:47 1780782647

I kinda disagree, though I do see the usefulness here. While fork/exec can be useful in some cases, it'd be honestly pretty neat if the APIs took a pidfd argument (maybe with 0 meaning current process). Only program is setuid/setgid binaries I suppose but maybe this case is better handled by special casing `exec`.

For example

   pidfd_t ps = spawn(); // creates a process stopped (kernel does this anyway by default)
   setuid(ps, 33);
   capset(ps, ...);
   socket(ps, ...);
   mmap(ps, ...);
   process_vm_writev(ps, ...);
   exec(ps, ...);
   signal(ps, SIGCONT);
   // error handling elided

I guess this is a little bit me being a bit of critical of the usual syscall APIs for not thinking about "what if I want to do this to another process I have access to" but...

It also makes things like thread safety even reasonably doable with fork. I do agree though that stuff like CreateProcess which take in a gazillion parameters don't really make for the greatest of userspace APIs

uecker · 2026-06-06T22:04:36 1780783476

Maybe, a few people proposed this. It is a lot better than a single spawn call.

But how often would one actually need this? And what are the semantics? Refer arguments (e.g. file descriptors) to the current process or the other one? How are cross-permissions handled? It seems a lot of complexity...

Someones proposed a ptrace_syscall which could achieve the same thing.

jkrejcha · 2026-06-06T22:40:18 1780785618

> But how often would one actually need this?

Well, the idea is that it'd probably be close to the default API for spawning processes (and could even be the bedrock for posix_spawn and friends in libc (and potentially even "simple" fork cases[1])). fork/clone would be the special case

In most cases, most programs don't need special setup. Something like `ptrace_syscall` would also work for this and would be probably the way to do it with the backwards compat limitations of nowadays

ptrace-ability seems to be generally how permissions for this sort of thing are handled in general (see also procfs, process_vm_writev, ptrace, etc). The complication is a little bit around setuid programs but either you could special case execve to imply SIGCONT for setuid or have execve also imply a SIGCONT as well

[1]: Probably would be rare for a compiler to optimize it though

jkrejcha · 2026-06-03T12:22:18 1780489338

Windows has had native support for always on top windows for over 25 years

jkrejcha · 2026-06-02T00:47:06 1780361226

I've mentioned this elsewhere in the thread, but I think it's a difference of view on what malloc represents. Operating systems do have "reserve this part of the address space" APIs and these reservations don't get charged against your commit because you're simply reserving the space, not committing to using it, and so the operating system doesn't need to back it with anything.

In this worldview, malloc is like me buying a plane ticket at the counter for a specific flight that's going to leave soon. I'd be really annoyed if I were bumped off a flight I just paid for (and would've rather been told "that flight is full, try again later" (malloc returns NULL)). This is, for example what Windows does. Under memory pressure, it'll say to applications, "hey no I'm not in a giving mood for memory right now" (and will sometimes bump the size of the pagefile if configured to do this, but only up to a point).

The thought behind this is that well... applications have to handle malloc returning NULL anyway. Whether that's calling abort and giving up is one matter, another might be to retry the allocation at a later time (maybe after Windows has bumped the pagefile size), another might be to handle an error using some preallocated buffer or whatever.

jkrejcha · 2026-06-02T00:36:12 1780360572

malloc can just return NULL (in specific, mmap returns -ENOMEM and your libc translates that). Applications need to check for success anyway

jkrejcha · 2026-06-02T00:33:10 1780360390

To be 100% fair, it's rare that processes are cloned on Windows, if only because it's part of the Native API that applications generally don't use directly, and CreateProcess is easier and does all the housekeeping stuff, etc, that people writing Windows applications generally come to expect (or don't even know happens)

I do think overcommit was a poor design choice, but I think it probably mostly does logically follow from the fact that fork and friends are the only ways available to create a process that's available to userspace. It's quite unfortunate though.

Part of the problem is that some applications wanted to reserve lots of address space but didn't necessarily want to touch it right away (such as when they were using it sparsely). Something that VirtualAlloc(x, MEM_RESERVE) (or mmap(..., MAP_NORESERVE)) would be suited for. But while malloc exists, mreserve doesn't in libc, and I think it was pretty uncommon to use it.

jkrejcha · 2026-06-01T15:28:58 1780327738

Windows doesn't use fork/exec for process creation in any relevant way today

There are Native APIs for implementing fork (needed for the obsolete POSIX subsystem, primarily), but even on the Native API side, processes are usually spawned through NtCreateProcess or RtlCreateUserProcess, though there is a bunch of setup with regards to the Csr APIs for the Win32 CreateProcess[1]).

[1]: https://stackoverflow.com/a/69605729/2805120

jkrejcha · 2026-06-01T15:24:03 1780327443

It has also created this unfortunate assumption a lot of the time that malloc and friends are (infallible OR crash) and, separately, can sometimes have potentially weird tendencies to force undefined behaviors on otherwise well-defined programs (I think primarily around mmap, although I'm not remembering the details super well).

Agreed though, overcommit is the culprit here. I get why it happened (unfortunate consequences of fork and friends existing as the way to spawn tasks and wanting those to be both performant and not fail in frustrating conditions), but I don't think it was a design that aged particularly well.

I actually like somewhat the notion of how Windows handles these two things

1. For address space reservations, you can reserve address space but in order to touch it you have to commit it. Commits have to be backed by something (RAM, a file, pagefiles if they exist) and if a commit fails, they'll get NULL back from malloc. It allows code to be more correct in the face of low-memory conditions or to try again later (Firefox for example, does this[1] on Windows).

2. Process creation is done with a specific API to create processes. The only problem with this I think is that you have to specify everything at creation time, but you could augment this by creating processes in a stopped state (iirc Linux has to do this anyway to set up some stuff before it can hand over control back to userland) and having the parent send FDs to the child or whatnot. Windows... doesn't do this, it has a couple of kitchen sink APIs for creating processes and setting up stuff like the standard streams... in any case I'm getting off topic.

Don't think there's much about that design that can be changed now though

[1]: https://hacks.mozilla.org/2022/11/improving-firefox-stabilit...

jkrejcha · 2026-06-01T15:07:06 1780326426

> These days you can do oom score adjusting, which is not as strong as a pardon.

Writing -1000 to /proc/<pid>/oom_score_adj will cause the OOM killer not to consider the process at all :)

From the man page proc_pid_oom_score_adj(5)

> The value of oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from -1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX). [...]. The lowest possible value, -1000, is equivalent to disabling OOM-killing entirely for that task, since it will always report a badness score of 0.