Multi-Kernel Drifting

Calamityjanitor · on Oct 26, 2022

I've been running OmniOS on my home NAS since 2015. In theory you should be able to compile and install any open source software on OmniOS/Illumos, but let’s be honest, the world assumes you’re running linux. Devs already have enough issues supporting different linux distros, sometimes you’ll see specific BSD support which can help but good luck if your issue is Illumos distro specific. Sometimes fixing these issues can be interesting and offer teaching in fundamental differences in unix.

squirrel · on Oct 26, 2022

A debugging tour de force all the way down the stack, from Unix command to reinstall to truss and dtrace to C code to assembly to analysing data structures! Leaky abstractions indeed.

yjftsjthsd-h · on Oct 25, 2022

I suppose that's an argument for running the same OS everywhere; you can't accidentally build against OmniOS if your run all Helios. I wonder if it would be possible to mark the package with the distribution that it was built on to avoid this kind of thing? ("You just tried to run an OmniOS program/module on Helios; are you sure you want to do that?")

hamburglar · on Oct 26, 2022

Spitballing here, but one could imagine a compatilibity assurance mechanism that baked in a hash of all the structs and function signatures in the kernel interfaces in some well-defined way that fixes the ordering. Runtime check when loading the module, bail if the hashes don’t match.

spijdar · on Oct 26, 2022

I haven't tracked OpenSolaris/illumos in a while, but it seems kind of bizarre that different "distros" of illumos would have different kernel ABIs. To me that's the definition of a different OS, in the way OpenBSD was a divergent fork of NetBSD. Reminds me (in a negative way, if I'm totally honest) of stories of old proprietary unixes/BSDs from the 80s/90s, with really minor but incompatible changes to a shared BSD/SysV source.

Anyway, I guess there's a lot of practical problems in defining either a new ABI or ABI version in the ELF headers, so the different distros just sort of pretend to all be "Solaris". If I remember, Solaris (de facto?) disallows static linking, and uses dynamic linking with libc to solve the ABI compatibility problem. Wouldn't help stuff like kernel modules/FUSE, I reckon.

The biggest question to me is why Helios is using the FUSE kernel module from OmniOS. If the kernels were strictly identical, this makes sense, but if they're not, it seems weird. Since Helios is a secret Oxide thing (?), guess it's a mystery.

jclulow · on Oct 26, 2022

> Helios is a secret Oxide thing

It's not so much a secret as we've been rather busy, and I haven't had time to clean up the repositories to the extent that I'm comfortable making them public. Hopefully real soon now, and absolutely before we ship the hardware to anybody.

> different "distros" of illumos would have different kernel ABIs

We do have strong guarantees around various kernel API/ABIs, which one can absolutely use to produce out of tree modules that will work over the long term and across distributions. We call this the DDI, or Device Driver Interface, and it's documented as public and stable in our manual.

The vnode layer bits that FUSE uses are ostensibly sort of public but not documented. This is of questionable utility, to be honest, but it's apparently there. A mistake was likely made in adding support for inotify to a particular fork of illumos, and a struct member was added in the (again, wishy-washy) "public" preamble of the vnode struct.

> The biggest question to me is why Helios is using the FUSE kernel module from OmniOS

This is my fault. Before a distribution can be self-hosting, it needs to exist at all. I am a big fan of OmniOS so I had bootstrapped the first Helios bits by building modified OmniOS bits on OmniOS, and then massaging the packaging metadata until it looked how I wanted it for Helios. I then installed a Helios machine from those packages and everything was subsequently built on Helios from then on.

Regrettably I didn't realise the business with the vnode stuff, or even that we had a FUSE package. It has not been rebuilt since that original bootstrap build, so it was built with the wrong vnode.h. In my defence this is the first time in two years anybody has tried to use FUSE on Helios so nobody else noticed either!

skissane · on Oct 26, 2022

> I had bootstrapped the first Helios bits by building modified OmniOS bits on OmniOS, and then massaging the packaging metadata until it looked how I wanted it for Helios

What are the differences between OmniOS and Helios? Why did you feel the need to create a new distribution?

yjftsjthsd-h · on Oct 26, 2022

> kind of bizarre that different "distros" of illumos would have different kernel ABIs.

I had a similar thought but on further consideration Linux does about the same. Linux freely breaks internal ABIs at will at least across versions; you can't use modules from Fedora on Debian in practice, if nothing else then because they aren't on the same exact minor version.

> The biggest question to me is why Helios is using the FUSE kernel module from OmniOS.

It was kinda unclear, but the impression I got was that they don't in general and this was a one-off mistake where someone compiled on the wrong machine by accident.

spijdar · on Oct 26, 2022

It's true that Linux does this, although my impression is that FreeBSD and Solaris have comparatively stable ABIs. I think they usually keep the kernel ABI stable between minor patches, and change it with major releases. I've never futzed about with e.g. nvidia drivers on FBSD or Solaris, so I dunno.

Anyway, it was definitely just a weird mistake, I'm more intrigued about how varied the Illumos branches/distros are, since I had thought they were mostly just different userlands/package managers, when they seem to have different kernel patch sets and bugfixes...

evol262 · on Oct 26, 2022

The UNIXes of the 80s/90s had far less in common than this. They had a shared root, but diverged very far in terms of userspace and kernel.

spijdar · on Oct 26, 2022

Oh for sure, it just feels like something that'd have been more common in that era. While you technically could, I doubt any Linux distros out there patch the kernel's VFS in some incompatible way like this, or (more relevant) any of the BSD "distros" based on the big three.

Makes me think of all the arcane gibberish that GNU Autotools checks for, so many minor variants in system headers and APIs...

evol262 · on Oct 26, 2022

This honestly isn't that surprising. It's not incompatible per-se. Linux has plenty of ABI-incompatible kernels (-rt, -ck, -pf), plus the assortment of forked kernels which never had changes merged back into upstream for various embedded boards, intel's fork for feature development, the same for network, and so on.

One of the "big" guarantees Red Hat customers like RHEL for is that they will not break kernel ABI through a release cycle. They like this because it's really hard to do, and even very minor dot releases of the kernel provide no assurances that the ABI won't break.

This seems to be normal distro stuff, and the problem broadly seems to be in two parts: * The Illumos team and distros need to either figure out a better mechanism for indicating minor kernel versions than the old "patch release", or they need to actually pay attention to it now that they know things diverge sometimes. * This worked fine when building from source. The assumption that all of the IllumOS distros can freely share binary packages seems to be a bad one, and they may need to go to the same process as every other distro and actually... rebuild.

msizanoen · on Oct 26, 2022

The Linux kernel actually has something like that: https://www.kernel.org/doc/html/latest/kbuild/modules.html#m....

It's used on Debian to support out-of-tree modules: https://kernel-team.pages.debian.net/kernel-handbook/ch-vers...

bfung · on Oct 25, 2022

The title bait’ed my inner troll.

The author did not disappoint at the very end of the article.

https://knowyourmeme.com/memes/multi-track-drifting

marginalia_nu · on Oct 25, 2022

Deja vu...

devmor · on Oct 26, 2022

I've compiled this script before...

teekert · on Oct 26, 2022

I love seeing Hugo with the Terminal theme in the wild. [0]

[0]: https://github.com/panr/hugo-theme-terminal

jiggawatts · on Oct 25, 2022

It always blows my mind to see people trying to do something that is Windows native on Linux.

Would you automate your Linux build using VB Script? Why not?

When in Rome, do as the Romans do!

Citrix actually purchased a company like this and sold their Linux-based virtual appliance that did one thing and one thing only: create Windows disk images for running Windows applications.

It was the stupidest thing I had ever seen that was packed full of clever solutions to problems that didn’t need solving.

yjftsjthsd-h · on Oct 25, 2022

Eh, it depends on context. If all your build servers are *nix boxes, no point in spinning up an NT machine just for building those images. In a Microsoft shop, controlling *nix boxes with powershell is probably a decent idea.

encryptluks2 · on Oct 26, 2022

> Would you automate your Linux build using VB Script? Why not?

That is a completely different concept and not at all relative to building deployment images. They are not compiling Windows here on Linux.

Creating build images doesn't necessarily require anything that is Windows specific. An unattended XML file can be added to a disk image without needing to boot into a Windows environment. The challenge with Windows automation though is to get an optimized image you need WIM mounting and modification. Linux even has a WIM tool, but the tools like DISM are not really there or the Windows ADK. I'm sure DISM can be reverse-engineered and then doing automation in Linux would make a lot of sense actually since it would be less-resource intensive. Spinning up a container or Linux distro is usually magnitudes faster than a Windows machine, and Windows automation in itself sucks comparatively. I would definitely pick doing automation on Linux whenever possible as long as it gets the job done well.

jiggawatts · on Oct 26, 2022

This is precisely the attitude that I'm arguing against.

No, Linux is not superior in any way for Windows install image automation!

Why the heck would you reverse engineer and port the DISM tool, when the DISM tool already exists!?.

Windows has a built-in set of free tools for automating image updates, integrations, and builds.

E.g.: you can mount a WIM image offline (without booting a VM or needing a container), and inject updates, software, configuration, or even drivers.

This is 100% supported an Will Just Work. It takes minutes, not hours.

The article is the rough equivalent of trying to build a Linux install image using Windows, via some broken FUSE EXT3 driver the causes blue-screen crashes of the entire host OS, then spending a month trying to fix that instead of just using Linux to solve the task like a normal person.

PS: I do 99.9% of my work with Windows. The one time in my career I had to inject drivers into a Linux install image, I created a Linux VM and just used that. (I had to make a version of XenServer that could boot from a MPIO FC adapter during install time, even if a path was down. Fun stuff.)

hamburglar · on Oct 26, 2022

I have nice, mature infrastructure (and expertise) for managing linux boxes. I do not have any desire to build or purchase infrastructure (and expertise) for managing windows boxes. If I have a task that can be done reasonably on my existing infrastructure, I’m not adding Windows to the mix without good reason.

Kwpolska · on Oct 26, 2022

If you’re building Windows images, you’re already doing serious stuff with Windows, and you’re probably about to deploy it to multiple workstations. Having a Windows server in this scenario makes a whole lot of sense.

hamburglar · on Oct 26, 2022

End-user workstations are way different than operational hosts. The workstations themselves are not "serious" because they don't necessarily need configuration management, compliance monitoring, centralized user management, operational metrics monitoring, yadda yadda yadda. The service which builds those workstations' custom images is operationalized and does need all this. I'm not a Windows shop. There's no Windows in my datacenter. That's reason enough to use "non-native" processes for building Windows images.

encryptluks2 · on Oct 26, 2022

> No, Linux is not superior in any way for Windows install image automation!

Having to configure a Windows environment for programmatic actions and subsequent changes is definitely more challenging. Even with the recent SSH access, which I believe the WinRM feature is still in preview, there are a lot of challenges for authentication that simply are not present in Linux.

> Why the heck would you reverse engineer and port the DISM tool, when the DISM tool already exists!?.

Because if you can do it in Linux, it makes sense to use an OS that is optimized to run like 95% of the world's server infrastructure due to performance and operational benefits. I really don't have time to get into an argument about why Steam OS chose Linux and why people use Wine to run Windows games on Linux. However, those are just a few indications that you're wrong.

> Windows has a built-in set of free tools for automating image updates, integrations, and builds.

They have a tool. I'm aware of Windows ADK. Everything else they had is no longer updated as far as I can tell. Most of them are just crappy GUIs wrapped around basic instructions that are as simple as creating partitions and copying files.

> E.g.: you can mount a WIM image offline (without booting a VM or needing a container), and inject updates, software, configuration, or even drivers.

It takes several minutes when in reality it is a subpar experience for developers looking to create multiple images. If someone creates a tool in Linux to do the same thing, but does it better, why does that bother you so much? Are you as bothered that Windows essentially has tried to create a Linux subsystem for Windows because developers were leaving Windows in droves because they found developing on Linux more pleasant?

> The one time in my career I had to inject drivers into a Linux install image.

You mean essentially installing a package via cloud-init which takes a second to add? Injecting drivers with Windows is still tedious if done through DISM and not the unattended XML.

jiggawatts · on Oct 26, 2022

> You mean essentially installing a package via cloud-init which takes a second to add?

No. I had to rebuild the entire image, from the boot image up, to support MPIO Fibre Channel with the capability to install successfully via PXE network boot even if one of the two fabric paths was down. This was to support large-scale deployments to diskless blade servers using a central SFTP server, where the installation target was a SAN disk array.

This is actually possible to achieve purely via Windows, using various Cygwin tools, but it is incredibly painful.

I used Linux in a virtual machine to do the build, which made this was relatively quick and easy[1] to implement. This makes sense, of course! Windows is easy to use to build Windows images, Linux is easy to use to build Linux images. Similarly, I would use MacOS to build MacOS images, and so forth.

Note that I basically never use Linux, except for rare cases such as needing to deploy the XenServer hypervisor, or some network appliance. Nonetheless, I did not "reach for Windows" as my preferred option, despite my vastly greater familiarity and skills with it compared to Linux.

That's my point. The original article author has only a hammer and sees everything as a nail, even screws. It doesn't matter if the hammer is great at driving in nails. It can be awesome at that. If you need to drive a screw, pick up a screwdriver, instead of writing clever blog posts about how hammers can be laboriously filed down to enable them to clumsily turn a screw.

[1] The requirement was complex, the actual change steps to get there were not too bad, even for a "non Linux person" like me.

encryptluks2 · on Oct 27, 2022

> No. I had to rebuild the entire image, from the boot image up, to support MPIO Fibre Channel.

I've never seen this. At least for the distros I'm familiar with it is a simple as modifying your mkinit to include a driver. Maybe XenServer is different or whatever it was that you were using.

haileys · on Oct 26, 2022

> Would you automate your Linux build using VB Script? Why not?

No but Windows builds are automated with Unix tools like Perl and Make :)

andai · on Oct 26, 2022

This is true. Tried compiling Windows XP recently and the use of Perl caught me off guard!

delta_p_delta_x · on Oct 26, 2022

> It always blows my mind to see people trying to do something that is Windows native on Linux.

> When in Rome, do as the Romans do!

This seems to be a running trend for people with a deep dislike for anything Windows-native–they tend to force non-Windows paradigms on Windows, and when this breaks, blame Windows for not being 'POSIX-compatible'.

Probably also led to the explosion of tools like Cygwin, Msys2, Mingw, etc.

drekipus · on Oct 26, 2022

What's the windows paradigm in this example?

Tools like cygwin, etc, exist because there (was) absolutely nothing like it on windows back in the day, or it was locked behind some paid for editor, or most often it relied on obscure hacks that were hard to read.

"find which file contains this string" is a non-windows paradigm.

pjmlp · on Oct 26, 2022

There wasn't free beer POSIX support, lets put it like it.

https://en.wikipedia.org/wiki/Windows_Services_for_UNIX

AlmostAnyone · on Oct 26, 2022

> "find which file contains this string" is a non-windows paradigm.

Really? What about FIND and FINDSTR?

FIND command is older than Linux.

thrwawy74 · on Oct 26, 2022

I actually had a neighbor who worked for Citrix and this describes him to a T. He was such a clever waste of time. I miss you Skip.

ilyt · on Oct 26, 2022

Think the point there is to not pay for windows license to make some images

throwaway09223 · on Oct 26, 2022

> trying to do something that is Windows native on Linux.

You realize this blog post is about Illumos, not Linux, right?

Illumos isn't nearly as well-polished.

A lot of these types of tasks are easier on Linux than they are on Windows native.

bragr · on Oct 26, 2022

>You realize he's not using Linux, right?

FTA: I had already gotten it working with QEMU under KVM on Linux but wanted to port it to propolis on our illumos distro, Helios.

throwaway09223 · on Oct 26, 2022

As I said, this is a blog about Illumos, not Linux (which wasn't notable, because it Just Worked on Linux)

Illumos is not Linux.

hinkley · on Oct 26, 2022

I think you are quite fixated on a noun when most of the point is Windows on !Windows.

throwaway09223 · on Oct 26, 2022

I think you've missed the point that this sort of thing is often easier on Linux than on Windows.

Many of the leading repair tools for Windows are Linux based, for example.

If you want to run older Windows software, you'll have an easier time on non-windows platforms via wine than on modern Windows. (Or, you can run otvdm, wine on windows itself)

etc etc

nix23 · on Oct 26, 2022

Whait linux is polished compared to illumos? Man you need a reality check.