It amazes me that Microsoft haven't replaced the Registry with a simple directory structure, not that it would help for this particular bug, but it would surely be an improvement. I maintain a library for accessing the registry from Linux (https://github.com/libguestfs/hivex) and after writing it I also wrote this screed about how it sucks in just about every way possible:
Actually you can use the Windows Projected File System to project the registry into the file system, making registry keys and values appear as files and directories.
Something is still translating the virtual files back and forth to the half-arsed hive format. Would recommend reading the link I posted since I have actually reverse-engineered the hive format.
Certainly there is a lot of legacy with the registry, but how would any of these issues be improved by moving to a file based config? All these issues could still exist under that model, and there would be new issues too.
Like for example, you already point out how the type system in the registry is very limited. But isn't the filesystem even worse? Everything there is binary blobs with no types at all. So how does that improve things?
It seems like your complains don't really have to do with the "directory" structure of the registry much, so I don't think moving to a file based model would really change anything. You'd just end up with the same legacy issues, but spread across more files.
Finally, AppData wasn't introduced with Vista, but rather it's always been there if applications need to store file-based data rather than individual configuration values. That is not a new or improved way of doing things as you seem to imply in the post.
The problem with hive is that the type has to be set correctly yet several types have no established meaning. For example it's totally random whether a number will be stored in binary with type DWORD or stored as a string (with who knows what type and encoding). However store it in a different way when writing to the registry and Windows or whatever app wants to read that field will break. In a way it's worse than if it wasn't present at all.
NTFS specifically has file forks ("Alternate Data Streams") and I guess you could use those to store a type, although whether using forks would be a good idea or not is up for debate.
This seems like the same kind of argument as saying that because JSON doesn't support the number formats I like, it would be better to use a notation that doesn't support types at all. Well, I think the reality has shown that in fact having some support for types sometimes is the more pragmatic way of doing things.
If those types aren't enough for your use case, then you will be forced to roll your own types in some binary/string data anyway, so it seems like it's strictly more work if you just always force everyone to roll their own.
And even then you still end up with the possibility of people using the wrong syntax for your hand-rolled types, like not quoting values that are supposed to be strings or quoting values that are supposed to be numbers.
Besides, wouldn't it be easier to fix this just by adding some more types, or deprecating everything except REG_SZ or something? What's the advantage of moving to a directory based model?
Don't see how this problem would be included in files. For every setting in the registry, there's a piece of software (and ultimately, a group of people behind it) that claims ownership to it, and determines the correct type for it.
File-based configs on Linux have the same problem anyway. The semantics of any config file you find are defined by the application that's consuming it. Any two config files that superficially seem to be using the same format, may in fact use a completely different one - and you'll never know, up until you edit one and it blows up in your face, because it cannot parse empty lines or #-comments, or escape characters, or negative values, or values larger than 2^16, or...
Whether altering Windows registry or a Linux config file, you cannot make a correct modification without knowing what the owners of the modified settings expect.
> It amazes me that Microsoft haven't replaced the Registry
how does this amaze anyone? how do people think backwards compatibility works?
Microsoft, supporting Windows, promises to make every effort to maintain backwards compatibility wherever possible so that programs compiled for, say, Windows 95 will run unmodified on Windows 10.
Not every program from 20+ years ago runs, but a lot do! That's a very hard thing to do if you wish to continue to advance the technology you use in your operating system. Apple doesn't even try.
Microsoft have taken steps to break backwards compatibility a few times in the name of progress and every time I talk to people during those transition periods, it is a 50/50 split between people who don't know that they've been given a decade of notice and now their "tried & true" software development paradigm doesn't work anymore, and people who are angry because most of the old ways are still supported.
The registry wasn't even supposed to be what it is today. it was a small stop-gap thing to stand in place while a better solution was developed. Developers discovered it, started using it, and now Microsoft has to support it. Of course it's rubbish; metaphorically, it's a piece of a whiteboard used as a doorstop until the real doorstop is delivered, except for some reason people started using it for important stuff and now everyone needs it.
>how does this amaze anyone? how do people think backwards compatibility works?
That doesn't have anything with backwards compatibility. Nothing forces MS to stick to old ad-hoc memory dump format. Neither there is anything that would suggest registry is deprecated, new Windows components keep using it and adding piles of junk into it.
not everyone uses the APIs to interact with the registry. some manipulate the file itself. it's not supposed to be possible, but it is, and people do it. you have to keep the registry itself to keep backwards compatibility.
if everyone used the API, then yes you're correct. in my opinion, Microsoft should do what you stated, but they don't want to, out of fear of breaking backwards compatibility for people who do the wrong thing. their needs are just as valid as yours or mine.
Most of the actual technical issues you list have more to do with it being extended for the last 30 years in a backwards compatible way than anything to do with it being a hierarchical db instead of a filesystem.
I still see it as a file system, very similar to NTFS (similar in the sense of having similar features), apart the (recent) project just mentioned (ProjFS) there existed a file system like driver for it, only for the record:
It probably seems similar because file systems are typically classified as a type of hierarchical db themselves. That being said "I can represent it with a file in a filesystem" is different from "it is a filesystem" in posix (nearly) everything is accessible through the filesystem, even network sockets, it doesn't mean everything's canonical representation is a filesystem it just means it's mappable.
Regardless the point wasn't "a filesystem couldn't represent a rewritten registry" it was that the registry is actually a database today (whether viewed as a file-system like db by the reader or hierarchical db it is listed as) and the rest of the technical problems have to do with it being 30 years old and not rewritten not that it wasn't written with a file system representation as primary view in the first place.
>This misses the point: the Registry is a filesystem. Sure it’s stored in a file, but so is ext3 if you choose to store it in a loopback mount. The Registry binary format has all the aspects of a filesystem: things corresponding to directories, inodes, extended attributes etc.
> The major difference is that this Registry filesystem format is half-arsed. The format is badly constructed, fragile, endian-specific, underspecified and slow.
Anyway, file systems and databases are essentially similar, the point revolves more around the poor implementation of the Registry (whatever it is).
I think everyone is in agreement it's bad, as I said:
> Most of the actual technical issues you list have more to do with it being extended for the last 30 years in a backwards compatible way than anything to do with it being a hierarchical db instead of a filesystem.
My first line about it being a database was about point 7 in the same link:
> Back to point 1, the Registry is a half-assed, poor quality implementation of a filesystem. Importantly, it’s not a database. It should be a database!
Technically a file system is just a special database. I think a better formulation of the authors point would be "the registry is a lot like a file system, even though a more traditional database approach or fully embracing it as a file system would have probably worked out better".
Also, they would have been able to at least improve the on-disk format with a major version; I highly doubt that the registry itself is backwards-compatible anyway and there are probably very few programs that access it directly.
That's a really good take on what the author was going for, I appreciate the take! I still disagree that it starting out as a filesystem or database has anything to do with why it's so crap 30 years later but it gets to the crux of the topic much quicker.
With how tightly the APIs for accessing the registry are coupled with the model and encodings of the registry, particularly the driver APIs for it, I don't think it would have been so easy to just swap out the back end without breaking something though (which Windows avoids like the plague) but maybe doable by someone more optimistic than me :). The real "rewrite" was the push for Universal Windows apps using the .NET platform which stores everything for the app in XML files and shadow directories instead of the registry. Of course that didn't take over quite like they hoped so they ended up back with using the registry they were trying to leave 10 years later.
Yes but a filesystem is also a hierarchical database.
A filesystem solves these issues specifically because it avoids reimplementation. As the registry has been extended as you say it approaches parity with filesystem functionality, but on a parallel track.
At a high level, avoiding multiple implementations of similar metaphors is ideal in terms of security. Reuse what you have.
I'd agree a filesystem is also a type of hierarchical database but the author doesn't think so:
"Back to point 1, the Registry is a half-assed, poor quality implementation of a filesystem. Importantly, it’s not a database. It should be a database!"
These are the kinds of categorizations that people can go nuts over. Rather than get too hung up on words I'd say that whatever this is, it can effectively be represented by a filesystem and therefore it should be as a matter of general architecture and security principle.
I'm actually with the author that if it were going to be rewritten a freshly written columnar database would be way more efficient than representing it as a filesystem but that either would be better than what we have after 30 years. I just don't think "it wasn't a filesystem originally" has much to do with why it's so crap now. Similar case: posix specifies network sockets be accessed as files/filesystems (as most everything in posix is) but nobody actually used that representation because it's inefficient even though it's the standard and easily mappable to files/filesystems. Well I think Solaris actually allows both but the point stands.
Sorry, I'm unfamiliar with what you mean by "network sockets be accessed as files." Do you mean unix domain sockets? These are in fact commonly used and they're certainly no less efficient (more efficient in many ways, in fact).
UDS are interfaced with via the same berkeley sockets api, not via the filesystem api. Have you ever written applications that use them?
I don't mean unix domain sockets, those are known as IPC sockets. The berkeley sockets API you are familiar is actually exactly what I was talking about. It does offer both types of sockets (the other being network sockets as I originally mentioned) but it uses handles in an abstract namespace not files in a filesystem (e.g. in Linux it's still a FD but it doesn't map to an actual file on a filesystem it's just a unique handle in its own namespace).
What I was referring to were things like /dev/tcp/ and /dev/udp you'll find on Solaris (or emualated via bash on most systems) which are actual filesystem paths instead of handles in abstract namespace. A usage example of this comparable to binding to a socket with the BSD API to udp://localhost:2048 would be "echo "example" > /dev/udp/localhost/2048". The actual I/O is through the standard file/filesystem interface just like /dev/random. It's not the best for network sockets though so they tend to get a raw handle in every modern OS, even if it does mean rebuilding the wheel on some other things.
Network sockets are the canonical example of "not everything in Unix is a file". "Everything is a FD" is true but "everything is a handle" is true on any OS design, the uniqueness that things like ram and disks are just files under / did not hold true with networking.
And yes I have written plenty of apps with ipc sockets and network sockets and raw sockets and even underlying device access (for things like custom Ethernet packets). I'm in networking by profession.
I think there's some confusion about how the sockets api works, let me see if I can clear this up.
Posix does not specify that network sockets should be accessed by file paths. It's possible to do so, but unspecified by the standard.
Sockets produced by socket(2) are regular old file descriptors, just as created by open(2) on a file path, or any other descriptor generating syscall like pipe(2) or epoll_create(2). There is no separate representation among any of these -- they are all just file descriptors. There are many, many ways to create descriptors and many aren't associated with a filesystem. There's no efficiency issue here, nor is there a divergence from a consistent pattern.
If you like, you can use fchmod(2) on a descriptor generated by socket(2) and change its permissions. You can track it by its inode. It doesn't matter that the descriptor is not linked to a filesystem, any more than for a similar descriptor created by pipe(2). They all have the same functionality and fit within the same consistent metaphor. When you run grep | grep, the pipe descriptor has permissions, mtime, ctime, atime and the rest. Everything just works.
It's trivial to write a filesystem to expose descriptors, in fact /proc does this already for all descriptor tables across all processes. There's no rebuilding of any wheel - the point of commonality is the "struct file" in linux/fs.h.
There's no such thing as a "raw handle" here, btw. That phrase has no meaning.
Thanks for giving detailed points instead of just asking if I've used sockets before - I think I see the main divergence as a result. First though I noted a big mistake on my part: I referenced the wrong UNIX standard originally. This is a huge error on my part, I meant to reference TLI (later standardized XTI, the "competitor" to BSD sockets at the time) not POSIX as what defined /dev/udp, /dev/tcp, and the APIs to access it instead of BSD/POSIX style socket APIs e.g. 't_open("/dev/udp", O_RDWR);'. My apologies I'm sure I was completely misdirecting a lot of the conversation and causing a lot of confusion with that error.
For where the main divergence in what we are each talking about though when I originally said:
> Similar case: posix specifies network sockets be accessed as files/filesystems (as most everything in posix is) but nobody actually used that representation because it's inefficient even though it's the standard and easily mappable to files/filesystems.
I was talking about literally exposing networking through the filesystem by mapping the construct to files and paths as that's what the author's registry tool actually does and what the author was proposing Windows should do in a rewrite - not whether or not sockets can be backed by the preexisting FD handle in an arbitrary namespace using custom socket functions to manage the socket efficiently. As a result I was trying to explain to you why BSD-type sockets don't use a literal filesystem mapping even if they have an FD and you were trying to explain to me how they were still backed by a FD even if it's not in the filesystem (i.e. describing the same API from opposite ends). I agree fully with your take it's a standard FD which has no performance concerns once created and can be treated as such but looking back I think I tried quite hard to point out I was talking about literal files/filesystems mapping not the FD handle so I'm not sure where the split came from... perhaps normally it"everything is a file" vs "everything is a FD" not a big distinction but this case just happened to be about a literal filesystem mapping not whether or not it would end up using a FD.
Also to note when I talk about "inefficient" I don't mean "slow to compute" (after all it ends up a FD once opened, as noted heavily at this point) it's the interface which becomes inefficient (which is what the registry article was dealing with). Even though you call it trivial as in "trivial to expose a mapping", which there is no argument is trivial BSD sockets offered much more straightforward and simple moldability around internet protocol network socket concepts than the filesystem approach by the author/TLI's approach which is a big reason BSD sockets won out. The "rebuilding of the wheel" are that the BSD socket API defines functions fit to purpose instead of molded around traditional file API naming and structure like TLI/XTI did.
In the section "It's not the best for network sockets though so they tend to get a raw handle" 'raw' was meant as another adjective to point out it was just a non-filesystem mapped reference (howso depends on the OS, in *nix still a FD in others not it doesn't really matter though it's just a ref) per the prior sentence not meant to be taken that 'raw handle' was a proper noun describing a different type of handle definition you'll find in the source code.
I appreciate the time, at the very least I'm sure I'll never make the error of conflating TLI with POSIX again and at the most I may have solidified some internals I don't get to think about every day!
No thanks: the registry is a truly huge simple key/value store, which is something files-in-dirs are terrible for because almost every single one of them would take up a full block on disk instead of the fraction of a block they actually need.
A better solution would be a simple database (like sqlite3) but then the immediate counter-argument is "okay, so we're done: it's already a simple database", because the registry hive is literally a file-backed database in the same vein as sqlite =)
The Windows registry is not a "simple key/value store" by a long shot. It is hierarchical, there are many different types of value, and there's a complex system of security attributes. These are simple facts.
You're right that a file per value would take a whole block on disk given the way some filesystems are currently implemented, but that's not an immutable feature of all filesystems - some Unix filesystems store small files in the inode. A real database is possible, but also the registry must be available very early in Windows boot (actually it's used by the bootloader, but also by the critical device database) so you'd want something that's at least easy to read with a smallish amount of code.
I feel this mattered much more in the mid 90's when 4GB disks were common in PCs, but with today's modern storage sizes this is trivial. Besides, the NTFS MFT already stores small files directly in the indexes.
I imagine that the registry is optimized for many small values (eg a DWORD - 4 bytes). Most filesystems wouldn't be very efficient with tons of 4 byte files.
Just naively translating the registry into a NTFS directory structure would require 1kb per value, simply because that's the size of a file record (NTFS already has an optimization to store small files directly in the file record if it fits in next to all the attributes and ACLs).
Also the Windows Filesystem driver stack is not very efficient for accessing many small files. It's built for flexibility and security, not speed.
A registryfs would be. The data structures underpinning access would not need to change.
The importance of using a filesystem interface is reuse of the access control mechanisms and filesystem API. It would avoid the type of bug above, due to nesting a hierarchical permissioned structure inside a file.
Implementing the registry APIs, but backed by a regular filesystem (as Wine does) would be the sensible thing for Windows to do. (I looked at the source of Wine just now and I'm fairly sure nowhere does it process hive files.)
PowerShell exposes the hives as a directory structure, and has for a decade or more. just type "HKLM:" or whatever hive you want and start using "cd" and "dir" all you want.
The guy who implemented that really did a disservice to the filesystem metaphor, though. Instead of making values analogous to files, they're properties of registry keys, so instead of Get/Set-Content, Get-ChildItem, etc. you need to do some gymnastics with Get/Set-ItemProperty to work with them. For example, if you want to find a registry value with a particular name, you can't just do 'dir -rec SomeValueName' to find it like you can on the filesystem provider.
well, the registry has types. files are raw binary data that is almost entirely untyped. how could they possibly enforce typed data without diverging from the filesystem metaphor?
Cmdlets can have provider-specific parameters, so in this case I would add a registry-provider-specific data type parameter with sensible default behavior to New-Item and Set-Content.
For example...
Set-Content hklm:\software\xyz\abc -value 1
...could create a DWORD value by default based on the type of the value argument, while adding '-DataType String' would enable creating a string value.
Files are raw binary data, but a (BTW unreliable) method to understand what they contain has been in use since years: file extensions.
I can see no reason why in a filesystem-like representation of the Registry you cannot have a value.dwd (which is a DWORD), a value.bin (which is a BINary), value.esz, etc., or at least that is what I would use.
https://rwmj.wordpress.com/2010/02/18/why-the-windows-regist...