Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The key flaw of UNIX philosophy is destructuring deserialization and reserialization based on lines, necessitating all manner of argument escaping and field delimiters, when pipelines should be streams of typed messages that encapsulate data in typed fields. Logs especially (logging to files is a terrible idea, because it creates log rotation headaches and each program requires a log parser because of the loss of structured information) also. Line-oriented pipeline processing is fundamentally too simple. Settling on a common data simple/universal format robust enough for all purposes, including very large data sets, to exchange between programs and is just complicated enough to eliminate escapement and delimiter headaches without throwing away flexibility (by a new/refined set of processing tools/commands) is key.


It's not so much a flaw as a tradeoff. Because it's so easy to produce lines of text, virtually every program supports it, even programs written by hackers who aren't interested in following OS communication standards.


No, that's your opinion, not mine. It's a flaw because it creates escaping magic, delimiters that are in-band (spaces) and every program has to know how to parse the output of every other program, wasting human time and processing time. Structured communication, say like an open standard for a schema-less, self-describing, flexible de/serialization like protobufs/msgpack, would be far superior, more usable, more efficient and simple but no too simple to process streams of data with structure and programmability already there.

Being able to dump structured information out of an exception directly into a log, and then from a log into a database, without any loss of information or extraneous log parsing, is a clear win. Or from a command as simple as listing files ("ls") into a database or into any other tool or program. Outputing only line-oriented strings is just throwing away type information, and creates more work for everyone else, even more so than continuing to stay with lines processing tools.


Sounds like the CLI tools, GNU and otherwise, could benefit from some kind of "--format" switch to define the output in detail. I mean something like ls --format "{ \"filename\": \"%{filename}\", \"size\": %{size} }" for a JSON-style output (or whatever the user wanted).

Or even something like --format-rfc4627 filename,size,permissions-numeric to get only those fields out in a valid JSON format.

This wouldn't remove the "every program has to know how to parse the output of every other program", but I am not convinced it is needed. For instance, how would e.g. grep know what field you want to process? And does e.g. "size" carry universally the same meaning and semantics in all programs there is and can ever be? Ditto for "permissions". And what about "blacklist".

As a completely fictious toy example:

  (ls --format-rfc4627 filename,size,permissions-numeric | json-shunt -p "filename" --cmd grep "MegaCorp") | json-print -p filename,size
The fictious "json-shunt" (for lack of better name) would pass only that input-parameter to its --cmd as an input, in this case grep command, the | part would be done for things for which the grep matched, but with all the other parameters intact. So it'd print the filenames and sizes of filenames with case-sensitive "MegaCorp", and output it in JSON.

Yes, I know there are more concise and elegant ways to express the same thing of printing out file sizes and filenames of matching filenames... Also, when scripting pipelines the verbosity doesn't matter, IMO it'd actually be a benefit to be able to read the script.

Edit: fix pre-caffeine wonky redirection and brain fart


So, if you're not a fan of the UNIX philosophy, maybe check out Powershell. Or take a look at WMI and DCOM in Windows. Eschew shell scripts in favor of strongly-typed programs that process strongly-typed binary formats, or XML, or whatever. The alternatives are out there.

"Worse is better" isn't for everyone.


Nushell does not seem like a violation of the unix philosophy, or at least the version of it that I like best.

"Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface."

Perhaps I'm wrong, isn't nushell simply adding one more dimension?

Instead of a one-dimensional array of lines, the standard is now... a two-dimensional array of tabular data. Perhaps it is not strictly a "text stream" but this does not seem to me to be a violation of the spirit of the message.

Simple line-based text streams are clearly inadequate IMO. 99.99% of all Unix programs output multiple data fields on one line, and to do anything useful at all with them in a programmatic fashion you wind up needing to hack together some way of parsing out those multiple bits of data on every line.

    maybe check out Powershell
I left Windows right around the time PS became popular, so I never really worked with it.

It seems like overkill for most/all things I'd ever want to do. Powershell objects seem very very powerful, but they seemed like too much.

Nushell seems like a nice compromise. Avoids the overkill functionality of Powershell.


Semantically, a one-dimensional list/array of “things” can be trivially transposed into a series of lines with a linebreak separator, but I don’t think the same holds true for a “list of lists” (2-dimensional data) or a “list of lists of lists” (3D data) etc. At least without a standard begin-end list delimiter that allows nesting.

Just thinking about a way that perhaps an old tool can be “wrapped” to be tricked into working with 2+-dimensional data by somehow divvying up the 2+ dimension input data into concurrent 1-dimensional streams, but this seems to require a way to represent more than 1 dimension of data without breaking existing utilities (unless there was, like, a wrapper/unwrapper layer that handled this...)


It’s worth noting that Powershell is available on Linux. Objects are pretty cool. https://docs.microsoft.com/en-us/powershell/scripting/instal...


It works pretty well on the Mac now, too.


"Worse is better" is not some absolute truth or philosophy and *NIX has won: Android, MacOS, WSL.

There's no real alternative for many profesionals, if they want to be employable.

Maybe we should accept this fact and try to make everyone's life easier than be stuck up and push people away.

There's no law that says that Unix tools can't be extended.

And heck, basic POSIX tools violate so many Unix principles...


When perl came along it killed the 'Unix one tool philosophy' dead as a doornail. And since then and people have just kinda ignored the smell coming off the rotting corpse.

I don't write complex scripts in shell anymore because it's insanity. But ad-hoc loops and crap like that... hell yeah. At least a few a day. Sometimes dozens.

People need to be reminded, I think, that shell isn't a programming language first. It's a user interface. And when I look at Powershell scripts and other things of that nature and think about living in Powershell day in and day out I don't see the big pay-off over something like fish.

'Is this going to make it so I can get my work done faster?'

'Is this going to be more pleasant to use as a primary interface for a OS?'

When I go into fish or zsh and use curl to grab json and 'jq' to get a idea of how to use the API in python or node...

versus running 'curl' in powershell against some random API I have never used before..

I get the distinct impression that 'This is fucking garbage' in that It would take me a lot longer to figure out how to use powershell in a useful way then the time I would save by doing so in the long run.


The irony is that the very attempt to be one tool for everything caused Perl's own destruction. Perl 5 is still used by some veterans for small scripts but who wants to use Perl 6?

Unix follows the KISS principle, and that is key for success. Albert Einstein said: "Keep things as simple as possible but not too simple". In that sense Unix and Posix are well done. However, that doesn't mean that good ideas like Nushell are not welcome.


I think the failure of Perl 6 was caused by a lack of community building and implementation leadership, not by trying to be too many things at once.


Yeah, I tried using Powershell as my shell and that's when I found out Powershell is more about the scripting language than an optimized shell used for everyday use. I was confronted with this almost immediately because one of the things I rely on most in Bash is 'dirs -v', pushd and popd. I have dirs aliased to dirs -v and I usually have 15-20 paths on my stack. I'll leave implementing the same functionality in Powershell as a user exercise.


Im confused... Why doesn't Push-Location (aliased to pushd by default), Pop-Location (aliased to popd by default), and Get-Location -Stack (trivially aliasable to dirs) not work? You can even have multiple location stacks and, if you care about the registry, you can also use the same commands to handle registry navigation as well.


‘dirs -v’ shows an enumerated list of directories. If I follow that up with pushd +11 I will be taken to the 11th path listed in the prior dirs command. As far as I know this isn’t implemented out of the box in PS


An addition to `{Push,Pop}-Location` - `Get-Location -Stack` will show the current directory stack. I don't know offhand of a way to get the equivalent of `pushd +11` though.


Not out of the box but I wrote a module[0] that gives you `cd+ 11` (and `dirs -v`).

[0] https://github.com/nickcox/cd-extras#navigate-even-faster


What about JSON vs protobufs? Is there a schemaless system that you use?


If GNU decided tomorrow that all utilities need to have a --json output argument then that would make me a very happy person.


Better than nothing, but the problem with that is that you can't construct a deep, lazily evaluated pipeline. JSON can't be outputted until all the data is collected.


There's a streaming format which is used for this: JSON Lines [1] AKA newline-delimited JSON [2].

[1] http://jsonlines.org/

[2] http://ndjson.org/


That still limits us too much due to the rather severely limited available data types (although arbitrary types could be serialized into binary/string data, I guess...)


Funny that you should mention this. I just hit that problem yesterday. The lack of a binary type is a problem. This is the same thing that hit XML. Unfortunately (or fortunately), the best solution is still the MIME stuff.


AFAIK protobufs are not schema-less.


Yeah, those sentences are too close together.

I'm trying to elicit more of a response from the comment author. I think they make some good points. I would like to learn more about their ideal system.


I was just looking at MessagePack yesterday (researching best ways to encode an event stream) and was very impressed.


Yes, I agree that it's a tradeoff. Although that tradeoff was made when memory was sparse and expensive, a modern shell should certainly be more structured - e.g. like PowerShell or Nushell here.


I can understand your frustration.

However, the unstructuring of data and keeping a single form of communication is what has kept UNIX around for so long and why Linux hasn't collapsed under its own weight long ago.

I was voicing your opinions exactly when I was new to the professional computing world. Over time I saw a lot of structured data schemes come and go and they all fall down with this: inflexibility and improper implementations.

Where do you think does the structure inside the data come from? You need to come up with a standard. Now you need everyone to stick to that standard. You need to parse the entire data or at least most of it to get to the information you need, finding new security footguns on the way. Soon you will realise you need to extend the standard. Now you have not only conflicting implementations (because noone ever gets them completely right) but conflicting versions.

And this needs to be done for every single fucking little filter and every single language they are written in.

Take a look at the various implementations of DER and ASN.1. The standard seems simple at first glance, but I haven't seen a single implementation that wasn't wrong, incomplete or buggy. Most of them are all of that. And DER is a very old standard that people should have understood in the meantime.

In order to get at least a bit of sanity in all of this, you need a central source of The Truth wrt. your standard. Who is that going to be for Linux and the BSDs and for macOS? Linus? Microsoft? Apple? The ISO standards board? You?

And all of this is equally true for log data.

I'm okay with tossing backwards compatibility over board. But not in favour of the horrible nightmare some people here seem to propose.


It was a decision that lead to success. Before Unix most filesystems were record oriented. Unix's philosophy of treating everything as a stream of char was different and very successful.


> The key flaw of UNIX philosophy is destructuring deserialization and reserialization based on lines

not so sure about your assertion there eugene :o)

interfacing with funk formatted lines of text is waay more easier because (imho) of the 'wysiayg' principle, and encourages, for lack of a better term, 'tinkering'

> Settling on a common data simple/universal format robust enough for all purposes...

output from 'ls, gcc, yacc/bison, lex/flex, grep, ...' should all follow the same fmt, sure...the good thing about standards is that we don't have enough of them to choose from, f.e. xml, sgml, s-expressions, json, csv, yaml, ...

having said that, when dealing with copious volumes of data, structure doesn't hurt, but in such cases, my guess is, there are only a handful of data-producers, and data-consumers for automatic consumption, and are all tightly controlled.


I used to think this, and am still sympathetic to the motivation behind this. But I've since changed my mind when I saw the challenges faced in Big Data environments. Structured data is nice at first when you get into it. But the need for data prep stubbornly will not go away. If only because how your stakeholders want the data structured only sometimes aligns with the pre-defined structure. And when you get those numerous requests for a different structure, you're right back to data munging, and wrangling lines of text.

In my limited experience, you're either taking the cognitive hit to learn how to grapple with line-oriented text wrangling, or taking the cognitive hit to learn how to grapple with line-oriented text wrangling AND on top of that a query language to fish out what you want from the structure first before passing it to the text wrangling. I'd sure like a general solution though, if you have a way out of this thicket.


I don't think I'd go as far as typed messages as that doesn't strike me as a simple or light-weight format. Something like JSON is almost good enough, but perhaps something even simpler/lighter would be ideal here.


JSON should have included comments from day 1. Big oops.


Comments are great for human-edited config files, but not a clear win for an stdout/in format.

The trouble is, comments are normally defined as having no effect, not part of the data model. But iff the comments contain useful information, how do you pick it out using the next tool? Imagine having to extend `jq` with operators to select "comment on line before key Foo" etc... And if it is extractable, are they still comments?

How do you even preserve comments from input to output, in tools like grep, sort, etc.

XML did make comments part of its "dataset". That is, conforming parsers expose them as part of the data. Similarly for whitespace (though it's only meaningful in rare cases like pre tag but that's up to the tool interpreting data), and some other things that "should not matter" like abbreviated prefixes used for XML namespaces). This does allow round-tripping, but complicates all tools, most importantly by disallowing assumptions that some aspect never matters, e.g. that's it's safe re-indent.

I'd argue the depest reason JSON won over XML is that XML's dataset was so damn complicated.


It used to. They were removed later


How should logging be done?


Nothing prevents the output from having more than 1 stream (stdout, stderr, etc), or to use either for logging.


> /dev/null


is "destructuring deserialization and reserialization based on lines" really an aspect of the UNIX philosophy or just an implementation detail? I thought it was more about doing one thing and doing it well [1]. It could be argued that nushell follows the UNIX philosophy.

[1] https://en.wikipedia.org/wiki/Unix_philosophy


“This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.”

Doug McIlroy (2003). The Art of Unix Programming: Basics of the Unix Philosophy

That said, I’m looking forward to testing our nushell!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: