Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The compelling idea here is that they convert the output of common shell commands into tabular data that can be manipulated using common operators, so that you don't have the remember sorting/filtering/grouping flags that may be different for every different shell command. So, imagine being able to use the same sorting/filtering logic on the output of `ls` as you might on the output of `ps`, and without relying on hacky solutions like pipelines to `grep`, `cut`, and `sort`.

It also means shell command output can be easily transposed into JSON and CSV. Actually pretty clever!



I've been delving into bash scripting a bit more than I'd like as of late and the lack of universally available consistent structured output for the CLI really got to me. Most of the script contents end up being these obfuscating and brittle "hacky solutions" that never should have been necessary. When I thought about pursuing fixing this I felt like the task was a bit overwhelming. I'm delighted that these developers are working on this!


The key flaw of UNIX philosophy is destructuring deserialization and reserialization based on lines, necessitating all manner of argument escaping and field delimiters, when pipelines should be streams of typed messages that encapsulate data in typed fields. Logs especially (logging to files is a terrible idea, because it creates log rotation headaches and each program requires a log parser because of the loss of structured information) also. Line-oriented pipeline processing is fundamentally too simple. Settling on a common data simple/universal format robust enough for all purposes, including very large data sets, to exchange between programs and is just complicated enough to eliminate escapement and delimiter headaches without throwing away flexibility (by a new/refined set of processing tools/commands) is key.


It's not so much a flaw as a tradeoff. Because it's so easy to produce lines of text, virtually every program supports it, even programs written by hackers who aren't interested in following OS communication standards.


No, that's your opinion, not mine. It's a flaw because it creates escaping magic, delimiters that are in-band (spaces) and every program has to know how to parse the output of every other program, wasting human time and processing time. Structured communication, say like an open standard for a schema-less, self-describing, flexible de/serialization like protobufs/msgpack, would be far superior, more usable, more efficient and simple but no too simple to process streams of data with structure and programmability already there.

Being able to dump structured information out of an exception directly into a log, and then from a log into a database, without any loss of information or extraneous log parsing, is a clear win. Or from a command as simple as listing files ("ls") into a database or into any other tool or program. Outputing only line-oriented strings is just throwing away type information, and creates more work for everyone else, even more so than continuing to stay with lines processing tools.


Sounds like the CLI tools, GNU and otherwise, could benefit from some kind of "--format" switch to define the output in detail. I mean something like ls --format "{ \"filename\": \"%{filename}\", \"size\": %{size} }" for a JSON-style output (or whatever the user wanted).

Or even something like --format-rfc4627 filename,size,permissions-numeric to get only those fields out in a valid JSON format.

This wouldn't remove the "every program has to know how to parse the output of every other program", but I am not convinced it is needed. For instance, how would e.g. grep know what field you want to process? And does e.g. "size" carry universally the same meaning and semantics in all programs there is and can ever be? Ditto for "permissions". And what about "blacklist".

As a completely fictious toy example:

  (ls --format-rfc4627 filename,size,permissions-numeric | json-shunt -p "filename" --cmd grep "MegaCorp") | json-print -p filename,size
The fictious "json-shunt" (for lack of better name) would pass only that input-parameter to its --cmd as an input, in this case grep command, the | part would be done for things for which the grep matched, but with all the other parameters intact. So it'd print the filenames and sizes of filenames with case-sensitive "MegaCorp", and output it in JSON.

Yes, I know there are more concise and elegant ways to express the same thing of printing out file sizes and filenames of matching filenames... Also, when scripting pipelines the verbosity doesn't matter, IMO it'd actually be a benefit to be able to read the script.

Edit: fix pre-caffeine wonky redirection and brain fart


So, if you're not a fan of the UNIX philosophy, maybe check out Powershell. Or take a look at WMI and DCOM in Windows. Eschew shell scripts in favor of strongly-typed programs that process strongly-typed binary formats, or XML, or whatever. The alternatives are out there.

"Worse is better" isn't for everyone.


Nushell does not seem like a violation of the unix philosophy, or at least the version of it that I like best.

"Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface."

Perhaps I'm wrong, isn't nushell simply adding one more dimension?

Instead of a one-dimensional array of lines, the standard is now... a two-dimensional array of tabular data. Perhaps it is not strictly a "text stream" but this does not seem to me to be a violation of the spirit of the message.

Simple line-based text streams are clearly inadequate IMO. 99.99% of all Unix programs output multiple data fields on one line, and to do anything useful at all with them in a programmatic fashion you wind up needing to hack together some way of parsing out those multiple bits of data on every line.

    maybe check out Powershell
I left Windows right around the time PS became popular, so I never really worked with it.

It seems like overkill for most/all things I'd ever want to do. Powershell objects seem very very powerful, but they seemed like too much.

Nushell seems like a nice compromise. Avoids the overkill functionality of Powershell.


Semantically, a one-dimensional list/array of “things” can be trivially transposed into a series of lines with a linebreak separator, but I don’t think the same holds true for a “list of lists” (2-dimensional data) or a “list of lists of lists” (3D data) etc. At least without a standard begin-end list delimiter that allows nesting.

Just thinking about a way that perhaps an old tool can be “wrapped” to be tricked into working with 2+-dimensional data by somehow divvying up the 2+ dimension input data into concurrent 1-dimensional streams, but this seems to require a way to represent more than 1 dimension of data without breaking existing utilities (unless there was, like, a wrapper/unwrapper layer that handled this...)


It’s worth noting that Powershell is available on Linux. Objects are pretty cool. https://docs.microsoft.com/en-us/powershell/scripting/instal...


It works pretty well on the Mac now, too.


"Worse is better" is not some absolute truth or philosophy and *NIX has won: Android, MacOS, WSL.

There's no real alternative for many profesionals, if they want to be employable.

Maybe we should accept this fact and try to make everyone's life easier than be stuck up and push people away.

There's no law that says that Unix tools can't be extended.

And heck, basic POSIX tools violate so many Unix principles...


When perl came along it killed the 'Unix one tool philosophy' dead as a doornail. And since then and people have just kinda ignored the smell coming off the rotting corpse.

I don't write complex scripts in shell anymore because it's insanity. But ad-hoc loops and crap like that... hell yeah. At least a few a day. Sometimes dozens.

People need to be reminded, I think, that shell isn't a programming language first. It's a user interface. And when I look at Powershell scripts and other things of that nature and think about living in Powershell day in and day out I don't see the big pay-off over something like fish.

'Is this going to make it so I can get my work done faster?'

'Is this going to be more pleasant to use as a primary interface for a OS?'

When I go into fish or zsh and use curl to grab json and 'jq' to get a idea of how to use the API in python or node...

versus running 'curl' in powershell against some random API I have never used before..

I get the distinct impression that 'This is fucking garbage' in that It would take me a lot longer to figure out how to use powershell in a useful way then the time I would save by doing so in the long run.


The irony is that the very attempt to be one tool for everything caused Perl's own destruction. Perl 5 is still used by some veterans for small scripts but who wants to use Perl 6?

Unix follows the KISS principle, and that is key for success. Albert Einstein said: "Keep things as simple as possible but not too simple". In that sense Unix and Posix are well done. However, that doesn't mean that good ideas like Nushell are not welcome.


I think the failure of Perl 6 was caused by a lack of community building and implementation leadership, not by trying to be too many things at once.


Yeah, I tried using Powershell as my shell and that's when I found out Powershell is more about the scripting language than an optimized shell used for everyday use. I was confronted with this almost immediately because one of the things I rely on most in Bash is 'dirs -v', pushd and popd. I have dirs aliased to dirs -v and I usually have 15-20 paths on my stack. I'll leave implementing the same functionality in Powershell as a user exercise.


Im confused... Why doesn't Push-Location (aliased to pushd by default), Pop-Location (aliased to popd by default), and Get-Location -Stack (trivially aliasable to dirs) not work? You can even have multiple location stacks and, if you care about the registry, you can also use the same commands to handle registry navigation as well.


‘dirs -v’ shows an enumerated list of directories. If I follow that up with pushd +11 I will be taken to the 11th path listed in the prior dirs command. As far as I know this isn’t implemented out of the box in PS


An addition to `{Push,Pop}-Location` - `Get-Location -Stack` will show the current directory stack. I don't know offhand of a way to get the equivalent of `pushd +11` though.


Not out of the box but I wrote a module[0] that gives you `cd+ 11` (and `dirs -v`).

[0] https://github.com/nickcox/cd-extras#navigate-even-faster


What about JSON vs protobufs? Is there a schemaless system that you use?


If GNU decided tomorrow that all utilities need to have a --json output argument then that would make me a very happy person.


Better than nothing, but the problem with that is that you can't construct a deep, lazily evaluated pipeline. JSON can't be outputted until all the data is collected.


There's a streaming format which is used for this: JSON Lines [1] AKA newline-delimited JSON [2].

[1] http://jsonlines.org/

[2] http://ndjson.org/


That still limits us too much due to the rather severely limited available data types (although arbitrary types could be serialized into binary/string data, I guess...)


Funny that you should mention this. I just hit that problem yesterday. The lack of a binary type is a problem. This is the same thing that hit XML. Unfortunately (or fortunately), the best solution is still the MIME stuff.


AFAIK protobufs are not schema-less.


Yeah, those sentences are too close together.

I'm trying to elicit more of a response from the comment author. I think they make some good points. I would like to learn more about their ideal system.


I was just looking at MessagePack yesterday (researching best ways to encode an event stream) and was very impressed.


Yes, I agree that it's a tradeoff. Although that tradeoff was made when memory was sparse and expensive, a modern shell should certainly be more structured - e.g. like PowerShell or Nushell here.


I can understand your frustration.

However, the unstructuring of data and keeping a single form of communication is what has kept UNIX around for so long and why Linux hasn't collapsed under its own weight long ago.

I was voicing your opinions exactly when I was new to the professional computing world. Over time I saw a lot of structured data schemes come and go and they all fall down with this: inflexibility and improper implementations.

Where do you think does the structure inside the data come from? You need to come up with a standard. Now you need everyone to stick to that standard. You need to parse the entire data or at least most of it to get to the information you need, finding new security footguns on the way. Soon you will realise you need to extend the standard. Now you have not only conflicting implementations (because noone ever gets them completely right) but conflicting versions.

And this needs to be done for every single fucking little filter and every single language they are written in.

Take a look at the various implementations of DER and ASN.1. The standard seems simple at first glance, but I haven't seen a single implementation that wasn't wrong, incomplete or buggy. Most of them are all of that. And DER is a very old standard that people should have understood in the meantime.

In order to get at least a bit of sanity in all of this, you need a central source of The Truth wrt. your standard. Who is that going to be for Linux and the BSDs and for macOS? Linus? Microsoft? Apple? The ISO standards board? You?

And all of this is equally true for log data.

I'm okay with tossing backwards compatibility over board. But not in favour of the horrible nightmare some people here seem to propose.


It was a decision that lead to success. Before Unix most filesystems were record oriented. Unix's philosophy of treating everything as a stream of char was different and very successful.


> The key flaw of UNIX philosophy is destructuring deserialization and reserialization based on lines

not so sure about your assertion there eugene :o)

interfacing with funk formatted lines of text is waay more easier because (imho) of the 'wysiayg' principle, and encourages, for lack of a better term, 'tinkering'

> Settling on a common data simple/universal format robust enough for all purposes...

output from 'ls, gcc, yacc/bison, lex/flex, grep, ...' should all follow the same fmt, sure...the good thing about standards is that we don't have enough of them to choose from, f.e. xml, sgml, s-expressions, json, csv, yaml, ...

having said that, when dealing with copious volumes of data, structure doesn't hurt, but in such cases, my guess is, there are only a handful of data-producers, and data-consumers for automatic consumption, and are all tightly controlled.


I used to think this, and am still sympathetic to the motivation behind this. But I've since changed my mind when I saw the challenges faced in Big Data environments. Structured data is nice at first when you get into it. But the need for data prep stubbornly will not go away. If only because how your stakeholders want the data structured only sometimes aligns with the pre-defined structure. And when you get those numerous requests for a different structure, you're right back to data munging, and wrangling lines of text.

In my limited experience, you're either taking the cognitive hit to learn how to grapple with line-oriented text wrangling, or taking the cognitive hit to learn how to grapple with line-oriented text wrangling AND on top of that a query language to fish out what you want from the structure first before passing it to the text wrangling. I'd sure like a general solution though, if you have a way out of this thicket.


I don't think I'd go as far as typed messages as that doesn't strike me as a simple or light-weight format. Something like JSON is almost good enough, but perhaps something even simpler/lighter would be ideal here.


JSON should have included comments from day 1. Big oops.


Comments are great for human-edited config files, but not a clear win for an stdout/in format.

The trouble is, comments are normally defined as having no effect, not part of the data model. But iff the comments contain useful information, how do you pick it out using the next tool? Imagine having to extend `jq` with operators to select "comment on line before key Foo" etc... And if it is extractable, are they still comments?

How do you even preserve comments from input to output, in tools like grep, sort, etc.

XML did make comments part of its "dataset". That is, conforming parsers expose them as part of the data. Similarly for whitespace (though it's only meaningful in rare cases like pre tag but that's up to the tool interpreting data), and some other things that "should not matter" like abbreviated prefixes used for XML namespaces). This does allow round-tripping, but complicates all tools, most importantly by disallowing assumptions that some aspect never matters, e.g. that's it's safe re-indent.

I'd argue the depest reason JSON won over XML is that XML's dataset was so damn complicated.


It used to. They were removed later


How should logging be done?


Nothing prevents the output from having more than 1 stream (stdout, stderr, etc), or to use either for logging.


> /dev/null


is "destructuring deserialization and reserialization based on lines" really an aspect of the UNIX philosophy or just an implementation detail? I thought it was more about doing one thing and doing it well [1]. It could be argued that nushell follows the UNIX philosophy.

[1] https://en.wikipedia.org/wiki/Unix_philosophy


“This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.”

Doug McIlroy (2003). The Art of Unix Programming: Basics of the Unix Philosophy

That said, I’m looking forward to testing our nushell!


You can also just use PowerShell. Its chief crime is its verbosity in naming, but it has the "universally available consistent structured output for the CLI" you're asking for.


I spent a lot of time in Powershell on Windows. Almost 3-4 years. I simply could not use it as a shell.

I enjoyed converting our many build scripts into Powershell, but using it as a shell was just uncomfortable.

It's not clear to me why. If I had to guess, besides the verbosity, it was the additional "indirection". Nothing was what it appeared on the screen. It was something else.

Bash felt far more comfortable as a daily shell because what you see is what you get.

I want to try Nu because it appears to combine that WYSIWYG simplicity of bash with the Powershell quality of passing data rather than text. Maybe tables are indeed a better data structure to use than objects.


PowerShell is more like interactive Python with very smart tab completion than "bash with objects." Basic conveniences like "grep" require some memorized scripting gymnastics because the commands got generalized a little too much.

For regular shell usage it's a bit painful because the abstractions are a bit too far removed from the file system you're operating on. But for scripting, such as managing and configuring systems, it's remarkably fluid.

The "nothing is as it appears" problem never bothered me much because it's still just data. You just specify the output when you're done crunching it. IMO being able to operate with typed data is really nice because you don't have to worry (nearly as much) about serialization between commands.


This is not true. If you want, you can literally pipe PowerShell output into grep (or select-string, which does the same) and it will filter by the tabular representation of each row. That can work just fine in interactive use cases and while it is less accurate and performant, it is the same bash.

The key to success in using PowerShell as an interactive shell is to memorize aliases and not caring about casing and spelling everything out. PowerShell is case-insensitive, has aliases (e.g. ? instead of where-object, or % foo to get the foo property of every object) and will accept abbreviated command and parameter names if the abbreviation is unambiguous. It is also useful to know about the different formatting cmdlets: mostly format-table (ft), format-list (fl), and that you can pick properties with -p (-Property). While PowerShell usually does the right thing, with these you can get the most appropriate formatting for any use case.


(Disclaimer: never used powershell, so unadulterated opinion here)

Scripting is programming, so you want data structures. For interaction, you want human-readable syntax.

e.g. programming languages mostly have a syntax: C, python, bash, fortran (exception: lisp), but they operate on data structures.

We can see powershell, nushell and other structured shells (e.g. some use XML) as an attempt to "get out of the syntax business", which XML began and JSON finished (https://blog.mlab.com/2011/03/why-is-json-so-popular-develop...).

This has been taken further into human read/write domains... with mixed reception: ant, XSLT, json as a config format. After all, parsing and "the syntax business" has a practical purpose.


> It's not clear to me why

I commented on this earlier but the defectiveness of PS as a shell hit me almost immediately when I tried to recreate my workflow of 'dirs -v', pushd and popd with a large number of paths on the stack.


Is it actually useful outside Windows? I know the latest version is now portable because of .net core, but I haven't tried it on *nix.


Yeah I'm using it as my primary shell on macOS. You can do everything you do with bash, but with objects, autocompletion etc


The only thing I cant find in PS is fishlike predictability when using it as a shell.


Fascinating. Is there a command for getting processes for example or do you need to use ps?


Naw its chief crime is not having `&&`


Im speaking from a position of naivete, so dont attack me;

Would an ideal situation be “do + while”

Whereas one may write

While x is true do “blah blah” && if output is still true then end elif do &&?

Or is that basically how it already works and we just have shitty interactions with the machines to get them too do the above (unless youre a more seniorbash person ((its one of my career regrets that i didnt spend a ton of time bash scripting)))


There are ways around it but nothing as easy as `&&` in other shells


pwsh 7 is adding it


The chief crime of PowerShell is actually how long it takes for a prompt to appear after starting it - especially on Windows. I still use it as a login shell on MacOS because it offers useful features, but it's (to put it bluntly) slow as hell to start.


I've generally found you can get most of what you need done with head, tail, grep, cut, paste, sed and tr (and maybe a couple others I'm missing). Oh, and test (both bash's and coreutil's). It can be a little hacky, but once you realize the crazy output you sometimes have to parse from third party programs, you realize that the ability to handle any free form text is pretty essential.

That said, since I'm a Perl developer, I'll generally just go straight to perl's inlined implicit loop mode (-p or -n) if I need to use more than a couple of those utils, unless I want it to be portable with a minimum of requirements (as even though more binaries are used in the shell version, they're all essentially guaranteed to exist on a Linux/Unix).


I agree. The idea of Mario is to use Python objects instead of text lines. This means it's much more structured and has a familiar syntax as well.

https://github.com/python-mario/mario


Thats when you switch to a real programming language. Relying on output from other programs will always be brittle - as error messages etc will change every new dist release.


Yeah just iterating through a directory with filenames with spaces..


I think they took one step in the right direction, where two more were needed.

The UNIX idea of piping text output from one command to the next was fine at the time, but it means that the downstream command has to parse text. With nushell, you are parsing slightly more structured text.

The two steps they could have taken in addition:

1. Instead of generating rows of a table, why not lists of strings? E.g. ("abc", "def", 123) instead of "abc | def | 123". Much easier to deal with in the command receiving this input.

2. And instead of just lists, why not objects? E.g. a process object, a file object, or, if you'd like a list as above.

I have implemented this idea: https://github.com/geophile/osh. For example, if you want to find processes in state 'S' with more than 3 descendents, and print the pid and command line of each, sorted by pid:

    osh ps ^ select 'p: p.state == "S" and len(p.descendents) > 3' ^ f 'p: (p.pid, p.commandline)'  ^ sort $
^ is used to denote piping. ps yields Process objects. Select examines those processes and keeps those with the right characteristics. Then f applies a function, mapping to the the pid and commandline in a list, and then sort sorts the lists, by pid (the first field of each list). $ does output.


The "abc | def | 123" is just a textual display of the data, not the data itself. Each row of the table is a structured record, like you suggest.


I think in nushell and powershell you are parsing objects.

'where' in pwsh / nush literally looks for keys with particular values.


I agree objects are way better than text for manipulating structured data. Python is great at this, so I use Mario to do it in the shell. https://github.com/python-mario/mario

For example, to sort filenames by length

    $ ls | mario apply 'sorted(x, key=len)' chain
    doc
    src
    bench
    extra
    AUTHORS
    LICENSE


Seems its only one level deep, no tables of tables are shown. Powershell has a lot of issues but one cool thing is you can access output as nested object data. Seems like the logical evolution.


You can have tables of tables. To get to the inner table data, you can use the 'get' command to retrieve it and work with it.


They have an example in their video.


This is pretty much what I've idealized for a new shell. Where every program can just output a standard table of information for each action. No need to have the program itself be queried for it every time, or find all the right options. You just get all the data, and filter what you want programaticly.


> You just get all the data, and filter what you want programaticly.

There are definitely cases when you're better off with producer-side filtering. For example, `find` allows you to not traverse certain directories with -prune, which might save a lot of resources.


PowerShell has a lot of "corrections" of this kind, to get producer-side filtering. e.g. Getting all Windows Eventlogs, then filtering client side turned out to be quite bad, so Get-EventLog became Get-WinEvent with a lot more filtering controls.

Cmdlets which get data, like Get-AdUser or Get-MessageTrackingLog in Exchange, or Get-CimInstance often have a lot of common filtering parameters as well as a more general -Filter or -Query "build your own".

Big source of annoyance where you can't do that, but might want to.


It also generally applies to concept of wildcards/globs not just to file paths, but many parameters. So a cmdlet like Get-User with a -Name parameter to get a user with a certain name can also be called with John* to get all users with first name John. In my experience the filtering options are pretty consistent


Do they actually convert the output of the existing commands? Or are they reimplementing then one by one? It looks like the latter.

In their examples it looks like 'ls' is built-in to their shell instead of from e.g. coreutils.


I love the core idea of nushell, but reimplementing them seems rather insane, given the incalcuable number of programmer-hours that have gone into coreutils.

That seems like a real mistake, rather than simply having nushell's "ls" be a wrapper for coreutils "ls -la" or some such.

I understand the benefit of reimplementing everything from scratch, as that way you have a more consistent nushell experience everywhere, regardless of which version of "native" ls your system has. And allowing "^ls" is a useful and necessary hedge.

But, wow reimplementing nearly everything seems like an enormous undertaking.

(It is of course possible that I'm completely misunderstanding things! Perhaps the devs can comment?)


Check the ls man page and remove anything related to sorting, filtering, or formatting information (as under this system all this functionality is provided in a shared and generic way). There is not much left.


Coreutils has been reimplemented N times: BSD, GNU, BusyBox, the Rust one, countless others I'm forgetting. It's doable.


ls is a shell builtin. When he types ^ls he uses the one from coreutils.


This idea is already preset in xonsh


How do you think it compares to xonsh? I ask because I had been thinking recently about investing some effort into xonsh and now nushell has appeared. I’d be very curious what any serious xonsh users think about this newcomer.


There're already numerous ways to do this without rewriting the world. For instance

https://github.com/dkogan/vnlog#powershell-style-filtering-o...


> So, imagine being able to use the same sorting/filtering logic on the output of `ls` as you might on the output of `ps`, and without relying on hacky solutions like pipelines to `grep`, `cut`, and `sort`

That would awksome.


Seems similar to the purpose of `osquery` which is SQL based.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: