Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The gnu coding standards has this to say: https://www.gnu.org/prep/standards/standards.html#index-beha...

Please don’t make the behavior of a utility depend on the name used to invoke it. It is useful sometimes to make a link to a utility with a different name, and that should not change what it does.

The next section provides some reasoning:

Providing valid information in argv[0] is a convention, not guaranteed. Well-behaved programs that launch other programs, such as shells, follow the convention; your code should follow it too, when launching other programs. But it is always possible to launch the program and give a nonsensical value in argv[0].



Thanks for the link, but I find that reasoning entirely unconvincing:

The ability to interpret argv[0] as a way to avoid exec is traded for hypothetical linking - but wherever linking is used (such as for idk a mail filter or other configurable executable) that's the place where a wrapper shell script could be used instead. And avoiding improper launching of egrep/fgrep from a program that doesn't bother to build argv[0] properly is also an (unlikely) non-use case that's better handled by fixing that offending program.


I find the reasoning convincing. It's the principle of least surprise. And a bunch of other principles. I don't want a program to care what it's executable name is, if it's being invoked with a relative path, absolute path, through a symlink or hardlink or anything else.

Of course, there are exceptions to the rule, like busybox. The point is to have uniform, predictable behavior across the ecosystem and not have every other program have its own weird idiosyncrasies.

That being said, the egrep/fgrep legacy is precisely such a case where it makes sense to make an exception. It's a decades-old legacy and grep isn't just any program, it's part of Using the Shell 101.


What makes the reasoning unconvincing for me is that it's simply wrong: argv[0] is not the name of the program being invoked. If it were, I would agree, but it's not. Rather, it's simply another argument whose default value is the name of the program being invoked, but you can pass something else in place of it if you want. Moreover, what I find surprising is seeing a knob that I can adjust, but which doesn't do anything. It's like having an argv[1] parameter that is always ignored. It's much better to make it do something that logically corresponds to its value, and the same goes when the index is 0.


> argv[0] is not the name of the program being invoked [...] it's simply another argument whose default value is the name of the program being invoked

That's your opinion. Another opinion would be that it's an interface contract with the calling program to pass the path to the program being invoked in argv[0]. This contract has been established by common practice and a corresponding expectation by most programs, even if it wasn't formally specified.

I'm not necessarily taking that position, but I can see convincing reasons to do so, and as such the reasoning in the article is also convincing.


That's not an opinion, that's a fact. The only thing that's an opinion in what I said is what the final decision should be, not the fact that it's a mutable argument. Whereas their stance is based on false premises to begin with. If their opinion was "we realize arg0 can be set explicitly but too many people wrongly assume it can't, so we will follow the crowd" that would make their argument more compelling.


The fact that it can physically be set to anything doesn't mean there isn't an implicit contract.


This isn't just "I can set it to something else if I go out of my way", this is "systems provide well-documented and standardized methods for setting this parameter to something else, and there are some widely used programs using precisely this feature." It's not common but that's exactly what default parameters are for: uncommon-yet-valid use cases. Yet you seem to treat your opinion that this is an implicit contract as somehow sufficient for establishing that it's a contract, despite what I just mentioned indicating otherwise. What contract can you point to that mandates argv[0] be set to the program name?


Established common practice.

The egrep use case, btw., wouldn't deviate from that. In fact, it would rely on argv[0] being truthfully set to the sym/hard link's name.


Not every established common practice implies a contract to follow that practice. It also matters what the reason for that practice is, and whom it's even relevant to. Like just because most people take the highway when driving from SF to LA that doesn't mean you're breaching some sort of contract by opting to take a side road.

There are two practices here. One is passing the program name to arg0. The other is never making some use of arg0. Both of these are established because they're the most convenient things to do by default, and because people rarely have a reason to deviate from them. That's it. Heck, if there was some contract to ignore arg0 then people wouldn't feel any contractual obligation to pass the program name to begin with, since it should be getting ignored anyway - that argument is pretty self defeating. Moreover, I think your position is effectively equivalent to saying "if you don't want a common practice to become a contract, then you must go out of your way and inconvenience yourself to deviate from that practice for absolutely no other reason than to make this very statement true", which is a rather bizarre (and inefficient) expectation from everyone around you.


Different behavior based on argv[0] was first brought to my attention when I discovered that /bin/sh was a symlink on some Linux systems. Bash has a Bourne shell compatibility mode.


Busybox is the forefront example of this in my mind. Busybox is a single binary that provides a ton of POSIX utilities (including sh, awk, ls, cp, nc, rm, gzip, and about 300 other common ones), and these can be invoked with `busybox $command`, or a symlink may be made from the command name to the `busybox` binary to automatically do the same. Many embedded Linux systems just use busybox and a ton of symlinks to act as the majority of the userland.


GNU does it their way, and Busybox does their own way too. Both have valid reasons for how things are set up. For users friendlinesr it's important then that they are as consistent as possible with it.


Bash actually has a POSIX compatibility mode.

There is a great deal in POSIX that was not in Bourne, native arithmetic expressions being the first to come to mind, then new-style command substitution.

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V...

All of the POSIX utility standards, including the grep variants, can be found in the URL's parent directory:

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/

If this new GNU grep functionality becomes widely distasteful, I think that OpenBSD's grep tries to emulate GNU in a few ways, and could supplant it, becoming the system grep while "gnugrep" is relegated to the shadows.

It is also extremely expensive on Windows for a version of grep to be implemented as a shell script. Launched by xargs, the penalty will be severe.

The commercial platforms based upon GNU are in a marriage of convenience, and can easily pick and choose.


> Bash actually has a POSIX compatibility mode.

It's partial. A short example is `date&>FILE`.

[edited typo]

On a POSIX shell, `date` will run in the background and the date will be written to stdout (`date&`) and `FILE` will be created or truncated (`>FILE`). Using `bash --posix`, the date will be written to `FILE`, since the incompatible bashism `&>` still takes priority.


I agree that is ambiguous.

If I were writing such a script where I wanted to launch a background process, and then create/truncate a file, I myself would separate them:

  date &
  >FILE
Bash wouldn't mistake that, but a lot of shell scripts look like line noise and there are situations where bad form is required (quoted in an ssh, for example).

Obviously, people wanting the bash functionality in POSIX would:

  date > file 2>&1
Bash discourages the latter form in my man page, but anyone interested in portability knows how profoundly bad that advice is.


Yikes. This is why I do not use bash. Too complicated.

For Linux, I added tab autocomplete to dash and use that for both interactive and non-interactive shell instead of bash.

Saves keystrokes and bytes. No need to keep typing "#!/bin/sh" into every script.


You can compile dash with a line editing library, but it does not come in most packages.


I use libedit. I am used to NetBSD sh; dash is similar, derived from the same source. I wanted dash to behave more like NetBSD sh, so I changed it. Anyway, I am not a huge fan of "packages". (The problems described in this thread are one reason why.) Of course I use packaged binaries in some instances, but normally I only install them when needed, use them and then uninstall them when I am done. I prefer to compile the kernel and userland myself. That includes the shell. Simply compiling dash with libedit does not provide tabcompletion. Need to make some small changes.


I just checked OpenBSD, and I find that there are 6 total links in /usr/bin: egrep, fgrep, grep, zegrep, zfgrep, and zgrep.

The OpenBSD package is superior.

The GNU gzip package includes a zgrep shell script that is adapted from work by "Charles Levert <charles@comm.polymtl.ca>" - this is similarly adapted for bzgrep and xzgrep.

The OpenBSD implementation will have superior performance for zgrep, because it is native C.

  rebel$ ls -li /usr/bin/*grep 
  466612 -r-xr-xr-x  6 root  bin  31520 Apr 11  2022 /usr/bin/egrep
  466612 -r-xr-xr-x  6 root  bin  31520 Apr 11  2022 /usr/bin/fgrep
  466612 -r-xr-xr-x  6 root  bin  31520 Apr 11  2022 /usr/bin/grep
  466711 -r-xr-xr-x  2 root  bin  15288 Apr 11  2022 /usr/bin/pgrep
  466612 -r-xr-xr-x  6 root  bin  31520 Apr 11  2022 /usr/bin/zegrep
  466612 -r-xr-xr-x  6 root  bin  31520 Apr 11  2022 /usr/bin/zfgrep
  466612 -r-xr-xr-x  6 root  bin  31520 Apr 11  2022 /usr/bin/zgrep


> But it is always possible to launch the program and give a nonsensical value in argv[0].

Well, if you're being a smart@ss and doing that, don't expect it not to break later.

Putting a weird value there and hoping it won't break is as stupid as calling the program with wrong options and hoping it will work.


Exactly.


the name of an executable is like a global variable the instant it is used as data; it can be changed at any time by someone else and then behavior changes.

names are not data. don't make them data.

names are names.


Using argv[0] for dispatch has worked for 4+ decades. There is zero non-contrived evidence that it ever fails to work.

Further, there's 4+ decades of legacy here, and it's very rude to break backwards compatibility with something like this where there's no need, no security vulnerability that must be fixed by breaking backwards compatibility.


I'm not saying it doesn't work, I'm saying that relying on it being a certain value, or one of a set of values, is like relying on the value of a global variable.

we avoid that in software development-land whenever possible, and for good reason.


Your point is moot given that this works and has for decades, and you concede that point.

> we avoid that in software development-land whenever possible, and for good reason.

We also avoid breaking backwards compatibility unless it's really necessary. In this case, granting your premise for the sake of argument, the harm caused by this change exceeds the actual non-harm of the thing you're objecting to regardless of the theoretical badness of that thing.


So if you symlink grep to something else then you have to use `-E` for that, for the 99.999999% use case of the rest of the world egrep being a hardlink to grep would have worked just fine.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: