> When using fread, some implementations use the entire buffer as a temporary work space, even if the returns a length less than the entire buffer. So the following won’t work reliably:
This doesn't work reliably because the arguments are wrong. It's "fread(buf, size_of_item, number_of_items, file_ptr)". You're telling it to read one item of size N-1. If there's not enough data to read for a complete item, the rest of the "item" might be written with garbage. If you do it the right way around - reading N-1 items of size 1, you actually get the result you expect.
Is this an unnecessary footgun in the standard C library API? Probably yes. But if you complain about it, I really rather you complain about the actual problem.
Other than this, there's a few things I would agree with, and then a few other things that are issues with _specific_ standard C libraries, e.g. assert just exiting the process. Mixing C library API / POSIX problems with specific C library problems isn't particularly great for an article like this.
Anyway, the article registers appropriately "uncooked" for someone reinventing pkg-config; as far as my personal "reputation database" is concerned this is strike 2 for the author.
Think calloc()---it takes two size parameters, one the size of the item, and one the number of items. In theory, this could avoid an overflow:
items = fread(array,sizeof(array[0]),count,fp);
vs:
items = read(fh,array,sizeof(array[0]) * count);
That multiple could overflow. fread() (and fwrite()) return the number of items read, not the number of bytes (or characters). The standard also points out that if size is 0, or nmemb is 0, nothing is done (and 0 is returned).
I agree it's a weird API. Not sure where it comes from, maybe some '70s thing with reading blocks?
However, this functions as a red flag for the article itself, as in: in an article dismissing something with very little nuance, finding one of the arguments exhibit a lack of good understanding of that thing raises an interrupt for me that its other arguments may also be flawed.
The article overall just rubs me as hubris. If you go about dismissing decades of engineering history, you better get your facts right. It's easy to write things like this when you're at the wrong end of the Dunning-Kruger bathtub, so you have to show you are in fact at the right end of it. (While in the middle, people tend to not write articles like this in my experience.)
> I agree it's a weird API. Not sure where it comes from, maybe some '70s thing with reading blocks?
Probably, yes. In the earliest versions of C, stdio hadn’t been invented yet, and all IO was done using Unix file descriptors - which was fine, because C only ran on Unix. But, they were keen to prove the language was cross-platform, and Bell Labs used both IBM and Honeywell mainframes, so they started porting C to those two platforms, as a proof-of-concept of its portability. And there they ran into a big problem-mainframe IO was radically different from Unix, and the file descriptor API was rather ill-suited to it. So, Mike Lesk invented a “portable IO package”, which abstracted over the differences between mainframe and Unix IO-and that package was the ancestor of stdio.
And I think that explains why fread() is designed the way it is. IBM mainframe IO is fundamentally record-oriented [0] (and probably Honeywell was too, although I know far less about it). To Unix, reading 10 records of 80 bytes each, 80 records of 10 bytes each, or 1 record of 800 bytes each - the three are largely equivalent. Not so under MVS-if, when you create a file, you declare it as being composed of 80 byte fixed length records, the OS will force all reads/writes to be multiples of 80 bytes-hence why fread() makes you declare the record size, since it is much more essential information on that platform.
[0] Historically speaking; nowadays z/OS, the successor to MVS, supports Unix-style byte-oriented IO as well as classic mainframe record-oriented IO; but back in the 70s, those Unix compatibility features were still 20 years away. And classic mainframe apps still predominantly rely on record-based IO
> as far as my personal "reputation database" is concerned this is strike 2 for the author.
Wow, so cocky.
> If you go about dismissing decades of engineering history, you better get your facts right.
I find most of the opinions on that page are well explained, and they aren't spectacularly bold anyway. You will be hard pressed finding people that think locales are usable and most of string.h is not historical baggage.
I also learned something new, for example the thing about isXXX() taking unsigned values.
> finding one of the arguments exhibit a lack of good understanding of that thing raises an interrupt for me that its other arguments may also be flawed.
You could try the thing about reading in a benevolent way. Don't choose the interpretation that would most upset you.
From what I can see you found one nit where the author might not have a complete understanding on the issue, or maybe they just swapped the two arguments. (I'm not quite sure what that explaination was about, and I didn't bother to research. You're right of course about how fread should be used). And then you went back to HN to tear the article in pieces (btw. I find the post to be well above average quality).
As far as I'm concerned, this is strike 1 on my list.
> It's easy to write things like this when you're at the wrong end of the Dunning-Kruger bathtub
Amusingly, only the most simplistic graphs of the Dunning-Kruger efect would portray it as matching a bathtub curve. In particular the rather slow slope of confidence rising as competence improves is dramatically (and significantly) different from the sharp drop-off of false confidence. Rather than two steep sides, there is a marked asymmetry.
I suspect that you may have misunderstood the author's point though. Even if you swap the arguments as you suggest, you still run into this problem described in the link that the author provides:
When used on a text mode stream, if the amount of data requested (that is, size \* count)
is greater than or equal to the internal FILE \* buffer size (by default the size is 4096 bytes,
configurable by using setvbuf), stream data is copied directly into the user-provided buffer,
and newline conversion is done in that buffer. Since the converted data may be shorter than the
stream data copied into the buffer, data past buffer[return_value \* size]
(where return_value is the return value from fread) may contain unconverted data from the file.
For this reason, we recommend you null-terminate character data at buffer[return_value \* size]
if the intent of the buffer is to act as a C-style string.
i.e. his complaint is that if he initialises the buffer to zeros and reads N bytes into it there is the possibility that zeros after the N bytes are overwritten.
This is distinct from the problem of not getting enough information back if you swap the argument order, and as it matches his decription in the text before the link I would assume that the swapped arguments are actually a typo rather than a misunderstanding of the API.
I have heard from somewhere that there once was an idea of making computers work in terms of objects, instead of streams. Ie, the storage layer would store individual objects, not bytes. And that the sizeof object, count of objects, params in traditional read and write functions comes from that.
Would it even matter? It's regularly used to describe a common phenomenon. You understood what the Parent Comment was describing, as did I. For that reason alone, it's effective communication.
I think the author does want fread() to overwrite N-1 bytes, iff there are N-1 bytes to read.
The objection is to overwriting N-1 bytes when only N-2 bytes were read.
Which... doesn't sound completely unreasonable?
On the other hand (especially with SIMD CPUs, but maybe also with simpler instruction sets) I can understand that copying blocks of e.g. 8 bytes at a time might be faster than the special case at the end of a read for copying only 7 bytes. (Because you need to follow an "unlikely" branch to get to the 7-byte code which stalls the CPU, maybe.) So if the output buffer has space for it, it might be more performant to copy 8 bytes including a junk byte and just say you copied 7, than copying only 7 bytes.
And the author does seem to object to a number of other design decisions which could hurt performance.
The article is indeed rather fuzzy about what "libc" is. It seems to exclude "things provided by the compiler" while talking about varargs, but later calls out atomics with no such note.
The subject is messy. For example; traditionally memset() and memcpy() are clib functions, but ofcourse nowdays any good compiler will intrisic or optimize those. Same for atomics.
> This doesn't work reliably because the arguments are wrong. It's "fread(buf, size_of_item, number_of_items, file_ptr)".
No it isn't. What C standard library are you using? Above code is correct in my C standard library ... man fread:
SYNOPSIS
#include <stdio.h>
size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);
....
The function fread() reads nmemb items of data, each size bytes long, from the stream pointed to by stream, storing them at the location given by ptr.
How does the man page's "size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);" disagree with the GP's "fread(buf, size_of_item, number_of_items, file_ptr)"? Both seem to support that "fread(buf, N-1, 1, f);" is "telling it to read one item of size N-1".
"The fread() function shall read into the array pointed to by ptr up to nitems elements whose size is specified by size in bytes, from the stream pointed to by stream. For each object, size calls shall be made to the fgetc() function and the results stored, in the order read, in an array of unsigned char exactly overlaying the object. […] If a partial element is read, its value is unspecified."
It doesn't say you stop when fgetc() returns EOF (or an error), so by a strict reading the remainder of partial items is filled with fgetc()'s "EOF" return value. But the more relevant thing is that it says the value of a partial element is unspecified.
Is this an unnecessary footgun in the standard C library API? Probably yes. But if you complain about it, I really rather you complain about the actual problem.
Other than this, there's a few things I would agree with, and then a few other things that are issues with _specific_ standard C libraries, e.g. assert just exiting the process. Mixing C library API / POSIX problems with specific C library problems isn't particularly great for an article like this.
Anyway, the article registers appropriately "uncooked" for someone reinventing pkg-config; as far as my personal "reputation database" is concerned this is strike 2 for the author.