> When using fread, some implementations use the entire buffer as a temporary wo...

matheusmoreira · on Feb 11, 2023

Honestly I don't blame him. Perhaps he's used to read where the number of bytes to read follows the buffer:

https://man7.org/linux/man-pages/man2/read.2.html

  ssize_t read(int fd, void *buf, size_t count);

That's how it works in every sane I/O API I've seen. What's even the point of the size_of_item argument?

spc476 · on Feb 12, 2023

Think calloc()---it takes two size parameters, one the size of the item, and one the number of items. In theory, this could avoid an overflow:

    items = fread(array,sizeof(array[0]),count,fp);

vs:

    items = read(fh,array,sizeof(array[0]) * count);

That multiple could overflow. fread() (and fwrite()) return the number of items read, not the number of bytes (or characters). The standard also points out that if size is 0, or nmemb is 0, nothing is done (and 0 is returned).

eqvinox · on Feb 11, 2023

I agree it's a weird API. Not sure where it comes from, maybe some '70s thing with reading blocks?

However, this functions as a red flag for the article itself, as in: in an article dismissing something with very little nuance, finding one of the arguments exhibit a lack of good understanding of that thing raises an interrupt for me that its other arguments may also be flawed.

The article overall just rubs me as hubris. If you go about dismissing decades of engineering history, you better get your facts right. It's easy to write things like this when you're at the wrong end of the Dunning-Kruger bathtub, so you have to show you are in fact at the right end of it. (While in the middle, people tend to not write articles like this in my experience.)

skissane · on Feb 13, 2023

> I agree it's a weird API. Not sure where it comes from, maybe some '70s thing with reading blocks?

Probably, yes. In the earliest versions of C, stdio hadn’t been invented yet, and all IO was done using Unix file descriptors - which was fine, because C only ran on Unix. But, they were keen to prove the language was cross-platform, and Bell Labs used both IBM and Honeywell mainframes, so they started porting C to those two platforms, as a proof-of-concept of its portability. And there they ran into a big problem-mainframe IO was radically different from Unix, and the file descriptor API was rather ill-suited to it. So, Mike Lesk invented a “portable IO package”, which abstracted over the differences between mainframe and Unix IO-and that package was the ancestor of stdio.

And I think that explains why fread() is designed the way it is. IBM mainframe IO is fundamentally record-oriented [0] (and probably Honeywell was too, although I know far less about it). To Unix, reading 10 records of 80 bytes each, 80 records of 10 bytes each, or 1 record of 800 bytes each - the three are largely equivalent. Not so under MVS-if, when you create a file, you declare it as being composed of 80 byte fixed length records, the OS will force all reads/writes to be multiples of 80 bytes-hence why fread() makes you declare the record size, since it is much more essential information on that platform.

[0] Historically speaking; nowadays z/OS, the successor to MVS, supports Unix-style byte-oriented IO as well as classic mainframe record-oriented IO; but back in the 70s, those Unix compatibility features were still 20 years away. And classic mainframe apps still predominantly rely on record-based IO

jstimpfle · on Feb 11, 2023

> as far as my personal "reputation database" is concerned this is strike 2 for the author.

Wow, so cocky.

> If you go about dismissing decades of engineering history, you better get your facts right.

I find most of the opinions on that page are well explained, and they aren't spectacularly bold anyway. You will be hard pressed finding people that think locales are usable and most of string.h is not historical baggage.

I also learned something new, for example the thing about isXXX() taking unsigned values.

> finding one of the arguments exhibit a lack of good understanding of that thing raises an interrupt for me that its other arguments may also be flawed.

You could try the thing about reading in a benevolent way. Don't choose the interpretation that would most upset you.

From what I can see you found one nit where the author might not have a complete understanding on the issue, or maybe they just swapped the two arguments. (I'm not quite sure what that explaination was about, and I didn't bother to research. You're right of course about how fread should be used). And then you went back to HN to tear the article in pieces (btw. I find the post to be well above average quality).

As far as I'm concerned, this is strike 1 on my list.

eqvinox · on Feb 12, 2023

> Wow, so cocky.

> You could try the thing about reading in a benevolent way. Don't choose the interpretation that would most upset you.

Et tu, Brute?

314 · on Feb 13, 2023

> It's easy to write things like this when you're at the wrong end of the Dunning-Kruger bathtub

Amusingly, only the most simplistic graphs of the Dunning-Kruger efect would portray it as matching a bathtub curve. In particular the rather slow slope of confidence rising as competence improves is dramatically (and significantly) different from the sharp drop-off of false confidence. Rather than two steep sides, there is a marked asymmetry.

I suspect that you may have misunderstood the author's point though. Even if you swap the arguments as you suggest, you still run into this problem described in the link that the author provides:

  When used on a text mode stream, if the amount of data requested (that is, size \* count)
  is greater than or equal to the internal FILE \* buffer size (by default the size is 4096 bytes,
  configurable by using setvbuf), stream data is copied directly into the user-provided buffer,
  and newline conversion is done in that buffer. Since the converted data may be shorter than the
  stream data copied into the buffer, data past buffer[return_value \* size]
  (where return_value is the return value from fread) may contain unconverted data from the file.
  For this reason, we recommend you null-terminate character data at buffer[return_value \* size]
  if the intent of the buffer is to act as a C-style string.

i.e. his complaint is that if he initialises the buffer to zeros and reads N bytes into it there is the possibility that zeros after the N bytes are overwritten.

This is distinct from the problem of not getting enough information back if you swap the argument order, and as it matches his decription in the text before the link I would assume that the swapped arguments are actually a typo rather than a misunderstanding of the API.

arka2147483647 · on Feb 11, 2023

I have heard from somewhere that there once was an idea of making computers work in terms of objects, instead of streams. Ie, the storage layer would store individual objects, not bytes. And that the sizeof object, count of objects, params in traditional read and write functions comes from that.

TristanBall · on Feb 13, 2023

As i understand it, IBM I does this, amongst others. I can't speak to what impact, if any, I had on c tho.

_jpys · on Feb 11, 2023

"wrong end of the Dunning-Kruger bathtub, so you have to show you are in fact at the right end of it."

It seems so simple. Yet certainty eludes me.

"Dunning-Kruger bathtub"

This phrasing is gold btw.

astrange · on Feb 13, 2023

Doesn't exist though.

(Not only did Dunning-Kruger not even claim that incompetent people believed they were confident, but the entire study doesn't reproduce.)

_jpys · on Feb 13, 2023

I learned the term in grad school. Wikipedia seems fairly accurate.

"Many criticisms of the Dunning–Kruger effect have the metacognitive account as their main focus but agree with the empirical findings themselves."

https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect#...

Would it even matter? It's regularly used to describe a common phenomenon. You understood what the Parent Comment was describing, as did I. For that reason alone, it's effective communication.

tom_ · on Feb 11, 2023

Another way of looking at it: if you don't want fread to overwrite N-1 bytes, don't tell it it can overwrite N-1 bytes!

(Also, check the return value...)

Karellen · on Feb 11, 2023

I think the author does want fread() to overwrite N-1 bytes, iff there are N-1 bytes to read.

The objection is to overwriting N-1 bytes when only N-2 bytes were read.

Which... doesn't sound completely unreasonable?

On the other hand (especially with SIMD CPUs, but maybe also with simpler instruction sets) I can understand that copying blocks of e.g. 8 bytes at a time might be faster than the special case at the end of a read for copying only 7 bytes. (Because you need to follow an "unlikely" branch to get to the 7-byte code which stalls the CPU, maybe.) So if the output buffer has space for it, it might be more performant to copy 8 bytes including a junk byte and just say you copied 7, than copying only 7 bytes.

And the author does seem to object to a number of other design decisions which could hurt performance.

jdefr89 · on Feb 11, 2023

I too am surprised. The author generally writes clean code, most of it seems to rely heavily on libC… I am confused…

spc476 · on Feb 12, 2023

If you follow the links, most of the complaints stem from the Microsoft Windows implementation of the C Standard.

eqvinox · on Feb 11, 2023

The article is indeed rather fuzzy about what "libc" is. It seems to exclude "things provided by the compiler" while talking about varargs, but later calls out atomics with no such note.

arka2147483647 · on Feb 11, 2023

The subject is messy. For example; traditionally memset() and memcpy() are clib functions, but ofcourse nowdays any good compiler will intrisic or optimize those. Same for atomics.

theteapot · on Feb 11, 2023

> This doesn't work reliably because the arguments are wrong. It's "fread(buf, size_of_item, number_of_items, file_ptr)".

No it isn't. What C standard library are you using? Above code is correct in my C standard library ... man fread:

  SYNOPSIS
        #include <stdio.h>
        size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);
        ....
        The function fread() reads nmemb items of data, each size bytes long, from the stream pointed to by stream, storing them at the location given by ptr.

ntrz · on Feb 11, 2023

How does the man page's "size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);" disagree with the GP's "fread(buf, size_of_item, number_of_items, file_ptr)"? Both seem to support that "fread(buf, N-1, 1, f);" is "telling it to read one item of size N-1".

theteapot · on Feb 11, 2023

Oh it doesn't. Sorry. Too early not enough sleep to be commenting ...

eqvinox · on Feb 11, 2023

https://pubs.opengroup.org/onlinepubs/007904975/functions/fr...

"The fread() function shall read into the array pointed to by ptr up to nitems elements whose size is specified by size in bytes, from the stream pointed to by stream. For each object, size calls shall be made to the fgetc() function and the results stored, in the order read, in an array of unsigned char exactly overlaying the object. […] If a partial element is read, its value is unspecified."

It doesn't say you stop when fgetc() returns EOF (or an error), so by a strict reading the remainder of partial items is filled with fgetc()'s "EOF" return value. But the more relevant thing is that it says the value of a partial element is unspecified.

(Edited to point out the "unspecified" aspect.)