Hacker Newsnew | past | comments | ask | show | jobs | submit | david2ndaccount's commentslogin

Use the new standards-conformant preprocessor with `/Zc:preprocessor`

https://learn.microsoft.com/en-us/cpp/build/reference/zc-pre...


Pascal strings might be the only string design worse than C strings. C Strings at least let you take a zero copy substring of the tail. Pascal strings require a copy for any substring! Strings should be two machine words - length + pointer (aka what is commonly called a string view). This is no different than any other array view. Strings are not a special case.


Yeah, I too feel that storing the array's length glued to the array's data is not that good of an idea, it should be stored next to the pointer to the array aka in the array view. But the thrall of having to pass around only a single pointer is quite a strong one.


> I too feel that storing the array's length glued to the array's data is not that good of an idea, it should be stored next to the pointer to the array aka in the array view.

That’s not cache-friendly, though. I think the short string optimization (keeping short strings alongside the string length, but allocating a separate buffer for longer strings. See https://devblogs.microsoft.com/oldnewthing/20240510-00/?p=10... for how various C++ compilers implement that) may be the best option.


> That’s not cache-friendly, though.

How so? The string implementations in that post are pretty much that:

    struct string
    {
        char* ptr;
        size_t size;
        union {
            size_t capacity;
            char buf[16];
        };
The pointer and the size are stored together, and they may optionally be located right next to the string's actual data, but only for very small, locally-allocated, short-lived strings; but in normal usage, that pointer points somewhere into the heap.


> they may optionally be located right next to the string's actual data, but only for very small, locally-allocated, short-lived strings

Only for small strings. Locally allocated and short-lived aren’t required for short string optimization to take an effect.

Also, I can’t find a good reference, but “only for small strings” in many programs means “for most strings”.


Is there a reason for the string not to be a struct, so that you're still just passing around a pointer to that struct (or even just passing it by value)?


I might guess that GP is referring not to interface ergonomics (for which a struct is a perfectly satisfactory solution, as you describe), but to implementation efficiency. A pointer is one word. A slice / string view is two words: a length and a pointer. A pointer to a slice is one word, but requires an additional indirection. I personally agree that slices are probably the best all-around choice, but taking double the memory (and incurring double the register pressure, etc.) is a trade-off that's fair to mention.


> C Strings at least let you take a zero copy substring of the tail

This is a special-case optimisation that I'm happy to lose in favour of the massive performance and security benefits otherwise.

Isn't length + pointer... Basically a Pascal string? Unless I am mistaken.

I think what was unsaid in your second point is that we really need to type-differentiate constant strings, dynamic strings, and string 'views', which Rust does in-language, and C++ does with the standard library. I prefer Rust's approach.


If I recall correctly a pascal string has the length before the string. Ie to get the length you dereference the pointer and look backwards N bytes to get the length. A pascal string is still a single pointer.

You cannot cheaply take an arbitrary view of the interior string - you can only truncate cheaply (and oob checks are easier to automate). That’s why pointer + length is important because it’s a generic view. For arrays it’s more complicated because you can have a stride which is important for multidimensional arrays.


> Isn't length + pointer... Basically a Pascal string? Unless I am mistaken.

Length + pointer is a record string, a pascal string has the length at the head of the buffer, behind the pointer.


Many years ago when reading Redis code I saw the same pattern: they pass around simple pointer to data, but there is a fixed length metadata just before that.


I assume it’s either Antirez’s sds or a variant / ancestor thereof, yes. It stores a control block at the head of the string, but the pointer points past that block, so it has metadata but “is” a C string.


Pascal strings store the string's length by its data, whereas fat pointers store the length by the address of the data.

The main difference is that if a string's length is by its data, you can't easily construct a pointer to part of that data without copying it into a new string, whereas if instead the length is by the data's address, you can cheaply construct pointers to any substring (by coming up with new length+address pairs) without having to construct entire new strings.


C strings also allow you to do a 0 copy split by replacing all instances of the delimeter with null (although you need to keep track of the end-of-list seperatly).


You also need to own the buffer otherwise you’re corrupting someone else’s data, or straight up segfaulting.


As long as you clearly document that the incoming data is going to be modified, it's not a problem. And in a lot of cases, the data either comes from the network or is read from the file - so the buffer is going to be discarded at the end anyway... why not reuse it?

And yes, today it would be easier to make a copy of the data... but remember we are talking about 90's, where RAM is measured in megabytes and your L1 cache may be only 8KB or so.


The "zero copy substring" in C is in general not a valid C string since it is not guaranteed to be zero-terminated. For both languages one could define a string view as a struct with a pointer plus size information. So, I do not see why Pascal is worse in this regard than C.


x86 had 6 general-purpose working registers total. Using length + pointers would have caused a lot of extra spills.


“Sure your software crashes and your machines get owned, but at least they’re not-working very fast!”


Right. This is so often the excuse for terrible designs in C and C++. It's wrong, "But it's faster". No, it's just wrong, only for correct answers does it matter whether you were faster. If just any answer was fine there's no need to write any of this software.


The world won’t allow a dependence on a single geopolitically threatened entity in the long run, so either they defuse that risk themselves or risk a competitor filling that role. This move is better for TSMC itself.


If you want to remain portable, write your code in the intersection of the big 3 - GCC, Clang and MSVC - and you’ll be good enough. Other implementations will either be weird enough that many things you’d expect to work won’t or are forced to copy what those 3 do anyway.


This is what I have been doing for years. Works well for me.

Sometimes it is annoying but realistically it is a good strategy.


You can apply `#` to __VA_ARGS__, which won’t preserve the exact whitespace, but for many languages it’s good enough. biggest issue is you can’t have `#` in the text.


I wrote a similar article in the past: https://www.davidpriver.com/ctemplates.html

I use this technique in my hobby projects as it has worked out the best for me of the options I’ve explored.


Works with MSVC if you add /Zc:preprocessor (to get a standard compliant preprocessor instead of the legacy one).


The author relies on the compiler fitting the bitfield in unused padding bytes after “speed”, but that is implementation-defined behavior (almost everything about bitfields is implementation defined). MSVC will align the unsigned bitfield to alignof(unsigned), whereas GCC and clang will use the padding bytes. To portably pack the struct, use unsigned integers and use flags (monster.flags & CAN_FLY instead of monster.can_fly).

See https://c.godbolt.org/z/K7oY77KGj


Bitfields have other problems. Say you have two bitfields each of `uint8_t` type and totaling 16 bits: well, you might think that's _two_ fields, but the compiler is allowed to treat them as _one_ whenever it wants to, and _that_ can be a problem if you're accessing one with atomics and the other without because the other can yield unsynchronized accesses to the one that needs synchronization.

Bitfields in C leave way too much behavior to the compiler or undefined. It's really intolerable.


Even worse: bit fields can only be applied to int, signed int and unsigned int (maybe char as well but i dont remember)

Even crazier is the fact that an int bitfield's signedness is implementation defined


> Even crazier is the fact that an int bitfield's signedness is implementation defined

Easy fix: just make them always unsigned. But the other problems are much more serious.


Why can't there be a standard defined for bitfields in future C releases? This is a long-discussed drawback of a feature that I really really want to be able to use in production code.


There is, but it's part of the platform ABI and not the C language standard. The latter specifies syntax and behavior, the former is what's concerned with interoperability details like memory layout.

I happen to have a copy of AAPCS64 in front of me, and you can find the specification of bit packing in section 10.1.8. The sysv ABI for x86/x86_64 has its own wording (but I'm pretty sure is compatible). MSVC does something else I believe, etc...


There is. I'm not sure when it was added, but looking at https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf, page 106, note 13

> An implementation may allocate any addressable storage unit large enough to hold a bit-field. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.

In practice, I've been using this feature for ages (along with __attribute__((packed))). There comes a point where you can start depending on compiler specific features instead of relying solely on the standard.


The portion you quoted derives word-for-word from the original ANSI standard (C89/C90): https://port70.net/~nsz/c/c89/c89-draft.html#3.5.2.1


Can this be standardized? It seems it would have accommodate the most limited architecture, in terms of memory access, probably making it practically useless. Seems best left to the compiler.


The quoted text is from the C standard. The poplar wisdom is that anything goes wrt bit-fields, but it's an exaggeration.

The original (?) DNS library used bit-fields to read and write the packet header status and flags fields, and that original code is still used today: https://github.com/openbsd/src/blob/master/include/arpa/name... It does rely on implementation-defined behavior--it swaps the order of fields based on whether the machine is little-, big-, or middle-endian--but even that behavior has remained remarkably consistent, at least on Unix ABIs.

I think the wisdom comes from dealing with niche compilers, especially from years ago, where bit-fields were one of the areas where standard adherence was less consistent; and in embedded and kernel contexts, where the implementation-defined and (implicit) undefined behavior, such as wrt atomicity or the intersection of bit-fields and (extension) packed pragmas, counseled against any usage of bit-fields.


I've been working on an open source library that, while it doesn't solve bitfields, can provide convenient ergonomic access to bit-granular types:

https://github.com/PeterCDMcLean/BitLib

You can inherit from bit_array and create a pseudo bitfield:

  class mybits_t : bit::bit_array<17> {
    public:
      // slice a 1 bit sized field
      // returns a proxy reference to the single bit
      auto field1() {
        return (*this)(0, 1);
      }

      // slice a 3 bit sized field 
      // returns a proxy reference to the 3 bits
      auto field2() {
        return (*this)(1, 4);
      } 
  };
  mybits_t mybits(7);
  assert(mybits.field1() == 1);
  mybits.field1() = 0;
  assert(static_cast<int>(mybits) == 6); //no implicit cast to integers, but explicit supported

There is minimal overhead depending on how aggressively the compiler inlines or optimizes*



different/more than std::bitset<SIZE> ?


Only similar by a little bit. Bitset misses the mark in so many ways.

bit_array can be compile time storage (ala bitset) or dynamic at construction time. There is also bit_vector which is entirely dynamic

Critically though, is that the type/library is built on a proxy iterator/pointer that lets you do certain things with STL or the bit algorithms that you cannot otherwise do with bitset.

But back to the bitfields idea. The bit array backing storage is configurable through templates so you will be able to know for certain exactly how much stack or heap your type will take up


C23 has _Float16


Summer reading programs are a band-aid on the problem that children shouldn’t have such a long summer break now that air conditioning is common. Spread the breaks out throughout the year if you want to maintain the same number of days off. All evidence shows the summer break is bad for children’s academic achievement (especially poor children), but it is viewed as a perk for the teachers so the teacher’s unions fight against questioning it.


Let’s say summer break is basically 3 months. I as a parent need to figure out childcare for that 3 month period at the beginning of summer. This is a much more time consuming endeavor than most would expect (or at least more than I expected). If you distribute those months throughout the year I need to repeat this process 3 different times, adding a bunch of overhead that could be spent on activities more beneficial to my family and kids.

Edit: Adding that I realize the summer slowdown absolutely exists and has a disproportionate effect on those that don’t need another wrench thrown in their life. But just wanted to add a perspective that isn’t “teacher union boogeyman”.


Not every school has air-conditioning however.

And there are schools that do year-round schedules, but the total time off is about the same. They will typically get a longer winter break, longer spring break, an additional fall break, and then a much shortened summer break, but those add up to about the same time off overall. I know many teachers who prefer that system, some because it means they get paychecks more consistently throughout the year, and also it gives you more spread out breaks and flexibility in taking trips instead of being locked in to summer/Christmas/one week in the spring.

The strongest push back to this schedule is in fact parents. The primary issue is once their kids are in different schools (high school / middle school / elementary) with different schedules this causes issues as kids are not longer on break at the same times. In addition summer camp programs are tied to the traditional schedule leaving kids in the year round schedule with fewer or no options.

In order to change it, you also need neighboring districts/communities/private schools/programming to all shift as well, otherwise it becomes too much of as hassle for parents & teachers.


Air conditioning is common, but at least in some regions, it would be a tremendous expense for the the school to condition their buildings for occupation during the summer. And many buildings were designed around the summer break, so they may not have capacity to condition the buildings for occupation during the summer; this is not without its problems as some buildings end up being unfit for occupation during the school year, especially as the climate gets less consistent. There's probably some opportunity for savings in places where increasing hours during the summer could result in decreasing hours in the winter, though.

I think there's some cultural value in having a shared experience of summer vacation. But I agree, breaking up the breaks throughout the year, where possible, would make a lot of sense. There's a benefit of less crowding when school districts have different weeks off; although it's harder for extended families to meet up when their school schedules are drastically different.


That assumes academic achievement should be the primary aim of childhood. What I learned in school was incredibly important—don’t get me wrong—but what I learned over the summer was arguably more important.

As a child of divorce, I cherished 6 straight weeks at my mom’s house (we only visited every other weekend during school). As a working class kid, I earned probably half my annual spending money over the summer.

My wife and I now have kids, and we’ve always loved to travel (and needed to just to visit family). Summer is the only time available for extended family trips (2+ weeks).


In other words, any time spent outside of school is time wasted?

We've cut the music and art in schools too. I guess the end state is one long endless math class. I'm sure those kids will be well adjusted.


It is the only vacation most teachers get, so of course they fight against shortening it


The argument wasn't for shortening it, but for distributing it through the year. I have never in my adult life taken 10 consecutive weeks off, and 5 two-week breaks would still be very generous.


Maybe summer break also has some value for the joy it brings to children? Their lives shouldn't just be preparation for adulthood, it's worth making childhood enjoyable too.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: