I think the library is great, and don't have a whole lot to say about it, but wanted to mention one tangentially related thing:
The README on this repo is awesome. Opening up with advantages and disadvantages? Awesome. Plenty of code examples covering all of the major use cases? Awesome. Quick overview of the internals? AWESOME! Quick two line note about how to use the library in your project? Awesome.
I'm tempted to rant about how I wish documentation was taken more seriously, and that programmers seem to make it a point of pride that spending the first half hour with a library figuring out how to actually use it is just something we have to deal with as programmers, but I won't do so aside from this single sentence.
man pages used to be treated with the same reverence, but nowadays nobody seems to bother. I have pretty extensive man pages for some of my open source stuff, but it doesn't seem to help with users downstream.
Don't get discouraged; people who greatly appreciate a well written man page are still around. They're the users you might not hear about because the fine documentation actually helped them. :-)
I used to be quite okay with command line help for small tools, but ever since I discovered http://explainshell.com/ I have a new found appreciation for man pages.
They almost never have examples(Is that some sort of rule?), they have no "quick summary of the shit you use 95% of the time", and they get generally written as a novel, seemingly from the perspective of the developer of the util rather than consumer.
Mind you, I have used man pages many times before, but only because it was the best source of information, not because it was a particularly efficent one.
The Markdown README of this string library is a thousand times better than any man page I have seen.
man pages start with the function prototypes, that's your "quick summary".
Near the end there can be an examples section, and many man pages have one. If I run "man 3 open" I see a reasonable examples section.
I like manpages because I'm already editing or running programs in the terminal, and I can pop open the man page in the terminal quite quickly and conveniently, without a search online or even a single network request, without reaching for the mouse, etc.
I'm not sure it was ever meant to be intuitive or easy to use. As far as I can tell, INFO pages (and much of UNIX) was intended to be a stop-gap solution until something better and more permanent was written atop them. Unfortunately that dream was never realized, and Linux's accidental popularity standardized what was supposed to be a bunch of building blocks. I could be wrong though, but almost every utility in UNIX screams this to me.
INFO is a Richard Stallman thing, it's a GNU thing, it is emphatically NOT a Unix thing.
Stallman wanted to replace the Unix thing (which was always man pages) with INFO, and for many years deprecated man pages, which is part of why many GNU man pages are sub-par.
Not that non-GNU man pages are perfect, but still.
As for Unix being intended to be a stop-gap, your impression is simply historically incorrect, aside from philosophical issues like the claims made in the infamous Gabriel essay "Worse is Better".
> I could be wrong though, but almost every utility in UNIX screams this to me.
Unix/Linux is certainly not perfect, but this simply reflects the truth of Henry Spencer's aphorism, "Those who don't understand Unix are condemned to reinvent it, poorly."
People who think Unix got it all wrong, as opposed to merely having assorted warts, should read Raymond's "Art of Unix Programming".
I was more than a little startled that Raymond captured a lot of the truth of the subject; it's a good read, and can potentially make anyone a better programmer.
I don't have the same understanding. Quite the opposite. GNU texinfo was designed (in 1986) to both generate manuals and be used as a hypertext system.
I know of nothing in its history to suggest that it was a stop-gap system.
Everything in UNIX is supposed to be building blocks, but you're supposed to be able to quickly assemble what you need from them in the shell. I don't think INFO fits this mold particularly well.
I'm still looking for a Linux desktop with decent man-page integration. The worst thing about man pages (and a lot of *nix documentation) is that the means of accessing it seem to be stuck in 1970.
"They almost never have examples(Is that some sort of rule?)"
They often have examples, though this is certainly better represented in some areas than others. One I looked at just the other day, man 7 aio, is 334 lines. Starts out describing the various aio functions and structures under DESCRIPTION, and has an EXAMPLE section: http://man7.org/linux/man-pages/man7/aio.7.html
Which is unfortunate, because the majority are either extremely terse, and/or merely consist of installation instructions, so that the information content is nowhere near as high as an average man page.
At least in my experience; maybe I've just been repeatedly unlucky.
I doubt it, though, because there isn't any standard template for them, unlike the situation for man pages, which have a number of standard sections that people generally copy.
Not doing trivial overflow checking before arithmetic.
Using results from said arithmetic as indices or starting points for memory writes.
My gut says that any application using this library to process untrusted input is exploitable.
Of course, it was never advertised as being secure :-)
Which is unusual, as many "better strings" libraries make claims about the security of the traditional way of string handling (and then go on to do it wrong).
Edit: To give a concrete example, someone here recommended BString just a few days ago. That library has a "security statement" in which it claims to prevent overflow type attacks by using signed integers instead of size_t, and then checking whether the result is negative after arithmetic. But signed overflow is undefined behavior in C, and these are not guaranteed to wrap around. And yes such things have been exploited. I don't know how likely or easy bstring is to "own" but in any case it's doinnit wrong.
And yes I checked the code, it does what it claims to do..
Hello clarry, I don't think there are APIs that if not grossly misused will lead to security issues in general. There is one issue I'm aware of that perhaps is not easily exploitable but surely is unexpected behavior, that is, overflow when you reach a 2GB string: that requires a check in sdsMakeRoomFor() for sure.
Note that this could be fixed by using uint64_t type for example in the header, however in the original incarnation of SDS inside Redis this was not possible for memory usage concerns. In the standalone library I believe it is instead a great idea, since 2GB is no longer an acceptable limit.
From the point of view of the security the most concerning function looks like sdsrange(), there are sanity checks in place to avoid that indexes received from the outside can lead to issues, but I'll do a security review in the future in order to formally check for issues.
If it's any reassurance, I had a look at it, and although there is unnecessary signed/unsigned arithmetic, I could not find anything exploitable. Change the types of variables representing an object size, length, array index, and so on to size_t (from int) and it will be fine.
Might be acceptable for an individual project where you can make statements about the sizes that will be used ahead of time, but for a general purpose library your statement is absolutely false. The issue is that your size requirement can overflow size_t, malloc gets passed a smaller value than you really intended, then if you try to use what you think you allocated (remember, it succeeded with a smaller size), it's a derefence outside the bounds of the allocation. Changing the size or type of the length variable won't solve that. This is a common and well known vulnerability pattern - it is disappointing to see folks so dismissive of it.
I'm aware of arithmetic overflow, thanks. It was the subject of my research (which will eventually be open source, watch www.peripetylabs.com/publications/ if you're interested). See my sizeop library in the meantime: https://bitbucket.org/eliteraspberries/sizeop
I am not dismissive of the problem. I just know from experience that the chances of code like that being fixed at all, let alone correctly, is near zero.
I think when doing malloc(n * m) or similar the most cautious thing to do would be to check for overflow even if you don't think it's exploitable. Especially for a library. Witness for example that OpenBSD's calloc does an overflow check.
I often leave this out of my own code, but not without feeling somewhat guilty about it.
In general, just because you cannot exploit something doesn't mean nobody else can. What to you seems at worst "merely" unexpected behavior is a very good starting point for someone who actually writes exploits.
Consider what happens if an attacker controls initlen passed to sdsnewlen() (also via sdsnew() if he controls the initial string). He can overflow the arithmetic on malloc/calloc. Now either the memcpy or the NUL-byte assignment can write out of bounds, which is potentially exploitable, especially if the attacker has a nearly unbounded number of attempts (which they often do as far as online-facing services go) or if they can do things to affect the program's memory layout -- which they likely can if they're feeding the program with data.
But if these two aren't enough for a successful exploit (I wouldn't make any bets!), the initlen is assigned to sh->len (whoops, conversion to signed... and possible overflow), which then can completely throw off the arithmetic in sdsMakeRoomFor. Here I would actually be quite willing to bet an attacker can arrange these numbers so that he can do a more controlled out-of-bounds write in one of the functions that rely on sdsMakeRoomFor.
These two functions aren't necessarily the only problematic ones; they're just the ones I noticed after a minute of scanning.
And I don't really agree with what eliteraspberrie said; using the right types is a good start but it's absolutely not fine until after you actually do all the overflow checking; for an example of this, see the OpenBSD man page for malloc: http://www.openbsd.org/cgi-bin/man.cgi?query=malloc
Rule of thumb: whenever you do any arithmetic, ask yourself, can it overflow (how do you know it doesn't?) and who's in charge of making sure it doesn't? In this case it's your API's responsibility. This is exactly the same thing you do whenever you call a function: you need to know if it can fail, and if it can, you need to check the return value and do the right thing.
In the years I've spent reading CVEs (and the broken code that caused the alert), proof-of-concept exploits as well as real world exploits, I've learned that some incredibly subtle and seemingly insignificant things can be exploited. As a general rule, don't worry about exploitability, it's usually not worth your time to prove one way or another. Just fix the code whenever you see something that could go wrong. Make it easy for the next guy who audits your code; so he too can tell that your code is secure, just by seeing the right checks in the right place.
It's a good read for anyone doing C these days, and covers a good deal of problems, including the one discussed here. It comes with snippets of known vulnerable real world code too.
I forked sds.c from Redis given that it was very useful for the project for years, so I guess it may be useful in other contexts as well. There are a few changes at API level compared to the Redis sds.c file, however most of the work went into documenting it.
since the entire idea is that the pointer on the left-hand side is to the type whose size should be subtracted, I think it's better not to repeat the type but to "lock it" to the pointer instead. This also (of course) means we can drop the parenthesis with sizeof, since those are only needed when its argument is a type name.
#include <stddef.h>
int buf_offset = (int)offsetof(struct sdshdr, buf);
struct sdshdr *sh = s - buf_offset;
This is because the compiler might insert padding between your struct elements and the flexible array member. In your case, you're using only int types in the header, so padding shouldn't be an issue on most architectures, but consider the following:
struct header {
int i;
char c;
char data[];
};
sizeof(struct header) on my machine is 8, but "data" starts at an offset of 5 from the beginning of the struct. So, to go from "data" pointer to the pointer representing the beginning of the struct, you will need to subtract 5, not 8. Here is a test program:
That surprised me. I thought( and read somewhere ) that the flexible array member comes right after the entire struct, which includes possible padding.
OP got away with it since he always allocates the size of the entire struct, which is 8, plus the string size. And since char doesn't have any alignment requirements it doesn't matter, because he always gets back to correct offset. So data member is basically not used, and if you check the code you will see that it actually in never used!
The standard actually specifies (or specified?) that the flexible array member must come after all the members, including the padding. But, from my reading, that wasn't their intent, and is certainly not how compilers implement it. The link in my original post is an acknowledgement from the committee about this and that the standard needs to be updated.
I think the main reason the OP got away with it is because in his structure, "sizeof(struct sdshdr)" is equal to the offset of "buf" in the struct. This is not necessarily true.
Very true of course, I guess I assumed that antirez had thought of that, should have noticed that there were no packing enhchantmens on the struct. I love offsetof() for code like this.
Heh, you're the first person I meet who advocates dropping the parens around the sizeof argument in any situation. That does look terribly alien to me.
Really? I always do. If you don't, it makes the variable look like a type. It's the same reason why I dislike extra parens around the value in the return statement. Don't make it look like something it isn't.
This is neat, but please add a pointer to deallocation function to sds header and use it in sdsfree. This will allow to return sds strings from functions in shared libraries. A common problem it that the user of the library might use a different version of libc than the shared library (this is especially typical on Windows), and when it calls sdsfree on an sds string allocated with a different libc, something awful will happen (in the best case, it will just leak memory). By storing a pointer to the deallocation function in the code which allocated the string you can make sure that it is always released by the same libc version that allocated it.
The biggest missing thing in C imho is destructors, because you need to manually clean up everything whenever you leave a scope or function. That means calling "free" at every exit point.
Or alternatively, putting one "cleanup:" label at the end of the function and using "goto cleanup;" instead of break or return everywhere.
But goto is considered harmful, so that handy pattern seems unclean.
My question is: do you consider the "goto cleanup" pattern clean or not, and if not, what are better alternatives?
Many people are correctly citing the usage of goto in the Linux kernel. One thing I would like to add is what Greg Kroah-Hartman says about goto in one of his talks: only jump forward, never backwards. This is precisely what you do when you goto cleanup.
I, personally, think the pattern is clean. A labelled break is basically a goto anyway, yet not always available in your language. I never liked needing a flag to exit some nested structure early. I don't find reading that clean at all.
Also, a minor rant here: Dijkstra's "Go To Statement Considered Harmful" is often mentioned, sometime wordlessly, when goto is brought up, but it seems many of those people misunderstood what was actually being argued. Although he mentions being "convinced that the go to statement should be abolished from all 'higher level' programming languages," his main gripe was that the "unbridled use of the go to statement has an immediate consequence that it becomes terribly hard to find a meaningful set of coordinates in which to describe the process progress." He felt it was "as it stands is just too primitive; it is too much an invitation to make a mess of one's program." So, as others have pointed out already, his main issue was with goto being used in a way that resulted in unstructured, hard to understand programs.
I had points taken off from a C programming assignment in college once because I used goto for cleanup, and the grader blindly took off points for using goto. It took me quite a few back & forths with him to convince him that my usage was fine. His words were: "the use of goto is rather dangerous and usually leads to spaghetti code and reduction in performance (because of prediction + cache reasons), wherever you used goto you could've used functions".
I have heard about this happening to others. In the kernel I developed as part of an operating systems course, I pre-emptively commented my use of goto with a short justification of its usage and cited its use in the linux kernel. I didn't get any response, so I have no idea how the T.A. felt about it. I didn't lose any points, at least.
agreed. Goto's are not bad at all, particularly when dealing with error scenarios. They avoid hack-y if-then error logic (sort of like try/catch/finally)
If you are constructing arguments in order, you can destroy them in the reverse order at the exit and this lets you cleanly handle a failure in the middle of allocations:
Notice the cascade after the 'out' label; each label after it is for a different level of cleanup needed, depending on how deep into the system call the function encountered the error. Also notice that this is basically the kind of code one would generate for exceptions.
In performance critical nested loops, goto is the only way to exit early. A typical example here is performing a matrix multiply. In cases like this, it is absolutely necessary, because C doesn't provide a more granular 'break'.
You could leave nested loops with a condition flag, which could produce something semantically equivalent to the goto. It seems like the logic here might be simple enough that the actual check could be eliminated in the generated code by a Sufficiently Smart Compiler (that could actually exist), if there was sufficient demand for it. I'm not sure it's actually more readable, though.
I'd argue than a condition flag in this case would introduce code bloat (declare, initialize, check in the loop construct) and decrease readability compared to a goto Label.
I agree that it typically would decrease readability, and thus wouldn't be advisable. My point was that readability is what we should be arguing, for the few places Go To is in fact appropriate, and not that we need it - even with performance constraints we basically don't.
Yeah, of course I'm aware, it's just that there is a time/space trade-off, so I think folks who argue that goto should never be used or has no legitimate uses, are just taking too much of a hard-line.
And I think it just highlights a lack of understanding about what might make goto "bad" in programming--IMHO it's "bad" when it makes control flow more complex, which undermines maintainability. Common "cleanup" idioms that only jump forward are not that bad, because they don't make control flow particularly more difficult to follow. (Whereas jumping backwards, especially across many state changes, can be very hard to follow.)
I don't see a time-space tradeoff here. Compiled naively, I expect the version with flags to be slower and take more space. Compiled sufficiently smartly, I expect them to be equivalent. I expect real compilers to be close enough to the latter for most but not all purposes, but I think if the flag version was more readable and maintainable the place to focus (collectively, medium term) would be on making compilers smart enough. Obviously, in the short term on specific projects you do what you need to.
"I think folks who argue that goto should never be used or has no legitimate uses, are just taking too much of a hard-line."
I agree, but that's because there are places where use of Go To makes things more readable and maintainable, not - primarily - because they are actually needed, per se, even with performance constraints (in the overwhelming majority of cases).
'And I think it just highlights a lack of understanding about what might make goto "bad" in programming--IMHO it's "bad" when it makes control flow more complex, which undermines maintainability.'
Yea, I agree. I love C and was a C purist and flat-out refused to learn C++ well and use it in practice. This recently changed and now I quite enjoy C++. Destructors are one reason why. The whole idea of resource-managing classes and RAII is clever and really useful.
The hard part is that it takes a lot to be a good C++ programmer. C is such a smaller language, you can master a handful of best practices, gain experience, and become a competent C programmer. In contrast, C++ is such a large language you have to be active in learning everything — read Meyers, Modern C++, GoF's Design Patterns, etc. Anything short of this, and you're going to accidentally reinvent the wheel or do something stupid. But still, C++(11) is a terrific language that I'm learning to love.
Dijkstra never considered the kind of goto you describe as harmful. This is a distinction that's lost on most programmers now that the kind of languages that did have harmful gotos are dead and forgotten.
What he argued against was gotos that jump across procedure or scope boundaries.
I think part of the problem is that everyone heard that goto is harmful but noone ever reads the original letter, even though its reads like a short blog post (assuming they had blogs back then)
According to Dijkstra, the really evil gotos are the ones that force you to keep a track of the whole execution path in order to figure out the current state of the program. Its much easier when you can look at a line of code and statically know what the program state will be like at that point.
A for loop is better than goto because you can look at a line inside it and instantly know that it will run a number of times, with the index changing in increasing order.
gotos for resource cleanup are good because you can look at a given line of code and know what resources still need to be freed.
Code that gets rid of gotos and break statements by blindly replacing them with tons of flags is just as bad as gotos because you still need to look at the whole execution path to figure out the state in the flags.
Relevant quote from "GO TO statement considered harmful": "The remark about the undesirability of the go to statement is far from new. I remember having read the explicit recommendation to restrict the use of the go to statement to alarm exits, but I have not been able to trace it; presumably, it has been made by C. A. R. Hoare."
He doesn't explicitly say it, but it seems he agreed with that use of go to.
In fact, his original title was "a case against the goto statement". It was the editor of CACM that gave it the link-baity (for the time) title "considered harmful".
GOTO _was_ considered harmful back in 1968(!) when structured programming was still in its infancy. Code of that time was usually peppered with GOTOs and therefore a pain to read.
Using GOTO for cleanup tasks is perfectly fine because it increases readability.
If you read much of the original discussion around the "Go To Statement Considered Harmful", this pattern is frequently pointed out as a reasonable use case absent proper exception handling.
Indeed, from the original editorial: "I remember having read the explicit recommendation to restrict the use of the go to statement to alarm exits, but I have not been able to trace it[.]"
It is the right way to handle exceptional conditions in modern C.
Well here is one alternative: allocate temporary strings on a stack basis. Have some higher level functions reset the stack pointer. So now things are allocated temporarily by default, and only in the (hopefully) rare case where you need a long-lived item do you promote it to long lived. Promoting could mean copying the string to the heap, or it could just mean marking it so that the stack cleanup mechanism skips it (so perhaps you have a linked list of temporary items or something similar).
Well I did not mean that the strings are literally on the call stack. You can make your own stack (and it doesn't have to be a stack, just have the same semantics).
You need to free heap-allocated objects, but stack allocated objects are important and useful (particularly with RAII) and cleanly reversing such allocations eliminates a class of bugs.
... mostly. Much like in a garbage collected language, you need to make sure you are no longer referencing things when you want resources to be freed, but it does make the worst case much harder to run into.
Recently I've been getting into some C and having programmed in a number of other languages know enough to be sure I wanted to have a solid understanding of properly handling encodings and strings in general. I've obtained a good understanding, but now I am looking to abstract some of the low-level mechanics. Unfortunately most of the libraries I have seen (such as bstring) use structs. It was nice to see this approach taken. Thanks for extracting it and making it easy to find and learn about!
On a related note - does anyone have recommendations for a similar small library for dealing with conversions from char * to wchar_t * and basic encoding duties? I'm working cross-platform, and so far I've stitched together some functions wrapping stuff like wcsrtombs() and WideCharToMultiByte().
Excellent news - the Redis source code is a really good source of high quality dependency-free C code and tends to be the first place I look.
Some time ago I forked the SDS code and added a set of additional utility functions around this for a project I was working on at the time (basic file reading, regex, LZF compression, Blowfish encrypt/decrypt, SHA256 etc).
This was from a fairly old SDS version so this looks like a good opportunity to sync up with the library version.
Disadvantage #1 can be simply solved by using a pointer to the "sds" type instead.
Then you can be sure that
sdscat(&s, "Some more data")
updates s to always point at the right memory address and you can't introduce hard to find bugs by forgetting to assign to s which the compiler wont warn you about.
If you'd pass just "s" instead of "&s" as the first parameter, the compiler would error out.
So all functions modifying the string should take a pointer to it.
My biggest gripe with C's string functions is that it's really easy to make off-by-one errors when trying to do anything with two strings, due to NUL and the way they seem to handle it inconsistently. I'm constantly having to check the docs for any given string function to find out how it handles NUL (which isn't always in an obvious spot thanks to the design of man pages). And once I've found it, I have to come up with some contrived example that usually needs to be written in a comment just to make sure I've used its intended algorithm correctly and didn't cause an off-by-one error. If this library hides all that for me, I'm sold.
Sigh, another string library. And this is the reason why I prefer C++ over C. You don't end up writing another string library for the 300th time. There aren't that many differences between string libraries anyway. Most of them are just structs with a pointer to a memory block, plus a length field.
> Sigh, another string library. And this is the reason why I prefer C++ over C. You don't end up writing another string library for the 300th time.
You say that, and yet every C++ project I've ever touched in my life has had its own string class with various levels of horror attached. My favorite was the one that stored everything internally as 32-bit characters to be Unicode safe, and was never used in a codebase that had to deal with Unicode.
Yep, but this one has not the usual layout of struct+pointer (that's the whole point), and I'm not sure it qualifies as "yet another" since it was written in 2006.
It appears to me to be "yet another" because the length before the string approach is usually referred to as a b-string.
windows (for example) has had a comprehensive b-string library (type is called BSTR) for about 20 years - due to it's age and provenance it has the downside of thinking a character is a 16-bit value...
You're right, sorry. I only scanned the document, saw the struct, and closed the tab. The way you use a header before the data is similar to how Ruby represents its strings internally.
> And this is the reason why I prefer C++ over C. You don't end up writing another string library for the 300th time.
Ironically, in almost all of my uses of C++, I did end up writing or using a custom string library there too. (I was doing mostly console games or language interpreters.)
I'm not so happy with C++ string either. Main complaint is that I like the use of C strings as "semi-predicates"- you can test for NULL to indicate a failure. I wrote my own String class at one point to provide this feature:
// These provide test if assigned/not assigned
inline operator void*() const
{
return (void *)s;
}
inline bool operator!() const
{
if(s) return 0;
else return 1;
}
Eh, I like my C idioms. With C++ it's easy to make it safe so that accessing an unassigned string doesn't cause a crash (make NULL equivalent to empty strings except for tests).
I have the same gripe with the way the STL is designed. Too tedious to test for empty first before reading an item.
Making sds a typedef for char* is very convenient. But it makes it very easy to pass an sds to a function that expects a C string without checking for null bytes.
Ruby, Java, Perl, PHP have all had security problems when interacting with C because they failed to properly distinguish binary-safe strings and C strings.
Disadvantage #1 seems unnecessary. Instead of this:
s = sdscat(s,"Some more data");
Why not do this?
sdscat(&s,"Some more data");
The latter would make the use-after-free error they're describing impossible. (Disadvantage #2, changing one reference but not others, would remain. And callers would still need to check for NULL if they intend to handle ENOMEM gracefully.)
I assert there's no meaningful performance difference between the two.
This is a common practice in relatively new interfaces like pthread_create.
> And you still want to access the old value if reallocation failed.
If you want to provide commit-or-rollback semantics, you could signal error via return value rather than by replacing s with NULL:
if (sdscat(&s,"Some more data") != SDS_SUCCESS) {
/* failure path; s is unchanged */
} else {
/* s now has "Some more data in it" */
}
but this may be completely useless, depending on the environment(s) in which the library or program is intended to be used. On 64-bit Linux systems with memory overcommit enabled (the default), this failure path essentially shouldn't ever happen. Instead, some process (maybe yours, maybe not) will be picked by the kernel OOM killer. Many programs just use an allocate-or-die interface, as the sds README mentions.
You're right, I wasn't saying that it's the only way to write allocation/assignment.
I was saying that in that case sdscat() matches with sdsnew() style.
The CString C++ class from Microsoft's MFC and now also ATL used since forever the trick of having both the length and the reference counting in the single allocation with the characters themselves, as well as additional 0-termination character even if it was initialized with non-zero terminated character.
In the case of disadvantage #1: s = sdscat(s,"Some more data");
You could fix that by adding a "remote pointer" header field. Inside of sdscat, you would allocate a new sds struct, and set the remote pointer header field in `s` to the new sds structs location. You could also try to do a realloc and maybe youll get the same starting pointer back again.
How much better is this dynamic approach than something like immutable strings? (not that char* is immutable)
Since with the sds library, you could potentially be getting back a new pointer for each operation, you could just as easily be working from an immutable string library. Do immutable strings have poor performance? I've never really considered it.
I strongly disagree. It is a tradeoff between that and advantages #1 and #3; with the bstring library, for example, you pass in a mutable string and the function mutates it.
In no world is passing in a mutable value, having it mutate it, but then having to reassigning your variable superior.
Passing in an immutable value, and then assigning a fresh value is reasonable, but that's not what SDS is doing, AFAICT.
I think gcc has some decorations you can have to tell it to never ignore the return function.
I'm curious why
s = sdscat(s,"Some more data");
didin't end up like:
sdscat(&s,"Some more data");
It's basically a malloc()/free() implementation with printf, some formatting, and strcat bolted onto it. (Strictly speaking, it may or may not use free lists or whatever, but the use of the header and returned pointer is quite similar.)
The README on this repo is awesome. Opening up with advantages and disadvantages? Awesome. Plenty of code examples covering all of the major use cases? Awesome. Quick overview of the internals? AWESOME! Quick two line note about how to use the library in your project? Awesome.
I'm tempted to rant about how I wish documentation was taken more seriously, and that programmers seem to make it a point of pride that spending the first half hour with a library figuring out how to actually use it is just something we have to deal with as programmers, but I won't do so aside from this single sentence.