Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
BCHS: OpenBSD, C, httpd and SQLite web stack (learnbchs.org)
217 points by davikrr on Jan 19, 2022 | hide | past | favorite | 149 comments


I'd be fine with this, even totally on-board, if C weren't so awful with respect to text. You don't even have to worry too much about free()ing your malloc()s if you design around short-lived processes. But this is just asking for security concerns among the tangled web of string and input processing your bespoke C routines are likely to develop into.

Pair it with a better, more modern, and safer native-compiled language and get the same effect. Zig, Nim, Go, hell even Carp.


> Pair it with a better, more modern, and safer native-compiled language and get the same effect. Zig, Nim, Go, hell even Carp.

I love how trollish it is not to talk about Rust in that context.


Maybe they didn’t want to bring forth the Rust stans


Your message looks like a perfect example of trolling to me.


This entire thread is a troll, or a demonstration of Poe's law.


If we'd just rewrite all of the things in Rust we could solve computer bugs forever, and world hunger too.


This having to re-write things is obviously Rust’s fatal flaw.

I cannot wait for the next great language, the one that brings all Rust’s advantages which is a pure superset so that it can still compile existing code. Surely something like that ought to end all these petty language rivalries forever!


This is what the Rust Evangelism Strike Force™ told me and by God, I believe them.


And end mortality!


Twice


Ah, the ol' double-free bug!


The last time someone promised to free the Men from the gift of Ilúvatar, it did not go well for them. A good reason never to use Rust!


BRHS does not sound as cool as BGHS. This is the only reason to exclude Rust and prefer Go. :p


I just wish there were better tools for navigating C codebases.

There’s been more than one time where I’m in some large auto tools based project trying to figure something out and there’s a call out to some dependencies I have no idea of.

Also many of the projects lack and sort of documentation or source code commenting. These aren’t someones pet project either. One of them was from a notable name in the open source community and the other one was a de-facto driver in a certain hardware space.


Using an IDE like VS/Clion/CDT/.. would already be of help.

Then there are tools like SourceGraph, CppDepend among others.


Use clang to generate a compilation db. Most IDEs support this format out of the box, otherwise via plugin or routed through youcompleteme.

https://clang.llvm.org/docs/JSONCompilationDatabase.html


Have you tried ctags for navigation? Or clang-lsp?


Try cscope.


It does seem sometimes that a lot of folks use C for philosophical rather than practical reasons.

That being said, I love seeing a push for simple stacks like this.


You don't even have to worry too much about free()ing your malloc()s

*gasp!* Such lack of symmetry... it disturbs something deep in my soul.


It's just a well known arena allocator pattern, implemented on OS level.


Is there a good string-manipulation C library?


The strbuf library that's part of git.git is a pleasure to work with. It's C-string compatible (just a char /size_t pair), guarantees that the "char " member is always NULL-delimited, but can also be used for binary data. It also guarantees that the "buf" member is never NULL: https://github.com/git/git/blob/v2.34.0/strbuf.h#L6-L70


This looks really well thought out - bookmarked.


Yes, SDS from Redis project.

https://github.com/antirez/sds

However the moment you call into other C libraries, they naturally only expect a char *.


I got tired of running into this problem and decided to simply eat the cost of using `char *` in my string library.


And that is why most such efforts eventually die.

WG14 could naturally work into something like SDS for strings and arrays, but of course that is out of their goals to ever do that.


> WG14 could naturally work into something like SDS for strings and arrays, but of course that is out of their goals to ever do that.

Maybe it is, but even if it were, sds strings are a poor choice. I used them extensively in a private project.

1. Typedef'ing `sds` to a pointer type. This leaves no indication to the reader of code that any `sds` typed variable needs an `sdsfree`. IOW, for every other standard type it is clear when the data object needs a `free`, `fclose`, etc. This is a big deal, it's difficult to change the typedef for sds due to the way it returns pointers.

2. Not compatible with current string functions, strike 1: storing binary data in the strings, like the nul character makes it silently lose data when used with current string functions that accept `const char *`. This is a very big deal!

3. Not compatible with current string functions, strike 2: an sds string is only compatible with current string functions that take a `const char *`. This isn't such a big deal (for example, it provides a replacement for `strtok` as the standard `sds` type won't work for `strtok`) but it's unnecessarily incompatible.

4. With the current way it's exposed to a caller, you cannot use `const sds` variables anywhere, which removes a lot of compiler-checking. Trying to use `const` on any sds variable is pointless as you get none of the error-checking.

While sds solves many problems with raw C strings, those problems can be solved by adding standard library functions that work with existing C strings. In addition, it adds a few more problems of its own.


"C strings" really aren't anything worth talking about. People take them way too seriously and then complain that they are "unsafe" or "hard to use". Look, C gives you memory to work with and the rest is up to you. Almost the only thing you want from C with regards to strings is string literals.

It should be obvious that most "string" APIs from libc like strcat, strcpy, but especially strtok are ridiculously bad and are only in the libc because of history. Don't use them.

Even strlen() is rarely a good idea to use, and you can (should?) replace strlen("abc") by sizeof "abc" - 1.


My point regarding WG14 wasn't to add SDS as they are, rather vocabulary types for strings and arrays in the same spirit as SDS.

When they exist as vocabulary types, the ecosystem can rely on their existence and slowly adopt their use, similarly to threads support introduction in C11, for example.


> My point regarding WG14 wasn't to add SDS as they are, rather vocabulary types for strings and arrays in the same spirit as SDS.

Well, yes, I'd love to see some proper string support too, so at least we're in agreement about that :-)

But, overhauling C with additional (memory-safe) array types and string types that are nonetheless still compatible with legacy uses is probably a non-starter anyway. The only way forward would be to add a new type that isn't compatible, which is unpalatable to a lot of people (myself included).

Adding memory-safe functions and/or semantics is easier, but will probably not cover 100% of the memory-safety desired.

> When they exist as vocabulary types, the ecosystem can rely on their existence and slowly adopt their use, similarly to threads support introduction in C11, for example.

Threads, I feel, are a poor example for two reasons: 1) Hardly any code uses the `thread_t` type for a variety of reasons, and 2) There was no need for a `thread_t` type to be backward compatible with anything.


For full memory safety with C the only option are the C Machines, meaning hardware memory tagging.

Already in use for a decade in Solaris SPARC, and eventually mainstream across all variations of ARM CPUs.

Unfortunely Intel botched their MPX implementation and now it is gone.


Apart from plain old fixed buffers, which is what is supported by C just fine and which covers 99% of string processing needs in the areas that C as a language is suited for anyway, ... there are 14 known ways of doing "strings" depending on circumstance, so I don't think it would be a good idea to introduce one mandatory version of them into the C standard. There is already C++ which has std::string, and there are a lot of GC'ed and scripting languages that are more suited for quick and dirty string processing.


The fact that C++ was able to eventually standardize on a single string type (despite the same mess of many dozens of incompatible implementations) shows that it is possible and desirable. It's not like raw buffers will go anywhere if you add a higher-level type. Nor does it have to be perfect - only "good enough" for use across the API surfaces of various libraries.


Just because it's possible to standardize on a string type in C it doesn't mean it's desirable. Also consider that it's not possible to copy C++'s string type because its ergonomics build heavily on RAII.

'const char *' arguments work just fine as parameters in libraries, and I don't see much of a use case (and insteaad more hazards) for a library that "resizes" a string argument destructively (like std::string does). The typical way to go about this is for the library to make a copy of the input string. On API boundaries, for memory that is needed longer than the function call lifetime, it is almost always an excellent idea to simply copy it. For data that doesn't make sense to copy (be it because of size or because only one side really needs it), the data should instead simply be created on the right side of the fence from the beginning.

I don't see myself needing a standardized string type because I'm not passing around string "objects", or concatening them, like it would be done in quick and dirty scripts. I honestly can't recall where that kind of thing would have been a good idea for my work in the last couple of years, and I'm much in favour of not growing the standard out of proportion. As said, if you desire C++ kind of ergonomics and want to solve more scripting-like tasks, there is already C++ and a ton of other languages.

What I can recall is skimming through a lot of C projects over the years that tries to do object-oriented and scripting type programming in C (often it wanted to be C++ or Java or even Python but it had to be C for some external reason), and that code is always, invariably, an unmaintainable mess where it's impossible to have a level of confidence that there are no memory errors and leaks. C is simply not suited for that style of programming.


That's not really a problem if the only thing they need is direct access to a read-only view of the buffer (i.e. const char*) - then it's no different than C++ and std::string.


> Is there a good string-manipulation C library?

You will have to define "good". My string library[1][2] is "good" for me because:

1. It's compatible with all the usual string functions (doesn't define a new type `string_t` or similar, uses existing `char *`).

2. It does what I want: a) Works on multiple strings so repeated operations are easy, and b) Allocates as necessary so that the caller only has to free, and not calculate how much memory is needed beforehand.

The combination of the above means that many common string operations that I want to do in my programs are both easy to do and easy to visually inspect for correctness in the caller.

Others will say that this is not good, because it still uses and exposes `char *`.

[1] https://github.com/lelanthran/libds/blob/master/src/ds_str.h

[2] Currently the only bug I know of is the quadratic runtime in many of the functions. I intend to fix this at some point.


No? Asking for code nav and you get three answers. Asking for this and you get crickets. In the 90s I worked at a place where we embedded TCL into all the apps, and rolled our own templating systems. I had to do a little string stuff in C after few years of go, and it sucked. Ugg. buf[len] = ‘\0’;

Using go, I thought I was getting back to low level stuff but this C experience made me appreciate strings in Go. Web servers in C are crazy bad idea, especially if they are spitting out html. Lisp would be better. Node would be better. Go would be better.


> buf[len] = ‘\0’;

So why didn't you use one of the bazillion library functions or third party libraries that terminate strings for you?

I feel like most of the criticism is coming from people who punish themselves by rejecting library functions and then telling that strings are hard. Doh.


I like to see the terminating nul in there, just in case my math was off earlier. I had strdup and so on. I was just way more work than go. And I was writing a LD_PRELOAD that I didn’t want to drag in any extra dependencies other than libc. Trust me, it is faster and safer and more fun in Go than C.


Alright, yea, if you're working under constraints like that, C sucks more than it has to.

It just doesn't feel fair to criticize C without mentioning that your experience comes from working under such unusual constraints. "Strings are hard, C sucks" is quite different from "strings are hard, C sucks when you can't rely on libraries." Also feels unfair to say you get crickets when you ask for a lib but actually you weren't even willing to use one. (There are tons of string libraries for C, it's impossible to miss them if you look around.)

Even then, if you're stuck working with only libc and your own code, there's a very high chance you're doing something wrong if you're doing math on strings and terminating them manually. There's a fair selection of libc functions that do all the math for you and will always output a properly terminated string if your inputs are a buffer & valid size and strings.


Can't vouch for any in particular, but they do exist. https://github.com/oz123/awesome-c#string-manipulation


Considering that it's a stack that uses OpenBSD, my first thought would be Perl, although it's not a language that one could call “modern”, heh. It's included into the base system and has rich libraries for text processing, (Fast)?CGI, HTML, and all that.


If using C is a must, having static analysis as part of CI/CD pipeline and using libraries like SDS should be considered a requirement.

Otherwise, yes using anything safer, where lack of bounds checking isn't considered a feature is a much better option.


I wrote my first web app in 2000 using C/mysql. It was Insanely fast but very awkward to implement. I used C because it was (and still is) the only language I knew well.

At least if you are going to use C, you (should) know to be extremely paranoid about how you process anything received from the user. That doesn't remove the risk but at least you are focused on it.


This is the same in any language. You can cause security issues in other languages as well if you trust the user/attacker.


Same guy wrote a rad tool that will generate server code, schema, frontend using a markup language called ORT.

Will generate Rust and typescript if ya want.


One of the most rewarding parts of back-end web application development is re-writing the same database routines and same JSON export routines over and over. Then changing the requirements and starting over. It's what makes web application developers such well-balanced folks, right?

Unfortunately, in the flux of user requirements—each addition or modification of a table column changing select routines, insertions, validation, exporting, regression tests, and even (especially?) front-end JavaScript—we make mistakes. What BCHS tools beyond the usual can help in this perennial test of our patience?

;)

https://learnbchs.org/kwebapp.html


well there are still many large software written in C, e.g. nginx, lighttpd, even linux kernel.

I checked BCHS a few years back, the key piece is that it's Openbsd, if it's Linux it might have caught on, due to linux's popularity, good or bad. This could be useful for embedded device for example, but not so many embedded devices running OpenBSD, if any at all.


The C library used (https://github.com/kristapsdz/kcgi) is portable and working on linux as well. Putting this behind nginx as fastcgi seems very well doable.


there are also qdecoder or cgic or fcgi etc are totally linux ready for CGI, which by the way, all worked well. CGI is still perfect for a deeply embedded linux device.


Never heard of Carp, looks cool!


Why would you ever choose C anymore? The killer feature of C++ is “you don’t pay for what you don’t use”. There’s virtually no reason ever not to use C++.


>Why would you ever choose C anymore?

I can't speak to why you'd want to use C in a web stack, but I can weigh in in the more general sense:

A while ago I thought I'd try my hand at the Cryptopals challenges, and I figured, hey all the security guys know C (and python, but ugh) so I'll use this as an opportunity to really learn C. Prior to starting that project, I "knew" C, in the sense that I took CS036 which was taught in that small subset of C++ that might as well be C.

So I jumped in and it felt really liberating, at first. You want to compare chars? It's just a number, use >= and <= ! Cast it back and forth to different types! Implement base64 by just using a character array of the encoding alphabet and just walk through it like an array and do magic casts! No stupid boilerplate like namespaces or static class main{}!

Then by about the 2nd set where you have to start attacking AES ECB I realized I was spending more time debugging my mallocs/frees and my fucking off-by-one errors than I was spending on actually learning crypto. I stuck with it until I think part way through the third set but by that point I couldn't take it any more.

So I bailed out of C and never looked back. I can see how a certain type of programmer (who is more practiced with malloc/fastidious with their array indices and loop bounds than I am) can really enjoy C for a certain type of work. But I can actually say now, hand on heart, that I know C; and I don't like it.


Brilliant summary of the path that a lot of C-programmers have taken.


Isn’t “C++ is too complex for me” a decent reason?


It totally is, as long as you don't use C instead. There are plenty of good, less complex languages than C++ out there: Java is quite close and way, way less complex, for example.

But C is non of them. Pointers are more complex in C than in C++ (pointer provenance). Casts are more complicated in C than in C++, as C++ named casts are less powerful and therefore give you less opportunity to shoot yourself in the foot. Strings (I'd guess quite important part of web programming) are a minefield in C, and they are so much better in C++. Resource management, memory ownership, object lifetimes. All very hairy in C++, all a good reason to not use C++. but also reasons to not use C. Types are a mess in C++. C is even weaker typed, making it worse.

If you like procedural programming and dislike the complexities of C++, i suggest you write C++ without member functions, without inheritance and compiler disabled exceptions. It will feel similar, but it will be better. I suggest using a c++ compiler to write C with enum classes, std::span, either a typed std::span wrapper for malloc or std::vector<std::char8_t>, std::string, std::string_view,using, std::variant, std::unique_ptr, std::fmt, std::optional, templates, std::vector for your types, in that order. And don't use C style var-args and C style enums, C style unions. You can have very C-Like code, that is way less complex than C.


> There are plenty of good, less complex languages than C++ out there […] But C is non of them.

To this day I cannot understand how anybody can claim that. C++ is so ridiculously complex that it’s not a superset of C mostly because of added keywords and rules, some of them implicit.

> I suggest using a c++ compiler to write C with enum classes, std::span, either a typed std::span wrapper for malloc or std::vector<std::char8_t>, std::string, std::string_view,using, std::variant, std::unique_ptr, std::fmt, std::optional, templates, std::vector for your types, in that order.

How on earth do you write that and expect me to believe it’s simpler than C? You’ve just listed a number of options on how to implement an *enum* of all things. Each one will have different pros and cons, and I’m supposed to weight them before coding?

> Pointers are more complex in C than in C++

That’s impossible from you own statement. Pointers in C can be more dangerous because the compiler won’t stop you from doing crazy shit, I’d buy that, but complex?

C++ feels like the Mikrotik version of C, there’s just too many options and buttons and switches, and I won’t use half of this in my life! I guess some people find it interesting but I’m just trying to get my job done.


You can think of simplicity in various ways - some subset of which are valid. For example, you could say that using std::unique_ptr is simpler than manual memory management, because it automates much of the process, making it less likely that a developer will make a mistake. Or you could say that the 1000s of lines of C++ std code that you will pull in represents an increase of complexity, because the mechanics are hidden under layers of abstraction.

I'm open to both views, and most importantly to trade-off one against the other according to other constraints and context. For me, using C++ allows me to do that, because it's (mostly) a superset of C. Acknowledging, of course, that this brings a trade-off of its own, because decisions must continually made (and enforced) about what std/external libraries to use and when.


I’m willing to buy that abstractions reduce complexity, but to do that they can’t be leaky abstractions. They should “just work”, and it should be easy to choose among them.

If you need to consider what’s underneath, how is that a reduction in complexity?

C++ abstractions are the opposite. You need to ponder the several options, normally tagged by their implementation details.


> Mikrotik version of C

?

Reference to the Latvian network kit manufacturer?


Yes, have you used any of their devices? They’re amazing in what they let you do, but their UI is a wall of settings.

Doing something as simple as adding an access point will take you a full day if you aren’t familiar with it.


Never used it, but your description makes me want to try it ...


If you like to tinker with network devices heavy recommendation.

If you want something plug and play, I’d look elsewhere.


>added keywords and rules, some of them implicit.

Yes, and I agree that that's a problem. But to a much lager degree it removes implicity from C. If you don't write member functions and ust inheritance (and I'm serious about this) I don't see any added implicity.

>How on earth do you write that and expect me to believe it’s simpler than C? Y

Because It is. Let me walk you thu:

- enum classes: work just like enums, minus the implicit conversions.

- std::span: incredibly simple: A struct with a pointer and a size. All wrapped up in a struct to signify indent: You don't own this. This is very simple and removes implicit assumptions from c apis. Putting data that belongs together in one struct is not alien to C. AFAIK it's considered good design.

- either a typed std::span wrapper for malloc: This is just a malloc that replaces void * for a proper type. Again, adding simplicity and removing implitity. And a facility to bounds check in way that is harder to mix up. Again, placing data that belongs together in a struct. However, if you do this, you give up the conventions that spans are not owing. That's because you might consider a std::vector<T> (same but also can to realloc), or just write a owing_span template that does the same. The name is just very convenient documentation. By the way, the suggest classes so far can reasonably written by yourself in a few lines of code.

-std::string : This one is the first to be non trivial to implement by your self, but dont be picky, just use the one from the standard. It very easy to use. And C Strings just suck.

-std::string_view: This is a struct{const char*; size_t;}, plus a few member functions, without any magic. While I suggest you don't write you own member functions, using this is just fine. And significantly harder to make mistakes with than with C.

-std::variant: This is just a union. But with most footguns removed. If you don't use unions in C, dont juse this.

-std::unique_ptr: This is this first bit of code that is somewhat magic, as it cleans up after itself. Just always correctly placed free()s. It also signifies ownership. I see how this is somewhat controversial, but for me this a massive simplification. But see, this list was ordered, and this was towards the end.

- std::fmt: Enjoy how C-Style var args don't have any type safety? Love the good old printf-exploits? Then this is not for you. Otherwise it's pythons easy and loved string interpolation.

-std::optional, templates, std::vector: This is definitifly getting more c++ is, but not complicated. Std::optional is a struct with a bool and one additional member, templates are way to do generic programming in a way easier way than preprocessor macros. But yes, may to much C++ fore some peoples liking. That's why these are last.

>That’s impossible from you own statement.

C does some pretty IMHO insane stuff called pointer provenance. AFAIK C++ doesn't. Also named casts. Harder to remove const by accident.

>You’ve just listed a number of options on how to implement an enum of all things.

Uhm.. What?

>but I’m just trying to get my job done.

If there is one pragmatic language out there, It's c++ (or maybe perl). In fact I'd say that why is so complex compared to more academic languages like scheme, haskell, prolog and pascal.


Yeah, I'm just going to keep writing C.

Thanks.


Object Pascal/Free Pascal then is the obvious choice, it is The better C.


I do use Lazarus/FreePascal for multiplatform GUI apps. It is also very easy to write servers in. Unfortunately performance is worse vs C/C++ and frankly "batteries" included in C++ are way more powerful and simpler to use so I choose C++ for backends. As for all that FUD around C++ - I am not C++ expert at all but I find modern C++ to be very productive, fast for backend development and pretty safe unless one's goal is to purposely shoot themselves in the foot.


What batteries are included in C++? There's the STL, but most of the rest is a mess to the point it's actively avoided in favor of alternatives (iostreams, locale).

Now if you do something like C++ with Qt, that's a very different proposition - but that applies both to batteries being included, and to the overall coding style.


The last time I looked at c++ (mid-1990's) it generated hugely bloated binaries and was generally slower at runtime than plain C. Is that still true today?

Is there any language (other than assembly) that is faster at runtime than C today?


> The killer feature of C++ is “you don’t pay for what you don’t use”

In what way C makes you pay for what you don't use?


SQLite author is an avid Tcl user and he recently introduced a small, secure and modern CGI based web application called wapp [1],[2].

[1] Wapp - A Web-Application Framework for TCL:

https://wapp.tcl.tk/home

[2] EuroTcl2019: Wapp - A framework for web applications in Tcl (Richard Hipp):

https://www.youtube.com/watch?v=nmgOlizq-Ms


this is very cool. I only have a passing familiarity with Tcl, but I've been building my own toy web framework and this is a fantastic reference! they made a lot of the same choices I made API-wise but the way they went about it is worth studying.


I'd like to point out that Wapp doesn't necessarily need to be run as a plain-old CGI application, I've had success running it with it's own built in web-server behind NGINX, for example.


It seems pretty crazy to write web-facing apps in C, with no memory safety at all.

(They do have "pledge" but even in the most restricted case, this still leaves full access to database)


It seems like the database libraries they recommend for security, ksql and sqlbox, mitigate the risk with process separation and RBAC, so the CGI process doesn't have full access to the database.

It's definitely contrary to modern assumptions about web app security, but it's interesting to see web apps that are secure because they use OS security features as they were designed to be used, rather than web apps that do things that are insecure from an OS-perspective, like handling requests from multiple users in the same process, but are secure because they do it with safe programming languages.


ksql exports "ksql_exec", while sqlbox exports "sqlbox_exec" -- both of those allow execution of arbitrary SQL.

So no, the web apps cannot be made secure via OS support alone, because the OS security features are not adequate for high-level problems. Any sort of code exploit allows attacker to trivially access the entire database -- either to read anything, or to overwrite anything.

"pledge" and "unveil" can prevent new processes from being spawned, but they cannot prevent authentication bypass, database dumpling or database deletion.


How is the overhead of creating a process per-request in this type of system?


Process-per-request is just infeasible with any significant amount of load.


Though the majority of running web servers, load balancers, protocol proxies like php-fpm, etc, are probably written in C :)


Yes, but they are...better built than your quick social network poll application thingy with customer's special sauce that you had 5 days to specify, develop and deploy.

C is a tremendous tool, but I don't think it's the best for customer facing web apps.


Not to mention databases.


Funny to reflect that there was a time not so long ago when writing web apps (CGI usually) in C wasn't at all unusual (shortly before Perl became much more popular for this). And today, it is indeed kind of crazy.


Depends on your definition of "not so long ago" - it's certainly most of the history of the web. The point when Perl, PHP, and Java started to become the dominant web app technologies is about as far from the present day as that point was from the moon landing.


The moon landing was 1968, 26 years before PHP was created in 1994. And that was just 28 years ago.

Oops, math checks out. I’m old.


I remember writing CGI scripts in Perl in 1993 ( the year before Netscape ). I am not sure when CGI even became a thing but it could not have been long before that.

Not only was “not so long ago” kind of at the very beginning of meaningful web history but it was also for a very brief moment in time ( if we are talking pre-Perl ). Pre-Perl CGI may have never been a thing though as Perl is older than CGI.

I recall PHP being the next wave after Perl. One could argue it never lost its place even if it now has many neighbours.

Not a Perl advocate by the way though it did generate some pretty magical “dynamic” web pages from text files and directory listings back in the day. Similar story with PHP.


It is true the time when it might have been sane to write CGI in C was very brief. Perl took over almost immediately (and to my chagrin, eventually PHP ate Perl's lunch). I remember reading CGI books that would explain how to do it in either Perl or C, the justification being "in case you need C for performance" but in reality I don't think a ton of C CGI was written. There was definitely some though; I recall poking around in cgi-bin directories and finding some compiled executables (could have been another compiled language like C++) and being disappointed I couldn't view the source like with .pl files.

It really takes you back to a very specific point in time though. A magical time when every year or month software and internet technology would take big leaps and bounds. When you might do things in a way that is very manual and slow compared to today and yet it was amazing at the time.


I remember learning CGI to write a web app in late 90s. Most resources at the time seemed to focus on it as a Perl thing, with all code examples etc being in Perl, other languages mentioned briefly if at all (usually at the beginning, when explaining the "it's just a process" model).


You mean 1997?

By 1999 I was already using our own version of mod_tcl and unfortunely fixing exploits every now and then in our native libs called by Tcl.


How about a web-facing she'll that allows arbitrary code execution ? [0]

There's nothing fundamentally insecure about allowing C or any arbitrary code to execute on behalf of a user -- this is basically what cloud computing (especially "serverless") is.

As you identify, though, you need a Controlled Interface (CI) which accounts for this model for all resources and all kinds of resources and many tools do not (yet) allow for it.

[0] https://rkeene.dev/js-repl/?arg=bash


The big difference is that with bash (python, perl, php etc..) exploits, all you need is to upgrade a package, and you are secure. No need to touch any of the application code.

Compare it with C, where the bugs are likely unique per app, and require non-trivial effort to detect and fix.

Execution of user-specific code by serverless services requires non-trivial isolation, and is predicated on "each user has its own separated area" to work. This is not the case with most websites. Take HN for example -- there is a shared state (list of posts) and app-specific logic of who can edit the posts (original owner or moderator). No OS-based service can enforce this for you.


Writing C might be challenging for some, but as others have mentioned, one can use some other language which gives a statically linked binary to place in the httpd chroot. It won’t be BCHS then.

For uptime.is I’ve used a stack which I’ve started calling BLAH because of LISP instead of C.


People love to talk all sorts of trash on this kind of stack but it's really quite solid for what it does. If anyone was ever curious what a sizeable codebase in this kind of code would even look like, check out the source code for undeadly.org [1]. Yeah these people may be crazy but they're also OpenBSD developers and we really love to see what we can get away with using nothing other than what's available in the base distribution. I think a lot of what you see being written for production ends up being very similar to this kind of approach, maybe just utilizing rust or golang as the web application backend language if that's what is the more comfortable thing. Nothing but the base system and a single binary, not relying on an entire interpreter stack, sure can be smooth.

There's other examples of this kind of approach, too, writing straight C Common Gateway Interface web applications in public-facing production use - What comes to mind is the version control system web frontend that the people who write wireguard use, cgit [2] - If it's really so crazy then how come the openbsd and wireguard people - presumably better hackers than you - are just out there doing it?

Other places you see C web application interfaces include in embedded devices (SCADA, etc) and even the web interfaces for routers, which unfortunately ARE crazy because check out all the security problems! Good thing people at our favorite good old research operating system have done the whole pledge(2)[3] syscall to try and mitigate things when those applications go awry - understanding this part of the whole stack is probably key to seeing how any of it makes any sense at all in 2022. It sure would be nicer if those programs just crashed instead of opening up wider holes. Maybe we can hope these mitigations and a higher code quality for limited-resource device constraints all become more widespread.

[1] http://undeadly.org/src/ [2] https://git.zx2c4.com/cgit/ [3] https://learnbchs.org/pledge.html


> If it's really so crazy then how come the openbsd and wireguard people - presumably better hackers than you - are just out there doing it?

Probably precisely because they're better? I can see why people who are struggling with malloc and off-by-ones (https://news.ycombinator.com/item?id=29990985) would think it's crazy.


we really love to see what we can get away with using nothing other than what's available in the base distribution

pkg_add sqlite3

Can't get away.


#include <db.h>

Berkeley DB with a header date of 1994 :) In base, and of course it still works.

Sqlite was removed from base, again, in 6.1 (2019) --https://www.openbsd.org/faq/upgrade61.html

with this BSDCAN '18 pdf briefly explaining the issues (unmaintainable) -- https://www.openbsd.org/papers/bsdcan18-mandoc.pdf


I believe Sqlite was in base when BCHS was first presented. That and you can just grap the big single c file version of sqlite, no need for a package.


OK to be honest let me amend that, because you make a valid if not snarky point!

We like seeing what we can get away with using what's available in the base distribution and a few well-chosen, well-audited packages


The Dunning-Kruger effect is stronger in people who spend a lot of time alone, e.g. programmers, which we will now see unfold below.


I propose an amendment to Godwin's Law to include "Dunning-Kruger" , "Dunning-Kruger-effect" and "Dunning-Kruger effect".


omg it's like I'm Nostradamus!


Another great stack for writing C (or now python) is https://kore.io which offers quite a few helper features, and its easy to get started


> How do I pronounce BCHS?

I think the correct pronunciation is “Breaches”. Using C in this place as other have mentioned is very, very likely to lead to security issues. Even C++, with its better string handling would be a step up.


I remember writing a lot of early web stuff in Perl/CGI. The "servers" I wrote were fast. Perl had most things you could desire built in already.

Database stuff took a good deal of doing, but with little in terms of abstraction, it was also quite fast.

I would like to see a rennescance of using different protocols than HTTP and different content markup than HTML.


Interesting CGI content linked on there.

I've been reading about / hacking on CGI recently, and it's been kinda fun!

Question: One thing I keep reading is how inefficient it is to start a new process for each incoming connection. Could someone explain to me why that's such a bottleneck? I imagine it being an issue back when CGI was used everywhere, people moving away from CGI, and forgetting about it. But hasn't there been improvements in the meantime? Computers from today can run circles around those from a few decades back. Has everything improved except the speed / efficiency of starting a new process?

(I don't have a computer science background, but I guess you could already tell from the above.)


> Interesting CGI content linked on there.

>

>I've been reading about / hacking on CGI recently, and it's been kinda fun!

>

>Question: One thing I keep reading is how inefficient it is to start a new process for each incoming connection. Could someone explain to me why that's such a bottleneck? I imagine it being an issue back when CGI was used everywhere, people moving away from CGI, and forgetting about it. But hasn't there been improvements in the meantime? Computers from today can run circles around those from a few decades back. Has everything improved except the speed / efficiency of starting a new process?

>

It's not as bad as you think it is; just change the webserver to pre-fork. From this link[1], and the nice summary table in this link[2] - I note the following:

1. pre-forked servers perform very consistently (the variation before being overwhelmed) and appears at a glance to only be less consistent than epoll.

2. For up to 2000 concurrent requests, the pre-forked server performed either within a negligible margin against the best performer, or was the best performer itself.

3. The threaded solution had the best graceful degradation; if a script was monitoring the ratio of successfull responses, it would know well beforehand that an imminent failure was coming.

4. The epoll solution is objectively the best, providing both graceful degradation as well as managing to keep up with 15k concurrent requests without complete failure.

With all of the above said, it seems that using CGI with a pre-forked server is the second best option you can choose.

I suppose that you then only have to factor in the execution of the CGI program (don't use Java, C#, Perl, Python, Ruby, etc - very slow startup times).

[1] https://unixism.net/2019/04/linux-applications-performance-i...

[2] https://unixism.net/2019/04/linux-applications-performance-p... 1.


Careful, pre-fork as described in the given link as worker processes each handling many requests. This result therefore does not answer the question about the cost of one process per request. The one that does seems to be fork, which is way less efficient (~460 seems like a low number of processes spawned per second though, can we really not do more?).


I'll read those articles you shared, thanks!

Currently the CGI stuff I'm working on is to run stuff on a cheap shared host, so I'll have to check which category of servers that Apache falls in.

Once an application I'm running on a shared host becomes successful enough, I'm probably going to want to move to a different environment, but I'm still interested in what that would mean for performance :)


> Once an application I'm running on a shared host becomes successful enough, I'm probably going to want to move to a different environment, but I'm still interested in what that would mean for performance :)

Depending on what you are doing and what language you are using, a $5/m DO droplet might be sufficient. I once ran a single multi-threaded server, serving a simple binary protocol, and over a 2 day period it handled sustained loads of up to 30k concurrent connections.

To get it that high, I had to up the file descriptor limit on that host.


It's not just the start-up and shut-down costs. A CGI process might need to attain connections to databases or other resources that could be pooled and re-used if the process didn't completely terminate.

You might want to look at using FastCGI:

https://en.wikipedia.org/wiki/FastCGI

Basically, the CGI processes stay alive and the servers supporting FastCGI ( like Apache and nginx ) communicate with an existing FastCGI process that's waiting for more work, if available.


Thanks! That's a good point, about re-using connections.

For my current use-case* that wouldn't be an issue, so CGI could probably be OK there, then!

* A side project that uses SQLite (1 file per user), and no other external resources.


I’m smiling at your question!

Yes, it’s less efficient than having a persistent server, but as all things are, it exists in a spectrum.

The load time for one of these processes is going to be almost trivial. I’m on mobile right now, but I would guess that it would be in a handful of milliseconds, especially when the binary is already in cache (due to other requests).

But if you want to compare this against a lot of the prevailing systems, it’ll still probably win on single request efficiency. Network hops, for example, are frequently quite slow and, if efficiency is your primary metric, should be avoided as much as possible. Things like Serverless go the opposite way and tore both your incoming request through a complex set of hops, and also your backend database requests.


Thanks for your response!

I guess I should do some benchmarks comparing different technologies.

> Things like Serverless go the opposite way and tore both your incoming request through a complex set of hops, and also your backend database requests.

I didn't know about that, thanks. If you know some good resources on the topic, feel free to put them in a reply to this message!


https://www.johndcook.com/blog/2011/01/12/how-long-computer-... is a decent place to start for thinking about how different timings work for things. It's a bit on the stale side, some things have gotten much faster (e.g. disk "seeks" are dramatically different with NVMe), but a lot of it has stayed similar, and some will never change (packet timing to Europe has a speed-of-light limit for now)


Time a python program that imports a few things and then immediately exits. It's significantly more CPU time than you might think. If you use a language with fast startup times, preforking CGI servers can be quite fast.


Lots of opinions but little facts in the comment. I'd love to see an experiment with people using that and their preferred web stack. Is this really slower to develop? By how much? Is this really unsecure? Is this really simpler, faster?


I’d wager a good portion of my salary that a skilled BCHS developer is slower than a skilled Django/RoR developer to build a usual web app (with auth, payment gateways, admin panels, etc). Not to say BCHS doesn’t look like a laugh to use.


I would too, but I'd like some hard data on this. For example, how much slower? 2 times? 10 times? 100 times? Is this an initial cost or a cost paid on all features? Is maintenance easier? Harder?


I’d like to love man pages but

- I feel that they are linux only. On my MacOS system I can’t rely on man x being the man page for the right version of x. I know that in principle there are environment variables that make sure i’m getting the gnu core utils version or the base homebrew version rather than the system BSD version, but it’s too many moving parts. Furthermore even if I get it right, I can’t expect people I’m working with or mentoring to get it right, hence I can’t recommend man to them for documentation. God knows about man pages on Windows.

- I feel that a small amount of plain text documentation should be stored in the executable, not separately. Isn’t it a holdover from the vastly more constrained computing environments of the 70s and 80s that we’re keeping man pages separate from the executable? Its just asking to get out of sync / incorrectly paired up.


OpenBSD pages (https://man.openbsd.org/) absolutely rock and other, proper BSDs do quite well.

Also, man pages are for more than just system utilities (man(1)). Which binary should hold pledge(2) (https://man.openbsd.org/pledge), exactly?

Your man pages should be updated when the associated tool is updated.

You are describing a MacOS issue, with its terrible package management, and frustrating toolchains.


You seem to be missing my point which is that, as a maintainer of a command line tool, I need to and want to cater to users of all OSs. And in fact, I will allocate my efforts more towards popular OSs. I genuinely am sure that your BSDs are a nice environment, but surely you understand how fringe they are? The majority of my users are MacOS+Windows, with substantial Linux also.

In fact MacOS has an excellent package manager -- it's called homebrew. I don't really want to argue about it but you're the one who made an unjustified assertion about an OS which I bet you don't use. People like you insist that it's bad but no-one who uses it knows why. I maintained my own Linux laptop for 10 years, and for the last 10 years I've used homebrew on a Mac. It has literally never given me any problems! I've never even searched the issues on Github for a problem as far as I can remember.

Honestly I think that the thought processes of most Linux/Unix enthusiasts like you who criticize homebrew are

1. We hate MacOS because childish anti-capitalist ideologies

2. Therefore we will not admit that a nice command-line development environment can be created on MacOS

3. Therefore homebrew is bad


> - I feel that they are linux only.

They're actually better on Free/Open BSD in my experience. As stupid as it makes me sound, I often struggle to parse Linux man pages, but the BSD's I've had no trouble with for a variety of topics.

> - I feel that a small amount of plain text documentation should be stored in the executable,

Isn't this how --help usually works? I would also rather have more documentation embedded, at least for some executables.


FreeBSD has pretty good documentation.

Comparatively, I’ve found NetBSD documentation to be lacking, although NetBSD seems to take the cake on code quality and legacy architectures (a feature I find my delve right now.

On the wider discussion of doc, I’ve found Linux kernel documentation to be a pain in the ass, and sometimes ever worse than windows kernel documentation (which I won’t even both to get into)


> On my MacOS system I can’t rely on man x being the man page for the right version of x.

But isn't that an issue with macOS, not an issue with man pages?


Yes, it is. But MacOS (and Windows) are popular OSs for laptop users. I'm the maintainer of a command line tool that is reasonably popular and I believe that the majority of my users are MacOS. So the question is quite concrete for me -- should I provide documentation in the form of a man page? I do not currently, for the reasons I gave above (although I made a mistake in saying Linux when I meant Linux and *BSD for which I deserve what I get!) But I'd appreciate it being pointed out if my thinking is wrong here.


> should I provide documentation in the form of a man page?

Yes, and this may be a much smaller effort than you suspect. Only by writing the output of --help in a certain order, you can use the "help2man" tool to generate a beautiful manpage automatically. Notice that your users do not need to have help2man installed, you run it yourself as part of your build process, to create the manpage from the latest source code.

It is very likely that if your tool already has a --help option, you don't really need to do anything to have a manpage. Just call help2man from your makefile.


It would be cool if help2man worked perfectly. In fact the output needs a few fixups -- it looks like it has problems with non-ASCII characters?

But if those fixups can be made automatically it's not bad and I agree might be worth adding to the build.


Yes, the output of help2man is not perfect. My main gripe is that it imposes a rather strict format for the manpage, and not all manpage markup is available. But it is better than not having a manpage at all.

The problem with non-ascii characters can be solved, though. Just add an option like "-L en_US.UTF-8" to the help2man call.


Thanks. However, now help2man is outputting the unicode character correctly, but `man $help2manOutput` is displaying things like `a(R)` and `a` instead of the unicode character. I've also tried with the `LOCALE=en_US.UTF-8` env var set.


This sounds like a problem with your pager, that is used to scroll the output of man pages in the terminal. Do accents appear correctly on other manpages in your system? If I recall correctly, on macos catalina I had to set my locale variable LC_CTYPE (and not LOCALE) so that the less pager recognized the accents.


I don't think it's a pager problem for the following reasons:

- If I do `man help2manOutput | cat` then it does not page, but I still see `a` and `a(R)` in place of the desired unicode characters

- `less help2manOutput` renders them correctly

- In general I have no problem viewing unicode characters in less: I'm familiar with env vars such as LESS and PAGER.


This is not the most appropriate place for this sort of fine-grained debugging. I'd suggest to try your generated manpage on another computer to see if it displays correctly. The only thing that I can say is that "it works for me" but of course this answer is not useful :)

The main site of development for help2man seems to be this:

https://salsa.debian.org/bod/help2man

I'm sure they'll accept pull requests that fix problems. Happy hacking!


s/C/NIM

Why don’t more folks use NIM for web development. Seems like the perfect blend of performance, ergonomics and productivity.


How is NIM doing? When I first read your comment I thought you were talking about Zig ( since that is the language that seems to pop up a lot these days ). It took me a second to catch myself. It feels like I have not heard about NIM in ages.

I am sure the D and V guys are asking themselves the same question.


Last I look and played around with Nim, what I felt was missing is a good way of doing templating. Beyond that I honestly enjoyed working in Nim more that both Go and Rust, which where the other two language I attempted to learn last year.


Because there's already Java/C#/Go/Rust/C++ in that space.


Nim/Zig/Crystal - these three languages look the same to me for some reason


I have written web applications in a lot of languages, including C. C was the worst.


What a coincidence! Lovely topic, even registered account for this :-)

I _just_finished_ my own comparative benchmarks to (re)check my projects from ~7 years ago, all in similar stack.

Back then I wrote the logic as Apache modules, in C. It was using Cairo to draw charts (surprisingly, the traces of trigonometry knowledge was enough for me to code that :-), and I had absolutely crazy "hybrids" of bubble charts with bars, alpha channel overlays etc. It was extremely useful for my projects back then and I never seen any library, able to produce what I "tailored" ...)

The 7-years-ago end-to-end page generation time was ~300 mcs (1e-6 sec), with graphics, data store IO and request processing, preparing the "bucket brigade" and passing it down the Apache chain.

This Jan I re-visited my code and implemented logic for OpenBSD httpd as:

** 1) Open BSD httpd "patch" to hijack the request processing internally, do necessary data and graph ops and push the result into Bufferevent buffer directly, before httpd serves it up to the client.

** 2) FCGI responder app, talking to httpd over unix socket. BTW: this is most secure version I know of, I could chroot / pledge / unveil and, IMO, it beats SELinux and anything else.

3) CGI script in ksh<=>slowcgi<=>FCGI=>httpd

4) CGI program (statically linked) in pure C<=>slowcgi<=>FCGI=>httpd

5) PHP :-) page (no frameworks)<=>php-fpm (with OpCache)<=>FCGI=>httpd

To my extreme surprise, the outcome was clear - it did not matter what I wrote my logic in, _anything today_ (including CGI shell script) is so fast, that 90% of time was spent on Network communication between the WebServer and the Browser. (And with TLS it is like 2x penalty ...)

All options above gave me end-to-end page generation time about 1-1.5 ms.

Guess what? Beyond "Hello World", with page size of 500Kb+, PHP was faster than anything else, including native "httpd patch" in C.

As side effect, I also confirmed that Libevent-based absolutely gorgeous OpenBSD httpd works slightly slower than standard pre-fork Apache httpd from pkg_add. (It gave me sub-ms times, just like 7 years ago)

Who would say ...

What also happened is that any framework (PHP or I even tried nodejs) or writing CGI in Python increased my end-to-end page generation time 10x, to double-digit ms.

I remember last week someone here was talking about writing business applications / servers for clients in C++, delivering them as single executable file.

I would be very interested to hear how that person's observations correlate with mine above.

G'day everyone!


Is anyone using this for anything? I'd love to know!


for this old environment, why not perl but C?


Parsing untrusted input in C never hurt anyone, did it?


If you're going to promote a stack, try at least to showcase all its components in the first example you give. Where is the SQLite part in your "BSD, C, httpd, SQLite" ? https://learnbchs.org/easy.html

Hello world apps don't mean much.


This is really hilarious... I just followed the third given example (https://kristaps.bsd.lv/absdcon2017/database-conclusions.htm...) and that's how it goes: "the simplicity of SQLite is a lie". Ok, thanks, I'll pass on the "BCHS" stack then. I now consider this website satire.


This feels like an unreasonable eschewing of all the advancements in programmer ergonomics & tooling that have been made over the course of decades.

"Just because you can, doesn't mean you should."


Imagine if you were a C developer who needed to create some web do-dads, this is probably a fantastic stack. If there was 1 right solution for the perfect stack we'd all be using it.


I get that languages are just tools, but each tool has problems it is better at solving. My original comment was pointing out that this stack is ill-suited for the task for which it has been built.


> "Just because you can, doesn't mean you should."

Ironically I could say the same about the JS ecosystem.


Not comparable. Easy dig though, I guess.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: