Its worth looking at the scale of the swear count. Linux has about 16629976 lines of code, and I'd estimate from the graph that it has about 370* swear words (excluding penguin). If you look at the second graph, that is less then 1 swear in 300000 lines.
I checked this on the source tree for 3.8.0. The numbers appear to be inflated by allowing the swear words to be part of other words.
For example, "shit" appears in 121 lines, but " shit " only appears in 10 lines. Looking at the offending lines, there is only one swearword that is missed by excluding spaces.
"fuck" appears 29 times, all of which are some conjugation of the verb (and some lines have duplicates I'm not counting).
"crap" appears 161 times, 20 of which are part of "scrap"
"bastard" appears 17 times, 6 of which go to email addressed hosted at "lazybastard.org" and "you-bastards.com"
"penguin" appears 99 times, two of which are jokes.
If you want to check the various words in isolation, surrounding spaces might cost you some matches, e.g. at the end of a sentence ("It's a piece of shit.") or when followed by a comma. Also, did you ignore case ("Shit happens.")?
How about trying \b[Ss][Hh][Ii][Tt]\b and the likes?
There were few enough curse words that I manually checked the output of not requiring spaces. Regarding the case sensitivity, it looks like I missed 12 instances of swearing because of that. Also, grep has a "-i" parameter, which makes it case insensitive.
I'm curious about the motivations for coding profanity. I've occasionally included comments like "don't f* with this unless you understand x, y, and z" in an attempt to protect fragile sections of code from careless collaborators. Nearly identical comments without profanity seemed ineffective. Within common conversations I know that profanity often carries an implication of violence, usually for the purpose of intimidation. I also find that profanity is frequently used for comedic relief.
Hopefully these counts don't indicate increasing fragility or violent disagreements within the kernel. Does anyone with kernel experience have any insight into common purposes for kernel profanity?
Looking through this, the two biggest categories seem to be complaints about buggy/weird hardware (that driver writers have to work around) and complaints about compiler quirks. I can imagine other projects having similarly-motivated cursing at annoying library weirdnesses.
The motive seems to be to acknowledge to the reader that, yes, this code is ugly, and it's not the writer's fault; it's the product of some bugginess external to the codebase that really isn't possible to fix. Blame "fucking gcc", or Sun for having the nerve to "take such nice parts and fuck up the programming interface" (to quote two examples from the Linux code). The target of the anger is most definitely not the intended audience.
From a quick grep of the source, it looks like a contributing factor is Matsushita Electric Industrial. If you are interested, I posted the output of grep to pastebin [0].
I doubt they retroactivly change code/comments because the original organization changed their name. The Matsushita references will probably stay there until they bitrot and get removed/rewritten.
If you look at the grep output that gizmo686 posted to pastebin (see below), you can search for penguin. There are a lot of web addresses, email addresses, maintainer roles (chief penguin), and logo references. There are a few instances of variable names containing penguin as well.
A casual search of code in Debian [1] shows that this is not limited to the Linux kernel. Thankfully there's no equivalent to the Parents Television Council [2] or Focus on the Family [3] for open source projects.
It seems as if a lot of lines of code were removed or not measured at that particular measurement; despite the drop in the usage of 'shit', its occurrence per line in fact jumps up at that moment.
There is an area around 2.4.36.5 where the fall and rise of "fuck" and "shit" closely resemble each other. I'm guessing it's a macro which concatenates the two expletives for use in the code.
That's what I'm thinking, they used the ## preprocessor operator to stick "fuck" and "shit". Oh, if this is linux, then the identifier is more likely to be "fuck_shit".
That's a little harsh. Yes he has a tendency to be - erm, 'vehement' about the quality of code, but there is no one else anyone would rather have at the helm of the kernel project.
I checked this on the source tree for 3.8.0. The numbers appear to be inflated by allowing the swear words to be part of other words.
For example, "shit" appears in 121 lines, but " shit " only appears in 10 lines. Looking at the offending lines, there is only one swearword that is missed by excluding spaces.
"fuck" appears 29 times, all of which are some conjugation of the verb (and some lines have duplicates I'm not counting).
"crap" appears 161 times, 20 of which are part of "scrap"
"bastard" appears 17 times, 6 of which go to email addressed hosted at "lazybastard.org" and "you-bastards.com"
"penguin" appears 99 times, two of which are jokes.