Linux kernel swear counts

gizmo686 · on Sept 1, 2013

Its worth looking at the scale of the swear count. Linux has about 16629976 lines of code, and I'd estimate from the graph that it has about 370* swear words (excluding penguin). If you look at the second graph, that is less then 1 swear in 300000 lines.

I checked this on the source tree for 3.8.0. The numbers appear to be inflated by allowing the swear words to be part of other words.

For example, "shit" appears in 121 lines, but " shit " only appears in 10 lines. Looking at the offending lines, there is only one swearword that is missed by excluding spaces.

"fuck" appears 29 times, all of which are some conjugation of the verb (and some lines have duplicates I'm not counting).

"crap" appears 161 times, 20 of which are part of "scrap"

"bastard" appears 17 times, 6 of which go to email addressed hosted at "lazybastard.org" and "you-bastards.com"

"penguin" appears 99 times, two of which are jokes.

kleiba · on Sept 1, 2013

If you want to check the various words in isolation, surrounding spaces might cost you some matches, e.g. at the end of a sentence ("It's a piece of shit.") or when followed by a comma. Also, did you ignore case ("Shit happens.")?

How about trying \b[Ss][Hh][Ii][Tt]\b and the likes?

gizmo686 · on Sept 1, 2013

There were few enough curse words that I manually checked the output of not requiring spaces. Regarding the case sensitivity, it looks like I missed 12 instances of swearing because of that. Also, grep has a "-i" parameter, which makes it case insensitive.

archangel_one · on Sept 1, 2013

Also a -w parameter, to match whole words only, which is generally better than adding spaces :)

DarMontou · on Sept 1, 2013

I'm curious about the motivations for coding profanity. I've occasionally included comments like "don't f* with this unless you understand x, y, and z" in an attempt to protect fragile sections of code from careless collaborators. Nearly identical comments without profanity seemed ineffective. Within common conversations I know that profanity often carries an implication of violence, usually for the purpose of intimidation. I also find that profanity is frequently used for comedic relief.

Hopefully these counts don't indicate increasing fragility or violent disagreements within the kernel. Does anyone with kernel experience have any insight into common purposes for kernel profanity?

azernik · on Sept 1, 2013

Looking through this, the two biggest categories seem to be complaints about buggy/weird hardware (that driver writers have to work around) and complaints about compiler quirks. I can imagine other projects having similarly-motivated cursing at annoying library weirdnesses.

The motive seems to be to acknowledge to the reader that, yes, this code is ugly, and it's not the writer's fault; it's the product of some bugginess external to the codebase that really isn't possible to fix. Blame "fucking gcc", or Sun for having the nerve to "take such nice parts and fuck up the programming interface" (to quote two examples from the Linux code). The target of the anger is most definitely not the intended audience.

gizmo686 · on Sept 1, 2013

From a quick grep of the source, it looks like a contributing factor is Matsushita Electric Industrial. If you are interested, I posted the output of grep to pastebin [0].

[0]http://pastebin.com/MNZF1Vz0

EDIT: This is against linux-3.8.0 from Mint's repository.

anabis · on Sept 1, 2013

The group's name has change from Matsushita to Panasonic, so this would bring the count down in the future.

gizmo686 · on Sept 1, 2013

I doubt they retroactivly change code/comments because the original organization changed their name. The Matsushita references will probably stay there until they bitrot and get removed/rewritten.

shaggyfrog · on Sept 1, 2013

Can anyone explain the inclusion of "penguin"? I know Tux is the mascot and all, but is it some kind of inside-joke-swear-thing?

DarMontou · on Sept 1, 2013

If you look at the grep output that gizmo686 posted to pastebin (see below), you can search for penguin. There are a lot of web addresses, email addresses, maintainer roles (chief penguin), and logo references. There are a few instances of variable names containing penguin as well.

http://pastebin.com/MNZF1Vz0

foobarbazqux · on Sept 1, 2013

I like jwz's classic post about swear words in Mozilla:

http://www.jwz.org/doc/censorzilla.html

valleyer · on Sept 1, 2013

I was laughing until I saw the swears in variable names. Now I'm mad.

ColinWright · on Sept 1, 2013

When this was sumitted some years ago there was some discussion. It might be worth comparing that with the comments here:

https://news.ycombinator.com/item?id=850761

It has been submitted a few more times, none with comments:

https://news.ycombinator.com/item?id=2070056

https://news.ycombinator.com/item?id=4045103

https://news.ycombinator.com/item?id=6307849

The last of these was just 14 hours ago - the trailing slash defeating the HN dup detector.

m_ram · on Sept 1, 2013

A casual search of code in Debian [1] shows that this is not limited to the Linux kernel. Thankfully there's no equivalent to the Parents Television Council [2] or Focus on the Family [3] for open source projects.

[1] http://codesearch.debian.net/search?q=fuck

[2] https://en.wikipedia.org/wiki/Parent%27s_Television_Council

[3] https://en.wikipedia.org/wiki/Focus_on_the_Family

aylons · on Sept 1, 2013

I wonder what caused the peak of shit just before 3.0.9, and what reversed it.

The spike just before 3.2.17 must be a glitch, but if it isn't, it's very intriguing.

ucarion · on Sept 1, 2013

It seems as if a lot of lines of code were removed or not measured at that particular measurement; despite the drop in the usage of 'shit', its occurrence per line in fact jumps up at that moment.

forkrulassail · on Sept 1, 2013

I love how penguin is halfway between bastard and crap, I'll update my swear word dictionary.

NAFV_P · on Sept 1, 2013

There is an area around 2.4.36.5 where the fall and rise of "fuck" and "shit" closely resemble each other. I'm guessing it's a macro which concatenates the two expletives for use in the code.

ajkjk · on Sept 1, 2013

or a variable named 'fuckshit'

NAFV_P · on Sept 1, 2013

That's what I'm thinking, they used the ## preprocessor operator to stick "fuck" and "shit". Oh, if this is linux, then the identifier is more likely to be "fuck_shit".

chatman · on Sept 1, 2013

With people like Linus Torvalds at the helm of the project, what else can be expected?

primelens · on Sept 1, 2013

That's a little harsh. Yes he has a tendency to be - erm, 'vehement' about the quality of code, but there is no one else anyone would rather have at the helm of the kernel project.

gizmo686 · on Sept 1, 2013

An increase in the rate of swear words.

frozenport · on Sept 1, 2013

Not sure what you mean. Are there a lot or a little?