Show HN: Bullshit-detector – quickly detect amount of bullshit in any text

luke-stanley · on Aug 22, 2024

Presumably, the identified phrase list could be used to finetune a Bert model or similar that could catch more cases, as a binary classifier. But presumably some actual semantically meaningful words would be needed too. That would be straight forward to do too though. Someone has probably already done it. The advantage would be you could get probability metrics on a broader set of text. Good data is the key thing though.

BrandoElFollito · on Aug 27, 2024

The fact that OP felt the need to add a disclaimer suggest that they expect people who write such abominations to search for detectors :)

Who knows, though. Maybe there is a marketing dude who once thought "maybe that's too much?". Naah.

ilaksh · on Aug 22, 2024

Hm. I wonder how well this works versus a large LLM. Seems like something a very strong LLM should be able to handle well with the right prompting.

If you can handle it with just phrases that would save a lot of time and money though.

luke-stanley · on Aug 22, 2024

Yeah, presumably words with vague meanings, low semantic quality are easy to find. Still, a Bert model for it would be fun.

delichon · on Aug 22, 2024

It counts these phrases that the author doesn't like:

https://github.com/pilotpirxie/bullshit-detector/blob/main/s...

By including this file this project should therefore correctly give itself a very high bullshit score. It's performance art really.

dotancohen · on Aug 22, 2024

That is equivalent to noticing that antivirus applications would flag theme themselves as viruses if they would review their own virus samples.

In applications I write, I store all example files separately for this reason and others. Even LLMs store the training data separately.

spacebacon · on Aug 22, 2024

Many duplicates

bravetraveler · on Aug 23, 2024

Haven't looked at the workings exactly, but this can sometimes be a deliberate choice to weight the options. A number property could work, but alas. Easier to yank and paste.

spacebacon · on Aug 23, 2024

I considered that. Ai also tends to list duplicates when generating exhaustive list.

bravetraveler · on Aug 23, 2024

Totally fair, just spittin' in the wind

spacebacon · on Aug 23, 2024

Same here lol

pacifika · on Aug 22, 2024

Love this

r00tanon · on Aug 22, 2024

Bullshit.