Hi, I have been reading marketing and business books recently and found plenty of them are filled with meaningless corporate jargon. These books could often be 1/3 of the original length and much more straightforward. I wrote a tiny library to calculate the amount of meaningless jargon in any text for myself, and open-sourced it later because someone may need this.
Presumably, the identified phrase list could be used to finetune a Bert model or similar that could catch more cases, as a binary classifier. But presumably some actual semantically meaningful words would be needed too. That would be straight forward to do too though. Someone has probably already done it.
The advantage would be you could get probability metrics on a broader set of text. Good data is the key thing though.
Haven't looked at the workings exactly, but this can sometimes be a deliberate choice to weight the options. A number property could work, but alas. Easier to yank and paste.