If it was that easy, why did their human judges fail at it?

socratic · on Aug 21, 2011

Does the experimental setup for the human judges sound fair to you?

For example, the naive Bayes classifier knows the a priori distribution of review spam (which appears to be held to 50%), but do the undergraduate human judges? It would appear not, given that one judge only labeled 12% deceptive.

Likewise, were the human judges able to see examples of truthful and deceptive reviews before beginning the task? (In other words, are the human judges solving a different problem, e.g., "deception detection", than the classifier e.g., "similarity to prior deceptive reviews from Turkers").

If these are differences between the human and computer annotator setups, are they major differences? Can you spot any other big differences between the two experimental setups?