Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I did really simple analysis in an AI class with several thousand posts of a forum I post frequently at, tagged "me" and "not me" (I figured that way it would be minimally invasive to people not participating -- plus, hey, easy naive Bayesian). Got fairly decent results with pretty trivial sample inputs. Numbers elude me, it has been years.

Everyone thinks "Aha, you have some catchphrases" (I do) or "Aha, you were one of the only Republicans and thus someone saying 'death tax' was more likely you" (true) or "You cited nationalreview.com more than the rest of the forum together" (true), but it turns out the distribution of really stupid stuff (stopwords, essentially) works better.

This is ironically the same they've discovered for making female/male authorship decisions, although I never went the next step and said "So what relationship does my distribution have with the average guy distribution?"

Incidentally, here's the reason you'll never have to worry about this in the context of "Google the Internet for everything Patrick McKenzie has ever written": imagine I have a 99.9% effective filter for you, and I dragnet an Internet filled with 5 billion documents of which you've written 1,000. I then identify 5 million documents as written by you... but you only wrote 1,000 of them.

This sort of "don't search the haystack unless you're bloody sure it is packed full of needles" thing is why you never want to test a population not known to be at risk for the disease, etc. (Or why you retest in the event of a positive using a different test.)



I down-voted you for the use of the term "death tax". The correct term for it is the "trust fund baby tax". This is not merely a matter of rhetoric, but of accuracy. The dead person isn't being taxed. Being deceased and presumably having no use for wealth past that point, he or she doesn't care. The living heirs are being taxed.

To quote Salute Your Shorts, get it right or pay the price.

It cost you 2 karma points, because I'd otherwise have voted you up for an otherwise high-quality post.


That was an odd thing to do... If he had used your "correct" term in his post, it would not have been a good indicator of Republican behavior and would therefore ruin the example. His post uses the term in a description of other posts -- just as your post does.


You make a good point. I just have a knee-jerk negative response to phrases like "death tax" when not used with bitter irony. (I have Republican relatives.) I probably deserve my -5 for being obnoxious.

Rash, knee-jerk reactions are often a social liability, which reminds me of a funny bar anecdote:

Guy: So, you're waiting for someone?

Girl: Yeah, some college friends. They're an hour late. Oh my... Boy, friends can be really rude sometimes.

Guy: Tell me about your sister. How old is she?

Girl: What?

Guy: Sorry, I heard the words "my boyfriend" and--

(Girl walks away.)


Yaaaouch! Seafood soup is NOT on the menu!


The dead person isn't being taxed. Being deceased and presumably having no use for wealth past that point, he or she doesn't care.

That's not true. Estate tax rules affect gifts during one's own lifetime as well.


I think such nitty-gritty comments about karma aren't very insightful. But while we're in those nitty-gritty details, I'll say there could very well be some people who upvote him just to counterbalance your downvote; he might even make a profit.


I down-voted your description to register my disapproval of marking down someone's post for a single, non-offensive word. That's obnoxious and pedantic.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: