If all you want to know is what fraction of all twitter accounts are spam accoun... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		justinpombrio on May 16, 2022 \| parent \| context \| favorite \| on: Nearly 20% of active Twitter accounts likely to be... If all you want to know is what fraction of all twitter accounts are spam accounts, it should be really easy: 1. Select 1000 accounts uniformly at random. Either from among all twitter accounts, or from active twitter accounts for whatever definition of "active". 2. Classify these 1000 by hand. Do as much investigation into them as you need to classify them accurately; no need to use heuristics here. You will (with very high probability) get an estimate accurate to within a percent or so. If you do statistics you could find the actual bounds.

alaricus on May 16, 2022 [–]

How do you get 1000 acounts at random? Does twitter have an API for it?

ianbooker on May 16, 2022 | [–]

The stream API can sample, but then you see currently active accounts only.

Users are denoted by numerical ID, you can sample using this.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact