Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Does the census bureau need personal data to do its functions? A "hash" of the name would be enough


The census is used to determine the allotment of House representatives for each state, and distribute funds to various federal programs by state or region. This probably requires more accurate and detailed demographic information than hashes would provide.

Also, names are not unique, therefore hashes of names would also not be unique. And how would you verify the hashes belong to actual people?


Hash multiple information. While a name isn't unique, a SSN is. Or a name tied to an address and age IS (age needed because similar address also leads to higher chance of name clashing if something like "Jr" or "Sr" is missed). You can hash more than a single thing.


With enough data you can often "de-anonymize" to discover actual identities.


Whenever the topic of anonymizing and deanonymizing and PII comes up, I have to use myself as an example of how little data can be considered PII. Despite living in a city of 300,000 people, through a quirk in street numbering my house wound up with a unique postal code. Given the postal code, almost any piece of data goes from 1 of 2 people (me and my wife) to just one of us.

Hopefully that thought sends shivers down the sounds of anyone who is trying to come up with a data anonymization scheme




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: