From the article: However, under the laws of the People's Republic, government agencies can more or less search any machine at any time in the Middle Kingdom, meaning profiles on 56.5 million American residents appear to be at the fingertips of China, thanks to CheckPeople – we assume Beijing has files on all of us, though, to be fair.
And that is exactly how the Europeans feel when their health records are handled by Google and the TSA wants to know their Facebook handle: what initially appeared as a nice-to-have is now all over a sudden a government source of data no one anticipated...
Edit - oh darn, I forgot Ancestry.com and 23andMe, which are even worse examples: US police has full access to all DNA samples provided by anyone in the past. That is 20 million DNA samples. Not name or age - full genetic info...
I’m starting to think the Facebook / Google way of harvesting data may really be over complicated. This kind of services shows people are willing to pay to give personal data that literally defines them as individuals to a commercial entity.
A dna test was really helpful in finding all my half siblings and other family members because my biological dad is a sperm donor.
The fact that my DNA can somehow turn up at the scene of a crime somewhere and the police can query the sample I sent to Ancestry x years ago, well, let's say that I already knew that risk going into it and it will have to be what it will have to be.
In our lifetimes, I'm not seeing a way for us to dismantle the forces that are pushing for such a big surveillance state.
Therefore, by definition, you can either find a way to cope within that surveillance state, or you can move to somewhere so remote and hidden that you can't be caught doing what they don't like.
>Therefore, by definition, you can either find a way to cope within that surveillance state, or you can move to somewhere so remote and hidden that you can't be caught doing what they don't like.
The problem with this is that by submitting your DNA, you're not handwaving away your own privacy - you're also making the decision for your relatives as well to handwave away their privacy.
>In our lifetimes, I'm not seeing a way for us to dismantle the forces that are pushing for such a big surveillance state.
"It's hard so why even bother trying". Yeah, fuck this milquetoast line of thought, to be frank. The cost of liberty is eternal vigilance. Don't engage in the sort of activity that lays the groundwork for totalitarian surveillance.
No, the only problem is that the government has access to the records, not that I gave my data to some private company.
It's like saying we shouldn't use cell phones and GPS because if our phones somehow interact with each other, your location data is given to my phone, and then my phone is less secure than yours and leaks both of our info to the government.
The only fault occurring here is the government's decision to gain access to and use this data
I don't think so, it reads like the surveillance state is so pervasive, the risks of leaking data to it so ever-present, and the possible use of that data in the future in ways we couldn't have guessed is largely a collective-action problem that likely can only be solved with laws, rather than individuals opting in or out of everything.
The commented completely missed the fact that they are contributing to building the mass surveillance that covers OTHERS that don't opt in. You can't opt out of 23 and me since your relatives opt in for you.
The US does not collect everyone's DNA at birth, and there would be some pushback against that. But there isn't against 23 and me because few care about others.
Roughly... Crime is committed, police find DNA sample at scene, police compare sample with online DNA registry, hits as a near-match for personX, police now know that a sibling/cousin of personX is the culprit. That sibling/cousin never allowed their own DNA to be collected.
I may have a story saved on an old sd card somewhere.. but I think it was Italian? A major crime was solved by pattern matching DNA to a close relative, which brought about a few people in the town who's DNA was 90(?)% similar or something.. they did find the killer/rapist whatever they were looking for..
but the process of investigating and questioning led investigators, family members, and through rumor / whispering and news articles to discover that one of the grandparents (of a well known family) had cheated and birthed a love child that was assumed to be of whatever family name.. and then they had kids - and they all had positions in the town.. but now the truth was known that none of those grand kids, their families, etc were actually part of the whatever-family-name dynasty.
A whole group of people and a town and some industries changed forever because dna is similar (and much dna is not as well) - and the use of this technique roping in and affecting others that had nothing to do with alleged crime - certainly can have other real world consequences.
"The repository's contents are likely scraped from public records"
Seems like a non-story to me. If you put it out there, someone is collecting, cleaning, and selling it. The fix in this particular scenario is to put less online.
Almost every city in the US has a GIS system which you can use to look up tax record ownership of property parcels. Sales history too even. This information is also available via FOIA requests. There's probably easy ways to scrape it out considering almost every city uses the same 2-3 GIS software systems. In the case of home ownership, the only choice you have for privacy is to register an LLC under a paid attorney as a proxy, and have the LLC buy the home. And even then you may run into issues trying to file for homestead exemption property taxes. You're not able to tell the city to unlist your property info in most cases.
Reading the Letters to the Editor and similar features in older magazines and newspapers reveals something that would be unthinkable today: it was common for the home address of the person writing to the periodical to be published along with their letter. As far as I can tell, no one had a problem with this. When I was young, drivers licenses included your Social Security number and it was common to have it printed on your personal checks. This no longer happens.
Is it the information that's the problem or is the problem what others are willing to do with that information?
Knowing the address of three strangers isn't very valuable, and thus less likely to be abused. Knowing the address of millions of people can be very valuable, and thus very likely to be abused.
In cases like this the issue lies not in the specific datatypes, but the aggregation thereof.
I think most reactionary pro-privacy responses (the HN default perspective) really stem from a complicated internalization of data gathering/use capabilities that has been adjusted over time combined with a lizard brain feeling of invasiveness. Because almost always no one particularly gives a crap about YOU and your specific data, but they may profit from and misuse your information in passing or in aggregate.
It is a lot like the feeling of having your car broken into or house robbed. You feel personally violated but more than likely you are a victim of circumstance. It can be hard to distinguish between faceless identification for (ad network data gathering, for the most part) and the risks of general data availability that makes anyone as capable as an old P. I. (especially when data leaks conflate the two).
"almost always no one particularly gives a crap about YOU and your specific data"
It's really amazing how much time people spend these days trying to gather information on people, considering how useless it is. I don't mean mass surveillance, I mean like people that you want to do business with individually.
> In the case of home ownership, the only choice you have for privacy is to register an LLC under a paid attorney as a proxy, and have the LLC buy the home.
Note you can achieve a lot of this privacy benefit much more easily by transferring ownership of your house to a living trust (where your name isn't on the name of the trust). For companies that just scrape and correlate info (as opposed to doing a targeted search), that is good enough.
Small nit, public records aren't things you put out there. It's records of your interactions with the government (directly or indirectly) that the government makes available as part of their transparency and FOIA efforts.
I’ve always wondered how those people finder sites get all this info. Do they just send a bunch of FOIAs to court houses and cities? I kinda figured maybe they paid to be wired in with the DMV and police computers maybe... I guess background check companies would be similar however they get the data.
I was looking on some before just curious and I noticed some information was inaccurate. Like looked up an old address, it said a dead relative used to live with us, when they never did. Then another site said one of our neighbors were a sex offender when not true.
Then there’s companies like LexisNexis too that have massive databases on people too. I think they have a way to run people’s credit without it actually showing up on people’s reports as I heard car dealers can get info on people credit without it showing up as a pull, so not sure if maybe it’s like a cached version of a credit report sold and traded.
Remembering watching some clips years ago on all these big data brokers on YouTube. Last I heard some of these companies won’t even delete your information unless you are a police officer who felt your life was in danger. Seems to still have a similar policy. https://www.lexisnexis.com/en-us/privacy/for-consumers/opt-o...
Where I live court records and property records are completely public and available for anyone to view online. So just simple web scraping. Voter records aren't online as far as I'm aware, but they're easy to request.
I know some of it makes it to the paper too, like bankruptcies or divorces if you don't know where they currently live, but I know some of that stuff is under a paywall or got to pay to make copies at the court house but probably depends on the area.
I was looking at one site about red light camera tickets since there's been debates over them, some states even outlawed them. But looking at one of the examples, some county didn't even have a secure website to put in license and credit card info. Chrome even put a warning next to the address bar. Not sure why they are allowed to process credit cards since a private website would have to be compliance with PCI. But I guess some areas are more technical than others.
For the entire state? I figured a county by county thing. But I do think more legal case law and publicly funded research should be open to the public. I know people feel that way about PACER, so someone created a plugin called Recap.
It's articles like this that make me realize many people really haven't come to terms with how the Internet has changed the definition of "privacy".
As mentioned in the article, this is all public data that probably any script kiddie with enough time could write scrapers for. It's a non-story. What's different is that (some) people somehow expect that this public data is as hard to collect and correlate as it was 30 years ago. Those days are long gone, and people should realize it.
General question to HN because i don't know the answer.
Has anyone ever tried de-duplicating and matching up records between multiple leaks to build user/person profiles?
I guess with enough unique keys, compute power and time you could build a reasonably accurate profile of a person by matching up email addresses, phone number or SN numbers?
Would probably make identify thief and fraud a lot easier going forwards.
Data broker-type companies do this already and sell it to to any company that wants to buy it. They even include stuff like your probable religion, probable income/net worth, probable hobbies/interests, etc.
For example:
>Each record contains entries that go far beyond contact information and public records to include more than 400 variables on a vast range of specific characteristics: whether the person smokes, their religion, whether they have dogs or cats, and interests as varied as scuba diving and plus-size apparel.
Probably not much more than what was in the Equifax breach.
I think one of the bigger issues in generating profiles on ~325 million people would be merging tons of incomplete data and lots of old/outdated data.
Not to say that you couldn't use some machine learning to fill in the blanks, but I'm sure most if the big data/analytics companies are already doing that.
And that is exactly how the Europeans feel when their health records are handled by Google and the TSA wants to know their Facebook handle: what initially appeared as a nice-to-have is now all over a sudden a government source of data no one anticipated...
Edit - oh darn, I forgot Ancestry.com and 23andMe, which are even worse examples: US police has full access to all DNA samples provided by anyone in the past. That is 20 million DNA samples. Not name or age - full genetic info...