You could write a very similar article about any AV company.
I've personally been automatically scanning data shared by various¹ AV companies for sensitive documents. I only store files not detected as malware by any scanners, and filter out large usually boring files (i.e. videos, audio). I also make an effort to filter out publicly available documents by googling them. So far I've gathered over 20TB of data.
So far I've found:
Entire customer database of a Swedish bank
Thousands of git repositories
Hundreds of passport scans
Tens of gigabytes of sensitive medical documents
Tens of gigabytes of internal government documents
Payroll data for hundreds of thousands of people
Entire email inboxes, both government and corporate.
Hundreds of website dbs in various formats
This is just off the top of my head, I could keep going for a while. I've got a searchable archive of tens of millions of clean pdfs, word docs, excel spreadsheets, powerpoints and so on. All pulled from users computers by their AV software.
¹ Not including Kaspersky, but including many US firms.
It's readily shared among researchers and "researchers". I'm scraping multiple feeds, but virustotal for example is one of the easiest public ones to access.
edit: This question was far more reasonable before the parent comment was edited.