Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Our three year saga to release 13M pages of CIA secrets (muckrock.com)
167 points by danso on Jan 21, 2017 | hide | past | favorite | 33 comments


This part made me laugh a bit:

> The declaration also says that CIA cannot release these TIFF files in electronic form because they can be so easily altered by the mere act of a CIA FOIA analyst looking at them, and that the security measures they must take to remove this accidental metadata for an electronic release (involving editing each file separately by hand) would take 28 years and 1,200 CDs.

Aside from the issue of the CIA apparently not being able to develop (or download) a batch metadata editing tool, a common excuse by agencies for releasing data as PDFs or image files, instead of Excel/CSV, is because the latter is too easy to manipulate by the public:

http://www.nytimes.com/2010/04/13/business/13docpay.html

> Among the four leading drug companies making physician payment disclosures, Mr. Coukell said, Eli Lilly, which was the first to disclose, presents data as an Adobe Flash image, which he said was impossible to download or to sort. “They’ve gone out their way, I think, to present it as a Flash document,” Mr. Coukell said.

> Mr. Dunston said Obsidian had to retype all the Lilly data.

> Carole Puls, a Lilly spokeswoman, said the company purposely made its report impossible to download "to protect the integrity of the data." Lilly was concerned someone could change numbers and create a false report outside the company’s Web site, Ms. Puls said.


That's pretty sad, especially for an agency that must have a certain amount of technical competence for their day-to-day.

This could be easily solved by releasing all documents and a list of their hashes simultaneously.


I think the CIA views easily accessible info as the problem, not the creation of the data. I don't think they are looking for a solution to make things more convenient.


Yeah, but they have to disseminate data and documents both internally, with sibling agencies (FBI, NSA, DoD), and to legislative and executive requests (Congressional panels, inspectors general). So, ideally they would have these best practices for sending secure documents in a reproducible/batch fashion, and reappropriate them to meet their legal obligations under FOIA.


They probably do have such guidelines internally, and totally different and arbitrary ones for FOIA.


Clearly; yet it is encouraging that CREST finally ended up online.


People really give these agencies too much misplaced respect.

The defense and intelligence communities hire from Tier 3 schools in the DC, Maryland and Virginia area.

People that - you imagine - would get laughed at for not being from Stanford and Ivy League at every VC pitch meeting. People you would think weren't the "best of the best" hired by all the tech companies that brag about doing so.

Of course, their engineers are just are educated and ambitious, I'm just pointing out how warped people's perception has become.


You're totally right. Your industry definitely contains the worlds smartest people by nature and everyone else is second-rate and would be laughed at if they tried to work in Silicon Valley.

I mean, do you actually hear yourself? Really listen to yourself and describe what you hear.


The comment you're replying to appears to be explcit sarcasm, albeit oddly put.


Just echoing perception. People think intelligence agencies' engineers are geniuses, but are candidates that wouldn't be considered geniuses in the private sector solely because of their pedigree.


Having worked in government, at least half of everybody I've met would be considered too incompetent for the job, if it were a normal company. So yes, it would be easy for me to believe that government workers would be laughed at in silicon valley.


Clearly the CIA's 'public facing' departments do not have the resources, nor the incentives (from a technical standpoint) that are deployed regularly in the 'intelligence' departments of the Agency.


I'd presume that's not a bug, that's a feature.


And PDF format has a built-in digital signature system.


The CIA keeps secrets for a reason. All this armchair talk from prople wanting to get CIA material into the open is pure idiocy.

Stupid and dangerous. You know these are the people who protect you from all manner of sabotage and manipulation from foreign nations right?


Declassified CIA material holds immense historical value. Each document is reviewed prior to release to ensure there's no harm done to national security.


Declassified is fine, and that's what this is about ... I just see too many people with no respect or knowledge thinking that secrets are by nature some sort of problem, and wanting classified things to be in the public domain.


Just classify everything then. Problem solved.


CIA material is extremely time-sensitive. Anything a year or two stale is probably fine to expose. In fact the attempt to hide every CIA activity from public scrutiny is so repressing and dangerous to a free society, that some risk is certainly warranted. Release the documents.


It's good to see satire and parody thriving in the age of Trump


> 1,200 CDs

I like how they try to make it sound like it's a bunch of data. Sure, it isn't a small amount, but it's only roughly 840GB.

So it's more like

> It would take someone a few minutes to setup a batch meta data editing program (shit, or only an hour or two if one had to be made) and an external hard drive from office depot.


As someone who has actually written a SWF to PDF converter, I am also laughing a bit.


Two thoughts

1. Screenshots of data can be manipulated

2. Digital signatures exist.


In the ediscovery world we can scrub that much metadata in less than a week.


In the vast majority* of cases, agencies of the US government should have all non-classified data OCRed, indexed, hosted on their site and additionally available on BitTorrent. Put together a reasonable "Open Access Data" budget and make it available under the Library of Congress.

* Thinking things like astronomical data or other petabyte+ data sets.


Indeed, I couldn't agree more. Though I'd be happy for them to be coerced into providing the petabyte+ datasets too. If they can afford to store them they should serve them.

I believe the UK signed up along with the US as part of the G8 to an Open Data Charter [1] in 2013. In theory it aims to ensure that all government data be published openly by default, but in practice, as far as I can work out it has no teeth. You can point government departments to it, tell them they should release data, ask why they wouldn't and just get stonewalled.

I particularly pushed hard to try and get data from speed cameras released (without personal details; just time, location, speed, etc.) I even offered to aggregate data, normalise it, run the hosting, etc all at my own cost. Nothing. As far as I can tell there's no one to go to who has the power and influence to make it happen.

  [1] https://www.gov.uk/government/publications/open-data-charter/g8-open-data-charter-and-technical-annex


I'm surprised nobody has scraped 100% of this and done OCR to make it all clearly documented and tagged.

Seems like a good grad school research project.


Thanks for your efforts Team Muckrock. Keep fighting the good FOIA transparency fight!


this is a little unrelated but does anyone know where i can find articles or reports that delve into the substance of the podesta emails rather than the circumstances surrounding the podesta emails? ive been looking all day and i can literally count on one hand the number of articles that report on the contents of the emails. and those dont even go very deep!


Pretend you're someone who conducts a lot of daily life via email, e.g. messages related to your day job, setting up appointments, chatting with friends, etc, and imagine that someone has leaked 2 months of those daily emails as a public download.

Now imagine you are someone else tasked with writing something that captures "the substance" of those emails. You've basically been asked to capture "the substance" of someone's life over a 2-month period. The reason why you don't see many "deep" articles about the content of the emails is because John Podesta is someone who is interesting to the public for a specific role: Clinton campaign manager. So the contents of his emails are analyzed in the context of the presidential race, not for any other kind of deep contextual analysis.


Is this available via torrent yet? Preferably ocr'd?



beautiful, thank you




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: