CSAM Detection Technical Summary [pdf]

mrtksn · on Aug 6, 2021

The final step of this is going to be detection of terrorist materials, then we will have privacy that is based on our jurisdictions definition of terrorism.

Your device provider(not necessarily Apple) will notify authorities when you score high enough on a threat model.

In just in a decade, for the US citizens that model could include belief in fair elections(depending on who wins, you can be a terrorist of the deep state or the far right), you can be a Chinese terrorist in the USA if the friction intensifies between countries, you can be financial terrorist if the relationship between Cryptocurrencies and the state sours.

You can be traitor terrorist of corporations, because they detected inappropriate use of corporate materials on your device.

You can be cultural terrorist in a culture obsessed with self preservation.

You can be infidel terrorist a muslim state with large enough market.

There is no need for hypothetical situations, you can be a terrorist or a freedom fighter today depending on your coordinates.

It is so easy to be a terrorist these days.

This is going to get very interesting when every authority starts demanding detection of their boogyman. Why bother with anti-espionage when you can tell apple to check who does have the pictures of your new shiny thing that was supposed to be a secret?

Apple seems to provide nice technological solution that looks promising to work fine as long as the authorities do good job in not becoming totalitarian crazy nutjobs because the technology can be used for any content, does not have a capability to ensure that it works only on child porn.

Apple is giving governments a content control tool that at this time is promised to work only for specific materials.

cwkoss · on Aug 6, 2021

We should hope detection of terrorism is the final step.

We should be extremely concerned that detection of political activism or investigative journalism will actually be the final step.

Prometheous08 · on Aug 6, 2021

'terrorism' is defined as whatever a governing body in charge says it is.

https://www.reuters.com/article/us-usa-iran/u-s-officially-d...

https://www.aa.com.tr/en/middle-east/iran-parliament-blackli...

cassianoleal · on Aug 6, 2021

Political activism of many forms is already considered terrorism in many parts of the world. Investigative journalism is not hard to frame as such either in many cases.

If terrorism is the final step, all that's needed to expand it is to redefine what constitutes it.

pintxo · on Aug 6, 2021

Nelson Mandela was once considered a terrorist!

yosamino · on Aug 6, 2021

> The final step of this is going to be detection of terrorist materials,

I don't think so. So many surveillance laws were passed with the stated intent of wanting to use them in cases of the most serious crimes and terrorism.

Now they are being used against your run of the mill criminals and innocent citizens.

phkahler · on Aug 6, 2021

In the US, one would hope Apple could be forced to reveal which images were reported in court. That would be a good check on misuse.

Like a lot of things, this sounds fairly reasonable but is likely to be corrupted over time. In other countries you may not be allowed to see the evidence against you. In the US there would certainly be pressure to suppress the evidence but that probably won't fly in a normal criminal court. IANAL.

donmcronald · on Aug 6, 2021

> Apple further transforms this database into an unreadable set of hashes, which is securely stored on users’ devices.

Is that a typo? What use is something that’s unreadable? I’m only half done, but it sounds like a bunch of hand wavy, AI, ML bullshit to me.

How do you match similar images without reducing them down to something that’s less than the original?

> Users can’t identify which images were flagged as CSAM by the system.

> If a user feels their account has been mistakenly flagged they can file an appeal to have their account reinstated.

So the user can’t have information about anything, but they can appeal the decision if their account is locked? And locking the account might cause people to do all kinds of stupid things like resetting their device to get the account back or because they panicked. Then the only evidence is some opaque system that claims the user had illegal content. That’s scary.

Many years ago I knew a person that took a perfectly good PC to the garbage dump because of a scareware warning saying it was involved in illegal activity. Poor guy wasn’t very computer savvy. This will open up some excellent phishing opportunities because of the seriousness/consequences of the type of accusation and the legitimacy of everyone knowing it’s something that’s done now.

Edit: I finished reading it. If the photos are synced to iCloud I guess locking the account preserves the data. I still don’t like how cloak and dagger the whole thing is. No one’s allowed to know anything and all the average person is going to understand is that Apple’s system says this person is guilty, so they’re guilty. I think stuff like this would get a lot more scrutiny and less traction if it wasn’t “for the children.”

What if there are bugs? No one is going to be allowed to audit the system or see how it works because of the sensitive nature of the content. It’s pretty scary IMO.

> The threshold is selected to provide an extremely low (1 in 1 trillion) probability of incorrectly flagging a given account.

How was that calculated? Prove it.

> The neural network is taught to generate descriptors that are close to one another for the original/perturbed pair.

So is that the same kind of crappy AI that (incorrectly) detects suspicious activity on email accounts?

salawat · on Aug 6, 2021

This is why AI is complete snake oil. They're trying to encode a recognizer, but they can't even define what it is they are actually feature extracting. Only that it's "typical ML techniques" that the rest of the industry doesn't want to call bullshit on because no one wants to give someone the idea that they shouldn't be allowed to use something if they can't explicitly articulate the function.

There's some things where screw it, why not. This is not one of them.

rantwasp · on Aug 6, 2021

> So the user can’t have information about anything, but they can appeal the decision if their account is locked? And locking the account might cause people to do all kinds of stupid things like resetting their device to get the account back or because they panicked. Then the only evidence is some opaque system that claims the user had illegal content. That’s scary.

At that point you have uploaded the pictures to icloud. The evidence is in icloud. You cannot destroy your device to get rid of this.

omoikane · on Aug 6, 2021

Having an appeals process doesn't inspire confidence since we rarely hear stories of those appeals processes working.

What we usually hear instead are people being unsuccessful in their attempts to go through an appeals process, often because the system doesn't reveal any information on why they were flagged. The real remediation appears to try to gain some publicity in order to attract the attention of some real human to actually look at their case.

ec109685 · on Aug 8, 2021

The low resolution version of the images are available (decryptable once the threshold has been met), so there’s evidence whether it is or not illegal material.

gojomo · on Aug 6, 2021

Many of these steps are spun as being for the users' privacy.

But they also prevent the operators of the system – Apple Inc, quasi-governmental non-profits (like NCMEC), & government law-enforcement agencies – from facing accountability for their choices. No one else can confirm what content they're scanning for, or what 'threshold' or other review applies, before serious enforcement action is taken against an account.

Also, Apple is quite vague about whether the claim "Apple manually reviews all reports" means Apple employees can view all triggering content, even if it was never uploaded to Apple servers. (How would that work?)

It would be trivial to extend this system to look for specific phrases or names in all local communication content.

How will Apple say 'no' when China & Russia want that capability?

In fact, might this initial domestic rollout simply be the 1st step of a crafty multi-year process to make that sort of eventual request seem ho-hum, "we're just extending American law-enforcement tech to follow local laws"?

sirn · on Aug 6, 2021

> Also, Apple is quite vague about whether the claim "Apple manually reviews all reports" means Apple employees can view all triggering content, even if it was never uploaded to Apple servers. (How would that work?)

From a quick read of the PDF, my impression is this Safety Voucher is more of an additional metadata generated before a photo is uploaded to iCloud Photo (as oppose to scan photo library to see if it matches and notify Apple narrative). The Safety Voucher is designed in a way that they could decide on the server-side if a photo is CSAM without requiring iCloud to have an ability to decrypt all photos. Safety Voucher comes in three layers:

- The first layer is generating a NeuralHash for a photo and derive a cryptographic header from it based on blind hash provisioned by Apple from a known CSAM NeuralHash (e.g. Apple pre-compute this table for each device). First layer is decryptable if the generated NeuralHash is a known CSAM. (Is it going to be a random data if NeuralHash doesn't exists in the table?)

- The second layer uses Secret Sharing[1] scheme (where a content is decryptable only if >n keys are present) using NeuralHash of a visual derivative as a secret share. Multiple visual derivative is generated for each photo, and certain amount of shares is required to decrypt the layer.

- After the second layer is decrypted, the file is readable.

One thing that doesn't makes sense is iCloud Photos are not currently End-to-End Encrypted, so Apple already have access to all photos. This scheme would only makes sense if they plan to introduce End-to-End Encryption for iCloud Photos (& iCloud Backups?) in the future.

In my opinion, if they were to enable E2EE for photos, this scheme would be slightly better for privacy than Apple having an ability to scan all photos on the cloud (e.g. they can't retroactively decrypt a photo that already has Safety Voucher uploaded to iCloud if the matching NeuralHash doesn't exists at the time of blind hash table generation), but significantly less than having E2EE and not having this system in the first place.

(As someone from a country that is known to abuse this sort of power, I'm not happy about this system existing.)

[1]: https://en.wikipedia.org/wiki/Secret_sharing

gojomo · on Aug 6, 2021

Your interpretation might be right, but Apple's vagueness might be strategically hiding things they know would be even more unpopular, if revealed.

Still, there's the question of what additional assurance this "manual review" is going to provide, if the Apple personnel only have exactly the same metadata/scrambled-representations as the software has, and not the raw images.

They couldn't detect a false positive with that info.

Perhaps it's just an option for Apple executives to exercise an extra level of courtesy (or leverage) over the famous & powerful, after seeing the alarm plus the account identity? "We've got a Senator here, we better consult Tim before reporting this through normal channels."

ec109685 · on Aug 8, 2021

The voucher has a low resolution version of the image included with it encrypted within two layers, so Apple, once the threshold has been breached, can view that version of the photo for false positives.

gojomo · on Aug 8, 2021

Thanks! So, this process sends the user's photo to Apple without the user's knowledge or approval.

If it is determined to be a false-positive, is the user informed that their data was exfiltrated?

ec109685 · on Aug 8, 2021

Good point. That is a flaw with all these scanning schemes.

ec109685 · on Aug 8, 2021

Are you sure the photos themselves aren’t encrypted when uploading? I think only the iCloud backup has the decryption keys.

Regardless, it’s way different building a system that has online access to decryption keys with the requirement to scan every photo uploaded versus reserving that access to more locked down servers that have the decryption keys and are only accessed during law enforcement actions.

sirn · on Aug 8, 2021

You're right, the iCloud Photo is actually encrypted on transit and on the server, but the server has the decryption key[1]. However they've been doing server-side scan for iCloud since quite some times ago[2], which is why I think this technique in the article plus End-to-End Encryption could be better for privacy than the status quo.

[1]: https://support.apple.com/en-us/HT202303

[2]: https://www.forbes.com/sites/thomasbrewster/2020/02/11/how-a...

sirn · on Aug 6, 2021

I can't edit anymore but I misunderstood the second layer. The second layer contains a single Secret Share (that can be used to reconstruct a decryption key after certain threshold) and an encrypted image information. User must upload multiple photos matching the initial CSAM database past certain threshold for any of them to be decryptable at all.

phkahler · on Aug 6, 2021

>> It would be trivial to extend this system to look for specific phrases or names in all local communication content.

How? They're comparing image hashes.

gojomo · on Aug 6, 2021

Perceptual image hashes are a hard problem. Searching for subversive text strings is easy.

And, a bunch of the obscuring-of-surveillance-intent achieved here is oblivious as to the types of detection-events - not at all image-specific. (That is, the private-set-intersection, cryptographic 'blinding', & threshold sharing schemes will work just fine for text/text-shingle/text-vector snitchlists as for image-perceptual-hash snitchlists. They're generic tricks for hiding what searches are being performed, & easy to repurpose.)

Once Apple's shown a willingness to install iOS software that does a crime-grep on private data, & snitches on users without their assent, adding text-triggers is technically simple. A iOS point-release.

_8j7i · on Aug 6, 2021

It is unacceptable for this process to execute on my devices. I am very glad I own zero Apple products, and this has solidified my resolve to never change that.

I'll be watching for proliferation of this technique in other devices, and will promptly trash any that implement it. (I already avoid uploading unencrypted content to cloud services.)

In this imperfect world, privacy keeps innocent people free. Long live AES-256.

donohoe · on Aug 6, 2021

Are you so sure this isn't already happening on other services you use?

_8j7i · on Aug 6, 2021

I believe it is happening, which is why I do not upload any unencrypted content to those services.

dagmx · on Aug 6, 2021

In which case, this new change wouldn't affect you anyway, because it only applies to photos uploaded to iCloud.

cucumb3rrelish · on Aug 6, 2021

can you explain why? if you upload a file selectively, where is the issue if its hash is compared to a CSAM database?

osmarks · on Aug 6, 2021

You can't check what's in the database. Someone could extend it to other things.

cucumb3rrelish · on Aug 6, 2021

Isn't this far fetched? There is few things I can think of that would get you into legal trouble but are morally acceptable. Like whistleblowing

osmarks · on Aug 7, 2021

You might be safe now in relatively okay Western countries. Consider the possibility of places like China using these features, or just future Western governments which are worse.

r00fus · on Aug 6, 2021

I'm sure this is happening on all other cloud photo services already.

cwizou · on Aug 6, 2021

As a reminder for context, Apple already did do a server side check for iCloud photos [1]. I think that's missing from the debate, what's changing is moving the check to on device which many object to (and I and others have written enough about that).

To add to this though : I think they built an extremely complex - and hopefully perfectly robust - system to make the check on device. On top of the other issues, it's massively important said process is not exploitable.

The content of the NCMEC database is not public, one can argue for good reason, and if that effort to move on device lead to the database being leaked (and thus bad actors being able to "mark" content that they shouldn't share), that would be incredibly counter productive. Maybe only knowing that the database was updated is too much information to leak about this ?

I think that's an additional concern to the ones already expressed many times, and I sincerly wish Apple had kept the checks server side even more now, as I only see downside to not doing so.

[1] : https://nakedsecurity.sophos.com/2020/01/09/apples-scanning-...

rantwasp · on Aug 6, 2021

The Big Brain Scheme Apple uses for their CSAM stuff. Summary for the lazy:

- this is going to be out in ios15

- Apple will build and push a database of hashes with every update

- these hashes will be used to verify against the pics before they are uploaded to icloud and generate a "voucher"

- once a certain threshold of vouchers is passed, Apple will be able to decrypt the pictures in questions (all this discussion seems to assume Apple does not have access to icloud pictures - so maybe they are preparing to rollout full encrypted backups???)

- the crypto stuff is interesting

- the scary part is that there is nothing preventing Apple to do the same thing with other thing on your phone (assuming there is a way to generate a good enough matching hash - I can see how this could be easily extended to basically anything that's stored on the phone)

simondotau · on Aug 7, 2021

Your last point is a slippery slope that began the moment you allowed any software on any device to update itself. Literally any software that updates itself could be converted into a treacherous surveillance system at any time.

And literally any cloud storage service could betray you at any time without pushing new binaries to your hardware.

And nobody ever has to tell you shit.

rantwasp · on Aug 7, 2021

for all that i know the software doesn’t even need to update itself. it may have the backdoor built in.

so now what? go back to landlines? stop using a “smart”phone?

jamessb · on Aug 6, 2021

This page [1] provides some more information and links to PDFs of several other related documents, including "Apple PSI System — Security Protocol and Analysis" [2] (which has lots more technical details) and some Technical Assesments.

[1]: https://www.apple.com/child-safety/

[2]: https://www.apple.com/child-safety/pdf/Apple_PSI_System_Secu...

h2odragon · on Aug 6, 2021

There's gonna be loads of fun picking this apart.

> Only another image that appears nearly identical can produce the same number; forexample, images that differ in size or transcoded quality will still have the same NeuralHash value

cf all the "calculated collisions" arguments already made

> Before an image is stored in iCloud Photos, an on-device matching process is performed for that image against the database of known CSAM hashes.

cf all the questions about that database of "official Badness"

> First, Apple receives the NeuralHashes corresponding to known CSAM from the above child-safety organizations.

sharing that authority out to who exactly i didnt catch the names in this document...

and also cf the already raised points about "won't $Nation want their own version of this with their own database of Bad".

Nations or religions, could Muslim countries keep their "no images of the Prophet" rules more strictly using these tools? would they not try?

cm2187 · on Aug 6, 2021

On the neural hash algorithm, I presume it must be some sort of a standard algorithm since the hashes will come from third party organisations. Is it available as a library anywhere?

jeffbee · on Aug 6, 2021

If the hashes are coming from NCMEC, presumably they must either be https://www.microsoft.com/en-us/photodna or Apple have contributed a new scheme.

pornel · on Aug 6, 2021

The PhotoDNA algorithm is kept as a secret, so whatever Apple is describing can't be it. I wonder if they've managed to get NCMEC to re-hash images for them using the new algorithm, or are they using PhotoDNA data as the input for their algorithm.

jeffbee · on Aug 6, 2021

You're right, the details in this paper are not consistent with PhotoDNA. I wonder if these new hashes and the models to generate them are going to be made available to other anti-abuse organizations.

nsxwolf · on Aug 6, 2021

So are people really going to go to prison because of a flag that says they did something illegal? With no opportunity for discovery or to cross examine? No way to audit the software and the chain of events that led to the detection?

ubermonkey · on Aug 6, 2021

That would indeed be bad.

What Apple is doing here is also bad. But jail-by-algorithm is not part of what Apple is doing.

r00fus · on Aug 6, 2021

No it's just "introduction to law enforcement by algorithm" which can be more fatal than jail.

takenpilot · on Aug 6, 2021

Just a question: if they turn these images into hashes, does that mean it only finds very specific images?

That is, if someone has some well-known image on their phone, then it can find it, but anything taken themselves is safe?

Doesn't that make this a bit pointless since I don't know anyone that saves images from the internet to their phone. That's weird, right?

yosamino · on Aug 6, 2021

> Doesn't that make this a bit pointless since I don't know anyone that [...]

There is a very large variety of people out there. It's unlikely that your social bubble contains all the different ways to use a phone. if you search the net the phrase "phone clearing dump" you will find example of people having filled their phones with so much "funny images" that they are now saving them to an online image hosting service.

You probably also don't know anyone who has depictions of sexual abuse of children on their phones, yet it is a thing that exists, and that apple seems to be trying to address.

The world is big and, as you say, weird ineed.

takenpilot · on Aug 6, 2021

And a sub-question: doesn't that make this ideal for checking for copyright infringement instead?

rantwasp · on Aug 6, 2021

copyright enforcement, detecting terrorists, sussing out dissidents. The applications are endless!!! Welcome to the future.

rantwasp · on Aug 6, 2021

yes. it's specific images. It's pointless and looks half-assed, but hey, they have the future to perfect it. This is only the watered down/tame version to get the foot in the door and cause a big commotion while still being able to keep plausible deniability. Once they do this a couple of times people will get tires and stop caring. That's when the scary stuff slips in.

pornel · on Aug 6, 2021

It finds specific images, but the hash is based on pixels, not raw file bytes. The hash is insensitive to small image changes (brightness, saturation, rotation, compression artifacts), so slightly modified images still hash to the same value.

k1rcher · on Aug 6, 2021

For reasons unbeknownst to me at the time of writing (for it requires much more introspection), I think this would be OK if it were a service that ran only on icloud-stored photos— rather than apple devices across the board.

I’m not sure why this is the line crossed for me. Perhaps it is only because I willfully choose _not_ to store _any_ data on iCloud (or at the very least a minimum amount as possible). Even moreso, I’m a bit unsure of what that line even is.

Is it a software vs hardware thing? No, because one could argue the service is not Apple hardware-specific, but rather specific to the iOS software.

So perhaps the line crossed is this being an OS feature rather than a SAAS feature. It is unfortunate that the precedent has been set where I am OK with potentially privacy invasive features on SAAS products, but I suppose it is what it is.

rantwasp · on Aug 6, 2021

the line is crossed because if you store data in the cloud you kind of get that the provider might be force to give LE to it. It may or may not happen but if it's in the cloud technically it's out of your control.

Stuff running on your device OTOH, you have the illusion that you control it. It's YOUR physical device and technically you should be able to do whatever you want with it. Now, enter closed source software + walled gardens => you don't really control your device. You don't get a say on what's going on in it. Up until know we believed Apple would do the right thing. Well... Myth busted I guess.

simondotau · on Aug 7, 2021

Turn off iCloud Photos and it won't run the surveillance. Seems reasonable that if you want to store your photos on someone else's hardware, they should get some say whether their hardware is used to store child porn.

rantwasp · on Aug 7, 2021

seems reasonable? here’s the thing: you start with a clear cut case like cp and after that you add different things that, as a government, bother you. you’re in a totalitarian state faster than you can blink.

The turn off icloud argument also does not work. Who tells you they’re not going to keep pulling stunts like this? So the solution is: stop using Apple. Now what?

simondotau · on Aug 8, 2021

> The turn off icloud argument also does not work.

Doesn't it? It worked on Android for the past seven years. Haven't heard anyone complain about the fact that Google has being doing pretty much the exact same thing to all photos uploaded to their cloud.

> So the solution is: stop using Apple.

Apple, Google, Microsoft, Facebook, and probably a dozen other major companies who didn't even consider it necessary to tell us.

rantwasp · on Aug 8, 2021

stopped using Google, M$ and Facebook a long time ago. Guess it’s time for Apple to go.

noduerme · on Aug 7, 2021

From the document

>> The neural network is taught to generate descriptors that are close to one another for the original/perturbed pair. Similarly, the network is also taught to generate descriptors that are farther away from one another for an original/distractor pair. A distractor is any image that is not considered identical to the original. Descriptors are considered to be close to one another if the cosine of the angle between descriptors is close to 1.

Can someone explain what this means? Am I to understand that the algorithm plots image descriptors or hashes of them along a 2-dimensional circle, and somehow iterates until the way it places them on that circle results in similar images being closer to each other?

ec109685 · on Aug 8, 2021

It’s not a two dimensional plot. This provides some background (though not sure if this is their scheme): https://towardsdatascience.com/importance-of-distance-metric...

aethertron · on Aug 6, 2021

Seems if you don't use iCloud, you can opt out. There are alternative backup services available on iOS devices, like Backblaze. But does the OS let them properly integrate with the system?

rantwasp · on Aug 6, 2021

to me it seems like if you don't use Apple you're totally protected. Hmm... What is the alternative again? Is there a phone that does straight linux and everything is open source?

Hackbraten · on Aug 6, 2021

There is: https://puri.sm/products/librem-5/

applthrowaway86 · on Aug 6, 2021

This is a consequence of the composition of Apple's security engineering organization. If you look through the LinkedIn profiles of the people in the org that delivered this feature, an interesting pattern emerges: a significant number of them are ex-IC. Back in the golden age of early iPhone development, the org used to have almost full autonomy and was run by a bunch of Mac fanatics and UNIX nerds. Today the org is full of people who were spies in their previous job. It shows.

cft · on Aug 6, 2021

I was going to buy an M1 Macbook but I don't want it to execute any spyware processes. Is it likely that Macbooks will get this too?

rantwasp · on Aug 6, 2021

yes, this is coming to the mac also. surveillance across the board!

m3kw9 · on Aug 6, 2021

A lot of that sounds like magic, most of us will take their word for it till it comes out and ppl like CitizenLabs takes a look

jeffbee · on Aug 6, 2021

The bottom line here is Apple can look at your iCloud data whenever they want. They've written down a vague description of something that supposedly keeps them from doing it all the time, but they can unilaterally change the parameters whenever they choose to do so. They have thoroughly torpedoed their privacy story.

r00fus · on Aug 6, 2021

Assuming the user trusts Apple, that may be palatable to some. But the crux is, the images they're matching against are in a database maintained by an unaccountable nonprofit (NCMEC).

Apple did this so their staff wouldn't have to see the offending images, until/unless it was actually confirmed.

However, the matching is all algorithmic and who knows what's in the CSAM database. It could be images of police brutality so law enforcement can ferret out protesters, for all we know. Want to round up all communists? This is an easy way to do so.

jeffbee · on Aug 6, 2021

Sure, the actual situation has always been that the iOS privacy story relies on you to trust Apple. But their marketing, which has always been bullshit, has implied that iOS privacy is a matter of the laws of physics.

aaomidi · on Aug 6, 2021

If you're an employee at Apple how do you even raise your voice of concern about this?

They're already calling people against this a screeching minority. Apple has created a toxic work environment and silences voices of concern by applying labels.

tibbydudeza · on Aug 6, 2021

I am in favor of them checking messaging content for general nudity for minors but I seriously have my doubts about this CSAM system.

Your life will be destroyed by a false flag.

ec109685 · on Aug 8, 2021

The flag includes an uploaded copy of the image, so that too will be verified before your life is destroyed.