The article starts at the top by saying "To become SOC2 compliant, we needed to ...

lastofus · on Aug 12, 2023

Doesn't encrypting your data before insertion make your data unable to be indexed/searched easily?

hn_throwaway_99 · on Aug 12, 2023

For indexing/searching on encrypted fields we use a blind index (lots of good resources if you search for that term).

On the other hand, sorting on encrypted fields has proven to be a difficult challenge. There are some possible approaches but they lower the security of your encryption.

GauntletWizard · on Aug 12, 2023

Blind indexes are useless when working with limited address spaces like Social Security Numbers, and even US Addresses[1]. It would take under an hour to reverse these on my current home PC.

Your advice isn't simply security theater - It's wrong and dangerous. It leads to companies treating this data, which is still sensitive, as nonsensitive and storing it insecurely, particularly when data teams export it to third-party tools.

[1] https://www.transportation.gov/gis/national-address-database

hn_throwaway_99 · on Aug 13, 2023

> It would take under an hour to reverse these on my current home PC.

The indexes are created with a secure salt. They're only crackable if you know the salt.

webstrand · on Aug 12, 2023

For some kinds of data and queries, it doesn't matter if the data in the index is encrypted. For other kinds of data, you could build the index on an expression that produces decrypted or anonymized values. Sadly postgres doesn't have per-index permissions, so you can't prevent a user with access to the table from using all of it's indexes.

_lqaf · on Aug 12, 2023

That's fine if you want a bucket of bits instead of a database. You can even make it easier by making one big table with an ID and blob, and just serialize | encrypt state to the DB. Easy-peasy.

If you want to use the "R" in RDBMS, though, or report on your data, or use indexes, or anything else that makes it worth running complex DBs instead of a file system, you're stuck using a database as a database.

hn_throwaway_99 · on Aug 12, 2023

This is wrong and unnecessarily snarky. I don't pre-encrypt all data, just PII/PHI. Doing this, or tokenization with a vaulting servjce, is pretty much standard recommended practice for storing sensitive data.

TheNewsIsHere · on Aug 13, 2023

I agree.

I do think there’s an argument to be made for the idea that, to put it colloquially, “somewhere the Social Security Administration needs a database that just has every SSN in plaintext”, but that’s not _exactly_ an honest everyday use case.

snagg · on Aug 12, 2023

We are big proponent of app-layer encryption as well. We wrote extensively about how we do it for our specific use case: https://www.slashid.dev/blog/app-layer-encryption/

singron · on Aug 13, 2023

We've done SOC2 type 1 and 2, and with a few exceptions, you only have to do what you say you do. First you claim you have controls on X, Y, Z, and then your audiors check that. You can just not claim X if you don't want to implement it. If the claim is vague, you have a lot of flexibility for implementation too.

This is a huge reason why SOC2 isn't a very useful certification. Your SOC2 and my SOC2 can be wildly different.

TheNewsIsHere · on Aug 13, 2023

I would qualify this statement. For a competent auditing firm there are non-negotiables to attesting to your own firms compliance, and a discerning (prospective) customer who pays attention and knows how to read those reports can spot places where “you’re trying to get away with it”.

I’d much sooner agree that the flexibility is in implementation. As long as you can hit a control in a reasonable and articulable manner that can be measured and evidenced, you have much flexibility. I see that as the benefit of SOC2. Others see it as an issue.

To your point, last time I led a company through a SOC2 Type 1-2 engagement, we had some standards sourced from NIST that were ahead of industry for the time, and published NIST standards were an authority that the auditing firm was comfortable accepting as compensatory for a control that otherwise would have been absent or out of compliance. So that control was ultimately accepted as “No exceptions during the audit period, but see our notes annex”.

rudasn · on Aug 12, 2023

How can you do client side encryption with web apps though? While keeping the key on the client, I assume, and allowing multiple browser sessions for the same user?

hn_throwaway_99 · on Aug 12, 2023

When I say "client side encryption", I'm referring to the database client, which in most web apps is actually code running on a private server (i.e. browser code makes API calls to a server running something like Python, Node or Java, and that server code makes calls to the DB - it's on the server where PII is encrypted).

That said, you can also use the SubtleCrypto API in the browser to encrypt data before it is even sent to the server.

candiddevmike · on Aug 12, 2023

Envelope encryption, where you encrypt a data encryption key (typically symmetric with AES) with other keys (typically asymmetric with RSA). This is how most password safes like bitwarden work.

TheNewsIsHere · on Aug 13, 2023

This model reminds me of sealed boxes, so I wanted to add that to this discussion.

Send a public key to the client (say in a secrets input page), your browser encrypts field content with that key, and you receive the ciphertext on the server. You can then decrypt it, discard the sealed box keys, and persist the data however you need. (Presumably something that sensitive would get encrypted with a different key before going into the database, but you could keep the keys around and have each piece of data protected by a different key. This has pros and cons.)

Github Actions secrets are protected in transit to Github using sealed boxes.

patmorgan23 · on Aug 12, 2023

I believe by client side they mean the database client, which would be the application backend/server.