Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The article starts at the top by saying "To become SOC2 compliant, we needed to remove global access and fine-tune who has access to what schemas and tables."

I've had to go through this SOC2 certification process as well, and I think a much better approach (with a lot of other benefits) is to use client side encryption to encode sensitive data like PII or PHI (personal health info) before you insert it into the DB. That way it's easy to give all of your developers read-only access to essentially the entire DB for things like debugging support while still maintaining SOC2 and other compliance (e.g. HIPAA).

Not saying there isn't also good use cases for roles and privileges (and it's a lot harder to add client-side encryption after the fact), but using client side encryption/decryption is a better approach to this issue IMO (you get more security benefits, and the compliance benefits really just are a consequence of that).



Doesn't encrypting your data before insertion make your data unable to be indexed/searched easily?


For indexing/searching on encrypted fields we use a blind index (lots of good resources if you search for that term).

On the other hand, sorting on encrypted fields has proven to be a difficult challenge. There are some possible approaches but they lower the security of your encryption.


Blind indexes are useless when working with limited address spaces like Social Security Numbers, and even US Addresses[1]. It would take under an hour to reverse these on my current home PC.

Your advice isn't simply security theater - It's wrong and dangerous. It leads to companies treating this data, which is still sensitive, as nonsensitive and storing it insecurely, particularly when data teams export it to third-party tools.

[1] https://www.transportation.gov/gis/national-address-database


> It would take under an hour to reverse these on my current home PC.

The indexes are created with a secure salt. They're only crackable if you know the salt.


For some kinds of data and queries, it doesn't matter if the data in the index is encrypted. For other kinds of data, you could build the index on an expression that produces decrypted or anonymized values. Sadly postgres doesn't have per-index permissions, so you can't prevent a user with access to the table from using all of it's indexes.


That's fine if you want a bucket of bits instead of a database. You can even make it easier by making one big table with an ID and blob, and just serialize | encrypt state to the DB. Easy-peasy.

If you want to use the "R" in RDBMS, though, or report on your data, or use indexes, or anything else that makes it worth running complex DBs instead of a file system, you're stuck using a database as a database.


This is wrong and unnecessarily snarky. I don't pre-encrypt all data, just PII/PHI. Doing this, or tokenization with a vaulting servjce, is pretty much standard recommended practice for storing sensitive data.


I agree.

I do think there’s an argument to be made for the idea that, to put it colloquially, “somewhere the Social Security Administration needs a database that just has every SSN in plaintext”, but that’s not _exactly_ an honest everyday use case.


We are big proponent of app-layer encryption as well. We wrote extensively about how we do it for our specific use case: https://www.slashid.dev/blog/app-layer-encryption/


We've done SOC2 type 1 and 2, and with a few exceptions, you only have to do what you say you do. First you claim you have controls on X, Y, Z, and then your audiors check that. You can just not claim X if you don't want to implement it. If the claim is vague, you have a lot of flexibility for implementation too.

This is a huge reason why SOC2 isn't a very useful certification. Your SOC2 and my SOC2 can be wildly different.


I would qualify this statement. For a competent auditing firm there are non-negotiables to attesting to your own firms compliance, and a discerning (prospective) customer who pays attention and knows how to read those reports can spot places where “you’re trying to get away with it”.

I’d much sooner agree that the flexibility is in implementation. As long as you can hit a control in a reasonable and articulable manner that can be measured and evidenced, you have much flexibility. I see that as the benefit of SOC2. Others see it as an issue.

To your point, last time I led a company through a SOC2 Type 1-2 engagement, we had some standards sourced from NIST that were ahead of industry for the time, and published NIST standards were an authority that the auditing firm was comfortable accepting as compensatory for a control that otherwise would have been absent or out of compliance. So that control was ultimately accepted as “No exceptions during the audit period, but see our notes annex”.


How can you do client side encryption with web apps though? While keeping the key on the client, I assume, and allowing multiple browser sessions for the same user?


When I say "client side encryption", I'm referring to the database client, which in most web apps is actually code running on a private server (i.e. browser code makes API calls to a server running something like Python, Node or Java, and that server code makes calls to the DB - it's on the server where PII is encrypted).

That said, you can also use the SubtleCrypto API in the browser to encrypt data before it is even sent to the server.


Envelope encryption, where you encrypt a data encryption key (typically symmetric with AES) with other keys (typically asymmetric with RSA). This is how most password safes like bitwarden work.


This model reminds me of sealed boxes, so I wanted to add that to this discussion.

Send a public key to the client (say in a secrets input page), your browser encrypts field content with that key, and you receive the ciphertext on the server. You can then decrypt it, discard the sealed box keys, and persist the data however you need. (Presumably something that sensitive would get encrypted with a different key before going into the database, but you could keep the keys around and have each piece of data protected by a different key. This has pros and cons.)

Github Actions secrets are protected in transit to Github using sealed boxes.


I believe by client side they mean the database client, which would be the application backend/server.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: