Hacker Newsnew | past | comments | ask | show | jobs | submit | reza_n's commentslogin

You can use `explicit_bzero()` to bypass DCE (dead code elimination). Otherwise, simply initializing your memory before using is enough to trigger magic failures when you use-after-free. C programs barely function if they do not initialize memory. Context, I work on Varnish which the OP referenced for this.


Varnish Software | Dev Ops | NYC onsite or US based remote | Full-time

Varnish Software is the company behind Varnish Cache, a hugely popular open source caching solution installed in front of millions of websites globally. We build high performance caching, CDN, and edge logic solutions. We are looking for a talented dev ops engineer to join our growing US based engineering team. You will be responsible for building, managing, and monitoring our cloud, on-premise, and hosted Varnish solutions. These are Linux based architectures with 100+ gbit/sec network capacity and hundreds of TB of disk capacity. Skills desired:

* Experience managing Linux based systems (bonus points for high performance systems experience)

* Experience troubleshooting and problem solving Linux based issues

* Experience with Ansible, Bash, Git, Docker, Kubernetes, and Terraform

* Knowledge of at least 1 programming language

* Knowledge of how websites and HTTP works

* Knowledge of security best practices

We are considering candidates of all skill levels, if you are new to the field but passionate about Linux and Dev Ops, please apply! Varnish Software provides an open laid back work culture with competitive salaries, full benefits, and generous vacation time. You will be working alongside some of the best and brightest in the industry!

Please apply at: nyc@varnish-software.com


Bummer, he pretty much proved he was on the right path with his translation theories. Maybe this was done to protect his work? As in, hes now working to complete the translation and did not want others to beat him to it. One can hope...


I write C full time and I love it (Varnish Cache). In our team of about 10 full time C engineers, we spend less than an hour a month dealing with things like “memory safety”. When you are writing C at a professional level, your time is spent on things like performance, algorithms, accuracy, hitting requirements, and delivering software. We have numerous safety and fuzzing systems humming in the background checking our work 24/7. The tooling ecosystem (Linux) is top notch.

(If you want to write C full time professionally too, contact me!)


C was my first love, and I still find myself using it often, especially in library code when I want versatility.

I've gotten into the habit of pushing memory allocations as far back into the user program as possible (where the user asks for the size of the internal struct, allocates and casts, and deals with ownership himself) to allow more flexibility to the users of my libraries, and also to remove entire classes of memory issues from the libraries themselves: https://github.com/kstenerud/c-cbe/blob/master/tests/src/rea...

As a bonus, it avoids the headaches of mixing multiple allocators (malloc, new, [NSObject alloc], JNI, etc) when used in cross-language codebases.

It does, unfortunately, complicate the API a little bit, but I find the tradeoff to be worth it in terms of safety.


What does it take to become a professional C engineer? How did you practice all that. I currently started looking into it, but I'm kind of just getting started (Books/PDF, Exercises).


Time and experience. Learning the syntax and wrapping your head around pointers and memory is the first step. After that, just try and write as much C as possible. Help out with projects, start projects, write tools and APIs. Most importantly, study existing code bases to learn real world techniques. Best case, get a C job where you have to write a diverse amount of C code day after day.


I can't recommend this book enough: Code Reading: The Open Source Perspective by Diomidis Spinellis. See also https://news.ycombinator.com/item?id=21006995 for several others.


I wrote something very similar many years ago. I described it as a reverse search index. The queries/regex gets indexed into a search tree and then text is run thru it. It supported hundreds of thousands of parallel regex searches. I called it dClass:

https://github.com/TheWeatherChannel/dClass


This is a bit on the unsafe side since it blindly trusts user input. At minimum, there needs to be some kind of magic number in the struct header to validate its looking at the right memory. Best case, some kind of pointer accounting. Unfortunately, magic doesn't come for free.


We can optimize that, just put a function pointer in the beginning that's called every time to validate the string... /s


Yup, we added this feature to Varnish Cache a few years ago, random key encryption. It generates a random key at startup and encrypts all memory with it. Since this kind of memory is only resident for the lifetime of the process, it works. We stored the random key in the Linux kernel using the crypto API [0] just because its not safe storing any kind of keys in a memory space used for caching (Cloudbleed [1]). We then use the key to generate a per object HMAC, so each piece of data ends up with its own key, which further prevents something like Cloudbleed. Since we used kernel crypto, overhead was about 50%. If you stay completely in user space, its probably much lower.

[0] https://www.kernel.org/doc/html/v4.17/crypto/userspace-if.ht...

[1] https://en.wikipedia.org/wiki/Cloudbleed


Could this be implemented at the OS level, i.e. whenever a proces launches, the OS generates a key that it will keep to itself and use to transparently encrypt all memory allocated by that process?


My first thought was to try to use 'containers' (cgroups) combined with the AMD secure memory extensions to achieve this type of isolation using as much off the shelf hardware as possible.

https://en.wikichip.org/wiki/x86/sme

https://www.kernel.org/doc/Documentation/x86/amd-memory-encr...

From the quick description it sounds like this provides a way of encrypting, per memory page, based on a symmetric key that is backed by some level of hardware encryption. It was not clear (in a quick read) how or where to specify the key by which an individual page is encrypted. That would be a critical component of comprehension with respect to identifying if this could be used to encipher individual processes and further isolate memory. It sounds like it might be possible to establish per-process memory isolation, which is probably the best level of security possible without resorting to entirely isolated hardware.


Per-process keys aren't really possible because memory can change process ownership (vmsplice) or be shared across processes (fork, page cache, memfd). It might be possible for pages marked MADV_DONTFORK

Additionally a per-process key does not help against spectre style attacks where you would trick the process into speculating on protected memory.


You'd probably want a hardware module to do that lest performance plummets. Memory controllers can already deal with ECC efficiently, adding a simple cypher on top of it should definitely be feasible.


Possibly, but memory is accessed using plain CPU instructions, so it would be hard to transparently encrypt all memory for an application at the kernel level. You do have virtual memory, but I dont think that could be leveraged for this. But who knows whats possible there, maybe if you align and address each memory value at the page boundaries and always force a page fault you could have a really poor implementation :)

Transparent disk encryption, not a problem since devices have filesystems which can implement encryption at that layer.


Modern Intel chips can encrypt memory on the fly without performance loss (SGX does this). However I think it's not exposed for non-enclave use. Perhaps it should be.

Note: inside the enclave there is a performance loss but that's due to MAC checks. If you just want encryption without integrity against tampering you don't need that.


But that wouldn't prevent (mitigate) cloudbleed anymore as the problem is about isolating contexts within process boundaries.


Technically yes, but practically no, because mediating all memory reads through the kernel would be very slow.

SME/MKTME add hardware support for this.


Yes. Most research makes CPU modifications since that makes the most sense. Sometimes they try to use OS-level techniques. Here's a survey showing some of each:

https://thayer.dartmouth.edu/tr/reports/tr13-001.pdf


Just to clarify my understanding, the reason for doing this is so that random sampling / leakage of the contents of the RAM stops being useful, you need to specifically get the key (and then presumably a whole chunk of encrypted RAM to decrypt?)?


Yup. When something goes wrong in these kinds of applications, you sometimes tend to just randomly dump memory, which is a huge data leak. Or even worse, if someone figures out a way to force a data leak, then your are completely compromised. Having each piece of data with its own key and that key is a combination of data outside of the process address space drastically lowers the chances of data leakage and total compromise.


Could you point me to the relevant source code? Am highly interested to take a look at it during the weekend.



No offense, I am genuinely curious, why would anyone use any closed source software for anything related to security after the Snowden revelations?


> Closed source

Ah, so we'll just have to trust you that it's doing anything at all, then.


"Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith."

https://news.ycombinator.com/newsguidelines.html


> "Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith."

> https://news.ycombinator.com/newsguidelines.html

Forgive me but can we not be skeptical of claims made about a commercial product?


Of course you can, and there are plenty of ways to do so that don't break the site guidelines. Cheap, snarky one-liners are not the way. If someone's posting about their own work, there's no need to be disrespectful.

It's also not helpful to post such a clichéd dismissal of what someone else says or their work. That's in the site guidelines too.

https://news.ycombinator.com/newsguidelines.html


Isn’t his reply valid and an exception to the rule given the context?


We have no problem sharing our codebase with customers, especially if there are concerns like this. Shoot me a msg if you are genuinely interested in anything you have read.


Edge facilities are warehouses in regional locations with excellent backbone connectivity, basically your modern datacenter. Cellphone towers can probably host a few racks, that's not a profitable business and its not "internet scale". If the regional datacenter has a 15ms ping to each tower in the region, then you have pretty good coverage.


I think you underestimate what the edge will become. There are already startups trying to store your data at your house, cellphone tower, isp, etc. In ways where there is no central store or in ways that everything is eventually consistent. Computing at the edge is a very interesting topic.


Storing my data at my house makes sense, especially if upkeep of the box is relegated to the end user. Storing my data at the tower in my neighborhood, instead of a regional center seems to be a large increase in maintenance cost for a minimal decrease in latency.

Accessing the tower is expensive in time, and equipment that runs at the tower is exposed to a wider variety of temperatures and RF stress than in a nice warehouse somewhere in the metro area.

It's possible the right caching at towers could reduce the backhaul bandwidth requirements, but seems iffy.


For an edge to work, you need security. That means transport encryption which requires certs. I think this fact alone will keep the edge at modern secured datacenters. There is limited physical security at cell sites. Even less at the users home. This could mean there is no more transport encryption for this kind of edge. Or even worse, private key loss.

Not saying no here, just pointing out a very large concern.


Have you taken a look at the Cloudflare Keyless SSL tech?


Just did and I think it furthers my point. Cloudflare now owns your key and the edge is now their network.

My concern is cert/key management where the edge is somewhere you have very little control over, like a cell tower, random building network, or a users house. Even with keyless, once that device is in my home, Im pretty sure that entire thing can be reverse engineered. Not easy, like probes and oscilloscopes on exposed leads hard, but physical access is pretty much game over, no?

I've worked in this space and the solution is detection and mitigation. Limit the damage to single devices, workflow the user in, look for human attack patterns. Defense is futile.


The point is that the key is never in the possession of the edge (i.e. Cloudflare). There is no way the edge could recover the key. They can use it to sign whatever they want, while you allow them to, although you can take whatever auditing measures you'd like there.


Why is storage at a (space-limited) cell tower more interesting than storage/compute at the ISP or packet core (or whatever's at the other end of the backhaul)?

How much latency do you think is incurred between the ISP and cell tower?



Exactly. I used to live in a low rise apartment building which used modern wood construction. Not only did I hear everything above me, but the shock waves (from foot strikes) would also travel thru the structure, so you could feel everything as well.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: