Blog

What Do We Trust to Keep Our Data Private?

Examine privacy and confidentiality in computation using PETs, looking at multiple scenarios in input and output privacy using encryption and enclaves.

8 minutes read

Feb 27, 2024

We find it fascinating to contemplate the future of privacy and confidentiality in computation. Privacy-enhancing technologies (PETs), as a catch-all phrase describing the broad basket of technologies focussed on these issues, endeavour to work to solve varying aspects of the problem. But each PET makes a different trade-off and is applicable in a different scenario.

Previously in our blog posts, we have differentiated between input and output privacy, but today we will just focus on input privacy specifically. As a recap, input privacy is the goal of applying an algorithm to data which is kept secret from the algorithm provider and/or other parties involved in the computation.

Broadly speaking, there are two branches of thought as to how we can achieve input privacy (well, three if you include blind quantum computing…. But let’s stick to things you can deploy today):‍

1. Pure encryption-based input privacy

2. Enclave-based input privacy (a.k.a confidential computing or trusted execution environments)

In this post, we’ll touch on both.

Pure Encryption-based Input Privacy

Pure encryption-based input privacy is referred to in the cryptography community as secure multiparty computation (SMPC or sometimes MPC). This can be somewhat confusing as multiparty computation can be achieved in alternative ways, but the nomenclature dates back to the 70s, so it’s likely to stick around.

The general premise is that a special type of encryption scheme is chosen (not your standard AES or the likes) and all of the parties involved encrypt and/or split their data into parts that, when combined, reveal the original values. These parts are then shared with their counterparts, and the process is repeated. While this sounds incredibly vague, in truth, it’s simply because there are many approaches to achieving this which follow different protocols and offer different functionalities and efficiency trade-offs.‍

In some scenarios, there may be only a client and a server, with the client aiming to offload computation to the server without revealing the client’s secret query. If the query is a look-up to a database, then this is called Oblivious Transfer (OT), and there are theoretical bounds to the performance efficiency which can be achieved (spoiler: they don’t scale well for big databases).

If the query is to apply a polynomial function to the input, then the class of protocol is called Homomorphic Encryption (HE). While HE is certainly improving in efficiency, it is often still significantly slower than performing a calculation on the plaintext, and both the client and the server have to pre-agree on the structure and order of the polynomial in advance.‍

In scenarios where there are more than two parties, forms of secret sharing can be applied, which can often make the computation more efficient but also typically can require an increased number of rounds of communication between the parties and collusion between parties can catastrophically reduce privacy.

In general, the choice of protocol ends up being a compromise between:

How many parties can be involved
Which parties have confidential data
What resilience against collusion do you wish to achieve
How trusted are the parties involved
The type of functionality you wish to perform
The amount of data and number of rounds of data to be transferred between parties
The computational burden you accept
If you are willing to perform approximate computation (not exact)

Once a scheme is finally determined, you are left with a set of building blocks, such as arithmetic over integers, or perhaps comparisons, etc., with which you create a circuit of operations that implements a specific computation. Small changes to the computation can result in radical changes to the circuit in some scenarios.

One size really doesn’t fit all use cases, and combining them naively is often dangerous (“composability” proofs usually need to be derived and examined). From an academic perspective, this gives us great scope to find new overlooked settings and offer novel schemes to address them.

However, from the perspective of creating standards and general-purpose tooling, it becomes tricky. There are some informative standards which really only define the terminology of secret sharing, such as ISO 4922.2–2, but no normative standards exist which enforce specific implementations. Groups such as the MPC Alliance and HomomorphicEncryption.org are working actively to change this, and it will be interesting to keep an eye on their progress over the coming years.

The result is a collection of libraries and frameworks, usually open-sourced (which is good) but typically still at an experimental level of maturity and often maintained by academic groups as a research resource for the community.

There are exceptions to this, of course. Zama, for example, is building a library focused on deploying neural networks via homomorphic encryption. They’ve intentionally narrowed the scope of their goals to make developing with HE more accessible to non-cryptographers and be as efficient as possible. These types of hard restrictions appear, at least in the short term, to be necessary in order to drive any meaningful user adoption.

Pure encryption-based techniques don’t really have a tangible path toward general-purpose computation that you may want to use in production. By this, we simply mean you are unlikely to replace all of your micro-services with any of these techniques anytime soon, but for very specific interactions (like a certain calculation or private-set intersection), they appear to have a promising future.

Enclave-based Input Privacy

‍Enclaves, also referred to as confidential computing or trusted execution environments, are very different in how they endeavour to protect data throughout a computation. Their story starts about 8 years ago with the dawn of Intel’s SGX becoming commonly available and has been massively evolving and maturing since then.

Confidential computing endeavours to create a separated and highly isolated environment in which sensitive data can be decrypted and processed, out of reach from any malicious actors (including any user of the main computer/server). It pairs this functionality with the ability to “attest” the software that is running within this isolated environment.

What this means in practice is that the underlying infrastructure hashes the software running inside of the enclave and digitally signs them within a document (known as an attestation document) which can be shared with parties who wish to upload data.

Intel’s original push for enclaves was revolutionary, and they essentially achieved the two desiderata (isolation and attestation) through the silicon of the computer itself. However, one challenge faced was that the enclaves had a very small memory footprint.

To get around this, the enclave would read and write encrypted data to the main memory, but of course, the access patterns used by the enclave became a side channel to learn about the behaviour of the internal process. This was mitigated by some frameworks with the use of Oblivious RAM, which essentially obfuscated the access of data to and from memory but which came with additional overhead.‍

Still to this day, when you say “secure enclave” in a meeting, one of the first responses you can expect to hear is “SGX?” which really shows the impact Intel has had on the domain over the past number of years. However, alternative approaches have become extremely popular in recent years.

Hypervisor-Based Enclaves

Most notable has been the rise of hypervisor-based enclaves. When you use a cloud provider, unless you are using a bare metal machine, you will likely be using a virtual machine (VM) which may be sharing the same physical infrastructure as another cloud user.

For example, Netflix and your retail bank may actually be running within the same server box in a data centre somewhere. But of course, they can’t see each other’s data or interfere with one another’s processes because their sessions are partitioned via the hypervisor of the server running on the cloud.

This infrastructure is ubiquitous in cloud computing which led to the leading cloud providers realising they could simply emulate the Intel-style hardware-based enclaves with their hypervisors. This leads to AWS’s Nitro Enclaves, amongst others.

There are, of course, pros and cons to hardware versus hypervisor-based enclaves. Hypervisor-based enclaves are far more flexible in nature; while they still have no PCI support (hard disks, GPUs, TPUs, etc), they do allow you to specify how many CPUs and RAM you would like to allocate to your enclave.

However, your trust is really in the cloud provider. For most, this is not a problem as all of their sensitive data already resides in the cloud provider, so implicit trust is already established. However, if you are strongly concerned by the CLOUD Act or simply don’t trust your cloud provider, you may be uncomfortable with this approach.

Some would say hardware-based enclaves are more secure because even the cloud provider can’t “cheat”. But the counter-argument to this is that a physicist will always be able to read the charge off of silicon given enough time and energy, albeit unlikely to be worth it for them in most scenarios.

privacy

data privacy

privacy enhancing technologies

secure enclaves

encryption

What Do We Trust to Keep Our Data Private?

Examine privacy and confidentiality in computation using PETs, looking at multiple scenarios in input and output privacy using encryption and enclaves.

Pure Encryption-based Input Privacy

Enclave-based Input Privacy

Hypervisor-Based Enclaves

Products

Community

Resources

Legal

Products

Community

Resources

Legal

Products

Community

Resources

Legal