Amazon Scholar, AWS Cryptography
Principal Product Manager, AWS Cryptography
AWS Cryptography tools and services use a wide range of encryption and storage technologies that can help customers protect their data both at rest and in transit. In some instances, customers also require protection of their data even while it is in use. To address these needs, Amazon Web Services (AWS) is developing new techniques for cryptographic computing, a set of technologies that allow computations to be performed on encrypted data, so that sensitive data is never exposed. This foundation is used to help protect the privacy and intellectual property of data owners, data users, and other parties involved in machine learning activities.
We recently spoke to Bill Horne, Principal Product Manager in AWS Cryptography, and Joan Feigenbaum, Amazon Scholar in AWS Cryptography, about their experiences with cryptographic computing, why it’s such an important topic, and how AWS is addressing it.
Tell me about yourselves: what made you decide to work in cryptographic computing? And, why did you come to AWS to do cryptographic computing?
Joan: I’m a computer science professor at Yale and an Amazon Scholar. I started graduate school at Stanford in Computer Science in the fall of 1981. Before that, I was an undergraduate math major at Harvard. Almost from the beginning, I have been interested in what has now come to be called cryptographic computing. During the fall of 1982, Andrew Yao, who was my PhD advisor, published a paper entitled “Protocols for Secure Computation,” which introduced the millionaire’s problem: Two millionaires want to run a protocol at the end of which they will know which one of them has more millions, but not know exactly how many millions the other one has. If you dig deeper, you’ll find a few antecedents, but that’s the paper that’s usually credited with launching the field of cryptographic computing. Over the course of my 40 years as a computer scientist, I’ve worked in many different areas of computer science research, but I’ve always come back to cryptographic computing, because it’s absolutely fascinating and has many practical applications.
Bill: I originally got my PhD in Machine Learning in 1993, but I switched over to security in the late 1990s. I’ve spent most of my career in industrial research laboratories, where I was always interested in how to bring technology out of the lab and get it into real products. There’s a lot of interest from customers right now around cryptographic computing, and so I think that we’re at a really interesting point in time, where this could take off in the next few years. Being a part of something like this is really exciting.
What exactly is cryptographic computing?
Bill: Cryptographic computing is not a single thing. Rather, it is a methodology for protecting data in use—a set of techniques for doing computation over sensitive data without revealing that data to other parties. For example, if you are a financial services company, you might want to work with other financial services companies to develop machine learning models for credit card fraud detection. You might need to use sensitive data about your customers as training data for your models, but you don’t want to share your customer data in plaintext form with the other companies, and vice versa. Cryptographic computing gives organizations a way to train models collaboratively without exposing plaintext data about their customers to each other, or even to an intermediate third party such as a cloud provider like AWS.
Why is it challenging to protect data in use? How does cryptographic computing help with this challenge?
Bill: Protecting data-at-rest and data-in-transit using cryptography is very well understood.
Protecting data-in-use is a little trickier. When we say we are protecting data-in-use, we mean protecting it while we are doing computation on it. One way to do that is with other types of security mechanisms besides encryption. Specifically, we can use isolation and access control mechanisms to tightly control who or what can gain access to those computations. The level of control can vary greatly from standard virtual machine isolation, all the way down to isolated, hardened, and constrained enclaves backed by a combination of software and specialized hardware. The data is decrypted and processed within the enclave, and is inaccessible to any external code and processes. AWS offers Nitro Enclaves, which is a very tightly controlled environment that uses this kind of approach.
Cryptographic computing offers a completely different approach to protecting data-in-use. Instead of using isolation and access control, data is always cryptographically protected, and the processing happens directly on the protected data. The hardware doing the computation doesn’t even have access to the cryptographic keys used to encrypt the data, so it is computationally intractable for that hardware, any software running on that hardware, or any person who has access to that hardware to learn anything about your data. In fact, you arguably don’t even need isolation and access control if you are using cryptographic computing, since nothing can be learned by viewing the computation.
What are some cryptographic computing techniques and how do they work?
Bill: Two applicable fundamental cryptographic computing techniques are homomorphic encryption and secure multi-party computation. Homomorphic encryption allows for computation on encrypted data. Basically, the idea is that there are special cryptosystems that support basic mathematical operations like addition and multiplication which work on encrypted data. From those simple operations, you can form complex circuits to implement any function you want.
Secure multi-party computation is a very different paradigm. In secure multi-party computation, you have two or more parties who want to jointly compute some function, but they don’t want to reveal their data to each other. An example might be that you have a list of customers and I have a list of customers, and we want to find out what customers we have in common without revealing anything else about our data to each other, in order to protect customer privacy. That’s a special kind of multi-party computation called private set intersection (PSI).
Joan: To add some detail to what Bill said, homomorphic encryption was heavily influenced by a 2009 breakthrough by Craig Gentry, who is now a Research Fellow at the Algorand Foundation. If a customer has dataset X, needs f(X), and is willing to reveal X to the server, he uploads X and has the cloud service compute Y= f(X) and return Y. If he wants (or is required by law or policy) to hide X from the cloud provider, he homomorphically encrypts X on the client side to get X’, uploads it, receives an encrypted result Y’, and homomorphically decrypts Y’ (again on the client side) to get Y. The confidential data, the result, and the cryptographic keys all remain on the client side.
In secure multi-party computation, there are n ≥ 2 parties that have datasets X1, X2, …, Xn, and they wish to compute Y=f(X1, X2, …, Xn). No party wants to reveal to the others anything about his own data that isn’t implied by the result Y. They execute an n-party protocol in which they exchange messages and perform local computations; at the end, all parties know the result, but none has obtained additional information about the others’ inputs or the intermediate results of the (often multi-round) distributed computation. Multi-party computation might use encryption, but often it uses other data-hiding techniques such as secret sharing.
Cryptographic computing seems to be appearing in the popular technical press a lot right now and AWS is leading work in this area. Why is this a hot topic right now?
Joan: There’s strong motivation to deploy this stuff now, because cloud computing has become a big part of our tech economy and a big part of our information infrastructure. Parties that might have previously managed compute environments on-premises where data privacy is easier to reason about are now choosing third-party cloud providers to provide this compute environment. Data privacy is harder to reason about in the cloud, so they’re looking for techniques where they don’t have to completely rely on their cloud provider for data privacy. There’s a tremendous amount of confidential data—in health care, medical research, finance, government, education, and so on—data which organizations want to use in the cloud to take advantage of state-of-the-art computational techniques that are hard to implement in-house. That’s exactly what cryptographic computing is intended for: using data without revealing it.
Bill: Data privacy has become one the most important issues in security. There is clearly a lot of regulatory pressure right now to protect the privacy of individuals. But progressive companies are actually trying to go above and beyond what they are legally required to do. Cryptographic computing offers customers a compelling set of new tools for being able to protect data throughout its lifecycle without exposing it to unauthorized parties.
Also, there’s a lot of hype right now about homomorphic encryption that’s driving a lot of interest in the popular tech press. But I don’t think people fully understand its power, applicability, or limitations. We’re starting to see homomorphic encryption being used in practice for some small-scale applications, but we are just at the beginning of what homomorphic encryption can offer. AWS is actively exploring ideas and finding new opportunities to solve customer problems with this technology.
Can you talk about the research that’s been done at AWS in cryptographic computing?
Joan: We researched and published on a novel use of homomorphic encryption applied to a popular machine learning algorithm called XGBoost. You have an XGBoost model that has been trained in the standard way, and a large set of users that want to query that model. We developed PPXGBoost inference (where the “PP” stands for privacy preserving). Each user stores a personalized, encrypted version of the model on a remote server, and then submits encrypted queries to that server. The user receives encrypted inferences, which are decrypted and stored on a personal device. For example, imagine a healthcare application, where over time the device uses these inferences to build up a health profile that is stored locally. Note that the user never reveals any personal health data to the server, because the submitted queries are all encrypted.
There’s another application our colleague Eric Crockett, Sr. Applied Scientist, published a paper about. It deals with a standard machine-learning technique called logistic regression. Crockett developed HELR, an application that trains logistic-regression models on homomorphically encrypted data.
Both papers are available on the AWS Cryptographic Computing webpage. The HELR code and PPXGBoost code are available there as well. You can download that code, experiment with it, and use it in your applications.
What are you working on right now that you’re excited about?
Bill: We’ve been talking with a lot of internal and external customers about their data protection problems, and have identified a number of areas where cryptographic computing offers solutions. We see a lot of interest in collaborative data analysis using secure multi-party computation. Customers want to jointly compute all sorts of functions and perform analytics without revealing their data to each other. We see interest in everything from simple comparisons of data sets through jointly training machine learning models.
Joan: To add to what Bill said: We’re exploring two use cases in which cryptographic computing (in particular, secure multi-party computation and homomorphic encryption) can be applied to help solve customers’ security and privacy challenges at scale. The first use case is privacy-preserving federated learning, and the second is private set intersection (PSI).
Federated learning makes it possible to take advantage of machine learning while minimizing the need to collect user data. Imagine you have a server and a large set of clients. The server has constructed a model and pushed it out to the clients for use on local devices; one typical use case is voice recognition. As clients use the model, they make personalized updates that improve it. Some of the local improvements made locally in my environment could also be relevant in millions of other users’ environments. The server gathers up all these local improvements and aggregates them into one improvement to the global model; then the next time it pushes out a new model to existing and new clients, it has an improved model to push out. To accomplish privacy-preserving federated learning, one uses cryptographic computing techniques to ensure that individual users’ local improvements are never revealed to the server or to other users in the process of computing a global improvement.
Using PSI, two or more AWS customers who have related datasets can compute the intersection of their datasets—that is, the data elements that they all have in common—while hiding crucial information about the data elements that are not common to all of them. PSI is a key enabler in several business use cases that we have heard about from customers, including data enrichment, advertising, and healthcare.
This post is meant to introduce some of the cryptographic computing and novel use cases AWS is exploring. If you are serious about exploring this approach, we encourage you to reach out to us and discuss what problems you are trying to solve and whether cryptographic computing can help you. Learn more and get in touch with us at our Cryptographic Computing webpage or send us an email at [email protected]
Want more AWS Security news? Follow us on Twitter.