We recently sat down with Galois Principal Scientist David Darais, and Research Engineers Rawane Issa and Sourya Dey to discuss AI/ML and security/privacy, as well as the cryptographic techniques and research being done by Galois to mitigate these threats. The Q&A below has been edited for brevity and clarity.
Q: How would you frame the challenge of maintaining security and privacy when using AI/ML tools? Why is this difficult? Why is this even a concern?
David: There are tons of privacy and security issues, but there seems to be not much interest in plugging the holes. No one is taking it seriously. So before we get to technical solutions, I think people need to understand that they even need a technical solution.
Sourya: One of the big issues right now is that LLMs are trained on a huge amount of data, and many of these LLMs are not open source, meaning that we don’t know exactly what they were trained on or what they contain inside the model. There have been demonstrated cases where prompting an LLM in different ways can make it give you more or different information, and that can be good or bad. Let’s say you’re trying to use the LLM to do math. Including phrases like, “Please answer truthfully and accurately,” or “Pretend you’re an expert” in the prompt can actually result in the model doing better at math. That’s a good result. But there are other phrases or sequences of characters that can lead to what we call “jailbreaking prompts,” which can be used to make the LLM tell people things we’d rather them not know, including sensitive or potentially dangerous information.
Q: So, in that example, there’s this huge ecosystem of information that has been entered into the LLM, and you might have someone at a company or government agency who has been using sensitive information as part of their prompts, and the LLM is taking that information and helping them write their reports faster. And they’re thinking: “This is great! And, presumably, it’s secure!”
Rawane: Yes, but the problem is that LLMs, like ChatGPT, memorize these prompts and use them to train the model further. There was a case recently where I think someone at Samsung was writing prompts that included sensitive source code, and ChatGPT memorized it and then started giving that sensitive information out in responses to other people. That’s how the information was leaked.
David: Going beyond LLMs, this connects to one of the classic AI/ML security issues, known as a “Model Inversion Attack,” where you can ask the model to regurgitate its input data. If you’re training it on data that’s sensitive or classified and it can regurgitate that data, that’s a big security concern. Another classic issue is what people call a “Data Poisoning Attack.” This is used to trick image recognition ML models. For example, you might have a smart car that can identify and respond to stop signs, and you can put this weird sticker on a stop sign and all of a sudden your model won’t see it as a stop sign anymore.
Rawane: My favorite example is the one where you poison the data set so that when you give it an image of a tabby cat it tells you it’s an avocado.
David: Right! So, you can trick AI/ML models in these really specific ways.
Q: So how do you mitigate that risk?
David: Most companies just want to not be liable for people putting proprietary information on LLMs. A lot of the solutions being proposed right now for LLM problems tend to start with: “Just retrain your own LLM from scratch and host it on your own premises…” Their IT department can spin up a local instance of a crappier LLM that is only available within the company, forgets prompts, and is less useful as a result.
And right there, nobody’s going to actually use that. Everybody’s going to just use the latest version of ChatGPT, and you can’t get any assurances with that. So, if I’m an organization and I’m worried about people putting company IP into ChatGPT, that’s like a firewall deep packet inspection kind of thing. I’m going to be sniffing all the packets that hit the boundary of my corporate network and flagging contents containing company information.
Q: It sounds like that’s the core dilemma here. You’ve got companies and government agencies who are saying: “Look man, everybody is using this, and we need to keep up. And to do that, we need to be able to use the best and latest versions of these tools, but you’re telling me that it’s too risky.” And just designing a tool to forget any sensitive information input doesn’t work either, because then your model loses the benefits of having that training data – which means it’s less useful.
David: Yeah, you want it to remember those inputs for better results, but you don’t for security and privacy!
Q: Are there better solutions?
David: Yes! Two of the best techniques are Differential Privacy and Multi-Party Computation (MPC). Differential Privacy is a technique that prevents an AI/ML model from becoming overly dependent on any specific input. I think about it as: “Making stuff blurry.”
Rawane: At a high level, Differential Privacy is one way to decide what functions are private by adding noise either into the input or within the algorithm. So, you kind of put a blanket over the output of the function to muffle it, and as a result, users can plausibly deny that their individual inputs ever produced a particular output or were even part of the computation to begin with. The great thing about Differential Privacy is that while the output is noisy, the trend is preserved. So if you are interested in learning the racial makeup of a place in terms of ratios, not in terms of exact numbers, that trend is preserved. Consistently adding noise is not going to change it. And a lot of analytics are interested in trends rather than in specific outputs.
David: Multi-Party Computation, on the other hand, is about simulating a trusted third party. So let’s say you have a friend named Bob. Bob has infinite time, infinite compute, and everyone trusts Bob unilaterally. Everyone just sends their sensitive data to Bob, and he computes what needs computing and sends it back to the right people. If you don’t have Bob, you need MPC.
Rawane: Or let’s say you have two hospitals that want to collaborate and combine their data to make inferences and inform public health. For example, maybe they both want to see the trends of infectious diseases in certain populations, but to do so they would have to share data that is protected under HIPAA. MPC and Differential Privacy help with that because you can get the analytic results without risking patient privacy or HIPAA violations.
When I was doing my PhD the Boston Women’s Workforce Council (BWWC) wanted to compute the wage gap between genders. Companies don’t want to share that data because they don’t want to risk being seen as the one who is paying women or ethnic minority groups less. So Boston University and the BWWC partook in an MPC to enable that computation. Companies’ information remained hidden throughout the computation and only the results of the computation, statistics about the wage gaps across groups, was revealed as part of the City of Boston’s and the BWWC’s report. If you’re a company in Boston, it’s really nice to have that badge to show that you participated in the computation while resting assured that your data is private, and the City of Boston could track progress towards closing that gap in the Greater Boston area.
Q: So is it better to use Differential Privacy or MPC or both?
Rawane: Usually both, because using just one can sometimes leave gaps. There is a classic example of Differential Privacy vs. MPC we can actually do right now. Imagine that we were looking to calculate the average of our salaries. Under MPC, we could compute this securely, learning the average without any of us ever knowing the others’ salaries. Now, let’s say Sourya drops out of the computation, and we compute the average again.
Q: Ah, I get it. So we see that the resulting number changes. It goes up or down, and we can infer: “Ok, Sourya’s salary is above or below X.”
Rawane: Right! Both of these were secure under MPC, but they were not trying to protect what you can infer from the output. That was never the point of MPC. This is where Differential Privacy comes in. If we had used Differential Privacy, you could not tell with high assurance what Sourya’s salary was.
So Differential Privacy and Multi-Party Computation are orthogonal. One of them tells you what functions are ok to compute to preserve the privacy of the inputs, and the other one asks you: “How do you compute that function in order not to reveal the input during the computation?”
David: Yeah, you’ve really gotta use both of these technologies.
Q: Got it. What is Galois doing in this space?
David: A lot! Sourya and Rawane have been working on solving the problem of membership inference attacks related to a client’s public-facing statistics. It’s kind of a red teaming exercise where we’re showing: “Right now you can infer all this information that you probably don’t want to.”
And then we’re doing a ton of other stuff. Galois Principal Scientist Dave Archer led a project a few years ago where we developed a solution for the Department of Education that would allow two different internal offices to securely share and analyze data on how students finance their post-secondary studies. We showed we could do it in a way that preserved privacy and would save money by removing the need to hire a third party to do the analysis for them and that would constitute a single point of failure should they fall victim to cyber-attacks. This solution was built within the Department’s existing ecosystem without relying on any specialized hardware. Just last year we helped a Fortune 100 company use PSI to securely analyze confidential clients records. I think that’s projected to save them something like $12 million per year.
Then we’ve been involved with a bunch of DARPA research in this area. One was called Collaborative Secure Learning. That was all about using MPC to train AI/ML models to analyze sensitive data from two different parties. The scenarios we looked at involved creating a joint image recognition model of allied and enemy vehicles. I don’t want to send you all the raw images of my vehicles, and you don’t want to send me all the raw images of your vehicles, because that risks giving away more information than intended. With MPC, you can compose all those images together without sharing them with either party, jointly combining those data sets so that we can get a classifier we can both use.
Sourya: We’re also doing a project right now where we’re basically trying to find out: “Under what circumstances can a large language model spit out sensitive information?” Right now one of the biggest problems is that these LLMs are so new that we don’t know where the problems are. So we’re aiming to better understand how these models actually work.
Q: Right, it’s kind of a black box problem. They work, but we don’t totally understand them, so we don’t totally grasp all the risks. So, bringing this full circle, you mentioned at the very beginning that there are many security and privacy risks associated with using AI/ML tools, but most people are just using them anyway, reaping their benefits and hoping for the best.
David: There’s an analogous story we tell in the cyber world: Security is very much Whack-a-Mole. You build a system and discover “it could be hacked in this way,” and then you patch it. Then you find out “It could be hacked in that way,” so you patch it again. Galois’s whole Formal Methods approach to software engineering is trying to get to a place where you’ve just solved the problem and you’re not playing Whack-a-Mole when this or that pops up.
You can tell the exact same story with AI/ML tools and privacy or security. Like, we can play Whack-a-Mole on this privacy leak here and that privacy leak there. This model might regurgitate sensitive inputs if you jailbreak it this way, and then you patch it. Or, using these cryptography and data privacy techniques, you can just solve the problem and you’re done playing Whack-a-Mole. So what future do we want to be in? The Whack-a-Mole future or the problem-solved future?”