The US Department of Education (DOE) was in a quandary. Every few years, they were required to report to Congress on the state of undergraduate student financial aid in the United States, but the confidential data needed to create the report was divided between two different internal offices—offices forbidden by policy to share data with each other.
The National Center for Educational Statistics (NCES) held information on students’ programs of study and the colleges they attended; meanwhile, the National Student Loan Data System (NSLDS), kept detailed financial records like family income levels, demographics, and each student’s federal grants and loans.
“Now you can imagine that if you’re the Department of Education and you have to build this report for Congress, you want to be able to bring those two sets of data together,” said Galois Principal Scientist David Archer. “The problem is that, due to data privacy policies, they’re not allowed to share that information with one another.”
For years, the DOE’s workaround solution had been to hire a trusted-third party to combine and analyze the data and write the report. Despite these contractors doing the work under a strict non-disclosure agreement, it was a process fraught with risk, extra costs, and inefficiency.
In 2020, the then-head of NCES approached Galois to see if we could come up with a solution that would allow the DOE to share and analyze the data on its own, while still preserving privacy. The result was impressive. After just two intense months of hands-on work, Galois executed an innovative cryptographic technique known as Private Set Intersection (PSI). The multi-party protocol allowed information from each database to be cross-linked, computed on and analyzed, all while ensuring time efficiency, pinpoint accuracy, and most crucially, rigorous privacy preservation.
“All you learn about the data is the statistics—the values coming out,” Archer explained. “We went from a very leaky security story to a cryptographically sound security story in two months. Now, using PSI, the DOE has the ability to run their analysis and create their report for Congress themselves in a way that accomplishes everything they need. They have a sustainable path forward.”
While the program has yet to be rolled out, the prototype demonstrated the remarkable potential of private set intersection and is being actively evaluated for use across multiple sectors.
Beyond DOE: Unlocking Potential Across Sectors
In an era when the world’s most valuable resource is data, the dilemmas of privacy and technology are more pressing than ever. With data often locked behind walls of confidentiality or legislative constraints (often for very good reasons), meaningful statistics that could inform public policies or business strategies have remained frustratingly out of reach—until now. Galois’s innovative PSI work not only secures sensitive information but also opens the door to a wealth of previously inaccessible data analytics.
The implications of PSI are vast, and while the DOE’s use case is compelling, the technology holds promise for a broad swathe of other applications.
“For example, there are laws that specifically forbid the IRS and Census from sharing data,” said Archer. “But if you can share that data while keeping it encrypted, there are so many things researchers and policymakers would love to be able to do. For example, take data on college graduation: ‘What field did you study?’ and ‘At what college or university?’ You could take that data from the National Student Clearinghouse and correlate it with IRS data on financial income a decade later, and answer questions like: ‘What’s the most valuable job category in the country?’ or ‘What universities should students think about going to?’ You could use that data to predict your earning capacity 10 years from now based on the choices you make today, all without ever risking the exposure or misuse of the private information that was used to do the prediction.”
In short, big data for better decision-making, with risk removed.
PSI shows promise in the commercial sector as well. This past year, Galois helped a Fortune 100 company use Private Set Intersection to securely match and analyze disparate sets of confidential client records – an effort projected to save an estimated $12 million annually.
The technology even has broad appeal for law enforcement and defense applications, where sensitive operations could potentially be compromised due to miscommunication or data breaches.
“In every state in the union there is an organization whose job is to deconflict law enforcement operations,” said Archer. “So if a law enforcement agency wants to let everyone know: ‘Hey, we’re going to conduct X kind of operation on Y night in place Z,’ they can add it to a database to ensure a different agency isn’t running a conflicting operation. Now, ostensibly it’s a secure database, but there’s always a risk that the database gets compromised.”
Galois’ PSI technology, demonstrated for U.S. defense agencies and others, allows agencies to cross-check their operations on a database without revealing compromising details. Thus allowing the benefits of deconfliction operations without risking operational security by pooling sensitive information in a single location.
In other words, PSI can keep both secrets and people safe.
The Future of Private Set Intersection
As Galois continues to refine and implement Private Set Intersection across various domains, the enormous potential of this technology is becoming increasingly clear. PSI ensures that sensitive information remains protected, even as it is being analyzed for valuable or needed insights. In other words, PSI effectively bridges the gap between our simultaneous needs for privacy and progress, security and innovation.
From equipping policymakers with otherwise inaccessible information to improve decision making, to securely deconflicting law enforcement and military operations, to saving companies millions, the potential horizon for PSI is exciting indeed.
A mission as complex as merging confidential data sets from multiple sources demands a solution as sophisticated as Private Set Intersection. Galois has shown that a new paradigm for data privacy, without sacrificing the data’s potential, is not only possible but practical.