In March, 2024, researchers discovered a backdoor hidden in an update of open-source Linux tool XZ Utils – a vulnerability that appears likely to be the result of a multi-year, state-sponsored supply chain attack. This latest close call is only the most recent in a growing history of incidents underscoring the fragility of a modern digital ecosystem built on a foundation of open-source software.
In 2016, for example, a programmer unpublished 11 lines of open-source code, a tiny JavaScript package called “left-pad,” resulting in a cascade of errors across the globe. In 2022, another open-source developer intentionally corrupted two of his own widely-used libraries, impacting millions.
Incidents like these, along with countless smaller examples, illustrate the broader security implications and risk inherent to our reliance on interconnected software supply chains, where the failure of a single element can disrupt the entire ecosystem. From the smallest apps to the largest applications, software is built on software which is built on even more software, forming complex dependency chains that, when traced far enough, very often include open-source code. While the open-source ethos has huge upsides – democratizing development and offering cost-effective, transparent alternatives to proprietary systems – its open nature also results in increased exposure to supply chain vulnerabilities.
“Anybody can contribute to open source projects and their source code is available to everybody, but that also makes them vulnerable,” explained Galois research engineer Sourya Dey. “Somebody will make a relatively small change to an open-source project or library, and that can have a lot of bad downstream effects.”
The critical challenge lies in safeguarding the integrity of software supply chains without stifling the innovation and collaboration that drive them.
Analyzing Open-Source Software Ecosystems
In 2022, aiming to meet this challenge head on, Galois and the University of Vermont developed LAGOON, an advanced analysis tool designed to dissect the intricacies of open-source communities and identify vulnerabilities, threats, and potentially malicious contributors.
“LAGOON came out of work we did for the DARPA Program known as Social Cyber,” Dey explained. “The goal was to analyze ecosystems of open-source projects to understand: ‘What are the vulnerabilities?’ ‘When might they fail?’ And ‘What can we be doing better?’”
The process begins with LAGOON ingesting multiple types of data from open-source ecosystems and combining them to create interactive spatiotemporal visualizations. Next, machine learning algorithms analyze these graphs, connecting dots and yielding actionable insights that help users understand the intricate relationships between developers, commits, files, and discussions in open-source software. This analysis aims to predict critical disruptions like developer disengagement, flag toxic interactions that may lead to talent attrition or indicate malicious intent, and spot vulnerabilities that may compromise the integrity of a project.
By providing a comprehensive view of open-source ecosystems, LAGOON serves as an indispensable tool for any entity reliant on open-source software, from government agencies to corporate enterprises. In addition, its batched integration system offers ongoing oversight without the need for continuous re-uploads, ensuring that databases remain up-to-date with minimal fuss. Finally, LAGOON exemplifies the principles it champions—it’s an open-source tool available for public use, inviting collaboration and continuous improvement from the global community, while simultaneously securing the open-source ecosystem.
“With Lagoon, open-source software such as Linux tools like XZ can be analyzed to potentially root out the problem ahead of time,” Dey said. “Now, moving forward, users who depend on these tools can use Lagoon to maintain security and flag suspicious activity.”
For more information, please visit the LAGOON Project Page.