An analysis tool for open-source communities

CHALLENGE: Dependency on Open-Source Software Increases Supply Chain Risk

The modern digital ecosystem is built on a fragile open-source foundation. From the smallest apps to the largest applications, software is built on software which is built on even more software, forming complex dependency chains that, when followed far enough, very often include open-source code. While the open-source ethos has huge upsides – democratizing development and offering cost-effective, transparent alternatives to proprietary systems – its open nature also results in increased exposure to supply chain vulnerabilities.

Because anyone can contribute, open-source software is changing all the time. Volunteer contributors come and go, and no one is ultimately responsible for communicating or vetting those changes, or ensuring the software stays up to date. In addition, while the majority of changes are helpful updates or improvements, sometimes sloppy or malicious code will be introduced, leading to negative downstream effects. Because of the complex, interconnected nature of software supply chains, even relatively small edits to an open-source project or library can cause problems for millions. 

In 2016, for example, a programmer unpublished 11 lines of open-source code, a tiny JavaScript package called “left-pad,” resulting in a cascade of errors across the globe as developers realized that a single strand of their software’s dependency chains could ultimately be traced to “left-pad.” In 2022, another open-source developer intentionally corrupted two of his own widely-used libraries, impacting millions. Most recently, in 2024, a backdoor was discovered in a malicious update of open-source Linux tool xz Utils – a vulnerability that appears likely to be the result of a multi-year operation. 

Incidents like these, along with countless smaller examples, highlight the critical need for a robust tool to analyze the tangled webs of open-source ecosystems. 


Enter LAGOON: an advanced analysis tool designed to dissect the intricacies of open-source software communities and identify vulnerabilities, threats, and potentially malicious contributors. Stemming from Galois and the University of Vermont’s work for the DARPA Social Cyber program, the LAGOON tool includes databases that can ingest multiple types of data from open-source ecosystems, and combine them to create interactive spatiotemporal visualizations. Next, LAGOON’s machine learning algorithms can analyze these graphs, connecting dots and yielding actionable insights that help users understand the intricate relationships between developers, commits, files, and discussions in open-source software. This analysis can predict critical disruptions like developer disengagement, flag toxic interactions that may lead to talent attrition, and spot vulnerabilities that may compromise the integrity of a project.

With LAGOON, open-source software ecosystems can be analyzed to root out potential problems ahead of time, mitigating risk and safeguarding the software supply chain.

By providing a comprehensive view of open-source ecosystems, LAGOON serves as an indispensable tool for any entity reliant on open-source software, from government agencies to corporate enterprises. In addition, its batched integration system offers ongoing oversight without the need for continuous re-uploads, ensuring that databases remain up-to-date with minimal fuss. Finally, LAGOON exemplifies the principles it champions—it’s an open-source tool available for public use, inviting collaboration and continuous improvement from the global community, while simultaneously securing the open-source ecosystem.


  • Predictive Capabilities: Utilizes machine learning to analyze critical issues within a community, such as developer disengagement.
  • Toxicity Monitoring: Identifies and tracks toxic interactions, preventing potential attrition of valuable community members.
  • Security Insight: Spots vulnerabilities and problematic code commits, reinforcing the integrity of open-source projects.
  • Time-Efficient Integration: Features a batching system that allows for efficient updating with new data, saving time without requiring full re-uploads.
  • Open Source for Open Source: As an open-source platform, it encourages broad community input and collaborative refinement.
  • Universally Applicable: Essential for any organization leveraging open-source software anywhere in its extended software supply chain, from government to corporate.
  • Innovation Safeguard: Ensures the reliability of the digital infrastructure that powers cross-industry innovation.