MATE: Interactive Program Analysis with Code Property Graphs

Galois is open-sourcing MATE, a suite of tools for interactive program analysis with a focus on hunting for bugs in C and C++ code. MATE unifies application-specific and low-level vulnerability analysis using code property graphs (CPGs), enabling the discovery of highly application-specific vulnerabilities that depend on both implementation details and the high-level semantics of target C/C++ programs.

MATE primarily finds vulnerabilities by static program analysis over the target’s CPG, which combines representations of a program’s syntax, control-flow, and dependencies into a unified graph structure that can be queried to identify potential flaws. The MATE CPG consists of the target’s:

  • abstract syntax tree (AST)
  • call graph (CG)
  • control-flow graph (CFG)
  • inter-procedural control-flow graph (ICFG)
  • inter-procedural dataflow-graph (DFG)
  • control-dependence graph (CDG)
  • points-to graph (PTG)
  • source-code to machine-code mapping
  • memory layout and DWARF type graph

MATE Tools

MATE comes with a number of applications built on top of the foundation of the CPG.

Flowfinder

Flowfinder displays a small fragment of a CPG in a browser window.

Flowfinder is an interactive, graphical, browser-based user interface for exploring a program’s code property graph. Similar to other program analysis tools, such as IDA Pro, Binary Ninja, and angr management, Flowfinder is designed to help answer questions such as “How does this data get from here to there and how is it changed along the way?” or “If I can control this buffer, what effect can I have on the execution of the program?” By leveraging the CPG, Flowfinder enables interprocedural analysis of program dataflows at a relatively high level of abstraction. Rather than navigating by jumping between views of the program’s concrete syntax (whether disassembly or source code), Flowfinder is designed to support expanding and contracting semantic representations of code and data as needed and creating and manipulating visualizations of high-level flows between different components.

MATE Notebooks

MATE has a Python API for querying the CPG and exposes browser-based, interactive Jupyter notebooks with this query interface pre-loaded. These notebooks can be used to write complex, recursive, whole-program queries that answer detailed questions like “What sequences of function calls can lead from point A to point B in this program?” or “Can user input flow into a memory location with a specific struct type, and from there to some particular function without passing through one of these three sanitization routines?”

MATE POIs

MATE ships with a number of automated analyses that detect potential vulnerabilities, called Points of Interest (POIs). These detectors are written in the same Python API available in the MATE notebooks; it’s easy to write additional application-, domain-, or API-specific detectors. Potential vulnerabilities found by these queries can be viewed in Flowfinder for collaboration and triage. The following table lists a few examples of MATE detectors (see the MATE documentation for a complete list):

AnalysisCPG LayersDescription
PathTraversalDFGFind calls to filesystem operations where the path may be influenced by user input
PointerDisclosureDFGFinds pointer-typed values that may be output to the user
OverflowableAllocationsDFGFinds calls to malloc where the size calculation may be influenced by user input
TruncatedIntegerDFGFinds calls to malloc where the size may be influenced by user input and the input is used elsewhere as a signed integer
IteratorInvalidationCFGFinds uses of C++ iterators subsequent to iterator-invalidating collection modifications
UninitializedStackMemoryCFG, PTGFinds potential intra- and inter-procedural uses of uninitialized stack memory

Mantiserve

Mantiserve integrates the CPG with the Manticore symbolic execution tool. Symbolic execution complements MATE’s facilities for reasoning about high-level data flows by providing means to explore detailed and low-level issues like memory corruption. There are two primary modes of using Manticore with MATE.

Exploration: Mantiserve ships with “detectors” which use data from the CPG to detect memory corruption during symbolic execution.

  • The Variable Bounds Access Detector searches for out of bounds memory access on the stack.
  • The Uninitialized Stack Variable Detector searches for variables allocated on the stack and used prior to initialization.
  • The Use After Free (UAF) Detector searches for and validates UAF vulnerabilities from calls to malloc and free.

Under-constrained symbolic execution: Unlike traditional symbolic execution which begins at the program entry point and executes until the program exits, under-constrained symbolic execution starts at an arbitrary function in the program. This specificity means that under-constrained symbolic execution can analyze parts of programs that would be too large or complex for traditional symbolic execution. As explained in our previous blog post Under-Constrained Symbolic Execution with Crucible, under-constrained symbolic execution may lead to false positives due to unknown preconditions. MATE’s under-constrained feature comes with a web UI that allows users to provide constraints to avoid this issue.

Use-Cases and Comparison to Other Tools

We built MATE with two primary use-cases in mind:

  1. Use by security researchers to find bugs in C and C++ programs
  2. Integration of the CPG and corresponding Python API into other applications

The following table compares MATE to tools with similar goals:

ToolAdvantages over MATEDisadvantages vs MATE
GitHub/Semmle CodeQL
  • Commercial code analysis for C/C++, Java, etc.
  • Custom logic-based query language
  • Library of detectors for vulnerabilities and flaws
  • Robust commercial implementation
  • Faster whole-program analysis
  • Large library of provided and community-authored vulnerability and code quality checks
  • IDE integration
  • Does not explicitly model memory relationships
  • Less comprehensive dataflow analyses
  • Limited support for interactive exploration and visualization
  • Difficult to integrate queries into custom workflows or applications
Joern
  • Open source CPG analysis
  • Custom query language DSL
  • Coverage of wide range of languages
  • Fuzzy parsing approach gives best-effort results on most programs without significant effort
  • Limited support for inter-procedural dataflow analysis
  • Limited support for interactive exploration and visualization
  • Difficult to integrate queries into custom workflows or app
Semgrep/weggli
  • AST-aware grep for code
  • Fast, low-latency interactions
  • Syntax-focused analysis works on all programs without pre-processing
  • Limited support for inter-procedural dataflow analysis
  • Limited support for interactive exploration and visualization
Other SAST tools, e.g., Coverity, CodeSonar, Veracode, Infer, Clang Static Analyzer
  • Robust implementations
  • Large libraries of provided and community-authored vulnerability and code quality checks
  • IDE integration
  • Limited support for interactive exploration and visualization
  • Difficult to integrate queries into custom workflows or app
Interactive binary-level bug-hunting tools, e.g., IDA Pro, Binary Ninja, and angr management
  • Robust commercial implementation
  • Can work without source code
  • Community support and features (e.g., community-authored plugins)
  • Limited support for inter-procedural dataflow analysis
  • Limited support for high-level (source-level) semantics

Limitations

MATE has several important limitations:

  • MATE analyzes only statically-linked code, so it can’t find bugs or follow control- and data-flows in dynamically-linked libraries without users writing detailed “signatures” for external code.
  • MATE analyzes LLVM bitcode. In practice, obtaining LLVM bitcode requires access to the source code, that the project can be compiled using clang/clang++, and may require some mucking around with the build system. Additionally, it’s much easier to use and understand MATE given familiarity with the LLVM language, but such familiarity is fairly uncommon.
  • MATE’s static analysis is fairly heavy-weight. The pointer analysis in particular requires a significant amount of time and RAM, on the order of hours and up to dozens of GB for large programs. Furthermore, these requirements don’t relate predictably to program size or other features.
  • MATE is still research-grade software. We have worked hard to make it robust, but not all of MATE’s tools and features will work well on all programs.

Conclusion and Acknowledgements

We’re happy to finally share MATE with the research community under the BSD 3-clause license. We look forward to discussing possible collaborations, additional use cases, and future research directions. Please reach out to mate@galois.com to start a conversation! More information about MATE is available in the project documentation.
MATE was developed collaboratively by Galois, Trail of Bits, and the lab of Dr. Stephen Chong at Harvard on the DARPA CHESS program. This material is based upon work supported by the United States Air Force and Defense Advanced Research Project Agency (DARPA) under Contract No. FA8750-19-C-0004.