Galois is open-sourcing MATE, a suite of tools for interactive program analysis with a focus on hunting for bugs in C and C++ code. MATE unifies application-specific and low-level vulnerability analysis using code property graphs (CPGs), enabling the discovery of highly application-specific vulnerabilities that depend on both implementation details and the high-level semantics of target C/C++ programs.
MATE primarily finds vulnerabilities by static program analysis over the target’s CPG, which combines representations of a program’s syntax, control-flow, and dependencies into a unified graph structure that can be queried to identify potential flaws. The MATE CPG consists of the target’s:
- abstract syntax tree (AST)
- call graph (CG)
- control-flow graph (CFG)
- inter-procedural control-flow graph (ICFG)
- inter-procedural dataflow-graph (DFG)
- control-dependence graph (CDG)
- points-to graph (PTG)
- source-code to machine-code mapping
- memory layout and DWARF type graph
MATE comes with a number of applications built on top of the foundation of the CPG.
Flowfinder is an interactive, graphical, browser-based user interface for exploring a program’s code property graph. Similar to other program analysis tools, such as IDA Pro, Binary Ninja, and angr management, Flowfinder is designed to help answer questions such as “How does this data get from here to there and how is it changed along the way?” or “If I can control this buffer, what effect can I have on the execution of the program?” By leveraging the CPG, Flowfinder enables interprocedural analysis of program dataflows at a relatively high level of abstraction. Rather than navigating by jumping between views of the program’s concrete syntax (whether disassembly or source code), Flowfinder is designed to support expanding and contracting semantic representations of code and data as needed and creating and manipulating visualizations of high-level flows between different components.
MATE has a Python API for querying the CPG and exposes browser-based, interactive Jupyter notebooks with this query interface pre-loaded. These notebooks can be used to write complex, recursive, whole-program queries that answer detailed questions like “What sequences of function calls can lead from point A to point B in this program?” or “Can user input flow into a memory location with a specific struct type, and from there to some particular function without passing through one of these three sanitization routines?”
MATE ships with a number of automated analyses that detect potential vulnerabilities, called Points of Interest (POIs). These detectors are written in the same Python API available in the MATE notebooks; it’s easy to write additional application-, domain-, or API-specific detectors. Potential vulnerabilities found by these queries can be viewed in Flowfinder for collaboration and triage. The following table lists a few examples of MATE detectors (see the MATE documentation for a complete list):
|PathTraversal||DFG||Find calls to filesystem operations where the path may be influenced by user input|
|PointerDisclosure||DFG||Finds pointer-typed values that may be output to the user|
|OverflowableAllocations||DFG||Finds calls to malloc where the size calculation may be influenced by user input|
|TruncatedInteger||DFG||Finds calls to malloc where the size may be influenced by user input and the input is used elsewhere as a signed integer|
|IteratorInvalidation||CFG||Finds uses of C++ iterators subsequent to iterator-invalidating collection modifications|
|UninitializedStackMemory||CFG, PTG||Finds potential intra- and inter-procedural uses of uninitialized stack memory|
Mantiserve integrates the CPG with the Manticore symbolic execution tool. Symbolic execution complements MATE’s facilities for reasoning about high-level data flows by providing means to explore detailed and low-level issues like memory corruption. There are two primary modes of using Manticore with MATE.
Exploration: Mantiserve ships with “detectors” which use data from the CPG to detect memory corruption during symbolic execution.
- The Variable Bounds Access Detector searches for out of bounds memory access on the stack.
- The Uninitialized Stack Variable Detector searches for variables allocated on the stack and used prior to initialization.
- The Use After Free (UAF) Detector searches for and validates UAF vulnerabilities from calls to malloc and free.
Under-constrained symbolic execution: Unlike traditional symbolic execution which begins at the program entry point and executes until the program exits, under-constrained symbolic execution starts at an arbitrary function in the program. This specificity means that under-constrained symbolic execution can analyze parts of programs that would be too large or complex for traditional symbolic execution. As explained in our previous blog post Under-Constrained Symbolic Execution with Crucible, under-constrained symbolic execution may lead to false positives due to unknown preconditions. MATE’s under-constrained feature comes with a web UI that allows users to provide constraints to avoid this issue.
Use-Cases and Comparison to Other Tools
We built MATE with two primary use-cases in mind:
- Use by security researchers to find bugs in C and C++ programs
- Integration of the CPG and corresponding Python API into other applications
The following table compares MATE to tools with similar goals:
|Tool||Advantages over MATE||Disadvantages vs MATE|
|Other SAST tools, e.g., Coverity, CodeSonar, Veracode, Infer, Clang Static Analyzer||
|Interactive binary-level bug-hunting tools, e.g., IDA Pro, Binary Ninja, and angr management
MATE has several important limitations:
- MATE analyzes only statically-linked code, so it can’t find bugs or follow control- and data-flows in dynamically-linked libraries without users writing detailed “signatures” for external code.
- MATE analyzes LLVM bitcode. In practice, obtaining LLVM bitcode requires access to the source code, that the project can be compiled using clang/clang++, and may require some mucking around with the build system. Additionally, it’s much easier to use and understand MATE given familiarity with the LLVM language, but such familiarity is fairly uncommon.
- MATE’s static analysis is fairly heavy-weight. The pointer analysis in particular requires a significant amount of time and RAM, on the order of hours and up to dozens of GB for large programs. Furthermore, these requirements don’t relate predictably to program size or other features.
- MATE is still research-grade software. We have worked hard to make it robust, but not all of MATE’s tools and features will work well on all programs.
Conclusion and Acknowledgements
We’re happy to finally share MATE with the research community under the BSD 3-clause license. We look forward to discussing possible collaborations, additional use cases, and future research directions. Please reach out to email@example.com to start a conversation! More information about MATE is available in the project documentation.
MATE was developed collaboratively by Galois, Trail of Bits, and the lab of Dr. Stephen Chong at Harvard on the DARPA CHESS program. This material is based upon work supported by the United States Air Force and Defense Advanced Research Project Agency (DARPA) under Contract No. FA8750-19-C-0004.