Introducing GREASE: An Open-Source Tool for Uncovering Hidden Vulnerabilities in Binary Code

Proactively and defensively ensuring the absence of vulnerabilities in binary code is crucial for deploying high-assurance systems. GREASE is an open-source tool leveraging under-constrained symbolic execution to help software reverse engineers analyze binaries and uncover hard-to-spot bugs, ultimately enhancing system security. This kind of binary analysis is especially important for systems that include COTS software that is only provided in binary form.

‍

GREASE can be used as a plug-in for the Ghidra reverse engineering framework, as a standalone command-line tool, or as a Haskell library. GREASE supports analysis of AArch32, PPC32, PPC64, and x86_64 Linux ELF binaries, as well as LLVM bitcode.

Demo

GREASE can help software reverse engineers discover bugs in binaries. For example, consider the following code derived from libpng, demonstrating CVE-2018-13785. Even at the source level, the bug is hard to spot. Can you see it? (Don’t worry about studying the code in detail, it won’t be necessary for understanding the rest of this post.)

void /* PRIVATE */
png_check_chunk_length(png_const_structrp png_ptr, const unsigned int length)
{
   png_alloc_size_t limit = PNG_UINT_31_MAX;

# ifdef PNG_SET_USER_LIMITS_SUPPORTED
   if (png_ptr->user_chunk_malloc_max > 0 &&
       png_ptr->user_chunk_malloc_max < limit)
      limit = png_ptr->user_chunk_malloc_max;
# elif PNG_USER_CHUNK_MALLOC_MAX > 0
   if (PNG_USER_CHUNK_MALLOC_MAX < limit)
      limit = PNG_USER_CHUNK_MALLOC_MAX;
# endif
   if (png_ptr->chunk_name == png_IDAT)
   {
      png_alloc_size_t idat_limit = PNG_UINT_31_MAX;
      size_t row_factor =
         (png_ptr->width * png_ptr->channels * (png_ptr->bit_depth > 8? 2: 1)
          + 1 + (png_ptr->interlaced? 6: 0));
      if (png_ptr->height > PNG_UINT_32_MAX/row_factor)
         idat_limit=PNG_UINT_31_MAX;
      else
         idat_limit = png_ptr->height * row_factor;
      row_factor = row_factor > 32566? 32566 : row_factor;
      idat_limit += 6 + 5*(idat_limit/row_factor+1); /* zlib+deflate overhead */
      idat_limit=idat_limit < PNG_UINT_31_MAX? idat_limit : PNG_UINT_31_MAX;
      limit = limit < idat_limit? idat_limit : limit;
   }
   // ...
}

‍

GREASE can automatically find this hard-to-spot bug:

$ clang test.c -o test
$ grease test

Finished analyzing 'png_check_chunk_length'. Possible bug(s):

At 0x100011bd:
div: denominator was zero
Concretized arguments:

rcx: 0000000000000000
rdx: 0000000000000000
rsi: 0000000000000000
rdi: 000000+0000000000000000
r8: 0000000000000000
r9: 0000000000000000
r10: 0000000000000000

000000: 54 41 44 49 01 00 00 00 f9 ff ff ff 00 00 00 00 00 80

‍

This output says that png_check_chunk_length will divide by zero when the register rdi holds a pointer to an allocation containing the bytes 54 41 44... Indeed, if we add the following main function:

int main() {
  char data[] = {0x54, 0x41, 0x44, 0x49, 0xf9, 0x00, 0x00, 0x00, 0x01, 0xb7, 0x3e, 0x9b, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80};
  png_check_chunk_length((png_const_structrp)data, 0);
  return 0;
}

‍

We see exactly what GREASE described:

$ clang test.c -o test
$ ./test
Floating point exception (core dumped)

How it works

Fundamentally, GREASE works quite similarly to UC-Crux, our tool for under-constrained symbolic execution of LLVM. Essentially, GREASE analyzes each function in the target binary by running it on a slate of fully symbolic registers. When errors occur (for example, if the program reads from uninitialized memory), GREASE uses heuristics to refine this initial symbolic precondition (e.g., by initializing some memory) and re-runs the target function. This process continues until GREASE finds a bug, or concludes that the function is safe under some reasonable precondition on its inputs. The blog post introducing UC-Crux describes this algorithm in considerable detail. Further information is also available in the GREASE documentation.

In contrast with the above example from libpng, GREASE’s heuristics will not flag the following program as potentially problematic.

$ cat test.c
int test(int *x) { return *x + 1; }

$ clang test.c -o test
$ grease test

– snip –
All goals passed!

‍

If we ask GREASE for additional details, we can see that it deduces that rdi must point to (at least) four initialized bytes. The heuristics deem this a reasonable precondition for the test function.

$ grease test -v

rip: 0000000000401010
– snip –
rdi: 000007+0000000000000000
– snip –
000007: XX XX XX XX

‍

(In the above output, XX indicates a byte of memory initialized to a symbolic value. The syntax of the language used in GREASE’s output is detailed in the documentation.)

Limitations

GREASE has several key limitations. Most importantly, GREASE relies on heuristics to determine whether a fallible memory access should be reported as a bug or not. These heuristics may cause false positives (reporting a normal program behavior as suspicious) or false negatives (missing real bugs).

Like most program analysis tools, GREASE is subject to a litany of constraints and caveats. GREASE suffers from path explosion. GREASE’s memory model cannot express unbounded heap data structures (e.g., linked lists of symbolic length). Like other tools based on symbolic execution, GREASE unrolls loops and recursion (optionally up to a maximum bound), and can get stuck when loop conditions remain symbolic. GREASE cannot analyze certain pathological behaviors of machine code, such as run-time code generation (JITs), self-modifying code, or jumps to the “middle” of instructions. A more complete accounting of GREASE’s limitations can be found in the documentation.

Comparison to other tools

Where does GREASE fit into the landscape of binary analysis, symbolic execution, and software reverse engineering tools?

Concrete testing tools such as fuzzers (e.g., AFL) have the advantage of reporting complete inputs that cause crashes. However, they are more susceptible to path explosion (that is, they can struggle to reach code deep inside the target), and can only test a vanishingly small fraction of possible inputs to a program. Using symbolic execution, GREASE can effectively cover more of the input space.
angr is a toolkit for symbolic execution of binaries. angr is much more mature than GREASE, and is intended for a wider variety of use-cases. angr is first and foremost a Python API that reverse engineers can use to build tools, whereas GREASE provides a ready-to-use CLI. angr performs traditional symbolic execution by default, whereas GREASE performs under-constrained symbolic execution. Like fuzzing, traditional symbolic execution can struggle to reach code deep in the program due to path explosion. angr aggressively concretizes pointers, whereas GREASE’s memory model is capable of sophisticated symbolic memory operations such as reads and writes involving symbolic pointers and symbolic sizes.
KLEE is a widely-used symbolic execution tool for LLVM. In contrast to KLEE, GREASE supports analysis of binaries in addition to LLVM, and performs under-constrained symbolic execution.
UC-KLEE was an under-constrained symbolic execution tool for LLVM described in “Under-Constrained Symbolic Execution: Correctness Checking for Real Code”. To our knowledge, it was never open-sourced. In contrast to UC-KLEE, GREASE supports analysis of binaries in addition to LLVM. UC-KLEE used a lazy memory model, whereas GREASE’s approach to under-constrained symbolic execution is more iterative in nature.
UC-Crux was our previous tool for under-constrained symbolic execution of LLVM. Unfortunately, we made two fundamental mistakes in the design of UC-Crux: (1) its core is specialized to analysis of LLVM, limiting its potential applicability to binaries, and (2) its analysis makes use of the types of LLVM pointers. LLVM has phased such types out in favor of the “opaque pointer” type ptr, and accordingly it would be difficult to adapt UC-Crux to work well on recent versions of LLVM. We plan to deprecate UC-Crux in favor of GREASE.
Macaw is a binary analysis framework similar to angr, which is described in the pre-print “Macaw: A Machine Code Toolbox for the Busy Binary Analyst”. GREASE is built on top of Macaw.
GREASE can be integrated into binary analysis platforms such as Ghidra and Binary Ninja as a plugin.

Conclusion

We are happy to share GREASE with the binary analysis research community under the BSD 3-clause license. GREASE is under active development. Pull requests, issues, and questions are welcome! Please reach out to grease@galois.com to start a conversation.

‍

This material is based upon work supported by the Defense Advanced Research Projects Agency under Contract No. W31P4Q-22-C-0017 and W31P4Q-23-C-0020.

‍

Introducing GREASE: An Open-Source Tool for Uncovering Hidden Vulnerabilities in Binary Code

March 19, 2025

Demo

How it works

Limitations

Comparison to other tools

Conclusion

RELATED ARTICLES

News

Galois Releases New Versions of Verification Tools SAW, Cryptol, and Crux

Blog Post

Function Argument Nullability Using an LLM

Blog Post

Generative AI for Specifications