Proactively and defensively ensuring the absence of vulnerabilities in binary code is crucial for deploying high-assurance systems. GREASE is an open-source tool leveraging under-constrained symbolic execution to help software reverse engineers analyze binaries and uncover hard-to-spot bugs, ultimately enhancing system security. This kind of binary analysis is especially important for systems that include COTS software that is only provided in binary form.
GREASE can be used as a plug-in for the Ghidra reverse engineering framework, as a standalone command-line tool, or as a Haskell library. GREASE supports analysis of AArch32, PPC32, PPC64, and x86_64 Linux ELF binaries, as well as LLVM bitcode.
GREASE can help software reverse engineers discover bugs in binaries. For example, consider the following code derived from libpng, demonstrating CVE-2018-13785. Even at the source level, the bug is hard to spot. Can you see it? (Don’t worry about studying the code in detail, it won’t be necessary for understanding the rest of this post.)
void /* PRIVATE */
png_check_chunk_length(png_const_structrp png_ptr, const unsigned int length)
{
png_alloc_size_t limit = PNG_UINT_31_MAX;
# ifdef PNG_SET_USER_LIMITS_SUPPORTED
if (png_ptr->user_chunk_malloc_max > 0 &&
png_ptr->user_chunk_malloc_max < limit)
limit = png_ptr->user_chunk_malloc_max;
# elif PNG_USER_CHUNK_MALLOC_MAX > 0
if (PNG_USER_CHUNK_MALLOC_MAX < limit)
limit = PNG_USER_CHUNK_MALLOC_MAX;
# endif
if (png_ptr->chunk_name == png_IDAT)
{
png_alloc_size_t idat_limit = PNG_UINT_31_MAX;
size_t row_factor =
(png_ptr->width * png_ptr->channels * (png_ptr->bit_depth > 8? 2: 1)
+ 1 + (png_ptr->interlaced? 6: 0));
if (png_ptr->height > PNG_UINT_32_MAX/row_factor)
idat_limit=PNG_UINT_31_MAX;
else
idat_limit = png_ptr->height * row_factor;
row_factor = row_factor > 32566? 32566 : row_factor;
idat_limit += 6 + 5*(idat_limit/row_factor+1); /* zlib+deflate overhead */
idat_limit=idat_limit < PNG_UINT_31_MAX? idat_limit : PNG_UINT_31_MAX;
limit = limit < idat_limit? idat_limit : limit;
}
// ...
}
GREASE can automatically find this hard-to-spot bug:
$ clang test.c -o test
$ grease test
Finished analyzing 'png_check_chunk_length'. Possible bug(s):
At 0x100011bd:
div: denominator was zero
Concretized arguments:
rcx: 0000000000000000
rdx: 0000000000000000
rsi: 0000000000000000
rdi: 000000+0000000000000000
r8: 0000000000000000
r9: 0000000000000000
r10: 0000000000000000
000000: 54 41 44 49 01 00 00 00 f9 ff ff ff 00 00 00 00 00 80
This output says that png_check_chunk_length
will divide by zero when the register rdi
holds a pointer to an allocation containing the bytes 54 41 44
... Indeed, if we add the following main function:
int main() {
char data[] = {0x54, 0x41, 0x44, 0x49, 0xf9, 0x00, 0x00, 0x00, 0x01, 0xb7, 0x3e, 0x9b, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80};
png_check_chunk_length((png_const_structrp)data, 0);
return 0;
}
We see exactly what GREASE described:
$ clang test.c -o test
$ ./test
Floating point exception (core dumped)
Fundamentally, GREASE works quite similarly to UC-Crux, our tool for under-constrained symbolic execution of LLVM. Essentially, GREASE analyzes each function in the target binary by running it on a slate of fully symbolic registers. When errors occur (for example, if the program reads from uninitialized memory), GREASE uses heuristics to refine this initial symbolic precondition (e.g., by initializing some memory) and re-runs the target function. This process continues until GREASE finds a bug, or concludes that the function is safe under some reasonable precondition on its inputs. The blog post introducing UC-Crux describes this algorithm in considerable detail. Further information is also available in the GREASE documentation.
In contrast with the above example from libpng, GREASE’s heuristics will not flag the following program as potentially problematic.
$ cat test.c
int test(int *x) { return *x + 1; }
$ clang test.c -o test
$ grease test
– snip –
All goals passed!
If we ask GREASE for additional details, we can see that it deduces that rdi
must point to (at least) four initialized bytes. The heuristics deem this a reasonable precondition for the test function.
$ grease test -v
rip: 0000000000401010
– snip –
rdi: 000007+0000000000000000
– snip –
000007: XX XX XX XX
(In the above output, XX
indicates a byte of memory initialized to a symbolic value. The syntax of the language used in GREASE’s output is detailed in the documentation.)
GREASE has several key limitations. Most importantly, GREASE relies on heuristics to determine whether a fallible memory access should be reported as a bug or not. These heuristics may cause false positives (reporting a normal program behavior as suspicious) or false negatives (missing real bugs).
Like most program analysis tools, GREASE is subject to a litany of constraints and caveats. GREASE suffers from path explosion. GREASE’s memory model cannot express unbounded heap data structures (e.g., linked lists of symbolic length). Like other tools based on symbolic execution, GREASE unrolls loops and recursion (optionally up to a maximum bound), and can get stuck when loop conditions remain symbolic. GREASE cannot analyze certain pathological behaviors of machine code, such as run-time code generation (JITs), self-modifying code, or jumps to the “middle” of instructions. A more complete accounting of GREASE’s limitations can be found in the documentation.
Where does GREASE fit into the landscape of binary analysis, symbolic execution, and software reverse engineering tools?
We are happy to share GREASE with the binary analysis research community under the BSD 3-clause license. GREASE is under active development. Pull requests, issues, and questions are welcome! Please reach out to grease@galois.com to start a conversation.
This material is based upon work supported by the Defense Advanced Research Projects Agency under Contract No. W31P4Q-22-C-0017 and W31P4Q-23-C-0020.