Under-Constrained Symbolic Execution with Crucible

UC-Crux is an open-source command-line tool and Haskell library for performing under-constrained symbolic execution on LLVM code for the sake of exposing bugs or verifying the absence of certain types of undefined behavior. It requires only LLVM bitcode as input. It is built on Galois’s Crucible library for symbolic execution.

Under-constrained symbolic execution (developed by Ramos and Engler) is a technique that can be used for many of the same purposes as more traditional symbolic execution, but sidesteps some scalability issues and requires little to no user intervention or setup. Its greatest strength and fundamental limitation is that it can start at any function in the target program.

If you’d like to try out UC-Crux before reading on, the README has a small demo.

Initial Results

A great way to understand the current capabilities and limitations of UC-Crux (and its particular approach to under-constrained symbolic execution) is to look at what it can do with a large, heavily used codebase like OpenSSH [1]:

Overall results:
  Unclear result, errors are either false or true positives (or timeouts were hit): 116
  Found likely bugs: 2
  Function is safe if deduced preconditions are met: 95
  Function is safe up to the specified bounds on loops/recursion: 3
  Function is always safe: 17

This summary shows that:

  • UC-Crux can prove [2] the absence of undefined behavior in 17 out of 233 (7.6%) of the functions.
  • UC-Crux hits its default bounds on loops and recursion on only three (1.3%) of the functions. This surprised us – though k-bounding makes under-constrained symbolic execution incomplete in theory, it doesn’t seem to be a significant problem in practice for at least some programs.
  • The heuristics used by UC-Crux are able to deduce sufficient preconditions for avoiding undefined behavior in 95 (40.7%) of functions, but are unable to do so in 116 (49.7%) cases.
  • The heuristics claim that two cases are likely bugs. These are probably just over-zealous, unfortunately we have not yet actually found any real bugs with the current heuristics.

This report is truncated, but the full output also indicates the source locations and nature of the purported “likely bugs,” descriptions of what kinds of contracts were inferred, an accounting of any unimplemented features that prevented analysis (in this case, 2 instances of variable-arity functions), and more.

Use Cases

UC-Crux is designed to support several related use cases:

  • For C/C++ developers, UC-Crux can help you:
    • Verify [3] the absence of undefined behavior in individual functions in your codebase, giving you (and your API consumers) confidence in their safety. This is like running tests with AddressSanitizer enabled, but on all possible inputs, not just the ones you wrote in your unit tests.
    • Reveal implicit preconditions necessary for avoiding undefined behavior, which you can then make explicit via assertions or documentation.
  • For bug hunters with access to source code [4]:
    • Automatically uncover control-flow paths that can lead to behaviors such as out-of-bounds reads and writes, double frees, signed arithmetic overflow, and more.
    • Avoid wasting time investigating provably-safe parts of codebases.
    • Understand the safety preconditions of interfaces in order to find violations of such conditions.
  • For proof engineers and others interested in formal verification:
    • Prove [5] memory safety for many functions with almost no effort.
    • Generate stake-in-the-ground preconditions on memory layout for function arguments and global variables. [6]

Under-Constrained Symbolic Execution

Classic Symbolic Execution

Symbolic execution is a powerful technique for understanding the behavior of a program. Programs are normally executed on concrete inputs, such as the integer 7 or the string “hello”. Symbolic execution can be thought of as interpreting a program on values that can be either concrete or symbolic, where symbolic values represent an unknown but possibly constrained value. If the symbolic executor reaches a branch with a symbolic condition, it follows both the true and false branches. Symbolic values and symbolic branching enable symbolic execution to explore all possible paths through a program. The result of symbolic execution is a description of how the function behaves for some class of inputs: we can ask whether the program can crash, or whether it computes the correct result from its inputs. Symbolic execution lies at the heart of Galois’s verification toolchain, and in particular, our industry-strength symbolic testing tool, Crux. For more information about Crux and symbolic execution see the Galois blog post introducing Crux and crux.galois.com.

Symbolic execution has to start somewhere. There are a few common approaches present in tools like KLEE or Crux-LLVM:

  1. Symbolic execution can begin at the program’s entry point (main). In this case, symbolic data is often introduced when the program reads from environment variables, command line arguments, the filesystem, or the network. [7]
  2. Symbolic execution can begin at a user-specified function that invokes a tool-specific C API to create symbolic data, add assumptions on its content, execute procedures in the program under test, and make assertions over program outputs. This is the predominant mode of use of Crux-LLVM, and is also possible with KLEE.

Each approach has its advantages and disadvantages. The first may have difficulty achieving adequate coverage due to path explosion, whereas the second requires more time and energy spent writing harnesses. Unit and property testing share many of the same disadvantages as the second approach. Under-constrained symbolic execution presents a third alternative, with its own set of trade-offs.

Under-Constrained Symbolic Execution to the Rescue

Roughly speaking, under-constrained symbolic execution uses heuristics to uncover what kind of inputs and environments are needed for the program to run without undefined behavior. (Or conversely, what kinds of inputs and environments can cause undefined behavior to occur.) Under-constrained symbolic execution can be thought of as a kind of contract (or precondition) inference: under-constrained symbolic execution tries to find out what preconditions are sufficient for successful execution of a given program (or procedure).

For a function like div that takes integers as arguments, it’s reasonably clear how this should work: run the function with symbolic integers as arguments and assert that no undefined behavior occurs in its execution.

Things get more interesting when considering functions that take pointer-rich data structures (e.g., linked lists) as arguments. Should the tool generate null pointers as input? Circular linked lists? Can the tool support heap structures of unbounded or symbolic size?

There’s another tricky issue: What constitutes a bug? Sure, perhaps your library function my_div has undefined behavior when it’s second argument is zero, but that’s probably expected, it might even be well-documented! Similarly, lots of functions that take pointer arguments assume that they’ll be non-null and freely dereference them. This is where the name under-constrained comes from: inputs to arbitrary functions may have preconditions that are not evident from the function’s signature nor even implementation, nothing in the code describes all the constraints on such input.

To address both of these issues, UC-KLEE (the tool described by Dawson and Engler) uses a technique called lazy initialization. Essentially, as the program under test reads and writes from memory during symbolic execution, the tool lazily/dynamically allocates and initializes such memory as necessary to make the read or write operation safe. For a more precise explanation of the behavior of UC-KLEE, see the paper that introduces it. UC-Crux takes a somewhat different approach (described in the next section), which enables a clean separation of constraint instantiation heuristics from the implementation of the memory model.

UC-Crux

UC-Crux takes an iterative approach to generating symbolic inputs. That is, the tool begins with a simple candidate input, and executes the procedure. If execution fails, this indicates the symbolic input must be refined. The tool examines the error, heuristically updates the input, and restarts the analysis. Of course, it may happen that the heuristics cannot understand and fix the failure: in this case, the program may contain a bug, or it may be that the heuristics are not sophisticated enough to compute appropriate constraints on the input. [8]

The following subsections explain this process in detail. They are pretty technical and can be safely skipped.

The Memory Model

To understand how UC-Crux generates symbolic inputs, we first need to take a look at its memory model (inherited from Crucible-LLVM). Reading from the Crucible-LLVM documentation on the subject (permalink):

Each allocation in the memory model is completely isolated from all other allocations and has a unique block identifier. Pointers into the memory model have two components: the identifier of the block they reference and an offset into that block. Note that both the identifier and the offset can be symbolic.


Said another way, the memory model is two-dimensional: One “axis” of a pointer determines which allocation (possibly allocations in the case of a symbolic block identifier) it points to, and the other “axis” is the offset within that allocation. The crucial point for UC-Crux is that pointers consist of a pair of data: a block identifier and an offset. [9] For example, the null pointer is represented by the pair (0, 0x0).

Generating Symbolic Inputs

Once a procedure in the target program has been selected [10] (this can be done manually by the user or every function in a target program can be explored), UC-Crux will generate the “smallest” fully symbolic input for that function based on its type signature, and then gradually “expand” it as necessary. For instance, if a function takes a single integer as an argument, UC-Crux generates a symbolic integer of the appropriate width. If it takes a pointer to an integer, UC-Crux generates a pointer with a block identifier of 0 and a symbolic offset. Call these initial pointers. Initial pointers don’t point to any valid memory, but they also don’t (concretely) compare equal to the null pointer.

UC-Crux will then symbolically execute the procedure on this input:

  • If symbolic execution encountered possible undefined behavior, then UC-Crux will examine the error, and either:
    • report it to the user (and optionally continue executing to find other problems), or
    • use heuristics to decide that the error was likely a false positive (i.e., due to an unsatisfied implicit precondition). The tool may then use further heuristics to generate inputs that satisfy the precondition, perhaps by
      • Allocating memory to back initial pointers in the input
      • Expanding the sizes of allocations pointed to by the pointers in the input
      • Initializing global variables with symbolic data
  • If the symbolic execution encountered no undefined behavior, the process is complete – the function is safe so long as its inputs always satisfy the constraints that UC-Crux deduced.

k-bounding

If the function reads from or writes to an initial pointer, such a read or write will always be undefined and cause that pointer to point to allocated memory on the next iteration. That is, the heuristic always thinks it’s reasonable (not a bug) for a procedure to read from and write to pointers that are reachable from its arguments or global variables. To ensure termination, UC-Crux (just like UC-KLEE before it) uses k-bounding (also known as k-limiting), i.e., it will only expand data structures up to some (optionally user-specified) constant depth. [11]

On Aliasing

Any two initial pointers may alias and may be null. However, when UC-Crux decides that a given pointer needs to be non-null and allocates memory to back it, each such pointer will have its own unique block of memory. Therefore, no valid pointer will alias any initial pointer, nor will two valid pointers alias. [12] In particular, UC-Crux won’t generate cyclic data structures.

Global Variables

Global variables are treated identically to arguments: They are initialized to the “smallest” symbolic value compatible with their type, and then “expanded” if/when the procedure under test accesses them. Therefore, the same caveats around aliasing apply to globals: No (valid) pointers that appear inside of the values of global variables will alias with one another, nor with any pointers that are reachable from the function’s arguments.

Intentionally Unsound Features

Symbolic execution can either be used for verification (as in Crux and KLEE), or for bug hunting (with tools like angr, Manticore, KLEE, and many others). During verification, [13] it’s crucial that the symbolic execution tool over-approximates the set of all possible program behaviors (or signals an error if this isn’t feasible). You don’t want to miss a bug and claim that your program has been proved correct! During bug-hunting, tools have more leeway to under-approximate possible behaviors. For instance, if the symbolic execution tool encounters a call to malloc with a symbolic argument, it can fork and explore just the minimum and maximum possible values (if they can be determined, and the tool designers think that this is a useful strategy for finding bugs).

UC-Crux is intended to be usable as both a (limited) verification tool and for bug hunting. Accordingly, it has several features that intentionally sacrifice soundness for code coverage.

Skipping External Functions

Programs often call functions that are defined externally, which are eventually resolved in a later build step or at runtime by the linker. It’s difficult to over-approximate the behavior of these functions (for example, they might have arbitrary effects on the values of global variables), but it’s a lot easier to under-approximate them: Simply assume that the function doesn’t mutate its arguments or any globals, and generate a fully symbolic return value (using the same strategy outlined above).

Underapproximating libc Functions and System Calls

In a similar vein, it can be a tedious and error-prone process for tool designers to implement over-approximate versions of libc functions and system calls. It’s much easier, for example, to implement a version of gethostname that simply always writes the constant string “galois” into its argument, rather than a version that chooses a string with symbolic length and content. Accordingly, UC-Crux ships with under-approximate versions of a few functions that Crux doesn’t fully support.

Limitations and Caveats

Threats to Soundness

  • As with any piece of software that is not formally verified, UC-Crux itself surely has bugs (and relies on the correctness of a complex stack of software and hardware including SMT solvers, operating systems, CPUs, etc. that themselves all have bugs). Any claims it makes are conditional upon not encountering such bugs.
  • C, C++, Rust, and LLVM [14] all lack official formalized semantics, so Crucible-LLVM (and so UC-Crux) contains a best-effort approximation based on the C standard and the LLVM language reference.
  • There are certain kinds of behaviors that are undefined at the C or C++ level that aren’t visible at the LLVM level, such as signed overflow and ill-typed reads out of unions. UC-Crux can’t detect these, since it operates on LLVM code. Crucible-LLVM has limited support for symbolically executing programs compiled with UndefinedBehaviorSanitizer, which partially addresses these issues.
  • UC-Crux will not (yet!) generate aliasing pointers, nor cyclic data structures. Undefined behaviors that occur only in such circumstances will be undetected.
  • UC-Crux contains features that are intentionally unsound for the sake of increasing coverage. These are not yet toggleable, though the output of UC-Crux is quite clear on when they have been used.
  • All of the limitations that apply to Crux-LLVM (permalink) apply to UC-Crux as well.

Threats to Completeness

  • As described in k-bounding, UC-Crux can only explore loops and recursive functions up to some user-defined number of iterations. Partial results are available when these bounds are hit.
  • UC-Crux can’t yet analyze variable-arity functions.
  • UC-Crux can’t yet analyze functions that call function pointers that are reachable from their arguments or global variables.
  • UC-Crux doesn’t support parallel or concurrent code (though there is a prototype in the works for support for such code in Crucible).
  • UC-Crux doesn’t support functions dereferencing pointer-sized but supposedly integer-typed values in their arguments or global variables.
  • Symbolic execution fundamentally relies on SMT solvers. Despite incredible research and engineering effort, the performance of such solvers remains unpredictable from problem to problem. At Galois, we’ve seen performance degrade especially when reasoning about programs with heavy use of floating-point values, non-linear arithmetic, and certain patterns of memory usage. [15] To alleviate these difficulties, UC-Crux allows users to select among many SMT solvers, and to set timeouts for SMT queries. If a query times out, partial results are available.
  • UC-Crux has limited support for string operations. For instance, calls to strlen will only work on concrete pointers. These operations are very difficult to handle in a performant way in full generality, as each additional character introduces a symbolic branch. Experiments with the SMT String theory may improve scalability, but it will likely remain a significant challenge.
  • UC-Crux can’t handle calls through non-concrete function pointers.
  • Support for programs that interact with the filesystem is experimental (permalink) and has its own caveats (permalink).
  • Crucible-LLVM (and so UC-Crux) doesn’t support the entire LLVM language or all of libc and libstdc++. See the Crux-LLVM documentation (permalink) for details of what functions and intrinsics are supported.

Contract Inference

Recall that under-constrained symbolic execution can be considered a form of contract inference. There are some important limitations on the situations in which UC-Crux can infer adequate contracts.

Loss of Precision for Disjunctive Preconditions

The algorithm UC-Crux uses for generating inputs can produce contracts/preconditions that are too strong, i.e., they are more restrictive than necessary. For example, UC-Crux would conclude that the pointer p in the following function must be non-null because it is dereferenced along one path through the function:

int f(int *p, bool x) {
  if (x) {
    return *p;
  }
  return x;
}

A more precise contract would state that p must be non-null whenever x is true, but UC-Crux doesn’t (yet) have the ability to deduce contracts involving disjunctions.

Inability to Infer Relations Between Arguments

UC-Crux can’t yet infer constraints involving two or more separate pieces of data (function arguments or global variables). For example, UC-Crux can’t deduce that in the following function, p needs to point to an allocation of size greater than the value of i:

int g(int *p, int i) {
  return p[i];
}

UC-Crux is still young and more exciting features are on the way. This new open-source tool uses under-constrained symbolic execution and will hopefully be useful to programmers, security researchers, and proof engineers alike. Pull requests, issues, and questions are welcome! Please reach out to crux@galois.com to start a conversation.

Footnotes

[1]: The following script records the steps taken to collect these results (after building uc-crux-llvm from commit d4d8d846 in GaloisInc/crucible). The output has been lightly edited for clarity. The results may differ from run to run depending on the SMT solver version, how busy the CPU is, the price of eggs, and cosmic rays.

git clone https://github.com/langston-barrett/nixpkgs-blight
cd nixpkgs-blight
git checkout 7741d44b359ab255b09fe2e6fe944b5f1c38da5c
make out/bitcode/openssh
uc-crux-llvm -- --solver=z3 --path-sat-solver=z3 --verbosity=3 --path-sat --path-strategy=split-dfs --explore --explore-parallel --explore-budget=500 --explore-timeout=5 --fail-fast --re-explore --no-compile out/bitcode/openssh/sshd.bc

[2]: With caveats

[3]: Again, with caveats.

[4]: Or binaries lifted to LLVM with tools like reopt or Remill.

[5]: Again, with caveats.

[6]: This feature is not currently supported via the command-line interface, but is possible with the Haskell library API.

[7]: For examples of this approach, see the “Using Symbolic Environment” KLEE tutorial, or the Manticore README.

[8]: From a very technical point of view, there’s always a precondition that would make any function execute successfully: an always-false one. The logical “principle of explosion” applies in this case. However, such a precondition isn’t very helpful, and UC-Crux won’t generate one.

[9]: Non-pointer bitvectors (e.g. the 5 in int x = 5;) are represented the same way as pointers, but with a block identifier that’s concretely 0.

[10]: While UC-Crux will start at a specific procedure, it will follow control flow into other functions as they’re called.

[11]: Technically, UC-Crux doesn’t specifically bound the depth/size of inputs, but rather has bounds on the number of loop iterations and recursive calls it will follow, which amounts to the same thing.

[12]: This isn’t a fundamental limitation, but rather an artifact of the current progress in developing UC-Crux. We hope to improve the input generation algorithm to consider aliasing in the future.

[13]: “Verification” can mean a lot of things – let’s focus on verifying the absence of undefined behavior for the moment, though some of these comments apply to functional specification as well.

[14]: As well as virtually every other language that compiles to LLVM, and pretty much every even halfway mainstream programming language with the sole, venerable exception of Standard ML.

[15]: In addition to these practical limitations, there’s also a more fundamental issue: SMT solvers are built to solve an NP-complete problem. Therefore, their performance can’t be uniformly good in all cases.