Protecting Applications with Automated Software Diversity

On the DARPA CFAR program, the Galois “RADSS” team is developing new ways to mitigate memory corruption attacks against legacy C/C++ systems without requiring finding and fixing each individual bug. CFAR is about “Cyber Fault-tolerant Attack Recovery” and our general approach is:

  1. Given some application to defend, generate multiple variants of that application such that all variants behave the same when given benign input, but behave differently when given malicious input.
  2. Run a set of multiple variants simultaneously in a multi-variant execution environment (MVEE) that unifies input/outputs and can monitor the set, detect when variants diverge in behavior, then react and recover.

The Galois-led team includes the University of California Irvine, Immunant, and Trail of Bits. Our focus is on the development of tools that can generate a diverse set of variants that maximize the detection of attacks while remaining within a performance envelope suitable for real-world systems. Our defenses have proven effective and practical in red-teaming exercises on CFAR, where we protect the Apache web server and other real-world applications from common forms of attack. In this post we outline our approach and where these defenses are most effective. A follow-up post will cover how our defenses protect against specific types of attacks as well as some surprising ways that composing variant generation techniques can inadvertently negate the intended security protections.

Variant Generation Strategy

We focus on approaches that surface memory corruption vulnerabilities (such as buffer overflows) that are common in many legacy systems written in unsafe languages like C/C++. Our strategy is for variants to preserve well-defined behavior in the application but introduce diversity in the effect of undefined behavior (such as out-of-bounds accesses).

Our team includes UC Irvine and Immunant, who develop the multicompiler. They’ve built the multicompiler on the clang/LLVM compiler framework by adding many transformations that introduce variation (known as software diversity) in the binaries produced. That is, from one input program, the multicompiler can produce many different output binaries that exhibit the same behavior under normal circumstances, but different behavior when attacked. This is possible because the code a programmer writes does not fully specify all aspects of the resulting binary representation, but low-level exploits are very sensitive to these properties. The multicompiler’s transformations range from randomization of memory layouts for the globals, stack, functions, and vtables, to more sophisticated changes in how the resulting variants store data in memory.

While the multicompiler generates variants of a program directly by compiling its source code, we may also wish to protect systems (or subsystems/libraries) that are available only in binary form. Our team also includes Trail of Bits, who bring McSema to tackle this issue. McSema can lift a binary to LLVM bitcode that the multicompiler can diversify, allowing us to generate variants of applications even when they are only available in binary form.

Success: Strong Defenses

The CFAR program includes a separate team responsible for performing regular security evaluations of our techniques. In these evaluations, our defenses have prevented the evaluation team from exploiting concrete instances of:

  • code and data information disclosure vulnerabilities (including ones resembling the “Heartbleed” bug)
  • corruption of stack variables: overflow and offset corruption vulnerabilities
  • corruption of globals: overflow and offset corruption vulnerabilities
  • corruption of heap objects: overflow and offset corruption vulnerabilities
  • use-after-free vulnerabilities, including use of heap grooming
  • control-flow hijacking vulnerabilities, including use of ROP

The evaluation team modeled vulnerabilities after common CWEs (drawn from the community-developed “common weakness enumeration” list), intended to be representative of vulnerabilities found in the wild.

Our defenses were successful even after granting the evaluation team strong capabilities, including:

  • full access to application source and knowledge of vulnerabilities in the original application
  • full knowledge of all variant binaries, with online and offline inspection capabilities

As the program progressed, the evaluations expanded to grant further capabilities to the evaluation team, intended to model an adversary with a once-in-a-decade vulnerability or some other nightmare scenario for defenders. This included giving the evaluation team information about runtime memory layouts and even full memory content of each variant as it ran in the multi-variant environment, simulating side-channel information (such as via Meltdown/Spectre style attacks). Our defenses succeeded even under these harsh operating conditions.

We generate variants that each behave differently when the set is attacked, allowing us to detect attacks by observing these differences in variant behavior. After detecting the attack we can restart or roll back variants to a known good state, and in some cases we can do even better by repairing the corrupted state and safely continuing execution. The details are beyond the scope of this post, but the idea is we construct variant sets so that certain attacks cannot corrupt the same state in all variants simultaneously, and we repair corrupted program state using uncorrupted state from other variants.

Practical Design Decisions

We’ve designed our variant set transformations to preserve legitimate behavior specified by the programmer. Instead, we change many aspects of the implementation that the source language specification leaves implementation specific. This can prevent or reveal many common types of attacks, and correctly written code will not depend on the properties of the undefined behavior that we change.

To support diversification of real-world code that may include violations of the language standard, we support various “whitelisting” techniques to omit diversification of specific problematic regions, functions, variables, or data structures. Using these tools and techniques, we have been able to generate diversified multi-variant sets of each of Apache, nginx, lighttpd, and thttpd web servers that have each passed their respective test suites.

The MVEE briefly pauses and cross-checks variants when they make sensitive operations like system calls, but otherwise variants run “full speed” as long as the host has the resources (CPU & memory) to accommodate the multiple instances of the application. The most extensive performance analysis evaluations on CFAR have taken place on the variant sets we have generated from the Apache web server. Our performance goals for RADSS are roughly 10% performance overhead for the MVEE and 10% overhead for our diversity transformations, and sets containing most of our transformations (combined) fit within this performance envelope in testing. The few transformations with larger performance impact (e.g. fine-grained heap-object-ID checks) trade additional overhead for extra protection against sophisticated attacks.

Conclusion

The combination of software diversity and multi-variant execution can provide strong protection from sophisticated adversaries. But making the most of this can be surprisingly tricky, and simply maximizing software diversity can end up negating the protections you intended to add. Please see our followup post where we describe some of these issues and the hazards of blindly applying software diversity techniques.

Acknowledgments

This material is based upon work supported by the United States Air Force and DARPA under Contract No. FA8750–15-C–0124.

The views, opinions and/or findings expressed are those of the author and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.

Distribution Statement “A” (Approved for Public Release, Distribution Unlimited).