2017 - Galois, Inc.

Technical Report
GALOIS-17-03
Oct 2017

Applying Formal Methods to Reinforcement Learning

We report our research on formal methods guided testing of autonomous systems. In particular, we looked at closed-loop control systems that incorporate neural network based reinforcement learning components. Typically reinforcement learning approaches train good control policies only after millions of experiments due to the sparsity of high reward actions. This may be unacceptable in online learning setting. Our solution to this problem is inspired by Imitation Learning, a learning from demonstrations framework, in which an agent learns a control policy by directly mimicking demonstrations delivered by an expert.

Systems support for Hardware Anti-ROP

In 2007, Shacham introduced Return-oriented Programming (ROP), a mechanism whereby an attacker can string together small snippets of existing executable code—known as gadgets—in order to exploit programs without injecting new bits of code. Despite numerous proposed mechanisms for mitigating their effects, ROP attacks remain a widespread attack vector for modern software systems. Research on Control-Flow Integrity (CFI) has often shown that these protections incur significant slowdown which is understood to be too costly for general-purpose use.

We investigate the design space of minor hardware extensions with potentially large performance savings and relatively few semantic changes. These hardware extensions significantly reduce the number of gadgets usable by attackers while requiring only minimal changes to existing software, and could be augmented in critical software by stronger software CFI protections.

We present a simulated hardware platform implemented as a modification of the QEMU hardware emulator that features loose-grained forward-edge CFI enforcement and fine-grained backward-edge CFI enforcement built into the operation of the instruction set, as well as modified versions of the Linux operating system and GNU Compiler Collection (GCC) infrastructure that allow us to run a typical Linux installation with minimal changes. We show that these simple hardware extensions and the corresponding software modifications can reduce usable ROP gadgets by a significant amount, making attacks against this platform significantly more difficult. Additionally, we discuss the tradeoffs and challenges that surfaced in the course of this implementation.

Code re-use attacks and their mitigation

Code-Reuse Attacks (CRAs) are well studied in the academic community. In this article, we provide a brief summary of notable attacks and mitigations with a focus on Return-oriented Programming (ROP). Our goal is to provide a roadmap for readers who may or may not be familiar with CRAs and who want to become more familiar with the research. As this is a roadmap, our aim is to be broad and concise with executive summaries, including citations, but otherwise defer to the original publications for a detailed account.

We have included a glossary at the end of technical terms and acronyms. In addition, our bibliography includes more articles than are covered and we recommend the enthusiastic reader to review the bibliography to discover additional reading material in this area.

Archives

Applying Formal Methods to Reinforcement Learning

Systems support for Hardware Anti-ROP

Code re-use attacks and their mitigation

Contact Galois

Stay Connected