POLYMORPH (Promotion to Optimal Languages Yielding Modular Operator-driven Replacements and Programmatic Hooks)
This Galois project, part of DARPA’s V-SPELLS program, seeks to bring high levels of assurance to the process of re-engineering legacy C/C++ systems. POLYMORPH has three research directions: (1) identify components within legacy systems; (2) enable developers to efficiently modify, optimize, and recompose components confidently and without fear of introducing regressions; and (3) generate executable code for each component, link the results together, and ensure that the resulting binary artifacts satisfy all of the required contracts governing the safe and correct interaction of components while ensuring that the result is compatible with the target system. Our goal is to conjure correct-by-construction code that can be composed safely for all levels of a system.
Legacy software, often written in C/C++, is the foundation of our most critical systems in both the commercial and government sectors. Even “new” developments usually incorporate significant amounts of legacy code, which is considered reliable by virtue of use in the field and for which a rewrite imposes unacceptable risk.
However, reusing components in new scenarios carries its own dangers, especially when the requirements for safely reusing a component are unknown. In many cases, the components were not designed for reuse. Even within an existing system, modifications of components to improve safety, security, or performance can carry significant risk.
DARPA’s V-SPELLS (Verified Security and Performance Enhancement of Large Legacy Software) program aims “to create a developer-accessible capability for piece-by-piece enhancement of software components with new verified code that is both correct-by-construction and safely composable with the rest of the system.”
Galois’s proposed solution as part of this program is a multi-part project named POLYMORPH (Promotion to Optimal Languages Yielding Modular Operator-driven Replacements and Programmatic Hooks). POLYMORPH’s planned goal would allow domain experts to re-engineer legacy C/C++ systems by replacing components, recomposing existing components, and distributing components to hardware accelerators while improving run-time efficiency.
POLYMORPH aims to bring a high degree of assurance to the legacy system reengineering process through automated and continual feedback on the validity and safety of each change. Domain experts will be able to view the feedback in an interactive development environment in real time, enabling users to improve large legacy codebases with confidence. Beyond just enabling the in-place re-engineering of components, POLYMORPH will enable developers to take advantage of hardware accelerators by expressing components in domain-specific languages (DSLs) that support compilation to specialized hardware targets. This takes advantage of the significant work that has gone into high-performance computational kernel compilations.
As with any type of assurance software, the “magic” is in the details. The POLYMORPH system is composed of three different research directions: Levitate (TA1), Transmute and Augury (TA2), and Revivify (TA3).
As Above, So Below
Levitate will provide domain experts with a visual user interface that allows them to take legacy C/C++ programs and lift them to DSLs. Levitate will lift into multiple DSLs, including: Cantrip, a DSL that will be developed for POLYMORPH to represent general-purpose computations and data transformations between components; Cryptol, a language for expressing cryptographic operations; and AnyDSL, a language for describing computational kernels with efficient compilation to GPUs and other accelerators. Levitate will be an automated process with a user in the loop, enabling domain experts to guide the automation to preferred software decomposition outcomes.
Domain experts will load the LLVM bitcode (compiled from C/C++) for a legacy system into Levitate, which will automatically identify component boundaries and interface specifications for those components. They will then be able to modify the abstractions identified by Levitate, using an IDE, after which Levitate will propagate the suggested changes throughout the system. Domain experts will iterate with Levitate until they are satisfied with the identified component boundaries. Levitate will then lift the resulting components into the most suitable DSL based on a set of heuristics and, where desired, domain expert guidance.
The resulting DSL components and their corresponding inferred specifications will be provided to Transmute and Augury for the next part of POLYMORPH: re-engineering and recomposition. POLYMORPH’s plan for the resulting tools is to support a wide range of developers with different levels of domain expertise and development practices. For example, developers could only be familiar with one DSL and component or could be familiar with multiple languages.
Invoking Cross-Language Compositional Programming
Cross-language interfaces are often a headache for developers to maintain, and inconsistencies in such interfaces are a frequent source of errors. Transmute is designed to provide developers with real time assistance for communicating between components written in different languages. It will internally maintain a database of component interfaces updated in real time as users make changes. It will also detect inconsistencies—such as function declaration in one component not matching the actual definition in another—and suggest fixes to automatically repair programs.
Portents of Cross-Language Verification
While Transmute is focused on real-time type-checking of multi-language systems, Augury is focused on the semantic meaning of programs. Augury will provide tools for model-based testing and verification of systems to ensure semantic interoperability between unlifted and lifted components.
Augury will build upon Galois’s previous work on open-source verification tools for symbolic execution, property specifications, decision procedures, and compositional verification. These tools map a variety of programming languages and IRs into a common format, called Crucible, for symbolic execution. Crucible already supports languages that compile to LLVM (including C/C++), Java and other JVM-based languages, MATLAB, and machine code.
Transmute and Augury will be designed to work closely with Levitate because developers may want to “reinvoke” Levitate’s algorithms during a system modification, especially when lifting failed or unexpected code.
Giving Life to New Systems
Revivify will allow users to reassemble the extracted components (written in both DSLs and legacy programming languages) into a secure and efficient system. To support the evolution of legacy systems to modern hardware, the reassembly process will take advantage of the high-level representations produced by Levitate to support translating components to the native instruction sets of different compute platforms. For example, this could involve transforming a monolithic system to run on a distributed computation fabric or adapting a component to make use of a hardware accelerator (such as a GPU).
Revivify will ensure that any compatibility requirements proven by Transmute and Augury are preserved through the compilation process and can be re-checked when the final executable system is assembled. Revivify will build on a low-level interface definition language (IDL) that specifies how components communicate and what they require of each other. It will then either translate compatibility proofs constructed by Transmute and Augury into proofs about the low-level IDL or reconstruct equivalent proofs from scratch.
Revivify will include a just-in-time (JIT) compiler for chosen DSLs to support user adaptation of previously static systems on an as-needed basis. Using this compiler, a component that existed in a single form in the original system can be dynamically replaced by the user during deployment. This increases the flexibility of the new system and enables more potential use cases for it.
Conjuring Correct-by-Construction Results
POLYMORPH’s planned result would allow developers to create correct-by-construction, formally verified code without needing to become formal methods experts. POLYMORPH would increase the security of systems and reduce risk when legacy code is introduced. It’s pure computer science, but it feels like magic.
For more information, please read our project announcement.
This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) and Naval Information Warfare Center Pacific (NIWC Pacific) under Contract No. N66001-21-C-4023. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of DARPA and NIWC Pacific.
Distribution Statement “A” (Approved for Public Release, Distribution Unlimited)