Creating open and accessible COVID-19 data models

Projections based on data models for COVID-19 are serving as a critical foundation for Federal, state and local government policy makers charged with making rapid and informed decisions to fight the spread of the novel coronavirus. 

Data models like the model presented by researchers at Imperial College London on March 16, which many believe led to a dramatic shift by the U.K. government to a strict lockdown, inform life and death decisions: whether to issue statewide “stay-at-home” orders, closing schools and businesses, forecasting potential hospital bed shortages, identifying current and emerging COVID-19 hot spots and countless other critical aspects of the pandemic.

These models and research papers continue to play a pivotal role. However, oftentimes models are published only as raw mathematics, data sets are not made available, and the code used to implement a model is not distributed. This means that it is a difficult, manual and lengthy process for others to re-use the model for their own specific purposes. This is particularly true for Governors and other state and local officials facing unique and rapidly changing circumstances.

To enable decision makers at every level more rapidly adapt data models to their own specific needs and geographic factors, Galois is partnering with DARPA’s ASKE and World Modelers programs to produce reliable insights and data models. Most critically, we are releasing all data, models, and resources under the  MIT open source license to encourage collaboration.

Making COVID-19 Models and Data More Widely Available 

As part of this effort, we are analyzing the data and models emerging from the COVID-19 pandemic. We are leveraging a novel pipeline and toolset to produce cleaned data sets, model analysis, and other resources for the federal, state, and local governments; the scientific community; and other friends and colleagues working on this crisis.

Our aim is to improve the agility, speed, and confidence of data and model analysis for crisis response, to allow domain experts to inform their decisions, policy, and actions with machine intelligence and automated reasoning. Three key areas of COVID-19 modeling where Galois believes it can have the most immediate impact relative to existing modeling tools are summarized below:

Rapidly adapt and localize existing COVID-19 models

There are good models out there today, great models actually, that can now be quickly and easily adapted by state and local government leaders and other officials for their unique requirements. For example, if local officials want to take a national COVID-19 contagion model and adjust it for their city’s population demographics, we can set new parameters with curve fitting tools for the local data to rapidly inform policy decisions.

In addition to adapting existing national models or those from other localities, we can also quickly create new models based on specific parameters and challenges, such as: when will my peak infections occur? When will hospital resources run out? Where are nearby cities in terms of peak cases and can resources be shared and staged out between cities (even across state lines)?

Identify errors with existing models

One could argue that the only scenario more dangerous than a lack of usable COVID-19 models is a model with errors that can not only lead to devastating policy decisions, but re-used by decision makers to build equally flawed models. With current data models based on raw mathematics however, it is difficult to identify model flaws and bugs, thus making models harder to validate.  

We aim to produce cleaned data sets by quickly identifying and correcting model bugs and errors. As a result, we can build tools you can rapidly develop validation with and generate re-usable versions.

Develop consumable models and datasets

During health crises, the ability to compress the time between data generation and producing models and datasets that are consumable and actionable for policy decision makers can make the difference between life and death. It can often take weeks or longer to present data in an intuitive way, which is further challenged as some of the most widely used data models change the way they publish data. This can add time to the process that no one has right now. Galois offers an alternative to traditional modeling and data analysis, which can be a slow, opaque,  bespoke, and error prone process. Galois is using methods and technologies developed under work on Automated Scientific Knowledge Extraction, and World Modelers, to assemble, clean, and validate models and data from a wide range of sources, using automated reasoning and code synthesis.

Why are we doing this?

At Galois we take our commitment to developing ideas and technologies for the common good pretty seriously. It manifests itself in how we conduct work, and in things like our boundary policy. We’re lucky to be afforded the opportunity to work as part of programs like ASKE and World Modelers mentioned above, and want to make sure that we do our part in helping government decision makers at all levels make fully informed decisions on policies.

We are encouraged to see others commit to making COVID-19 datasets more accessible and open source so modeling isn’t restricted by models that are difficult to adapt, interpret or visualize.

Stay tuned for updates to this blog, the project page, and our Github page for the COVID-19 pandemic for a list of published models and data.