Data as a Digital Specimen

Technology Policy Science & Innovation

Data as a Digital Specimen

Posted on: 22nd July 2021
By Multiple Authors
Martin Carkett
Policy Lead, Science & Innovation Unit
Jennifer LaFreniere
Project Manager, Stanford Healthcare Innovation Lab
Alexander Honkala
Immunologist and Researcher, Stanford Healthcare Innovation Lab
Benjamin Rolnik
Director, Stanford Healthcare Innovation Lab
Benedict Macon-Cooney
Deputy Executive Director, Technology and Public Policy
Henry Fingerhut
Senior Policy Analyst, Science & Innovation unit

    Executive Summary

    Executive Summary

    Throughout history, innovation has been born of information. Today, the technological revolution has given us the ability to access more information than ever before, including unprecedented insights into biomedical data at both the personal and population level. This has profound implications for health and offers the potential to create new systems that can provide world-class precision health for all. 

    However, access to data in itself isn’t sufficient to drive innovation and realise the benefits of this new age of information. Instead, we must take a more comprehensive, systematic and global approach to the generation, distribution and utilisation of data, including who generates it, for whom and for what purpose. This approach must promote the development of new, urgently needed data sets for patient populations both large and small while also fostering modern industrial policy to create new markets, develop improved privacy controls and support equitable health-care delivery. 

    The Tony Blair Institute and the Stanford Healthcare Innovation Lab have a shared vision to transform personal and global health in the 21st century. In this paper – the first in a series – we set out the key areas we believe policymakers, researchers, practitioners and private industry should jointly focus on in order to make this vision a reality. We also seek to identify a network of like-minded individuals and institutions who can help develop and deliver the systems-focused data, funding and policy structures needed to achieve precision health for all.

    In Search of Magic Bullets

    In Search of Magic Bullets

    A little more than a century ago, the German Nobel Laureate Paul Ehrlich laid some of the foundational blocks of modern medicine. Searching for “magic bullets” in the belief that drugs could be developed to target specific diseases, he discovered that arsenic compounds could be used to treat patients with syphilis, leading to the world’s first man-made antibiotic: Salvarsan.

    Ehrlich made his discovery open and accessible by sending 65,000 free samples to doctors all over the world. His systematic, data-driven and collaborative approach paved the way for profound progress in the development of new drugs to treat myriad ailments that had plagued nations for centuries. 

    Today, as we look for ammunition in the battle against Covid-19, our most potent weapon has, in many ways, been data. Rapid advances in genomic sequencing technologies over the past two decades meant the genetic code of the SARS-CoV-2 coronavirus could be sequenced just days after the first Covid-19 cases were reported. This data was made available on open-access databases such as GenBank and GISAID by 6 January 2020, and within days the first vaccines were already under development, as were thousands of studies on how the disease works and how to defeat it. Journals and data repositories have been overwhelmed with new submissions as scientists and doctors around the world have marshalled their resources, organised critical clinical studies and set up reagent drives to increase testing capacity.

    However, this rapid progress has been unevenly distributed and many nations face unprecedented struggles in providing access to testing and vaccination, organising large data sets and maintaining normal non-Covid-19 clinical functions. Medicine now stands on a precipice: can we overcome the inertia of data-set creation, fragile supply chains and under-resourced inequitable health-care systems to deliver these new developments in medicine to the global populations that need them most? As we look to the future, how do we reimagine global health and the systems to support it? At the heart of this mission lies data, technology and patients.

    Collected and harnessed at scale, data offers the ability to transform the health of our society. At a global level, multi-omics research, which extends biological characterisation across multiple dimensions beyond genomics, offers the opportunity to greatly improve our functional understanding of how disease and health develop or degrade. These data sets, coupled with a growing ability to analyse them at scale through AI and machine learning, can greatly accelerate our efforts to alleviate disease and suffering while preserving health and health span. 

    Meanwhile, at the individual level, the vast amounts of data now available through wearable fitness technologies can provide unprecedented insights into our personal health and daily routines on a continuous streaming basis. Top athletes including Patrick Mahomes, Rory McIlory and LeBron James all use them – and in the coming decade, as the cost comes down and the utility increases, more and more of us will look to smart devices for signs of progress or deterioration, providing greater autonomy over our own health and helping establish the basis for truly personalised precision medicine.

    However, despite the promise of new technologies and the insights we can derive from data today, in far too many cases data still is not systematically collected or is underutilised, often locked away for one-off uses by siloed research teams. Where it is available, divergent systems can often hold it in different formats, with diverse or inconsistent labelling. Innovation will never happen in the dark, so there is an urgent need to build better tools, develop novel institutional and technical solutions, and make new global commitments to build and share biomedical data more consistently and more fluidly.

    Structuring the Future

    Structuring the Future

    Our teams at the Tony Blair Institute and the Stanford Healthcare Innovation Lab share a belief in finding solutions for a new, data-driven era of medicine and in trying to build world-class health care for all. Our aim is to bring together global communities to reimagine biomedical data generation, analysis and uses in the modern age to create the self-sustaining structures necessary to capture its full value and drive significant improvements in health outcomes across the world.

    From a technical standpoint, we will address the technologies and support structures necessary to generate and share cutting-edge data effectively and responsibly. Building on the success of genomics, this should include partnerships between an expanded global network of BioBanks and combining data sets to widen the scope of multi-omics research – which should, in turn, be greatly accelerated. 

    This will also require a greater focus on interoperability and standardisation, but in other places we need to rethink the overarching infrastructure as frontier innovations, such as Federated Learning, where insights can be derived without data even leaving individual devices. Creating the infrastructure required for this data will be daunting – given that it’s non-rivalrous, has a zero-marginal cost of reproduction and has increasing returns to use – but it will be transformative.

    We also need to recast the debate around social utility. As it stands, the overriding and often justified concern in this space is privacy. Governance questions of what data is required and why, who will be processing it, and how and where they will be doing so are fundamental questions that always need to be addressed. Development of modern privacy regulations, improved health-care IT infrastructure and much better controls for patients’ health-care data are therefore essential in improving patient trust and ensuring that personal biomedical data is securely protected. Moreover, a more granular discussion on how data can be used to provide better individual understanding of one’s health, as well as for the common good, will be critical in increasing patient cooperation.

    Key to all of these points is the development of roadmaps and standards for improved multi-omic data generation; privacy-protecting standards for informed consent and data transfer, analysis and application; technological interoperability between projects, data sets, technologies, borders and patient conditions; sharing of technology, standard operating procedures and best practices; and data-driven public policy for ethical oversight, dialogue between key stakeholders and implementation of future-forward health-care infrastructure. Each of these themes will be explored in more detail in subsequent publications.

    Some of these changes have already happened during the current crisis. We have increased our ability to track pathogen evolution in real-time through genomic surveillance, telemedicine has greatly accelerated and will continue to scale in coming years, and platforms such as PREDICT have been transformative for health-data machine learning. The team at the Stanford Healthcare Innovation Lab has also been collecting data from people with fitness wearables and creating algorithms that can predict the onset of certain viral illnesses. This has profound consequences for stopping transmission not only of Covid-19, but also the flu and other viruses, and can even be applied to the management of some chronic inflammatory conditions.

    As the pandemic has shown, we were always capable of moving faster in health-care innovation, and many of the standard practices were often consequent of institutional inertia rather than lack of ability, but the changing nature of the pandemic in the Western world still risks a return to slow, business-as-usual health-care progress. A coalition of the willing is needed around the world to increase biomedical-data generation and its utilisation so that we can transform our practices around health care in the ways that Salvarsan and genomics have done previously. Even as the pandemic shifts, it has shown that a better, data-driven world with faster-paced biomedical innovations is not only feasible but already exists, waiting to be capitalised upon and taken into new areas of global health and precision personal health-care delivery.

    Part of this drive will require a new model of thinking of data as a digital specimen. Whereas issues of consent and collection have previously been a top-down approach, a bottom-up voluntary approach can be based on individual actions and collective efforts, which can then be used to advance discovery and innovation on the terms of the patients who need it most. By setting up platforms, collaborations, tools and more, we hope to foster this transformation of health-care data around the world to usher in a new era of improved health care, health span and individual agency in managing conditions, risks and privacy for transformative research outcomes.



    The biotechnology revolution sweeping the world presents profound possibilities to leverage the collective power of humanity in ways that have never before been possible, while also providing more protection of the freedom and liberty of individuals. In health, this is no different. On its own, data will never be a magic bullet, but the more of it we build in ways that empower patients, doctors and researchers, the more potential we have to open up new innovations. To do so, we must accelerate scientific advances in multi-omics, digital health, sensing and clinical medicine while also reimagining clinical, policy, financial and technical structures that enable both the research underlying these developments and the translation of promising methods into new clinical and individual health practices.

    Lead Image: Getty Images

    Charts created with Highcharts unless otherwise credited.

    Find out more