Swarm Learning: Expert Explainer

This piece forms part of an expert explainer series as part of a collaboration between the Tony Blair Institute and the Stanford Healthcare Innovation Lab. Together, TBI and SHIL have a shared vision to transform personal and global health in the 21st century. Combining leading policy and research capabilities, they are working to develop the technology and knowledge to improve lives around the world.

Introduction

One of the central battles playing out in the internet era is that between centralisation vs decentralisation. At the radical end of this spectrum, many who fully believe in a decentralised future are building crypto, developing blockchain and creating new markets and communities beyond the control of the traditional gatekeepers. But this fight might be most transformative in health, where new tools can analyse data at vast scale and solve the centralisation problem that has long impeded progress: privacy. Central among these tools is a new type of Machine Learning: Swarm Learning.

The biotech revolution

We are currently at the beginning stages of a generation-defining revolution in biology. For the past two decades, breakthroughs in our understanding of genetics and genomics, coupled with those in AI and machine learning, have presented us with opportunities to radically improve healthcare around the world. Data is now a digital specimen, but as more and more data is collected, often in different formats and on disparate platforms, new solutions are needed to successfully integrate, store, compute, and secure these data. At the margins, this might mean saving around $1 per person. But on aggregate, reducing the cost of monthly data storage would save healthcare systems millions of dollars annually.

The Stanford Healthcare Innovation Lab (SHIL) is focused on building the future of precision health, deep medicine, and truly 21st century healthcare. Our work combines biology and data science to extend the boundaries of knowledge and raise the standard of care for people around the world. Analysing various multiomics datasets, stored on different cloud providers is central to the pursuit of deepening our understanding of the mechanisms of disease and in delivering personalised medicine. In short, the more biological data and associated metadata researchers can access, the more insights we can deliver. However, progress has often been impeded by the cost, friction and privacy risks of doing so.

To push the frontiers, the team at SHIL have therefore developed a new framework for federated learning – a pioneering concept for handling data in an era where our devices, smartphones, tablets, and applications are interconnected warehouses of our digital identity. Rather than centralise data, federated learning means that analysis can be run on computing frameworks – meaning the identifiable information that often raises privacy concerns is not stored. In essence, federated learning allows two distant, secure, datasets to be temporarily treated as one, without moving or sharing the entire dataset.

Swarm

Swarm (Bahmani and Ferriter et al. 2021) facilitates and minimises the movement of data between different cloud platforms and improves the security and privacy of health data being transferred. By developing an Application Programming Interface (API) that utilises a serverless computing model (underlying infrastructure is managed by cloud providers) to evaluate how much data needs to move, performs computation in situ as much as possible, and facilitates data transfer, it can open up new avenues for collaboration at a low cost.

Figure 1

A federated cloud framework for large-scale variant analysis

In proving the concept, the team at Stanford ran four experiments across cloud providers including Google Cloud Platform (GCP) and Amazon Web Services (AWS) leveraging GCP BigQuery, AWS Athena, Apache Presto and MySQL. These each have different properties and represent the diversity of platforms that such a framework needs to work on. Across these, our analysis was accurate, it decreased the amount of data that needed to be scanned or transferred, while also reducing time and cost (e.g., moving only 0.001% of a full annotation table of genomic variants -- 4KB of 500GB). Although this technology is still in its infancy, it has the potential to open up new possibilities to analyse important data at scale: cheaply and quickly. But more importantly, this new technology also improves security and protects privacy.

Future direction

Often, different teams across the same organization store data on different platforms so solutions such as Swarm can make data sharing easier. This is especially helpful when making discoveries through data-driven approaches where there is a need to combine data from various secured sources.

For healthcare, where progress in our understanding is reliant on data, this framework can push us forward. Platforms like Swarm are critical for enhancing inclusion by handling sensitive data for unrepresentative cohorts who might be concerned about using any public cloud providers. In case of building better clinical decision support systems, federated data shared from different healthcare systems can provide more training data.

In order to further move to a world of precision medicine, it is also critical that we have diversity of data. In particular, this means that all countries and ethnicities need to be represented. This may still be a way off, but developing technologies such as federated learning can provide technical solutions that can help share data across countries without compromising privacy and security.

Those of us in the research and policy communities and working in technology need to work together in building innovative solutions to advance biomedical research. We believe that everyone should have access to world-class health and in building a community, the team at SHIL are making our tools open-source. This way others can easily adopt these tools and build them into even more impactful applications. It is essential for policy-makers and politicians to engage with those advancing technical and scientific knowledge so that we can work together to overcome the challenges that hinder progress. The prize is significant: as our biological and computational abilities increase at a faster pace, we can translate this into better health outcomes for people around the world.

Photo credit: Getty Images

Article Tags

Science & Innovation