Lacuna Fund aims to close the gap in health disparities by fostering interdisciplinary collaborations that create, expand, or aggregate labeled training and evaluation datasets. Ultimately, the goal of this information is to help providers and patients make decisions that lead to more equitable healthcare outcomes.

The value of machine learning (ML) in healthcare is its ability to process huge datasets beyond the scope of human capability, and then reliably convert analysis of that data into clinical insights that aid physicians in planning and providing care, ultimately leading to better outcomes, lower costs of care, and increased patient satisfaction. [1]

However, a major concern with ML implementation has emerged regarding bias in the datasets that inform algorithms that inform diagnosis and treatment. Healthcare datasets often lack broad representation across demographic and socioeconomic groups. Where this representation is addressed, the data may also not be totally accurate, resulting on incorrect understanding of differences across groups. Current data may also miss important socioeconomic, environmental, and other data that can determine health outcomes. Improvement of these datasets may improve machine learning models and ultimately, health outcomes. [2]

Lacuna Funding

The Lacuna Fund seeks Expressions of Interest (EOIs) from multi-disciplinary teams to develop open and accessible training and evaluation datasets for ML applications that address inequities in healthcare outcomes in the United States and in Low and Middle-Income Countries (LMICs) globally. The purpose of this call for EOI is to identify promising ideas and invite teams to submit full proposals.

Our goal is to support the creation, augmentation, or aggregation of datasets that are representative of affected populations and are therefore less biased and more likely to lead to equitable health outcomes. Given historic inequities related to race in the U.S., we are interested in datasets that could help reduce racial disparities in healthcare outcomes in the U.S., and datasets that can mitigate inequities in healthcare outcomes related to identity in LMICs (e.g. ethnicity, tribal affiliation, gender, etc.). We know that lack of diversity in terms of sexuality, age, ability, geographic location, and even type of care setting (e.g., community health system vs. a large academic medical system, inpatient vs. outpatient), can make a dataset less representative and lead to unfair outcomes for subsets of a population. We encourage applicants to consider taking an intersectional approach by including data for multiple underserved groups, or to explain why a certain type of diversity is important to ensure equitable outcomes for a specific use case.

Our Technical Advisory Panel, which is responsible for identifying data gaps, developing the EOI, and reviewing and selecting proposals, has identified needs for datasets that can be used to address a health disparity in the following areas:

  • cancer
  • infectious disease
  • chronic disease

However, Lacuna Fund EOIs are intentionally open to encourage new and innovative ideas that we may not have identified. Applicants may also make a case for why a dataset in another area could significantly reduce health inequities.

The Technical Advisory Panel sees great value in unlocking, augmenting, or aggregating existing datasets and is also open to proposals to create new datasets. Most critically, we want to ensure the underlying dataset is not fundamentally flawed so as to avoid perpetuating bias. All of the approaches described below are of interest:

  • Pooling existing data from healthcare systems, health insurance companies, or health data intermediaries to make it accessible to researchers to be able to reveal and correct algorithmic bias (e.g., MIMIC, PhysioNet).
  • Filling gaps/making existing datasets more representative (e.g., images of skin cancer, lung x-rays to detect COVID, etc.).
  • Linking existing clinical datasets with data on social determinants of health to create more robust, informative datasets.
  • Cleaning up existing datasets to ensure accuracy in data about race, ethnicity, gender, disability, etc.

Successful applicants will propose creation, expansion, or aggregation of a dataset(s) that can be used to address multiple research questions and correct a bias or disparity.

Open RFPs

See information on how to apply as well as open and past RFPs here.


The 2021 request for EOIs to create or aggregate datasets that can fill a gap, correct a bias, or address a disparity in health is supported by The Rockefeller Foundation,, Wellcome Trust, and the Gordon and Betty Moore Foundation.

[1] Corbett, Ed. “Real-World Benefits of Machine Learning in Healthcare.” Health Catalyst, 19 Nov. 2020,

[2] Wawira Gichoya J, McCoy LG, Celi LA, et al. “Equity in essence: a call for operationalising fairness in machine learning for healthcare”. BMJ Health & Care Informatics 2021;28:e100289. doi:10.1136/bmjhci-2020-100289