Skip to content

Health Domain

Lacuna Fund health datasets reduce health disparities by helping providers and patients make decisions that lead to more equitable healthcare outcomes. These datasets can be used to train chatbots, provide reliable medical information to the public, assist with disease screening and diagnosis, and assess the health and treatment of large populations over time (e.g. maternal health or HIV data). Learn more and download released datasets below.

Health Datasets

Description: This dataset will help in the real-time and remote diagnosis of rabies disease for humans and animals in low-resource settings. A time series approach can be applied to the outbreak dataset to predict the number of rabies cases likely to occur within an area after a given time interval. This approach can help with resource mobilization, too, such as identifying the number of vaccines required in a specific area at a given time. The number of observations from the two datasets is 12,684. There are three datasets for rabies diagnosis for animals and humans, with 7,081 and 4,585 observations, respectively. In the outbreak prediction dataset, 1,018 observations were accounted for. 

Contact: Asa Emmanuel | asakalonga@gmail.com and Kennedy Lushasi | klushasi@ihi.or.tz 

Authors and Affiliations: Asa Emmanuel, Rebecca Chaula, Deogratias Mzurikwao, Joel Changalucha, Kennedy Lushasi 

Dataset: access here.

Contact: Maria Paz Hermosilla | goblab@uai.cl 

Description: This data repository will evaluate factors that contribute to child malnutrition in Chile and childrens’ nutritional status, as well as the associated costs. The focus at this stage is on estimating health costs associated with child malnutrition and identifying biopsychosocial determinants that lead to it.

Authors and Affiliations:  

  • Ministry of Health, Chile  
  • GobLab, School of Government, Adolfo Ibañez University, Chile 
  • FONASA (Public health insurance agency) 
  • Health Superintendency, JUNAEB (national school aid and scholarship board). 

Dataset: Given the sensitive nature of the data contained in this repository, those interested can visit the project website here for controlled access for relevant awarded research projects:  https://goblab.uai.cl/proyecto-reduccion-de-la-malnutricion-infantil-en-chile/. 

Contact: Rose Nakasi | g.nakasi.rose@gmail.com or rose.nakasi@mak.ac.ug 

Description: This dataset will aid in the diagnosis of malaria. The dataset contains annotated images of blood samples collected in Uganda and Ghana with objects of interest, including parasites and white blood cells. It significantly increases the number of available microscopy images — including metadata — by 6,000 thick blood slides and 2,000 thin blood slides for use in object detection research and other areas of inquiry. 

Authors and Affiliations:  

  • Makerere Artificial Intelligence Lab  
  • minoHealth 

Dataset: https://doi.org/10.7910/DVN/VEADSE 

Region: Sub-Saharan Africa

Contact: Bhiken Naik | bin4n@uvahealth.org

This dataset can be used to identify patterns of intraoperative anesthesia practice and predict postoperative length of stay and risk of mortality based on intraoperative variables. It      includes 2,066 intraoperative anesthesia records from two academic centers in sub-Saharan Africa. The team photographed completed intraoperative anesthesia records using a smartphone, de-identified the images, and securely uploaded them to a HIPAA-compliant server. Using a combination of computer vision AI and manual extraction techniques, the team collected the following comprehensive intraoperative data: demographic data, medication data, hemodynamic data, physiological data, anesthesia type, surgery type, postoperative length of stay, and 30-day postoperative mortality.

Intraoperative anesthesia data encompasses a wide range of information that is essential for patient care during surgical procedures. However, capturing this depth of information is particularly challenging in low- and middle-income countries (LMICs), where the current electronic intraoperative anesthesia datasets are often limited in scope. As a result, a significant number of key data elements, which could be vital for clinical decision-making and research, are either missing or not available. This limitation hinders the ability to fully understand and improve patient outcomes in LMICs, so this dataset fills a critical gap by developing a method to include all data elements from the intraoperative anesthesia records.

Authors and Affiliations:

  • University of Virginia: Bhiken Naik
  • School of Medicine and Pharmacy, University of Rwanda and King Faisal Hospital, African Health Sciences University: Paulin Banguti
  • Safe Surgery South Africa: Hyla Kluyts
  • University of Virginia: Ryan Folks

Dataset: https://portal.ithriv.org/#/public_commons/project/d9fc062c-64c9-4481-80e7-3db4aba17e00

Country: Nigeria

Contact: Udunna Anazodo | udunna.anazodo@mcgill.ca

The BraTS-Africa dataset is an aggregation of magnetic resonance imaging (MRI) scans from six centers in Nigeria aimed at providing a public dataset for the development of machine-learning solutions for the management of brain tumors in African patients. This dataset serves as a starting framework for future expansion in other regions of Africa. The team processed and annotated a total of 584 images from 146 patient scans. Ninety-five of these scans are presumed to have diffuse glioma, and 51 of them have other types of central nervous system (CNS) neoplasms. Expert radiologists annotated three distinct tumor sub-regions to delineate the enhancing tumor (ET), the necrotic tumor core (NCR), and the peritumoral oedematous/infiltrated tissue (ED) sub-regions.

Prior to this study, there was no known comprehensive annotated brain imaging dataset available to the public from Africa. This study filled that gap to ensure that novel machine-learning solutions for neurological disease management, such as brain tumors, can solve the unmet clinical needs in Sub-Saharan Africa.

Authors and Affiliations:

  • Medical Artificial Intelligence (MAI) Lab (Lagos, Nigeria): Maruf Adewole, Abiodun Fatade, Oluyemisi Toyobo, Farouk, Dako, Udunna Anazodo
  • The National Hospital (Abuja, Nigeria): Feyisayo Daji, Chinasa Kalaiwo
  • Lagos University Teaching Hospital: Olubukola Omidiji
  • Lagos State University Teaching Hospital: Rachel Akinola
  • NSIA-Kano Diagnostic Center: Mohammad Abba Suwaid
  • Federal Medical Centre (Umuahia, Nigeria): Kenneth Aguh
  • Lily Hospital (Benin, Nigeria): Mayomi Onuwaje
  • University of Pennsylvania (Philadelphia, USA): Farouk Dako
  • Indiana University (Indianapolis, USA): Spyridon Bakas
  • Scripps Clinic Medical Group (San Diego, USA): Jeffery Rudie
  • McGill University (Montreal, Canada): Udunna Anazodo

Dataset: https://www.cancerimagingarchive.net

Country: Nepal

Contact: Bishesh Khanal | bishesh.khanal@naamii.org.np

This dataset helps to detect diarrhea-causing parasites in resource-limited rural areas, particularly across the Global South, where access to expensive diagnostic tools is limited. It     contains approximately 400,000 microscopic slide images from water, vegetable, and stool samples from four different provinces across Nepal, making it one of the largest datasets of its kind.      The team collected water samples from different sources (i.e., tap water, bottled water, lake, river, pond, stream, spring water, wetland, well, and borewell) and used seven different types of vegetables. Using the dataset and annotations available, this team trained different deep-learning models to automatically detect parasites, specifically Giardia and Cryptosporidium cysts.

The sample images were captured using both smartphone and brightfield microscopes before being uploaded to an online data collection and annotation platform. This platform allows multiple users to upload images of samples with permission-based features for quality control. Permitted users can review the uploaded images, approve or reject them, add comments on individual images, and filter the view for samples based on a certain date range or province. As a first step, this dataset focused on Nepal, but it is designed to be applicable across similar regions worldwide.

Authors and Affiliations:

  • Nepal Applied Mathematics and Informatics Institute for Research (NAAMII): Bishesh Khanal, Udit Chandra Aryal, Safal Thapaliya
  • Kathmandu Institute of Applied Science (KIAS): Dr Basant Giri, Dr. Susma Giri, Dr Bhanu Neupane, Asmita Adhikari, Asmita Karki, Ramdeep Shrestha, Aayusha Upreti, Pramikshya Bagale, Deepa Prajapati, Prashamsa Shrestha, Celeus Baral
  • Nyaya Health Nepal, Bayalpata: Mandeep Pathak, Ekendra Kunwar, Khadak Chaudhary, Sunil Buda, Tapendra Kunwar, Ramesh Badahit, Nim Prakash Sharma
  • Provincial Public Health Laboratory (PPHL)-Janakpur: Shravan Kumar Mishra, Santosh Kumar Yadav, Jitendra Kumar Sah, Amrendra Kumar Mishra, Sarwajit Yadav, Ashish Jha
  • Kathmandu Institute of Child Health (KIOCH)-Damak: Dr. Bhagawan Koirala, Dr. Sandeepa Karki, Dr. Jayamani Shrestha

Dataset: https://zenodo.org/records/13913469

All Lacuna Fund datasets are licensed under the CC-BY 4.0 International license unless otherwise noted.