Agriculture Datasets
Description: This machine learning dataset of smallholder farmer’s fields includes georeferenced crop images along with labels on input use, crop management, phenology, crop damage, and yields, collected across 8 counties in Kenya.
Authors: Lilian Waithaka, Koen Hufkens, Berber Kramer and Benson Njuguna
Dataset: access here
Description: This dataset includes corrected geolocations of fields, improving the usability of the most expansive Eastern Africa crop cut yield estimation. Collected by the non-profit One Acre Fund from 2015 – 2019, this dataset covers major crop producing regions in Kenya, Rwanda, and Tanzania.
Dataset: access here
Description: This project built a remotely monitored and controlled Internet of Things (IoT) fish pond water quality management system for the generation of labeled datasets both for conventional ponds and the aquaponic pond systems.
Authors: Udanor Collins, Blessing Ogbuokiri, and Nweke Onyiny
Dataset: access here
Description: This dataset contains a repository of image and spectrometry datasets for five main food security crops in Sub-Saharan Africa: cassava, maize, beans, bananas, and cocoa. Collected and curated in collaboration with the in-country agricultural experts, the datasets deliver a wide range of machine learning applications, including classification, object detection, early crop disease detection, and spatial analysis. The team collected and annotated 127,046 images and 39,300 spectral data points.
Authors: Joyce Nakatumba-Nabende, Andrew Katumba, Claire Babirye, Jeremy Francis Tusubira, Godliver Owomugisha, Neema Mduma, Darlington Akogo, Blessing Sibanda
Dataset: access here
Description: This dataset focuses on locations with predominantly pastoral communities in northern Tanzania to identify fine and broad-scale movements of livestock and land use patterns and to understand how these relate to communal conflicts. It is a high-quality, accurate and labeled (image, location, and time stamps) dataset containing detailed information on ~ 2000 communal resources (e.g., rangelands, water points, and dips) and their use patterns for over 220 villages across four large districts in northern Tanzania, representative of pastoral systems of livestock production in East Africa. The dataset can be used to describe forage and livestock resource management in managed ecosystems such as community rangelands; identify major migration routes among pastoralist herds and the location and type of infrastructure required to support livestock production; anticipate the location of conflicts with crop farmers and determine the best locations to establish forage banks and support infrastructure along livestock migratory routes.
Contact: Gladness Mwanga | gladnessg@nm-aist.ac.tz and Divine Ekwem | divine.ekwem@glasgow.ac.uk
Authors and Affiliations: Dr. Divine Ekwem (University of Glasgow); Gladness Mwanga (Nelson Mandela African Institution of Science and Technology), Professor Gabriel Shirima (Nelson Mandela African Institution of Science and Technology), Professor Mizech Chagunda (University of Hohenheim)
Dataset: access here.
Description: The project created labeled yield estimates from 3000 farmers, and was used to train prediction models for yield prediction across the country, consequently using the dataset to generate high resolution crop mask layers for the different value chains. The yield prediction models were enhanced by other biophysical datasets ranging from soil properties and climate related indicators. The datasets proved a concept of scalable machine learning models training, which may be able to respond more appropriately and cost-effectively to agricultural stressors, thereby ensuring a positive impact on agricultural practices (e.g., good agricultural practices), yields (e.g., harvest quality and quantity), and farmer access to financing (e.g., crop insurance).
Contact: Seth Odhiambo | sodhiambo@pula.io
Authors and Affiliations: Pula Advisors
Dataset: access here.
Contacts:
- Mary Dziedzorm Afenyo | Farmerline | mary@farmerline.co
- Lyndon Estes | Clark University | lestes@clarku.edu
- Primož Kovačič | Spatial Collective | primoz@spatialcollective.com
Description: This dataset provides continent-wide crop field labels for Africa, improving the availability and use of crop field boundary (parcel) maps. It contains 42,403 annotated geospatial polygons indicating the boundaries of individual crop fields spanning the years 2017-2023.
Authors and Affiliations:
- Wussah, A., Afenyo, M., Osei , A.K., Gathigi, M., Kovačič, P., Muhando, J., Addai, F., Akakpo, E.S., Allotey, M., Amkoya, P., Amponsem, E., Dadon, K.D., Gyan, V., Harrison X.G., Heltzel, E., Juma, C., Mdawida, R., Miroyo, A., Mucha, J., Mugami, J., Mwawaza, F., Nyarko, D., Oduor, P., Ohemeng, K., Segbefia, S.I.D., Tumbula, T., Wambua, F., Yeboah, F., Estes, L.D., 2024.
Dataset:
- Zenodo: https://zenodo.org/records/11060871
- Github: https://github.com/agroimpacts/lacunalabels
- AWS Open Data Registry: https://registry.opendata.aws/africa-field-boundary-labels/
Countries: Kenya, Mali, Togo, Rwanda, Uganda, Ethiopia, Malawi, Zambia, Tanzania, Namibia, Sudan, and Nigeria
Contact: Catherine Nakalembe | cnakalem@umd.edu
CropHarvest increases the understanding of the main types of food production in Sub-Saharan Africa and can help inform decision-making around agricultural development, early warning systems, and regional trade. It is a global, open-source remote sensing dataset for crop-type classification in Sub-Saharan Africa – specifically in Kenya, Mali, Togo, Rwanda, Uganda, Ethiopia, Malawi, Zambia, Tanzania, Namibia, Sudan, and Nigeria.
The team expanded on an existing dataset published in 2021 to now include the following: new labeled data points using Collect Earth Online, ground data for crop type mapping, street-level images, crowdsources labeled images, and price data. In addition, the Collect Earth Online data was randomly sampled to cover the entire country, filling critical data gaps in crop patterns and yields.
Authors and Affiliations:
- NASA Harvest: Tseng, G.
- University of Maryland, College Park: Zvonkov, I., Nakalembe, C.L. and Kerner, H.
Countries: Ghana, Uganda
Contact: Darlington Akogo | darlington@gudra-studio.com
This dataset supports yield estimation, crop type detection and classification, fruit detection and counting, and fruit maturity stage detection (unripe, ripe, and spoiled) for three products that are important sources of livelihood for millions of households in Sub-Saharan Africa.
It contains 14,870 drone images with bounding box annotations of cashew, cocoa, and coffee trees collected across multiple farms in Ghana and Uganda. Conventional methods of yield estimation are expensive, require a lot of labor and time, and are prone to error due to incomplete ground observations. This results in poor crop yield estimations and hinders farmers’ ability to appropriately plan and manage their fields and production pipelines. This dataset will help transform African agriculture into agribusiness by allowing for the development of yield estimation solutions that enable farmers to make good business decisions. Having key details about agricultural production readily accessible enables a timely harvest, helping farmers ensure healthy, fresh produce and, in addition, better sales.
Authors and Affiliations:
- KaraAgro AI: Darlington Akogo, Cyril Akafia, Harriet Fiagbor, Stephen Torkpo, Christian Kusi
- Makerere AI Lab: Joyce Nakatumba-Nabende
All Lacuna Fund datasets are licensed under the CC-BY 4.0 International license unless otherwise noted.