2020 Awards
Description: This dataset is the first large-scale human-annotated Twitter sentiment dataset for Hausa, Igbo, Nigerian-Pidgin, and Yorùbá, the four most widely spoken languages in Nigeria.
Authors: Shamsuddeen Hassan Muhammad, David Ifeoluwa Adelani, Sebastian Ruder, Ibrahim Said Ahmad, Idris Abdulmumin, Bello Shehu Bello, Monojit Choudhury, Chris Chinenye Emezue, Saheed Salahudeen Abdullahi, Anuoluwapo Aremu, Alipio Jeorge, and Pavel Brazdil
Languages: Hausa, Igbo, Nigerian-Pidgin, and Yorùbá
Dataset: access here
Description: This evaluation dataset automatically quantifies the quality of machine translation systems for Afar, Amharic, Oromo, Somali and Tigrinya.
Authors: Asmelash Teka Hadgu, Gebrekirstos G. Gebremeskel, Abel Aregawi
Translators: Afar – Mohammed Deresa, Yasin Nur; Amharic – Tigist Taye, Selamawit Hailemariam, Wako Tilahun; Oromo – Gemechis Melkamu, Galata Girmaye; Somali – Abdiselam mohamed, Beshir Abdi; Tigrinya – Michael Minassie, Berhanu Abadi Weldegiorgis, Nureddin Mohammedshiek
Languages: Afar, Amharic, Oromo, Somali and Tigrinya
Dataset: access here
All Lacuna Fund datasets are licensed under the CC-BY 4.0 International license unless otherwise noted.