SAIL Logo
HomeAboutProjectsNews & EventsNLP ResourcesContact
SAIL Logo

Somali-language AI and Innovation Lab — Pioneering the digital frontier for Somali language through cutting-edge AI research and innovation.

Jamhuriya University of Science and Technology
Mogadishu, Somalia
sail@just.edu.so
+252 - 61- 2223999

About

  • About SAIL
  • Research Areas
  • Why SAIL?

Quick Links

  • Featured Projects
  • News & Insights
  • Resources
  • Contact

2026 SAIL - Somali-language AI and Innovation Lab. All rights reserved.

NLPcompleted

CIRAL: A Test Collection for CLIR Evaluation in African Languages

Read Full Article
March 8, 2026
SAIL Team

Abstract

Cross-lingual information retrieval (CLIR) continues to be an actively studied topic in information retrieval (IR), and there have been consistent efforts in curating test collections to support its research. However, there is a lack of high-quality human-annotated CLIR resources for African languages: the few existing collections are mostly curated synthetically or from sources with limited corpora for these languages. We present CIRAL, a test collection for cross-lingual retrieval with English queries and passages in four African languages: Hausa, Somali, Swahili, and Yoruba. CIRAL’s corpora are obtained from Indigenous African websites and consist of a total of over 2.5 million passages. We gathered over 1,600 queries and 30k high-quality binary relevance judgments annotated by native speakers of the languages. Additional pools were also obtained at CIRAL’s shared task, which was hosted at the Forum for Information Retrieval Evaluation 2023 to encourage community participation in CLIR for African languages. We describe the design and curation process of our test collection and provide reproducible baselines that demonstrate CIRAL’s utility in evaluating the effectiveness of systems. CIRAL is available at https://github.com/ciralproject/ciral.

Related Projects

Explore more projects in this category

Research Paper
NLP

Morphologically-informed Somali Lemmatization Corpus built with a Web-based Crowdsourcing Platform

Somali NLP Engine
AI/NLP

Detection of Somali-written Fake News and Toxic Messages on the Social Media Using Large Language Models

AI Chatbot
AI

AfriMTE and AfriCOMET: Enhancing COMET to Embrace Under-resourced African Languages