GSMA AI Talent Mapping
Data Engineer & Analyst
About the Project
A data-driven research project mapping AI and LLM talent distribution across 5 African regions and 7 skill clusters. The system collects empirical signals from multiple sources — job market APIs (Adzuna, Remotive), GitHub repositories (topics, contributors, stars), research publications (Semantic Scholar, arXiv, Google Scholar), and ecosystem programs — to build a comprehensive picture of Africa's AI talent landscape.
Key Highlights
- Built data collection pipeline across multiple APIs (Adzuna, Remotive, Serper, Semantic Scholar)
- Analyzed AI talent across 5 African regions and 7 skill clusters
- GitHub repository analysis: topics, contributors, stars as talent signals
- Research publication tracking across arXiv and Google Scholar
- PowerBI dashboard and Next.js web app for interactive visualization
Technical Challenges
Aggregating talent signals from completely different data sources (job postings, code repos, academic papers) into a unified view required careful normalization. Each source has different biases — GitHub skews toward open source, job APIs toward formal employment — so the mapping needed to weight signals appropriately.