AI/Algorithmic Fairness and Bias
Evaluating sociocultural bias in language models
Overview
This project examines how NLP systems encode sociocultural biases in Bengali, focusing on identity dimensions such as religion, nationality, and their intersections.
Outcome
- Bengali cultural bias evaluation dataset (BIBED)
- Persona-based auditing of language models
- Empirical evidence of sociocultural bias
Approach
We design datasets capturing explicit and implicit identity signals (e.g., names, dialects) and conduct algorithmic audits by varying identity attributes in model inputs. Findings are interpreted through sociotechnical and postcolonial perspectives.
Key Findings
- Bias emerges across religious and national identities
- Models rely on linguistic proxies to infer identity
- Translation pipelines risk cultural misrepresentation
Contributions
- Culturally grounded fairness dataset
- Framework for low-resource bias evaluation
- Insights bridging NLP fairness and sociotechnical theory