AI/Algorithmic Fairness and Bias

Evaluating sociocultural bias in language models

Overview

This project examines how NLP systems encode sociocultural biases in Bengali, focusing on identity dimensions such as religion, nationality, and their intersections.

Outcome

  • Bengali cultural bias evaluation dataset (BIBED)
  • Persona-based auditing of language models
  • Empirical evidence of sociocultural bias

Approach

We design datasets capturing explicit and implicit identity signals (e.g., names, dialects) and conduct algorithmic audits by varying identity attributes in model inputs. Findings are interpreted through sociotechnical and postcolonial perspectives.

Key Findings

  • Bias emerges across religious and national identities
  • Models rely on linguistic proxies to infer identity
  • Translation pipelines risk cultural misrepresentation

Contributions

  • Culturally grounded fairness dataset
  • Framework for low-resource bias evaluation
  • Insights bridging NLP fairness and sociotechnical theory