Refine-LM
Mitigating Language Model Stereotypes via Reinforcement Learning
Rameez Qureshi, Naim Es-Sebbani, Luis Galárraga, Yvette Graham, Miguel Couceiro & Zied Bouraoui
REFINE-LM is a novel approach to mitigate stereotypical biases in large language models (LLMs) using reinforcement learning. Unlike existing methods that require extensive fine-tuning or manual annotations, REFINE-LM debiases models by acting on the word probability distributions, reducing biases related to gender, ethnicity, religion, and nationality without impacting performance of the model. It is efficient, scalable, and applicable to various LLMs, providing a versatile solution for reducing harmful stereotypes in NLP applications.