keyboard_arrow_up
Self-Explaining Emotion Classification through Preference-Aligned Large Language Models

Authors

Muhammad Hammad Fahim Siddiqui, Diana Inkpen, and Alexander Gelbukh, University of Ottawa, Mexico

Abstract

Recent advancements in large language models (LLMs) have shown promise for NLP applications, yet producing accurate explanations remains a challenge. In this work, we introduce a self-explaining model for classifying emotions in X posts and construct a novel preference dataset using chain-of-thought prompting in GPT-4o. Using this dataset, we guide GPT-4o with preference alignment via the Direct Preference Optimization (DPO). Beyond GPT-4o, we adapt smaller models such as LLaMA 3 (8B) and DeepSeek (32B distilled) through preference tuning using Odds Ratio Preference Optimization (ORPO), significantly boosting their classification accuracy and explanation quality. Our approach achieves state-ofthe- art performance (68.85%) on the SemEval 2018 E-c multilabel emotion classification benchmark, exhibits comparable results on the DAIR AI multiclass dataset and attains a high sufficiency score—indicating the standalone effectiveness of the generated explanations. These findings highlight the impact of preference alignment for improving interpretability and enhancing classification.

Keywords

LLMs, preference alignment, emotion classification

Full Text  Volume 15, Number 10