Resumen:
In the era of social media, there has been an observed increase in the expression of hatred and discrimination online, posing psychological and physical risks for specific individuals or groups. This phenomenon has driven the need to detect and address offensive language in these environments. Currently, manual validations and systems designed primarily for English language address this issue, but there is a lack of specific approaches for other languages, including Spanish.
This thesis focuses on developing an effective and robust method to detect offensive language targeted at the LGBTIQ+ community in Spanish. It employs six supervised learning methods: Neural Network, Decision Tree, Support Vector Machine, Naive Bayesian Classifier, Logistic Regression, and Random Forest. The challenge of class imbalance in web data is addressed through a corpus of 6,716 Twitter documents, previously labeled and selected from a larger set, along with bag- of-words vectorization techniques and a customized polarity lexicon.
This approach combines the power of supervised learning with a lexicon tailored to the LGBTIQ+ community, capturing the complexities and nuances of offensive language in this context. The research aims to enhance the ability to detect offensive comments online, providing a significant contribution to addressing the expression of hatred on digital platforms.