Abstract
Sentiment analysis is a relevant area in the natural language processing context–(NLP) that allows extracting opinions about different topics such as customer service and political elections.
Sentiment analysis is usually carried out through supervised learning approaches and using labeled data. However, obtaining such labels is generally expensive or even infeasible. The above problems can be faced by using models based on self-supervised learning, which aims to deal with various machine learning paradigms in the absence of labels. Accordingly, we propose a self-supervised approach for sentiment analysis in Spanish that comprises a lexicon-based method and a supervised classifier. We test our proposal over three corpora; the first two are labeled datasets, namely, CorpusCine and PaperReviews. Further, we use an unlabeled corpus conformed by news related to the Colombian conflict to understand the university journalistic narrative of the war in Colombia.
Obtained results demonstrate that our proposal can deal with sentiment analysis settings in scenarios with unlabeled corpus; in fact, it acquires competitive performance compared with state-of-the-art techniques in partially-labeled datasets.
Sentiment analysis is usually carried out through supervised learning approaches and using labeled data. However, obtaining such labels is generally expensive or even infeasible. The above problems can be faced by using models based on self-supervised learning, which aims to deal with various machine learning paradigms in the absence of labels. Accordingly, we propose a self-supervised approach for sentiment analysis in Spanish that comprises a lexicon-based method and a supervised classifier. We test our proposal over three corpora; the first two are labeled datasets, namely, CorpusCine and PaperReviews. Further, we use an unlabeled corpus conformed by news related to the Colombian conflict to understand the university journalistic narrative of the war in Colombia.
Obtained results demonstrate that our proposal can deal with sentiment analysis settings in scenarios with unlabeled corpus; in fact, it acquires competitive performance compared with state-of-the-art techniques in partially-labeled datasets.
Original language | Spanish (Colombia) |
---|---|
Article number | 5472 |
Pages (from-to) | 1-16 |
Number of pages | 17 |
Journal | Applied Sciences (Switzerland) |
Volume | 12 |
Issue number | 5472 |
DOIs | |
State | Published - May 28 2022 |