Toxic comment detection in Swedish
Toxic comment detection in Swedish

Detecting toxic messages in Swedish language

Even though the rapid development of Internet and social media contributes significantly to human connection, it is undeniable that this is also the very reason why toxic behaviors become more common online. Thus, toxic comments classification has been researched by experts in the Machine Learning field for the past few years. Recently, one of our clients asked us to teach Ebbot to detect toxic messages in conversations. Thanks to this special request, we got a chance to work on one of the most difficult topics in the Natural Language Processing (NLP) field. And yes, we can not be more excited! đŸ„ł

Challenges with collecting dataset

In order to successfully implement this classification task, we have to train Ebbot on a dataset of text with toxicity. Although large labeled training datasets exist, they are not available in Swedish. And using machine translation is not a good approach, since there are many slangs that cannot be translated accurately by machines.

Ebbot's solution to toxic messages detection

After researching, we found an open-source yet highly accurate trained model, built by Laura Hanu at Unitary. In addition to the original version, which only supported English and was trained on Wikipedia comments, Unitary also provided a multilingual model which was trained on 7 different languages (english, french, spanish, italian, portuguese, turkish and russian).
At the same time, we also found a machine translation model by the Language Technology Research Group at the University of Helsinki. This combination enables us to work around the lack of dataset and meet our clients’ request. After receiving input text in Swedish, Ebbot will translate it to English first, then run it through the toxicity classifier. The output will be the scores for six categories of toxic messages: toxicity, severe toxicity, obscene, threat, insult and identity hate. Using this method, not only can we decide whether a message is toxic or not, but we are also able to see which type of inappropriate behaviors it brings.
examples of toxic comments detection
Examples of toxic comment detection from our sample web app
We are aware that this is not the best solution when it comes to solving Machine Learning/Artificial Intelligence problems. Nevertheless, when facing the challenges of not having available training dataset, we consider this to be one of the quickest and easiest ways to tackle multilingual NLP challenges. Currently we are testing the model and gathering user feedback to improve the app’s performance. But please feel free to contact us if you have any inquiries about our bot-builder product or special NLP integrations 🙌 We are usually very responsive 😉

Wanna know more about our product?

If you are curious and want to know more about how Ebbot – a helpful digital employee – can assist you, let’s meet and talk about it! All you need to do is clicking on the button below 👇

Share This Post

Dela pÄ facebook
Dela pÄ linkedin
Dela pÄ twitter
Dela pÄ email

LĂ€s fler

Autocorrect spelling mistakes in Swedish language

Understanding that typos can be one of the reasons why Ebbot cannot give the right response, our NLP team at Hello Ebbot decided to develop a new feature to autocorrect spelling mistakes, specifically for Swedish language! Our spellcheck corrector not only considers context to provide better correction, but also has fast performance.