Autocorrect spelling mistakes in Swedish language

As human beings, it is simple for us to understand text messages that contain spelling mistakes. But do you know that sometimes Ebbot can comprehend messages if there are too many misplayled words? Understanding that typos can be one of the reasons why Ebbot cannot give the right response, our NLP team at Hello Ebbot decided to develop a new feature to autocorrect spelling mistakes, specifically for the Swedish language! Our spellcheck corrector not only considers context to provide better correction, but also has fixed performance. In this blog post, we will tell you our little secret to achieve this result 🤖

Ebbot's response before applying spellchecking
Ebbot's response after applying spellchecking

Introducing Jamspell - a spellcheck library

Jamspell is an open-source spelling mistakes autocorrection library. It is written in C++ but available in many other programming languages by using swig – a software tool that is used to connect libraries written in C or C++ with other languages such as Python, PHP, Javascript...

Furthermore, it is very simple to train a custom model to support the language of your choice. In order to customize Jamspell to a new language, you will need two utf-8 files, one is the alphabet.txt which contains the language's alphabet, and the other one is a corpus of sentences to be trained. Because the purpose of the training process is to "teach" the model how to spell words correctly, it is recommended to use a corpus with minimal spelling mistakes. In our case, we chose OpenSubtitles as our training corpus, but there are other options to choose from as well, such as Wikipedia or literature documents. Detailed steps to accomplish this can be found on Jamspell's Github.

How Jamspell performs spellchecking in Swedish language

In order to integrate this easily into our applications, our NLP team decided to wrap the model into an API using FastAPI. For security purposes, we unfortunately cannot share the access to the API publicly. But here are some extremely challenging examples which our model successfully corrected, that we would like to show you 👇

  1. Jdg springwr mwd my dog every day
  2. Vusstw du toll example that there are also 5,000 somethingurrrservat in Svetihe?
  3. Jqg her still intw gets my ordwr
  1. I run with my dog every day
  2. For example, did you know that there are over 5,000 nature reserves in Sweden?
  3. I still haven't received my order.

Now, that's what we call 🪄 magic! We are working on integrating this into Ebbot's skills set, but do you know that we already have a magical dishwasher ready to use? If you want to know more about this special dishwasher, feel free to contact us! See you in our next magic show... 🧙 ♀️


Curious about Ebbot?

If you want to know more about how Ebbot – a helpful digital employee – can assist you, let's meet and talk about it! All you need to do is clicking on the button below 👇

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share by email

Read more

Paraphrase questions in Swedish using T5

After one month of preparing the dataset and training, we proudly present to you a T5 (Text-To-Text Transfer Transfromer) based model that paraphrases any questions in the Swedish language. By learning from paraphrased questions by the Swedish T5, we are no longer limited to just the topics in our current questions database.