autocorrect-spelling-banner
autocorrect-spelling-banner

Autocorrect spelling mistakes in Swedish language

As human beings, it is simple for us to understand text messages that contain spelling mistakes. But do you know that sometimes Ebbot cannot comprehend messages if there are too many misspelled words? Understanding that typos can be one of the reasons why Ebbot cannot give the right response, our NLP team at Hello Ebbot decided to develop a new feature to autocorrect spelling mistakes, specifically for the Swedish language! Our spellcheck corrector not only considers context to provide better correction, but also has fast performance. In this blog post, we will tell you our little secret to achieving this result 🤖

Ebbot's response before applying spellchecking
Ebbot's response after applying spellchecking

Introducing Jamspell - a spellcheck library

Jamspell is an open-source spelling mistakes autocorrection library. It is written in C++ but available  in many other programming languages by using swig – a software tool which is used to connect libraries written in C or C++ with other languages such as Python, PHP, Javascript…

Furthermore, it is very simple to train a custom model to support the language of your choice. In order to customize Jamspell to a new language, you will need two utf-8 files, one is the alphabet.txt which contains the language’s alphabet, and the other one is a corpus of sentences to be trained. Because the purpose of the training process is to ”teach” the model how to spell words correctly, it is recommended to use a corpus with minimal spelling mistakes. In our case, we chose OpenSubtitles as our training corpus, but there are other options to choose from as well, such as Wikipedia or literature documents. Detailed steps to accomplish this can be found on Jamspell’s Github.

How Jamspell performs spellchecking in Swedish language

In order to integrate this easily into our applications, our NLP team decided to wrap the model into an API using FastAPI. For security purposes, we unfortunately cannot share the access to the API publicly. But here are some extremely challenging examples which our model successfully corrected, that we would like to show you 👇

Original
  1. Jdg springwr mwd min hund varjw dag
  2. Vusstw du toll exemprl att det finns äver 5 000 nåturrrservat i Svetihe?
  3. Jqg her fortfsrsnde intw fårt min ordwr
Corrected
  1. Jag springer med min hund varje dag
  2. Visste du till exempel att det finns över 5 000 naturreservat i Sverige?
  3. Jag har fortfarande inte fått min order

Now, that’s what we call 🪄 magic! We are working on integrating this into Ebbot’s skills set, but do you know that we already have a magical dishwasher ready to use? If you want to know more about this special dishwasher, feel free to contact us! See you in our the next magic show… 🧙‍♀️

Sälj-Ebbot-Rosa-bakgrund-chatbot

Curious about Ebbot?

If you want to know more about how Ebbot – a helpful digital employee – can assist you, let’s meet and talk about it! All you need to do is clicking on the button below 👇

Share This Post

Dela på facebook
Dela på linkedin
Dela på twitter
Dela på email

Läs fler

Paraphrase questions in Swedish using T5

After one month of preparing the dataset and training, we proudly present to you a T5 (Text-To-Text Transfer Transfromer) based model that paraphrases any questions in the Swedish language. By learning from paraphrased questions by the Swedish T5, we are no longer limited to just the topics in our current questions database.

LÄS MER »