example of the sentence similarity application
example of the sentence similarity application

Extending SentenceTransformers to Swedish language

The story of the NLP team from Hello Ebbot extending SentenceTransformers to Swedish started a month ago, when we unexpectedly received a call from Santa Claus...

🎅🏼 Santa: Hello, is this Hello Ebbot's NLP team? It's Santa Claus speaking! Hello Ebbot is on the Nice List this year and I have a gift for you.

👾 Hello Ebbot team: Oh Santa!! Really, you have a gift for us?

🎅🏼 Santa: Yes of course, you have all been working very hard in the year 2020. How may I help reducing your workload?

👾 Hello Ebbot team: Hmm, there is actually one thing that we want to improve right now! So in order for our digital co-worker to respond to human-language, he has to be trained to detect intent,which basically is the purpose of a message. Then he learns how to accurately predict it through 10-20 example sentences for every intent. It would be nice if we can have an application that takes one sentence as an input and outputs many sentences with the same meaning, so we don't have to come up with these examples ourselves.

🎅🏼 Santa: Aaah, then I know exactly what you need, how about my intelligent SentenceTransformers model? He can help you translate the sentences into numbers and you can use cosine similarity to find similar sentences in a big corpus.

👾 Hello Ebbot team: That's great! We will prepare and clean our list of example sentences in our database and wait for your gift!

🎅🏼 Santa: One little problem, you have to teach SentenceTransformers English! He only speaks English.

👾 Hello Ebbot team: That's okay Santa, we know you have to talk to other companies on the nice list. Let us take care of this from here.

That's when we decided to train the SentenceTransformers so that the model can embed Swedish text. And finally, after hours of training and many cups of coffee later...
SentenceTransformers now speaks Swedish fluently! 🥳 🎉


How we extended SentenceTransformers to Swedish

Based on the publication "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation", we extended the "teacher" English SentenceTransformers to a "student" Swedish model using English – Swedish parallel sentences dataset, which was TED2020 corpus containing 119,602 sentences. We trained our Transformer based on UKPLab's example training script using Colab Pro notebook. Utilizing Colab Pro's Graphics Processing Unit (GPU), it took us only two hours to train and we achieved the accuracy of 95.6% evaluated on test set.

Hello Ebbot's application built using SentenceTransformers

After finishing extending SentenceTransformers To Swedish, we used the model to embed our corpus, which is a cleaned list of 56,538 example phrases that we came up with to teach Ebbot in the past. Then, cosine similarity was applied to compare the semantic similarity between the given text and sentences in the corpus. The application then prints out the most similar sentences along with similarity scores.
Using Streamlit , our NLP team built a simple web app, allowing users to choose how many similar phrases they want to generate. There is also an option to print out top similar or all sentences within a chosen range of percentage.

Let's take a look at more examples!

  1. wondering when you send away what I ordered from you (Score: 0.93)
  2. I wonder about when I get the stuff that I ordered (Score: 0.91)
  3. hello I have ordered goods out of you got wondering where the rest gone (Score: 0.90)
  4. when do I get my ordered goods (Score: 0.89)
  5. when do I get my package that I ordered (Score: 0.89)
  6. when do I have to download the order (Score: 0.89)
  7. when will things I order arrive (Score: 0.88)
  8. when sent my order (Score: 0.88)
  9. and you wonder how I should go about it, you send someone here to pick it up when I had home delivery (Score: 0.88)
  10. where are my things that I have ordered (Score: 0.88)
  1. great thanks for all the help to have it good (Score: 0.98)
  2. super good thank you very much for your help (Score: 0.98)
  3. top thank you very much for your help (Score: 0.98)
  4. top thanks for your help 👍🏾 (Score: 0.98)
  5. thanks for the help have it so good (Score: 0.98)
  6. many thanks you have been very helpful (Score: 0.98)
  7. perfect thank you very much for the help (Score: 0.98)
  8. top thanks thanks for good service (Score: 0.98)
  9. oh top thanks for your help (Score: 0.98)
  10. excellent thanks for your help

You can see that the application is not only finding other sentences with similar words, but is actually able to return sentences with the same meaning. This is what makes the SentenceTransformers a powerful and helpful tool for us, because the more creative we are with the example phrases, the better Ebbot become at detecting intents!

Being extremely excited about our result, Santa 🎅🏼 called to congratulate us and ask when we will have the application ready to be used in production. Even though we are proud of ourselves for successfully extending SentenceTransformers to Swedish, we told him that we still want to test it internally and make improvements before the official release. We thanked Santa 🎅🏼 again and promised him we would be even more hard-working in the year 2021 to continue being on the nice list 🎄 And so Hello Ebbot's journey for the year 2021 begins....


Want to know more about why Santa contacted us?

If you are curious and want to know more about how Hello Ebbot made it to Santa's nice list, let's meet and talk about it! All you need to do is clicking on the button below 👇

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share by email

Read more

Autocorrect spelling mistakes in Swedish language

Understanding that typos can be one of the reasons why Ebbot cannot give the right response, our NLP team at Hello Ebbot decided to develop a new feature to autocorrect spelling mistakes, specifically for Swedish language! Our spellcheck corrector not only considers context to provide better correction, but also has fixed performance.