Grouping similar sentences for faster intent training

Back in December 2020, we successfully extended a powerful Natural Language Processing (NLP) model called ”SentenceTransformers” to Swedish language. We hate to brag, but after publishing the blog post about our achievement, we received a lot of attention 😎 So we decided to continue with another exciting project using the Swedish SentenceTransformers model, in which we aim to half-automate the intent training process by grouping similar sentences. Sounds complicated, doesn’t it? Don’t worry, we will make the explanation as simple as possible, so please keep on reading 👀

Problems with the intent training process

We already explained in the last post that Ebbot responses to you based on the purpose of your messages (intents) that he learns through example phrases. In order to provide the best customer experience, we continuously use data from real conversations between Ebbot and chatbot users – which are stored inside Ebbot’s training center – in order to teach Ebbot new intents, or in some cases, provide more examples to improve his accuracy in detecting old intents.

Even though we love having a lot of data, sometimes it is extremely difficult for our Customer Implementation Manager team to sort out training data for Ebbot when there are thousands of sentences in the training center. Imagine going through that many sentences and deciding their intents, it’s a lot of work! If only we can group all the similar sentences together, the training process will be so much faster… 🤔

Faster intent training by clustering sentences

Similarly to our past projects, we used Streamlit to build the similar sentences grouping web app. Combining our Swedish SentenceTransformers and UKPLab’s community detection function allows us to have three different functions: Grouping a list of minimum five sentences and using .csv file exported from training center to receive results in another .csv file or a .txt file

The special part about our app is that we utilize our spam classifier to remove all the spam messages and the app also offers a keywords suggestion feature to help deciding the intent of each group. Fortunately for us, Mutli-RAKE delivers exactly what we need. At the moment, RAKE supports up to 26 languages, so with just a few lines of code, you could also have the same feature as well! 😉

Allow us to ”show off” a little bit about how flexible our clustering function is! For privacy reasons, we have to censor the information in the image below. We hope that you understand👇

Our app allows you to download results as a .csv file

Even though the program already works smoothly and we cannot wait to implement it into the system for our clients, it will take a while more for it to be integrated. Meanwhile we encourage you to follow our LinkedIn for weekly updates. Or if you have any questions, let’s have a little chat and we will tell you more about us!


Curious about our products?

If you are curious and want to know more about Ebbot or want to see a live demo of our AI projects, let’s meet and talk about it! All you need to do is clicking on the button below 👇

Share This Post

Dela på facebook
Dela på linkedin
Dela på twitter
Dela på email

Läs fler

Detecting toxic messages in Swedish language

Recently, one of our clients asked us to teach Ebbot to detect toxic messages in conversations. Thanks to this special request, we got a chance to work on one of the most difficult topics in the Natural Language Processing (NLP) field. And yes, we can not be more excited! 🥳


Steg för steg guide: Projekt Chatbot

Att genomföra ett chatbot-projekt kan vara en lång och komplex resa om man inte vet hur man ska bära sig åt. Det finns många risker förknippade med att automatisera kundkontaken, och i slutändan står organisationens rykte på spel, då chatboten blir välkomstmattan för kunder att skapa sig det där första intrycket som utgör majoriteten av vad de kommer tycka för alltid. Det är också lätt att ett chatbots-projekt drar ut på tiden, och kostar skjortan.