Chat messages spam classifier using machine learning

In the last blog, Ebbot explained to you the training process which helps him correctly respond to your queries. As mentioned in the end of the blog, every time Ebbot fails to understand you, he will learn from your messages in the conversation to improve his performance. But do you know that not every sentence is useful for the learning process? There is information that we do not want Ebbot to memorize, such as phone numbers, emails and spam messages (e.g: asdfda, wqrherewrere safdfa). That is why we decided to build a Machine Learning (ML) model to classify messages as spam or not spam to filter out unnecessary dataPlease keep reading to find out how we trained this spam classifier and its accuracy!

Collecting and labeling the dataset

Using data from a conversation script between users and Ebbot, we collected 2924 phrases in total and labeled them as ”Spam” or ”Not Spam”. With the help of our sentence similarity model, we were able to cluster meaningful and spam sentences into two groups. Thus, avoided manual data-labeling as much as we could and saved a lot of time.   

Training and evaluating the model

Inspired by the article ”Create a SMS spam classifier in Python”, we chose Multinomial Naive Bayes Classifier Model for this project. 75% of the dataset was used for the training and 25% was saved for testing. By analyzing the dataset, we also noticed that the average length of spam (12.4 characters) messages are much lower than non-spam (21.5 characters) ones.

After fitting the training dataset to the model, we used the test set to see how it performs. The AUC-ROC (Area Under The Curve- Receiver Operating Characteristics) score was very high, approximately 0.97!

aucscore
AUC-ROC score for the model after fitting our dataset

Building and deplying a webapp to host the spam classifier

Before giving Ebbot this filter to help him collect only meaningful data, we still want to test and improve the model. Thanks to Streamlit, we were able to quickly build a web app in under one hour. Furthermore, we included a feedback system which would collect false predictions to improve the model. The web app was then deployed using Heroku. If you want to test it live, here is 🥁🥁🥁🥁🥁🥁🥁🥁🥁🥁🥁 the link!

streamlit-webapp
The web app built using Streamlit and deployed with Heroku

We are hoping to implement this filter into our Bot-builder product in order to help our clients reducing the training time of their digital assistants in the near future. As soon as this is launched, we will definitely notify you of the good news with another blog. Until then, please feel free to look at other posts on our website or follow our LinkedIn to be updated with exciting news almost every week! 

Sälj-Ebbot-Rosa-bakgrund-chatbot

Wanna know more about our product?

If you are curious and want to know more about how Ebbot – a helpful digital employee – can assist you, let’s meet and talk about it! All you need to do is clicking on the button below 👇

Share This Post

Dela på facebook
Dela på linkedin
Dela på twitter
Dela på email

Läs fler

Steg för steg guide: Projekt Chatbot

Att genomföra ett chatbot-projekt kan vara en lång och komplex resa om man inte vet hur man ska bära sig åt. Det finns många risker förknippade med att automatisera kundkontaken, och i slutändan står organisationens rykte på spel, då chatboten blir välkomstmattan för kunder att skapa sig det där första intrycket som utgör majoriteten av vad de kommer tycka för alltid. Det är också lätt att ett chatbots-projekt drar ut på tiden, och kostar skjortan.

LÄS MER »