0
1
2
3
4
5
6
7
8
9
Contact Us
1 / 10
slider-arrow
slider-arrow
slide-image
LSTM News Categorizer
1 / 10
slider-arrow
slider-arrow
slide-image

Project Details

There are many sources of news. 

Sometimes you want to read news by a specific category (for example, just Sports and|or Business). 

Some news resources don't have news categories or their grading is rather arbitrary. 

Our client came up with the idea of a news aggregator that they would like to see in their app. 

The concept of the aggregator is a model that categorizes the news text from several sources into 10 topics: 

  • World, 
  • Health, 
  • Society, 
  • Incidents, 
  • Politics, 
  • Sports,
  •  Business, 
  • Culture, 
  • Economics, 
  • Technology. 

Classification models were trained for two languages: Russian and Kazakh. 

The news language is detected by a model based on a perceptron, which was trained on the collected data of Russian and Kazakh

The categorizer model is based on the LSTM (long short-term memory) neural network. In practice, the model has shown a good quality of news categorization (validation accuracy more than 0.93), provided it is multi-label and multi-category.

Features

  1. Innovative news categorization with LSTM models

Our team developed LSTM models using Python to categorize news articles effectively. LSTM, a cutting-edge neural network architecture, was chosen for its capacity to grasp long-term relationships in data sequences, making it ideal for understanding and categorizing news content. The models were trained to classify news into ten predefined topics, optimizing the user experience by organizing information based on themes such as World, Health, Business, and more. The successful integration of LSTM models elevated the platform's accuracy, resulting in precise news categorization and enhanced user engagement.    

  1. Language detector

This feature seamlessly recognizes the language of each news article, providing the ability to sort and organize the news according to the detected language. The language detector significantly enhances the versatility of the news aggregator, ensuring that users can effortlessly access news in their preferred languages. This feature optimizes the user experience by offering language-based sorting options, making news consumption even more personalized and accessible.

  1. Creating a robust database

Building a robust system to collect news articles from diverse sources, process them through the ML models for categorization, and securely store the information in a database. This involved seamless integration with various APIs and services to ensure an extensive news source pool.

  1. Intuitive and visually appealing user interface

The Onix team developed a webview-based interface accessible on Android and iOS apps, ensuring a delightful user experience. The frontend allowed users to view the latest categorized news, customize their preferences, and filter news by category and source.

  1. Integration with client's services

Integration with the client's existing services was a critical component. The team seamlessly integrated the news aggregator with the client's infrastructure, ensuring a cohesive user journey and enabling user identification within the app.  

Technologies
Python
ML
Machine Learning
Next.js
TensorFlow
SpaCy
NLTK
Gensim
Scikit-learn
LSTM models
Language detector

Project Crew

contribution-logo
Frontend Development
Next JS, LSTM models, Language detector, TensorFlow, SpaCy, NLTK, Gensim, Scikit-learn
contribution-logo
Backend Development
Python