LSTM News Categorizer
There are many sources of news. Sometimes you want to read news by a specific category (for example, just Sports and|or Business). Some news resources don't have news categories or their grading is rather arbitrary. Our client came up with the idea of a news aggregator that they would like to see in their app. The concept of the aggregator is a model that categorizes the news text from several sources into 10 topics: World, Health, Society, Incidents, Politics, Sports, Business, Culture, Economics, Technology. Classification models were trained for two languages: Russian and Kazakh. The news language is detected by a model based on a perceptron, which was trained on the collected data of Russian and Kazakh. The categorizer model is based on the LSTM (long short-term memory) neural network. In practice, the model has shown a good quality of news categorization (validation accuracy more than 0.93), provided it is multi-label and multi-category.