Text Classifier¶
The text_classifiers module is designed to classify texts by city functions, such as housing and utilities, public amenities, transportation, health care, and others, using a pre-trained BERT family model in Russian. The module processes the input text and classifies it into specific urban functions using a pre-trained rubert-tiny2 model trained on 90,000 marked accesses. The main method, run_text_classifier(), calls the model, takes text as input, and returns up to three predicted city functions with their probability of being correctly identified.
This module contains the TextClassifiers class, which is aimed to classify input texts into themes or structured types of events. It uses a Huggingface transformer model trained on rubert-tiny. In many cases, the count of messages per theme was too low to efficiently train, so synthetic themes based on the categories as the upper level were used (for example, „unknown_ЖКХ“).
Attributes: - repository_id (str): The repository ID. - number_of_categories (int): The number of categories. - device_type (str): The type of device.
The TextClassifiers class has the following methods:
@method:initialize_classifier: Initializes the text classification pipeline with the specified model, tokenizer, and device type.
- @method:run_text_classifier_topics:
Takes a text as input and returns the predicted themes and probabilities.
- @method:run_text_classifier:
Takes a text as input and returns the predicted categories and probabilities.
- class soika.src.risks.text_classifier.TextClassifiers(repository_id, number_of_categories=1, device_type=None)[исходный код]¶
Базовые классы:
object- classify_text(text, is_topic=False)[исходный код]¶
- initialize_classifier()[исходный код]¶
- run_text_classifier(text)[исходный код]¶
- run_text_classifier_topics(text)[исходный код]¶