StreetExtractor

class soika.src.geocoder.street_extractor.StreetExtractor[исходный код]
static extract_ner_street(text: str, classifier) Series[исходный код]

Extract street addresses from text using a pre-trained custom NER model.

This function processes text by removing unnecessary content, applies a custom NER model to extract mentioned addresses, and returns the address with a confidence score.

Параметры:

text (str) – The input text to process and extract addresses from.

Результат:

A Series containing the extracted address and confidence score,

or [None, None] if extraction fails or the score is below the threshold.

Тип результата:

pd.Series

static extract_toponym(text: str, street_name: str) str | None[исходный код]

Extract toponyms near the specified street name in the text.

This function identifies the position of a street name in the text and searches for related toponyms within a specified range around the street name.

Параметры:
  • text (str) – The text containing the address.

  • street_name (str) – The name of the street to search around.

Результат:

The first toponym found if present, otherwise None.

Тип результата:

Optional[str]

extractor = <soika.src.geocoder.text_address_extractor_by_rules.NatashaExtractor object>
static process_pipeline(df: DataFrame, text_column: str, classifier) DataFrame[исходный код]