Geocoder¶
- class soika.src.geocoder.geocoder.Geocoder(df, model_path: str = 'Geor111y/flair-ner-addresses-extractor', device: str = 'cpu', territory_name: str = None, osm_id: int = None, city_tags: dict = {'place': ['state']}, stemmer_lang: str = 'russian', text_column_name: str = 'text')[исходный код]
This class provides a functionality of simple geocoder
- assign_street()[исходный код]
Simple workaround
- create_gdf(df: DataFrame) GeoDataFrame[исходный код]
Function simply creates gdf from the recognised geocoded geometries.
- get_df_areas(osm_id, tags)[исходный код]
Retrieves the GeoDataFrame of areas corresponding to the given OSM ID and tags.
- Параметры:
osm_id (int) – The OpenStreetMap ID.
tags (dict) – The tags to filter by.
date (str) – The date of the data to retrieve.
- Результат:
The GeoDataFrame containing the areas.
- Тип результата:
gpd.GeoDataFrame
This function first checks if the GeoDataFrame corresponding to the given OSM ID is already in the cache. If it is, it returns the cached GeoDataFrame. Otherwise, it retrieves the GeoDataFrame from the HistGeoDataGetter, filters out the „way“ elements, and adds it to the cache. Finally, it returns the GeoDataFrame from the cache.
- static get_level(row: Series) str[исходный код]
Addresses in the messages are recognized on different scales: 1. Where we know the street name and house number – house level; 2. Where we know only street name – street level (with the centroid geometry of the street); 3. Where we don’t know any info but the city – global level.
- static get_stem(street_names_df: DataFrame) DataFrame[исходный код]
Function finds the stem of the word to find this stem in the street names dictionary (df).
- match_group_to_area(group_name, df_areas)[исходный код]
Matches a given group name to an area in a DataFrame of areas.
- Параметры:
group_name (str) – The name of the group to match.
df_areas (DataFrame) – The DataFrame containing the areas to match against.
- Результат:
- A tuple containing the best match for the group name and the admin level of the match.
If no match is found, returns (None, None).
- Тип результата:
tuple
- merge_to_initial_df(gdf: GeoDataFrame, initial_df: DataFrame) GeoDataFrame[исходный код]
This function merges geocoded df to the initial df in order to keep all original attributes.
- preprocess_area_names(df_areas)[исходный код]
Preprocesses the area names in the given DataFrame by removing specified stopwords, converting the names to lowercase, and stemming them.
- Параметры:
df_areas (DataFrame) – The DataFrame containing the area names.
- Результат:
- The DataFrame with preprocessed area names, where the „area_name“ column contains the original names
with stopwords removed, the „area_name_processed“ column contains the lowercase names with special characters removed, and the „area_stems“ column contains the stemmed names.
- Тип результата:
DataFrame
- preprocess_group_name(group_name)[исходный код]
Preprocesses a group name by converting it to lowercase, removing special characters, and removing specified stopwords.
- Параметры:
group_name (str) – The group name to preprocess.
- Результат:
The preprocessed group name.
- Тип результата:
str
- run(df: DataFrame = None, tags: dict | None = None, group_column: str | None = 'group_name', search_for_objects=False)[исходный код]
Runs the data processing pipeline on the input DataFrame.
- Параметры:
tags (dict) – The tags to filter by.
date (str) – The date of the data to retrieve.
df (pd.DataFrame) – The input DataFrame.
text_column (str, optional) – The name of the text column in the DataFrame. Defaults to «text».
- Результат:
The processed DataFrame after running the data processing pipeline.
- Тип результата:
gpd.GeoDataFrame
This function retrieves the GeoDataFrame of areas corresponding to the given OSM ID and tags. It then preprocesses the area names and matches each group name to an area. The best match and admin level are assigned to the DataFrame. The function also retrieves other geographic objects and street names, preprocesses the street names, finds the word form, creates a GeoDataFrame, merges it with the other geographic objects, assigns the street tag, and returns the final GeoDataFrame.
- set_global_repr_point(gdf: GeoDataFrame) GeoDataFrame[исходный код]
This function set the centroid (actually, representative point) of the geocoded addresses to those texts that weren’t geocoded (or didn’t contain any addresses according to the trained NER model).
Back to all Geocoding