streamsight.preprocessing.DataFramePreprocessor

class streamsight.preprocessing.DataFramePreprocessor(item_ix: str, user_ix: str, timestamp_ix: str)

Bases: object

Preprocesses a pandas DataFrame into an InteractionMatrix object.

The DataFramePreprocessor class allows the programmer to add filters for data preprocessing before transforming the data into an InteractionMatrix object. The preprocessor class after applying the filters, updates the item and user ID mappings into internal ID to reduce the computation load and allows for easy representation of the matrix.

Parameters:
  • item_ix (str) – Name of the column in which item identifiers are listed.

  • user_ix (str) – Name of the column in which user identifiers are listed.

  • timestamp_ix (str) – Name of the column in which timestamps are listed.

__init__(item_ix: str, user_ix: str, timestamp_ix: str)

Methods

__init__(item_ix, user_ix, timestamp_ix)

add_filter(filter)

Add a preprocessing filter to be applied

process(df)

Attributes

item_id_mapping

Map from original item IDs to internal item IDs.

user_id_mapping

Map from original user IDs to internal user IDs.

_print_log_message(step: Literal['before', 'after'], stage: Literal['preprocess', 'filter'], df: DataFrame)

Logging for change tracking.

Prints a log message with the number of interactions, items and users in the DataFrame.

Parameters:
  • step (Literal["before", "after"]) – To indicate if the log message is before or after the preprocessing

  • stage (Literal["preprocess", "filter"]) – The current stage of the preprocessing

  • df (pd.DataFrame) – The dataframe being processed

_update_id_mappings(df: DataFrame) None

Update the internal ID mappings for users and items.

The internal ID mappings are updated to reduce the computation load and allow for easy representation of the matrix.

Parameters:

df (pd.DataFrame) – DataFrame to update the ID mappings

add_filter(filter: Filter)

Add a preprocessing filter to be applied

This filter will be applied before transforming to a InteractionMatrix object.

Filters are applied in order of addition, different orderings can lead to different results!

Parameters:

filter (Filter) – The filter to be applied

property item_id_mapping: DataFrame

Map from original item IDs to internal item IDs.

Pandas DataFrame containing mapping from original item IDs to internal (consecutive) item IDs as columns.

Returns:

DataFrame containing the mapping from original item IDs to internal

Return type:

pd.DataFrame

process(df: DataFrame) InteractionMatrix
property user_id_mapping: DataFrame

Map from original user IDs to internal user IDs.

Pandas DataFrame containing mapping from original user IDs to internal (consecutive) user IDs as columns.

Returns:

DataFrame containing the mapping from original item IDs to internal

Return type:

pd.DataFrame