streamsight.matrix.InteractionMatrix

class streamsight.matrix.InteractionMatrix(df: DataFrame, item_ix: str, user_ix: str, timestamp_ix: str, shape: Tuple[int, int] | None = None, skip_df_processing: bool = False)

Bases: object

Matrix of interaction data between users and items.

It provides a number of properties and methods for easy manipulation of this interaction data.

Attention

  • The InteractionMatrix does not assume binary user-item pairs. If a user interacts with an item more than once, there will be two entries for this user-item pair.

  • We assume that the user and item IDs are integers starting from 0. IDs that are indicated by “-1” are reserved to label the user or item to be predicted. This assumption is crucial as it will be used during the split scheme and evaluation of the RS since it will affect the 2D shape of the CSR matrix

Parameters:
  • df (pd.DataFrame) – Dataframe containing user-item interactions. Must contain at least item ids and user ids.

  • item_ix (str) – Item ids column name.

  • user_ix (str) – User ids column name.

  • timestamp_ix (str) – Interaction timestamps column name.

  • shape (Tuple[int, int], optional) – The desired shape of the matrix, i.e. the number of users and items. If no shape is specified, the number of users will be equal to the maximum user id plus one, the number of items to the maximum item id plus one.

  • skip_df_processing (bool, optional) – Skip processing of the dataframe. This is useful when the dataframe is already processed and the columns are already renamed.

__init__(df: DataFrame, item_ix: str, user_ix: str, timestamp_ix: str, shape: Tuple[int, int] | None = None, skip_df_processing: bool = False)

Methods

__init__(df, item_ix, user_ix, timestamp_ix)

concat(im)

Concatenate this InteractionMatrix with another.

copy()

Create a deep copy of this InteractionMatrix.

copy_df([reset_index])

Create a deep copy of the dataframe.

difference(im)

Difference between this InteractionMatrix and another.

get_interaction_data()

Get the data that is not denoted by "-1".

get_items_n_first_interaction([n_seq_data, ...])

Select the first n interactions for each item.

get_items_n_last_interaction([n_seq_data, ...])

Select the last n interactions for each item.

get_prediction_data()

Get the data to be predicted.

get_users_n_first_interaction([n_seq_data, ...])

Select the first n interactions for each user.

get_users_n_last_interaction([n_seq_data, ...])

Select the last n interactions for each user.

interactions_in(interaction_ids[, inplace])

Select the interactions by their interaction ids

items_in()

Keep only interactions with the specified items.

items_not_in(I[, inplace])

Keep only interactions not with the specified items.

mask_shape([shape, drop_unknown_user, ...])

Masks global user and item ID.

nonzero()

timestamps_gt()

Select interactions after a given timestamp.

timestamps_gte()

Select interactions after and including a given timestamp.

timestamps_lt()

Select interactions up to a given timestamp.

timestamps_lte()

Select interactions up to and including a given timestamp.

union(im)

Combine events from this InteractionMatrix with another.

users_in()

Keep only interactions by one of the specified users.

users_not_in(U[, inplace])

Keep only interactions not by the specified users.

Attributes

INTERACTION_IX

ITEM_IX

MASKED_LABEL

TIMESTAMP_IX

USER_IX

binary_values

All user-item interactions as a sparse, binary matrix of size (users, items).

has_timestamps

Boolean indicating whether instance has timestamp information.

indices

Returns a tuple of lists of user IDs and item IDs corresponding to interactions.

item_ids

The set of all item IDs.

max_item_id

The highest item ID in the interaction matrix.

max_timestamp

The latest timestamp in the interaction

max_user_id

The highest user ID in the interaction matrix.

min_timestamp

The earliest timestamp in the interaction

num_interactions

The total number of interactions.

user_ids

The set of all user IDs.

values

All user-item interactions as a sparse matrix of size (|`global_users`|, |`global_items`|).

shape

The shape of the interaction matrix, i.e. |user| x |item|.

INTERACTION_IX = 'interactionid'
ITEM_IX = 'iid'
MASKED_LABEL = -1
TIMESTAMP_IX = 'ts'
USER_IX = 'uid'
_apply_mask(mask: Series, inplace=True) InteractionMatrix
_apply_mask(mask: Series, inplace=False) None
_check_shape()
_get_first_n_interactions(by: ItemUserBasedEnum, n_seq_data: int, t_lower: int | None = None, inplace=False) InteractionMatrix
_get_last_n_interactions(by: ItemUserBasedEnum, n_seq_data: int, t_upper: int | None = None, id_in: Set[int] | None = None, inplace=False) InteractionMatrix
_timestamps_cmp(op: Callable, timestamp: float, inplace: bool = False) InteractionMatrix | None

Filter interactions based on timestamp. Keep only interactions for which op(t, timestamp) is True.

Parameters:
  • op (Callable) – Comparison operator.

  • timestamp (float) – Timestamp to compare against in seconds from epoch.

  • inplace (bool, optional) – Modify the data matrix in place. If False, returns a new object.

property binary_values: csr_matrix

All user-item interactions as a sparse, binary matrix of size (users, items).

An entry is 1 if there is at least one interaction between that user and item. In all other cases the entry is 0.

Returns:

Binary csr_matrix of interactions.

Return type:

csr_matrix

concat(im: InteractionMatrix | DataFrame) InteractionMatrix

Concatenate this InteractionMatrix with another.

Note

This is a inplace operation. and will modify the current object.

Parameters:

im (Union[InteractionMatrix, pd.DataFrame]) – InteractionMatrix to concat with.

Returns:

InteractionMatrix with the interactions from both matrices.

Return type:

InteractionMatrix

copy() InteractionMatrix

Create a deep copy of this InteractionMatrix.

Returns:

Deep copy of this InteractionMatrix.

Return type:

InteractionMatrix

copy_df(reset_index: bool = False) DataFrame

Create a deep copy of the dataframe.

Returns:

Deep copy of dataframe.

Return type:

pd.DataFrame

difference(im: InteractionMatrix) InteractionMatrix

Difference between this InteractionMatrix and another.

Parameters:

im (InteractionMatrix) – InteractionMatrix to subtract from this.

Returns:

Difference between this InteractionMatrix and the other.

Return type:

InteractionMatrix

get_interaction_data() InteractionMatrix

Get the data that is not denoted by “-1”.

get_items_n_first_interaction(n_seq_data: int = 1, t_lower: int | None = None, inplace=False) InteractionMatrix

Select the first n interactions for each item.

Parameters:
  • n_seq_data (int, optional) – Number of interactions to select, defaults to 1

  • t_lower (Optional[int], optional) – Seconds past t. Lower limit for the timestamp of the interactions to select, defaults to None

  • inplace (bool, optional) – If operation is inplace, defaults to False

Returns:

Resulting interaction matrix

Return type:

InteractionMatrix

get_items_n_last_interaction(n_seq_data: int = 1, t_upper: int | None = None, item_in: Set[int] | None = None, inplace: bool = False) InteractionMatrix

Select the last n interactions for each item.

Parameters:
  • n_seq_data (int, optional) – Number of interactions to select, defaults to 1

  • t_upper (Optional[int], optional) – Seconds past t. Upper limit for the timestamp of the interactions to select, defaults to None

  • item_in (Optional[Set[int]], optional) – Set of item IDs to select the interactions from, defaults to None

  • inplace (bool, optional) – If operation is inplace, defaults to False

Returns:

Resulting interaction matrix

Return type:

InteractionMatrix

get_prediction_data() InteractionMatrix

Get the data to be predicted.

Returns:

InteractionMatrix with only the data to be predicted.

Return type:

InteractionMatrix

get_users_n_first_interaction(n_seq_data: int = 1, t_lower: int | None = None, inplace=False) InteractionMatrix

Select the first n interactions for each user.

Parameters:
  • n_seq_data (int, optional) – Number of interactions to select, defaults to 1

  • t_lower (Optional[int], optional) – Seconds past t. Lower limit for the timestamp of the interactions to select, defaults to None

  • inplace (bool, optional) – If operation is inplace, defaults to False

Returns:

Resulting interaction matrix

Return type:

InteractionMatrix

get_users_n_last_interaction(n_seq_data: int = 1, t_upper: int | None = None, user_in: Set[int] | None = None, inplace: bool = False) InteractionMatrix

Select the last n interactions for each user.

Parameters:
  • n_seq_data (int, optional) – Number of interactions to select, defaults to 1

  • t_upper (Optional[int], optional) – Seconds past t. Upper limit for the timestamp of the interactions to select, defaults to None

  • user_in (Optional[Set[int]], optional) – Set of user IDs to select the interactions from, defaults to None

  • inplace (bool, optional) – If operation is inplace, defaults to False

Returns:

Resulting interaction matrix

Return type:

InteractionMatrix

property has_timestamps: bool

Boolean indicating whether instance has timestamp information.

Returns:

True if timestamps information is available, False otherwise.

Return type:

bool

property indices: Tuple[List[int], List[int]]

Returns a tuple of lists of user IDs and item IDs corresponding to interactions.

Returns:

Tuple of lists of user IDs and item IDs that correspond to at least one interaction.

Return type:

Tuple[List[int], List[int]]

interactions_in(interaction_ids: List[int], inplace: bool = False) InteractionMatrix | None

Select the interactions by their interaction ids

Parameters:
  • interaction_ids (List[int]) – A list of interaction ids

  • inplace (bool, optional) – Apply the selection in place, or return a new InteractionMatrix object, defaults to False

Returns:

None if inplace, otherwise new InteractionMatrix object with the selected interactions

Return type:

Union[None, InteractionMatrix]

property item_ids: Set[int]

The set of all item IDs.

Returns:

Set of all item IDs.

Return type:

Set[int]

items_in(I: Set[int], inplace=False) InteractionMatrix
items_in(I: Set[int], inplace=True) None

Keep only interactions with the specified items.

Parameters:
  • I (Set[int]) – A Set or List of items to select the interactions.

  • inplace (bool, optional) – Apply the selection in place or not, defaults to False

Returns:

None if inplace, otherwise returns a new InteractionMatrix object

Return type:

Union[InteractionMatrix, None]

items_not_in(I: Set[int], inplace=False) InteractionMatrix | None

Keep only interactions not with the specified items.

Parameters:
  • I (Set[int]) – A Set or List of items to exclude from the interactions.

  • inplace (bool, optional) – Apply the selection in place or not, defaults to False

Returns:

None if inplace, otherwise returns a new InteractionMatrix object

Return type:

Union[InteractionMatrix, None]

mask_shape(shape: Tuple[int, int] | None = None, drop_unknown_user: bool = False, drop_unknown_item: bool = False, inherit_max_id: bool = False) None

Masks global user and item ID.

To ensure released matrix released to the models only contains data that is intended to be released. This addresses the data leakage issue. It is recommended that the programmer defines the shape of the matrix such that the model only sees the data that is intended to be seen.

Example

Given the following case where the data is as follows:

> uid: [0, 1, 2, 3, 4]
> iid: [0, 1, 2, 3, -1]
> ts : [0, 1, 2, 3, 4]

Where user 4 is the user to be predicted. Assuming that user 4 is an unknown user, that is, the model has never seen user 4 before. The shape of the matrix should be (4, 4). This should be defined when calling the function in :param:`shape`.

If the shape is defined, and it contains ID of unknown user/item, a warning will be raised if drop_unknown is set to False. If drop_unknown is set to True, the unknown user/item will be dropped from the data. All user/item ID greater than shape[0] will be dropped. This follows from the initial assumption that the user/item ID starts from 0 as defined in the dataset class.

Else, in the event that :param:`shape` is not defined, the shape will be inferred from the data. The shape will be determined by the number of unique users/items. In this case the shape will be (5, 4). Note that the shape may not be as intended by the programmer if the data contains unknown users/items or if the dataframe does not contain all historical users/items.

param shape:

Shape of the known user and item base. This value is usually set by the evaluator during the evaluation run. This value can also be set manually but the programmer if there is a need to alter the known user/item base. Defaults to None

type shape:

Optional[Tuple[int, int]], optional

param drop_unknown_user:

To drop unknown users in the dataset, defaults to False

type drop_unknown_user:

bool, optional

param drop_unknown_item:

To drop unknown items in the dataset, defaults to False

type drop_unknown_item:

bool, optional

param inherit_max_id:

To inherit the maximum user and item ID from the given shape and the dataframe. This is useful when the shape is defined and the dataframe contains unknown users/items. Defaults to False

type inherit_max_id:

bool, optional

property max_item_id: int

The highest item ID in the interaction matrix.

In the case of an empty matrix, the highest item ID is -1. This is consistent with the the definition that -1 denotes the item that is unknown. It would be incorrect to use any other value, since 0 is a valid item ID.

Returns:

The highest item ID.

Return type:

int

property max_timestamp: int

The latest timestamp in the interaction

Returns:

The latest timestamp.

Return type:

int

property max_user_id: int

The highest user ID in the interaction matrix.

Returns:

The highest user ID.

Return type:

int

property min_timestamp: int

The earliest timestamp in the interaction

Returns:

The earliest timestamp.

Return type:

int

nonzero() Tuple[List[int], List[int]]
property num_interactions: int

The total number of interactions.

Returns:

Total interaction count.

Return type:

int

shape: Tuple[int, int]

The shape of the interaction matrix, i.e. |user| x |item|.

timestamps_gt(timestamp: float) InteractionMatrix
timestamps_gt(timestamp: float, inplace: Literal[True]) None

Select interactions after a given timestamp.

Parameters:
  • timestamp (float) – The timestamp with which the interactions timestamp is compared.

  • inplace (bool, optional) – Apply the selection in place if True, defaults to False

Returns:

None if inplace, otherwise returns a new InteractionMatrix object

Return type:

Union[InteractionMatrix, None]

timestamps_gte(timestamp: float) InteractionMatrix
timestamps_gte(timestamp: float, inplace: Literal[True]) None

Select interactions after and including a given timestamp.

Parameters:
  • timestamp (float) – The timestamp with which the interactions timestamp is compared.

  • inplace (bool, optional) – Apply the selection in place if True, defaults to False

Returns:

None if inplace, otherwise returns a new InteractionMatrix object

Return type:

Union[InteractionMatrix, None]

timestamps_lt(timestamp: float) InteractionMatrix
timestamps_lt(timestamp: float, inplace: Literal[True]) None

Select interactions up to a given timestamp.

Parameters:
  • timestamp (float) – The timestamp with which the interactions timestamp is compared.

  • inplace (bool, optional) – Apply the selection in place if True, defaults to False

Returns:

None if inplace, otherwise returns a new InteractionMatrix object

Return type:

Union[InteractionMatrix, None]

timestamps_lte(timestamp: float) InteractionMatrix
timestamps_lte(timestamp: float, inplace: Literal[True]) None

Select interactions up to and including a given timestamp.

Parameters:
  • timestamp (float) – The timestamp with which the interactions timestamp is compared.

  • inplace (bool, optional) – Apply the selection in place if True, defaults to False

Returns:

None if inplace, otherwise returns a new InteractionMatrix object

Return type:

Union[InteractionMatrix, None]

union(im: InteractionMatrix) InteractionMatrix

Combine events from this InteractionMatrix with another.

Parameters:

im (InteractionMatrix) – InteractionMatrix to union with.

Returns:

Union of interactions in this InteractionMatrix and the other.

Return type:

InteractionMatrix

property user_ids: Set[int]

The set of all user IDs.

Returns:

Set of all user IDs.

Return type:

Set[int]

users_in(U: Set[int], inplace=False) InteractionMatrix
users_in(U: Set[int], inplace=True) None

Keep only interactions by one of the specified users.

Parameters:
  • U (Union[Set[int]) – A Set or List of users to select the interactions from.

  • inplace (bool, optional) – Apply the selection in place or not, defaults to False

Returns:

None if inplace, otherwise returns a new InteractionMatrix object

Return type:

Union[InteractionMatrix, None]

users_not_in(U: Set[int], inplace=False) InteractionMatrix | None

Keep only interactions not by the specified users.

Parameters:
  • U (Set[int]) – A Set or List of users to exclude from the interactions.

  • inplace (bool, optional) – Apply the selection in place or not, defaults to False

Returns:

None if inplace, otherwise returns a new InteractionMatrix object

Return type:

Union[InteractionMatrix, None]

property values: csr_matrix

All user-item interactions as a sparse matrix of size (|`global_users`|, |`global_items`|).

Each entry is the number of interactions between that user and item. If there are no interactions between a user and item, the entry is 0.

Returns:

Interactions between users and items as a csr_matrix.

Return type:

csr_matrix