streamsight.matrix.InteractionMatrix

class streamsight.matrix.InteractionMatrix(df: DataFrame, item_ix: str, user_ix: str, timestamp_ix: str, shape: Tuple[int, int] | None = None, skip_df_processing: bool = False)

Bases: object

Matrix of interaction data between users and items.

It provides a number of properties and methods for easy manipulation of this interaction data.

Attention

The InteractionMatrix does not assume binary user-item pairs. If a user interacts with an item more than once, there will be two entries for this user-item pair.
We assume that the user and item IDs are integers starting from 0. IDs that are indicated by “-1” are reserved to label the user or item to be predicted. This assumption is crucial as it will be used during the split scheme and evaluation of the RS since it will affect the 2D shape of the CSR matrix

Parameters:

df (pd.DataFrame) – Dataframe containing user-item interactions. Must contain at least item ids and user ids.
item_ix (str) – Item ids column name.
user_ix (str) – User ids column name.
timestamp_ix (str) – Interaction timestamps column name.
shape (Tuple[int, int], optional) – The desired shape of the matrix, i.e. the number of users and items. If no shape is specified, the number of users will be equal to the maximum user id plus one, the number of items to the maximum item id plus one.
skip_df_processing (bool, optional) – Skip processing of the dataframe. This is useful when the dataframe is already processed and the columns are already renamed.

__init__(df: DataFrame, item_ix: str, user_ix: str, timestamp_ix: str, shape: Tuple[int, int] | None = None, skip_df_processing: bool = False)

Methods

`__init__`(df, item_ix, user_ix, timestamp_ix)
`concat`(im)	Concatenate this InteractionMatrix with another.
`copy`()	Create a deep copy of this InteractionMatrix.
`copy_df`([reset_index])	Create a deep copy of the dataframe.
`difference`(im)	Difference between this InteractionMatrix and another.
`get_interaction_data`()	Get the data that is not denoted by "-1".
`get_items_n_first_interaction`([n_seq_data, ...])	Select the first n interactions for each item.
`get_items_n_last_interaction`([n_seq_data, ...])	Select the last n interactions for each item.
`get_prediction_data`()	Get the data to be predicted.
`get_users_n_first_interaction`([n_seq_data, ...])	Select the first n interactions for each user.
`get_users_n_last_interaction`([n_seq_data, ...])	Select the last n interactions for each user.
`interactions_in`(interaction_ids[, inplace])	Select the interactions by their interaction ids
`items_in`()	Keep only interactions with the specified items.
`items_not_in`(I[, inplace])	Keep only interactions not with the specified items.
`mask_shape`([shape, drop_unknown_user, ...])	Masks global user and item ID.
`nonzero`()
`timestamps_gt`()	Select interactions after a given timestamp.
`timestamps_gte`()	Select interactions after and including a given timestamp.
`timestamps_lt`()	Select interactions up to a given timestamp.
`timestamps_lte`()	Select interactions up to and including a given timestamp.
`union`(im)	Combine events from this InteractionMatrix with another.
`users_in`()	Keep only interactions by one of the specified users.
`users_not_in`(U[, inplace])	Keep only interactions not by the specified users.

Attributes

`INTERACTION_IX`
`ITEM_IX`
`MASKED_LABEL`
`TIMESTAMP_IX`
`USER_IX`
`binary_values`	All user-item interactions as a sparse, binary matrix of size (users, items).
`has_timestamps`	Boolean indicating whether instance has timestamp information.
`indices`	Returns a tuple of lists of user IDs and item IDs corresponding to interactions.
`item_ids`	The set of all item IDs.
`max_item_id`	The highest item ID in the interaction matrix.
`max_timestamp`	The latest timestamp in the interaction
`max_user_id`	The highest user ID in the interaction matrix.
`min_timestamp`	The earliest timestamp in the interaction
`num_interactions`	The total number of interactions.
`user_ids`	The set of all user IDs.
`values`	All user-item interactions as a sparse matrix of size (\|`global_users`\|, \|`global_items`\|).
`shape`	The shape of the interaction matrix, i.e. \|user\| x \|item\|.

INTERACTION_IX = 'interactionid'

ITEM_IX = 'iid'

MASKED_LABEL = -1

TIMESTAMP_IX = 'ts'

USER_IX = 'uid'

_apply_mask(mask: Series, inplace=True) → InteractionMatrix
_apply_mask(mask: Series, inplace=False) → None

_check_shape()

_get_first_n_interactions(by: ItemUserBasedEnum, n_seq_data: int, t_lower: int | None = None, inplace=False) → InteractionMatrix

_get_last_n_interactions(by: ItemUserBasedEnum, n_seq_data: int, t_upper: int | None = None, id_in: Set[int] | None = None, inplace=False) → InteractionMatrix

_timestamps_cmp(op: Callable, timestamp: float, inplace: bool = False) → InteractionMatrix | None

Filter interactions based on timestamp. Keep only interactions for which op(t, timestamp) is True.

Parameters:

op (Callable) – Comparison operator.
timestamp (float) – Timestamp to compare against in seconds from epoch.
inplace (bool, optional) – Modify the data matrix in place. If False, returns a new object.

property binary_values: csr_matrix

All user-item interactions as a sparse, binary matrix of size (users, items).

An entry is 1 if there is at least one interaction between that user and item. In all other cases the entry is 0.

Returns:: Binary csr_matrix of interactions.
Return type:: csr_matrix

concat(im: InteractionMatrix | DataFrame) → InteractionMatrix

Concatenate this InteractionMatrix with another.

Note

This is a inplace operation. and will modify the current object.

Parameters:: im (Union[InteractionMatrix, pd.DataFrame]) – InteractionMatrix to concat with.
Returns:: InteractionMatrix with the interactions from both matrices.
Return type:: InteractionMatrix

copy() → InteractionMatrix

Create a deep copy of this InteractionMatrix.

Returns:: Deep copy of this InteractionMatrix.
Return type:: InteractionMatrix

copy_df(reset_index: bool = False) → DataFrame

Create a deep copy of the dataframe.

Returns:: Deep copy of dataframe.
Return type:: pd.DataFrame

difference(im: InteractionMatrix) → InteractionMatrix

Difference between this InteractionMatrix and another.

Parameters:: im (InteractionMatrix) – InteractionMatrix to subtract from this.
Returns:: Difference between this InteractionMatrix and the other.
Return type:: InteractionMatrix

get_interaction_data() → InteractionMatrix: Get the data that is not denoted by “-1”.

get_items_n_first_interaction(n_seq_data: int = 1, t_lower: int | None = None, inplace=False) → InteractionMatrix

Select the first n interactions for each item.

Parameters:

n_seq_data (int, optional) – Number of interactions to select, defaults to 1
t_lower (Optional[int], optional) – Seconds past t. Lower limit for the timestamp of the interactions to select, defaults to None
inplace (bool, optional) – If operation is inplace, defaults to False

Returns:

Resulting interaction matrix

Return type:

InteractionMatrix

get_items_n_last_interaction(n_seq_data: int = 1, t_upper: int | None = None, item_in: Set[int] | None = None, inplace: bool = False) → InteractionMatrix

Select the last n interactions for each item.

Parameters:

n_seq_data (int, optional) – Number of interactions to select, defaults to 1
t_upper (Optional[int], optional) – Seconds past t. Upper limit for the timestamp of the interactions to select, defaults to None
item_in (Optional[Set[int]], optional) – Set of item IDs to select the interactions from, defaults to None
inplace (bool, optional) – If operation is inplace, defaults to False

Returns:

Resulting interaction matrix

Return type:

InteractionMatrix

get_prediction_data() → InteractionMatrix

Get the data to be predicted.

Returns:: InteractionMatrix with only the data to be predicted.
Return type:: InteractionMatrix

get_users_n_first_interaction(n_seq_data: int = 1, t_lower: int | None = None, inplace=False) → InteractionMatrix

Select the first n interactions for each user.

Parameters:

n_seq_data (int, optional) – Number of interactions to select, defaults to 1
t_lower (Optional[int], optional) – Seconds past t. Lower limit for the timestamp of the interactions to select, defaults to None
inplace (bool, optional) – If operation is inplace, defaults to False

Returns:

Resulting interaction matrix

Return type:

InteractionMatrix

get_users_n_last_interaction(n_seq_data: int = 1, t_upper: int | None = None, user_in: Set[int] | None = None, inplace: bool = False) → InteractionMatrix

Select the last n interactions for each user.

Parameters:

n_seq_data (int, optional) – Number of interactions to select, defaults to 1
t_upper (Optional[int], optional) – Seconds past t. Upper limit for the timestamp of the interactions to select, defaults to None
user_in (Optional[Set[int]], optional) – Set of user IDs to select the interactions from, defaults to None
inplace (bool, optional) – If operation is inplace, defaults to False

Returns:

Resulting interaction matrix

Return type:

InteractionMatrix

property has_timestamps: bool

Boolean indicating whether instance has timestamp information.

Returns:: True if timestamps information is available, False otherwise.
Return type:: bool

property indices: Tuple[List[int], List[int]]

Returns a tuple of lists of user IDs and item IDs corresponding to interactions.

Returns:: Tuple of lists of user IDs and item IDs that correspond to at least one interaction.
Return type:: Tuple[List[int], List[int]]

interactions_in(interaction_ids: List[int], inplace: bool = False) → InteractionMatrix | None

Select the interactions by their interaction ids

Parameters:

interaction_ids (List[int]) – A list of interaction ids
inplace (bool, optional) – Apply the selection in place, or return a new InteractionMatrix object, defaults to False

Returns:

None if inplace, otherwise new InteractionMatrix object with the selected interactions

Return type:

Union[None, InteractionMatrix]

property item_ids: Set[int]

The set of all item IDs.

Returns:: Set of all item IDs.
Return type:: Set[int]

items_in(I: Set[int], inplace=False) → InteractionMatrix

items_in(I: Set[int], inplace=True) → None

Keep only interactions with the specified items.

Parameters:

I (Set[int]) – A Set or List of items to select the interactions.
inplace (bool, optional) – Apply the selection in place or not, defaults to False

Returns:

None if inplace, otherwise returns a new InteractionMatrix object

Return type:

Union[InteractionMatrix, None]

items_not_in(I: Set[int], inplace=False) → InteractionMatrix | None

Keep only interactions not with the specified items.

Parameters:

I (Set[int]) – A Set or List of items to exclude from the interactions.
inplace (bool, optional) – Apply the selection in place or not, defaults to False

Returns:

None if inplace, otherwise returns a new InteractionMatrix object

Return type:

Union[InteractionMatrix, None]

mask_shape(shape: Tuple[int, int] | None = None, drop_unknown_user: bool = False, drop_unknown_item: bool = False, inherit_max_id: bool = False) → None

Masks global user and item ID.

To ensure released matrix released to the models only contains data that is intended to be released. This addresses the data leakage issue. It is recommended that the programmer defines the shape of the matrix such that the model only sees the data that is intended to be seen.

Example

Given the following case where the data is as follows:

> uid: [0, 1, 2, 3, 4]
> iid: [0, 1, 2, 3, -1]
> ts : [0, 1, 2, 3, 4]

Where user 4 is the user to be predicted. Assuming that user 4 is an unknown user, that is, the model has never seen user 4 before. The shape of the matrix should be (4, 4). This should be defined when calling the function in :param:`shape`.

If the shape is defined, and it contains ID of unknown user/item, a warning will be raised if drop_unknown is set to False. If drop_unknown is set to True, the unknown user/item will be dropped from the data. All user/item ID greater than shape[0] will be dropped. This follows from the initial assumption that the user/item ID starts from 0 as defined in the dataset class.

Else, in the event that :param:`shape` is not defined, the shape will be inferred from the data. The shape will be determined by the number of unique users/items. In this case the shape will be (5, 4). Note that the shape may not be as intended by the programmer if the data contains unknown users/items or if the dataframe does not contain all historical users/items.

param shape:: Shape of the known user and item base. This value is usually set by the evaluator during the evaluation run. This value can also be set manually but the programmer if there is a need to alter the known user/item base. Defaults to None
type shape:: Optional[Tuple[int, int]], optional
param drop_unknown_user:: To drop unknown users in the dataset, defaults to False
type drop_unknown_user:: bool, optional
param drop_unknown_item:: To drop unknown items in the dataset, defaults to False
type drop_unknown_item:: bool, optional
param inherit_max_id:: To inherit the maximum user and item ID from the given shape and the dataframe. This is useful when the shape is defined and the dataframe contains unknown users/items. Defaults to False
type inherit_max_id:: bool, optional

property max_item_id: int

The highest item ID in the interaction matrix.

In the case of an empty matrix, the highest item ID is -1. This is consistent with the the definition that -1 denotes the item that is unknown. It would be incorrect to use any other value, since 0 is a valid item ID.

Returns:: The highest item ID.
Return type:: int

property max_timestamp: int

The latest timestamp in the interaction

Returns:: The latest timestamp.
Return type:: int

property max_user_id: int

The highest user ID in the interaction matrix.

Returns:: The highest user ID.
Return type:: int

property min_timestamp: int

The earliest timestamp in the interaction

Returns:: The earliest timestamp.
Return type:: int

nonzero() → Tuple[List[int], List[int]]

property num_interactions: int

The total number of interactions.

Returns:: Total interaction count.
Return type:: int

shape: Tuple[int, int]: The shape of the interaction matrix, i.e. |user| x |item|.

timestamps_gt(timestamp: float) → InteractionMatrix

timestamps_gt(timestamp: float, inplace: Literal[True]) → None

Select interactions after a given timestamp.

Parameters:

timestamp (float) – The timestamp with which the interactions timestamp is compared.
inplace (bool, optional) – Apply the selection in place if True, defaults to False

Returns:

None if inplace, otherwise returns a new InteractionMatrix object

Return type:

Union[InteractionMatrix, None]

timestamps_gte(timestamp: float) → InteractionMatrix

timestamps_gte(timestamp: float, inplace: Literal[True]) → None

Select interactions after and including a given timestamp.

Parameters:

timestamp (float) – The timestamp with which the interactions timestamp is compared.
inplace (bool, optional) – Apply the selection in place if True, defaults to False

Returns:

None if inplace, otherwise returns a new InteractionMatrix object

Return type:

Union[InteractionMatrix, None]

timestamps_lt(timestamp: float) → InteractionMatrix

timestamps_lt(timestamp: float, inplace: Literal[True]) → None

Select interactions up to a given timestamp.

Parameters:

timestamp (float) – The timestamp with which the interactions timestamp is compared.
inplace (bool, optional) – Apply the selection in place if True, defaults to False

Returns:

None if inplace, otherwise returns a new InteractionMatrix object

Return type:

Union[InteractionMatrix, None]

timestamps_lte(timestamp: float) → InteractionMatrix

timestamps_lte(timestamp: float, inplace: Literal[True]) → None

Select interactions up to and including a given timestamp.

Parameters:

timestamp (float) – The timestamp with which the interactions timestamp is compared.
inplace (bool, optional) – Apply the selection in place if True, defaults to False

Returns:

None if inplace, otherwise returns a new InteractionMatrix object

Return type:

Union[InteractionMatrix, None]

union(im: InteractionMatrix) → InteractionMatrix

Combine events from this InteractionMatrix with another.

Parameters:: im (InteractionMatrix) – InteractionMatrix to union with.
Returns:: Union of interactions in this InteractionMatrix and the other.
Return type:: InteractionMatrix

property user_ids: Set[int]

The set of all user IDs.

Returns:: Set of all user IDs.
Return type:: Set[int]

users_in(U: Set[int], inplace=False) → InteractionMatrix

users_in(U: Set[int], inplace=True) → None

Keep only interactions by one of the specified users.

Parameters:

U (Union[Set[int]) – A Set or List of users to select the interactions from.
inplace (bool, optional) – Apply the selection in place or not, defaults to False

Returns:

None if inplace, otherwise returns a new InteractionMatrix object

Return type:

Union[InteractionMatrix, None]

users_not_in(U: Set[int], inplace=False) → InteractionMatrix | None

Keep only interactions not by the specified users.

Parameters:

U (Set[int]) – A Set or List of users to exclude from the interactions.
inplace (bool, optional) – Apply the selection in place or not, defaults to False

Returns:

None if inplace, otherwise returns a new InteractionMatrix object

Return type:

Union[InteractionMatrix, None]

property values: csr_matrix

All user-item interactions as a sparse matrix of size (|`global_users`|, |`global_items`|).

Each entry is the number of interactions between that user and item. If there are no interactions between a user and item, the entry is 0.

Returns:: Interactions between users and items as a csr_matrix.
Return type:: csr_matrix