streamsight.matrix.InteractionMatrix
- class streamsight.matrix.InteractionMatrix(df: DataFrame, item_ix: str, user_ix: str, timestamp_ix: str, shape: Tuple[int, int] | None = None, skip_df_processing: bool = False)
Bases:
object
Matrix of interaction data between users and items.
It provides a number of properties and methods for easy manipulation of this interaction data.
Attention
The InteractionMatrix does not assume binary user-item pairs. If a user interacts with an item more than once, there will be two entries for this user-item pair.
We assume that the user and item IDs are integers starting from 0. IDs that are indicated by “-1” are reserved to label the user or item to be predicted. This assumption is crucial as it will be used during the split scheme and evaluation of the RS since it will affect the 2D shape of the CSR matrix
- Parameters:
df (pd.DataFrame) – Dataframe containing user-item interactions. Must contain at least item ids and user ids.
item_ix (str) – Item ids column name.
user_ix (str) – User ids column name.
timestamp_ix (str) – Interaction timestamps column name.
shape (Tuple[int, int], optional) – The desired shape of the matrix, i.e. the number of users and items. If no shape is specified, the number of users will be equal to the maximum user id plus one, the number of items to the maximum item id plus one.
skip_df_processing (bool, optional) – Skip processing of the dataframe. This is useful when the dataframe is already processed and the columns are already renamed.
- __init__(df: DataFrame, item_ix: str, user_ix: str, timestamp_ix: str, shape: Tuple[int, int] | None = None, skip_df_processing: bool = False)
Methods
__init__
(df, item_ix, user_ix, timestamp_ix)concat
(im)Concatenate this InteractionMatrix with another.
copy
()Create a deep copy of this InteractionMatrix.
copy_df
([reset_index])Create a deep copy of the dataframe.
difference
(im)Difference between this InteractionMatrix and another.
Get the data that is not denoted by "-1".
get_items_n_first_interaction
([n_seq_data, ...])Select the first n interactions for each item.
get_items_n_last_interaction
([n_seq_data, ...])Select the last n interactions for each item.
Get the data to be predicted.
get_users_n_first_interaction
([n_seq_data, ...])Select the first n interactions for each user.
get_users_n_last_interaction
([n_seq_data, ...])Select the last n interactions for each user.
interactions_in
(interaction_ids[, inplace])Select the interactions by their interaction ids
items_in
()Keep only interactions with the specified items.
items_not_in
(I[, inplace])Keep only interactions not with the specified items.
mask_shape
([shape, drop_unknown_user, ...])Masks global user and item ID.
nonzero
()Select interactions after a given timestamp.
Select interactions after and including a given timestamp.
Select interactions up to a given timestamp.
Select interactions up to and including a given timestamp.
union
(im)Combine events from this InteractionMatrix with another.
users_in
()Keep only interactions by one of the specified users.
users_not_in
(U[, inplace])Keep only interactions not by the specified users.
Attributes
All user-item interactions as a sparse, binary matrix of size (users, items).
Boolean indicating whether instance has timestamp information.
Returns a tuple of lists of user IDs and item IDs corresponding to interactions.
The set of all item IDs.
The highest item ID in the interaction matrix.
The latest timestamp in the interaction
The highest user ID in the interaction matrix.
The earliest timestamp in the interaction
The total number of interactions.
The set of all user IDs.
All user-item interactions as a sparse matrix of size (|`global_users`|, |`global_items`|).
The shape of the interaction matrix, i.e. |user| x |item|.
- INTERACTION_IX = 'interactionid'
- ITEM_IX = 'iid'
- MASKED_LABEL = -1
- TIMESTAMP_IX = 'ts'
- USER_IX = 'uid'
- _apply_mask(mask: Series, inplace=True) InteractionMatrix
- _apply_mask(mask: Series, inplace=False) None
- _check_shape()
- _get_first_n_interactions(by: ItemUserBasedEnum, n_seq_data: int, t_lower: int | None = None, inplace=False) InteractionMatrix
- _get_last_n_interactions(by: ItemUserBasedEnum, n_seq_data: int, t_upper: int | None = None, id_in: Set[int] | None = None, inplace=False) InteractionMatrix
- _timestamps_cmp(op: Callable, timestamp: float, inplace: bool = False) InteractionMatrix | None
Filter interactions based on timestamp. Keep only interactions for which op(t, timestamp) is True.
- Parameters:
op (Callable) – Comparison operator.
timestamp (float) – Timestamp to compare against in seconds from epoch.
inplace (bool, optional) – Modify the data matrix in place. If False, returns a new object.
- property binary_values: csr_matrix
All user-item interactions as a sparse, binary matrix of size (users, items).
An entry is 1 if there is at least one interaction between that user and item. In all other cases the entry is 0.
- Returns:
Binary csr_matrix of interactions.
- Return type:
csr_matrix
- concat(im: InteractionMatrix | DataFrame) InteractionMatrix
Concatenate this InteractionMatrix with another.
Note
This is a inplace operation. and will modify the current object.
- Parameters:
im (Union[InteractionMatrix, pd.DataFrame]) – InteractionMatrix to concat with.
- Returns:
InteractionMatrix with the interactions from both matrices.
- Return type:
- copy() InteractionMatrix
Create a deep copy of this InteractionMatrix.
- Returns:
Deep copy of this InteractionMatrix.
- Return type:
- copy_df(reset_index: bool = False) DataFrame
Create a deep copy of the dataframe.
- Returns:
Deep copy of dataframe.
- Return type:
pd.DataFrame
- difference(im: InteractionMatrix) InteractionMatrix
Difference between this InteractionMatrix and another.
- Parameters:
im (InteractionMatrix) – InteractionMatrix to subtract from this.
- Returns:
Difference between this InteractionMatrix and the other.
- Return type:
- get_interaction_data() InteractionMatrix
Get the data that is not denoted by “-1”.
User and item IDs that are not denoted by “-1” are the ones that are known to the model.
- Returns:
InteractionMatrix with only the known data.
- Return type:
- get_items_n_first_interaction(n_seq_data: int = 1, t_lower: int | None = None, inplace=False) InteractionMatrix
Select the first n interactions for each item.
- Parameters:
n_seq_data (int, optional) – Number of interactions to select, defaults to 1
t_lower (Optional[int], optional) – Seconds past t. Lower limit for the timestamp of the interactions to select, defaults to None
inplace (bool, optional) – If operation is inplace, defaults to False
- Returns:
Resulting interaction matrix
- Return type:
- get_items_n_last_interaction(n_seq_data: int = 1, t_upper: int | None = None, item_in: Set[int] | None = None, inplace: bool = False) InteractionMatrix
Select the last n interactions for each item.
- Parameters:
n_seq_data (int, optional) – Number of interactions to select, defaults to 1
t_upper (Optional[int], optional) – Seconds past t. Upper limit for the timestamp of the interactions to select, defaults to None
item_in (Optional[Set[int]], optional) – Set of item IDs to select the interactions from, defaults to None
inplace (bool, optional) – If operation is inplace, defaults to False
- Returns:
Resulting interaction matrix
- Return type:
- get_prediction_data() InteractionMatrix
Get the data to be predicted.
- Returns:
InteractionMatrix with only the data to be predicted.
- Return type:
- get_users_n_first_interaction(n_seq_data: int = 1, t_lower: int | None = None, inplace=False) InteractionMatrix
Select the first n interactions for each user.
- Parameters:
n_seq_data (int, optional) – Number of interactions to select, defaults to 1
t_lower (Optional[int], optional) – Seconds past t. Lower limit for the timestamp of the interactions to select, defaults to None
inplace (bool, optional) – If operation is inplace, defaults to False
- Returns:
Resulting interaction matrix
- Return type:
- get_users_n_last_interaction(n_seq_data: int = 1, t_upper: int | None = None, user_in: Set[int] | None = None, inplace: bool = False) InteractionMatrix
Select the last n interactions for each user.
- Parameters:
n_seq_data (int, optional) – Number of interactions to select, defaults to 1
t_upper (Optional[int], optional) – Seconds past t. Upper limit for the timestamp of the interactions to select, defaults to None
user_in (Optional[Set[int]], optional) – Set of user IDs to select the interactions from, defaults to None
inplace (bool, optional) – If operation is inplace, defaults to False
- Returns:
Resulting interaction matrix
- Return type:
- property has_timestamps: bool
Boolean indicating whether instance has timestamp information.
- Returns:
True if timestamps information is available, False otherwise.
- Return type:
bool
- property indices: Tuple[List[int], List[int]]
Returns a tuple of lists of user IDs and item IDs corresponding to interactions.
- Returns:
Tuple of lists of user IDs and item IDs that correspond to at least one interaction.
- Return type:
Tuple[List[int], List[int]]
- interactions_in(interaction_ids: List[int], inplace: bool = False) InteractionMatrix | None
Select the interactions by their interaction ids
- Parameters:
interaction_ids (List[int]) – A list of interaction ids
inplace (bool, optional) – Apply the selection in place, or return a new InteractionMatrix object, defaults to False
- Returns:
None if inplace, otherwise new InteractionMatrix object with the selected interactions
- Return type:
Union[None, InteractionMatrix]
- property item_ids: Set[int]
The set of all item IDs.
- Returns:
Set of all item IDs.
- Return type:
Set[int]
- items_in(I: Set[int], inplace=False) InteractionMatrix
- items_in(I: Set[int], inplace=True) None
Keep only interactions with the specified items.
- Parameters:
I (Set[int]) – A Set or List of items to select the interactions.
inplace (bool, optional) – Apply the selection in place or not, defaults to False
- Returns:
None if inplace, otherwise returns a new InteractionMatrix object
- Return type:
Union[InteractionMatrix, None]
- items_not_in(I: Set[int], inplace=False) InteractionMatrix | None
Keep only interactions not with the specified items.
- Parameters:
I (Set[int]) – A Set or List of items to exclude from the interactions.
inplace (bool, optional) – Apply the selection in place or not, defaults to False
- Returns:
None if inplace, otherwise returns a new InteractionMatrix object
- Return type:
Union[InteractionMatrix, None]
- mask_shape(shape: Tuple[int, int] | None = None, drop_unknown_user: bool = False, drop_unknown_item: bool = False, inherit_max_id: bool = False) None
Masks global user and item ID.
To ensure released matrix released to the models only contains data that is intended to be released. This addresses the data leakage issue. It is recommended that the programmer defines the shape of the matrix such that the model only sees the data that is intended to be seen.
Example
Given the following case where the data is as follows:
> uid: [0, 1, 2, 3, 4] > iid: [0, 1, 2, 3, -1] > ts : [0, 1, 2, 3, 4]
Where user 4 is the user to be predicted. Assuming that user 4 is an unknown user, that is, the model has never seen user 4 before. The shape of the matrix should be (4, 4). This should be defined when calling the function in :param:`shape`.
If the shape is defined, and it contains ID of unknown user/item, a warning will be raised if
drop_unknown
is set to False. Ifdrop_unknown
is set to True, the unknown user/item will be dropped from the data. All user/item ID greater than shape[0] will be dropped. This follows from the initial assumption that the user/item ID starts from 0 as defined in the dataset class.Else, in the event that :param:`shape` is not defined, the shape will be inferred from the data. The shape will be determined by the number of unique users/items. In this case the shape will be (5, 4). Note that the shape may not be as intended by the programmer if the data contains unknown users/items or if the dataframe does not contain all historical users/items.
- param shape:
Shape of the known user and item base. This value is usually set by the evaluator during the evaluation run. This value can also be set manually but the programmer if there is a need to alter the known user/item base. Defaults to None
- type shape:
Optional[Tuple[int, int]], optional
- param drop_unknown_user:
To drop unknown users in the dataset, defaults to False
- type drop_unknown_user:
bool, optional
- param drop_unknown_item:
To drop unknown items in the dataset, defaults to False
- type drop_unknown_item:
bool, optional
- param inherit_max_id:
To inherit the maximum user and item ID from the given shape and the dataframe. This is useful when the shape is defined and the dataframe contains unknown users/items. Defaults to False
- type inherit_max_id:
bool, optional
- property max_item_id: int
The highest item ID in the interaction matrix.
In the case of an empty matrix, the highest item ID is -1. This is consistent with the the definition that -1 denotes the item that is unknown. It would be incorrect to use any other value, since 0 is a valid item ID.
- Returns:
The highest item ID.
- Return type:
int
- property max_timestamp: int
The latest timestamp in the interaction
- Returns:
The latest timestamp.
- Return type:
int
- property max_user_id: int
The highest user ID in the interaction matrix.
- Returns:
The highest user ID.
- Return type:
int
- property min_timestamp: int
The earliest timestamp in the interaction
- Returns:
The earliest timestamp.
- Return type:
int
- nonzero() Tuple[List[int], List[int]]
- property num_interactions: int
The total number of interactions.
- Returns:
Total interaction count.
- Return type:
int
- shape: Tuple[int, int]
The shape of the interaction matrix, i.e. |user| x |item|.
- timestamps_gt(timestamp: float) InteractionMatrix
- timestamps_gt(timestamp: float, inplace: Literal[True]) None
Select interactions after a given timestamp.
- Parameters:
timestamp (float) – The timestamp with which the interactions timestamp is compared.
inplace (bool, optional) – Apply the selection in place if True, defaults to False
- Returns:
None if inplace, otherwise returns a new InteractionMatrix object
- Return type:
Union[InteractionMatrix, None]
- timestamps_gte(timestamp: float) InteractionMatrix
- timestamps_gte(timestamp: float, inplace: Literal[True]) None
Select interactions after and including a given timestamp.
- Parameters:
timestamp (float) – The timestamp with which the interactions timestamp is compared.
inplace (bool, optional) – Apply the selection in place if True, defaults to False
- Returns:
None if inplace, otherwise returns a new InteractionMatrix object
- Return type:
Union[InteractionMatrix, None]
- timestamps_lt(timestamp: float) InteractionMatrix
- timestamps_lt(timestamp: float, inplace: Literal[True]) None
Select interactions up to a given timestamp.
- Parameters:
timestamp (float) – The timestamp with which the interactions timestamp is compared.
inplace (bool, optional) – Apply the selection in place if True, defaults to False
- Returns:
None if inplace, otherwise returns a new InteractionMatrix object
- Return type:
Union[InteractionMatrix, None]
- timestamps_lte(timestamp: float) InteractionMatrix
- timestamps_lte(timestamp: float, inplace: Literal[True]) None
Select interactions up to and including a given timestamp.
- Parameters:
timestamp (float) – The timestamp with which the interactions timestamp is compared.
inplace (bool, optional) – Apply the selection in place if True, defaults to False
- Returns:
None if inplace, otherwise returns a new InteractionMatrix object
- Return type:
Union[InteractionMatrix, None]
- union(im: InteractionMatrix) InteractionMatrix
Combine events from this InteractionMatrix with another.
- Parameters:
im (InteractionMatrix) – InteractionMatrix to union with.
- Returns:
Union of interactions in this InteractionMatrix and the other.
- Return type:
- property user_ids: Set[int]
The set of all user IDs.
- Returns:
Set of all user IDs.
- Return type:
Set[int]
- users_in(U: Set[int], inplace=False) InteractionMatrix
- users_in(U: Set[int], inplace=True) None
Keep only interactions by one of the specified users.
- Parameters:
U (Union[Set[int]) – A Set or List of users to select the interactions from.
inplace (bool, optional) – Apply the selection in place or not, defaults to False
- Returns:
None if inplace, otherwise returns a new InteractionMatrix object
- Return type:
Union[InteractionMatrix, None]
- users_not_in(U: Set[int], inplace=False) InteractionMatrix | None
Keep only interactions not by the specified users.
- Parameters:
U (Set[int]) – A Set or List of users to exclude from the interactions.
inplace (bool, optional) – Apply the selection in place or not, defaults to False
- Returns:
None if inplace, otherwise returns a new InteractionMatrix object
- Return type:
Union[InteractionMatrix, None]
- property values: csr_matrix
All user-item interactions as a sparse matrix of size (|`global_users`|, |`global_items`|).
Each entry is the number of interactions between that user and item. If there are no interactions between a user and item, the entry is 0.
- Returns:
Interactions between users and items as a csr_matrix.
- Return type:
csr_matrix