streamsight.settings.SlidingWindowSetting
- class streamsight.settings.SlidingWindowSetting(background_t: int, window_size: int = 2147483647, n_seq_data: int = 10, top_K: int = 10, t_upper: int = 2147483647, t_ground_truth_window: int | None = None, seed: int | None = None)
Bases:
Setting
Sliding window setting for splitting data.
The data is split into a background set and evaluation set. The evaluation set is defined by a sliding window that moves over the data. The window size is defined by the
window_size
parameter. The evaluation set comprises of the unlabeled data and ground truth data stored in a list. The unlabeled data contains the lastn_seq_data
interactions of the users/item before the split point along with masked interactions after the split point. The number of interactions per user/item is limited totop_K
. The ground truth data is the interactions after the split point and spanswindow_size
seconds.Core attribute
background_data
: Data used for training the model. Interval is [0, background_t).unlabeled_data
: List of unlabeled data. Each element is aInteractionMatrix
object of interval [0, t).ground_truth_data
: List of ground truth data. Each element is aInteractionMatrix
object of interval [t, t + window_size).data_timestamp_limit
: List of timestamps that the splitter will slide over the data.incremental_data
: List of data that is used to incrementally update the model. Each element is aInteractionMatrix
object of interval [t, t + window_size).
- param background_t:
Time point to split the data into background and evaluation data. Split will be from [0, t)
- type background_t:
int
- param window_size:
Size of the window in seconds to slide over the data. Affects the incremental data being released to the model. If :param:`t_ground_truth_window` is not provided, ground truth data will also take this window.
- type window_size:
int, optional
- param n_seq_data:
Number of last sequential interactions to provide as data for model to make prediction.
- type n_seq_data:
int, optional
- param top_K:
Number of interaction per user that should be selected for evaluation purposes.
- type top_K:
int, optional
- param t_upper:
Upper bound on the timestamp of interactions. Defaults to maximal integer value (acting as infinity).
- type t_upper:
int, optional
- param t_ground_truth_window:
Size of the window in seconds to slide over the data for ground truth data. If not provided, defaults to window_size during computation.
- type t_ground_truth_window:
int, optional
- param seed:
Seed for random number generator.
- type seed:
int, optional
- __init__(background_t: int, window_size: int = 2147483647, n_seq_data: int = 10, top_K: int = 10, t_upper: int = 2147483647, t_ground_truth_window: int | None = None, seed: int | None = None)
Methods
__init__
(background_t[, window_size, ...])Destruct data generators.
Get the parameters of the setting.
next_ground_truth_data
([reset])Get the next ground truth data.
next_incremental_data
([reset])Get the next incremental data.
next_t_window
([reset])Get the next data timestamp limit.
next_unlabeled_data
([reset])Get the next unlabeled data.
Reset data generators.
restore_generators
([n])Restore data generators.
split
(data)Splits :param:`data` according to the setting.
Attributes
Background data provided for the model for the initial training.
Ground truth data to evaluate the model's predictions on.
Name of the setting.
Data that is used to incrementally update the model.
Flag on setting if it is ready to be used for evaluation.
Flag to indicate if the setting is
SlidingWindowSetting
.Name of the setting.
Number of splits created from dataset.
Parameters of the setting.
The upper timestamp of the window in split.
Unlabeled data for the model to make predictions on.
Window size in seconds for splitter to slide over the data.
Upper bound on the timestamp of interactions.
Number of last sequential interactions to provide in
unlabeled_data
as data for model to make prediction.Number of interaction per user that should be selected for evaluation purposes in
ground_truth_data
.- _abc_impl = <_abc._abc_data object>
- _background_data: InteractionMatrix
Data used as the initial set of interactions to train the model.
- _check_size()
Warns user if any of the sets is unusually small or empty
- _check_split()
Checks that the splits have been done properly.
Makes sure all expected attributes are set.
- _check_split_complete()
Check if the setting is ready to be used for evaluation.
- Raises:
KeyError – If the setting is not ready to be used for evaluation.
- _create_generator(attribute: str) Any
Creates generator for provided attribute name
- Parameters:
attribute (str) – the attribute name to be used to create the generator
- Yield:
Data return from the attribute
- Return type:
Any
- _ground_truth_data: InteractionMatrix | List[InteractionMatrix]
Data containing the ground truth interactions to
_unlabeled_data
. IfSlidingWindowSetting
, then it will be a list ofInteractionMatrix
.
- _ground_truth_data_generator()
Generates ground truth data.
Allow for iteration over the ground truth data. If the setting is a sliding window setting, then it will iterate over the list of ground truth data.
Note
A private method is specifically created to abstract the creation of the generator and to allow for easy resetting when needed.
- _incremental_data: List[InteractionMatrix]
Data that is used to incrementally update the model. Unique to
SlidingWindowSetting
.
- _incremental_data_generator()
Generates incremental data.
Allow for iteration over the incremental data. If the setting is a sliding window setting, then it will iterate over the list of incremental data.
Note
A private method is specifically created to abstract the creation of the generator and to allow for easy resetting when needed.
- _next_t_window_generator()
Generates t_window data.
Allow for iteration over the t_window data. If the setting is a sliding window setting, then it will iterate over the list of data timestamp limit.
Note
A private method is specifically created to abstract the creation of the generator and to allow for easy resetting when needed.
- _num_full_interactions: int
- _split(data: InteractionMatrix)
Splits dataset into a background, unlabeled and ground truth data.
- Parameters:
data (InteractionMatrix) – Interaction matrix to be split. Must contain timestamps.
- _split_complete
Number of splits created from sliding window. Defaults to 1 (no splits on training set).
- _t_window: None | int | List[int]
This is the upper timestamp of the window in split. The actual interaction might have a smaller timestamp value than this because this will is the t cut off value.
- _unlabeled_data: InteractionMatrix | List[InteractionMatrix]
- _unlabeled_data_generator()
Generates unlabeled data.
Allow for iteration over the unlabeled data. If the setting is a sliding window setting, then it will iterate over the list of unlabeled data.
Note
A private method is specifically created to abstract the creation of the generator and to allow for easy resetting when needed.
- property background_data: InteractionMatrix
Background data provided for the model for the initial training.
This data is used as the initial set of interactions to train the model.
- Returns:
Interaction Matrix of training interactions.
- Return type:
- destruct_generators() None
Destruct data generators.
Destructs the data generators of the setting object. This method is useful when the setting object needs to be be pickled or saved to disk.
- get_params() Dict[str, Any]
Get the parameters of the setting.
- property ground_truth_data: InteractionMatrix | List[InteractionMatrix]
Ground truth data to evaluate the model’s predictions on.
Contains the actual interactions of the user-item interaction that the model is supposed to predict.
- Returns:
_description_
- Return type:
Union[InteractionMatrix, List[InteractionMatrix]]
- property identifier: str
Name of the setting.
- property incremental_data: List[InteractionMatrix]
Data that is used to incrementally update the model.
Unique to sliding window setting.
- Returns:
_description_
- Return type:
List[InteractionMatrix]
- property is_ready: bool
Flag on setting if it is ready to be used for evaluation.
- Returns:
If the setting is ready to be used for evaluation.
- Return type:
bool
- property is_sliding_window_setting: bool
Flag to indicate if the setting is
SlidingWindowSetting
.- Returns:
If the setting is
SlidingWindowSetting
.- Return type:
bool
- n_seq_data: int
Number of last sequential interactions to provide in
unlabeled_data
as data for model to make prediction.
- property name: str
Name of the setting.
- Returns:
Name of the setting.
- Return type:
str
- next_ground_truth_data(reset=False) InteractionMatrix
Get the next ground truth data.
Get the next ground truth data for the corresponding split. If the setting is a sliding window setting, then it will iterate over the list of ground truth data.
- Parameters:
reset (bool, optional) – To reset the generator, defaults to False
- Raises:
EOWSetting – If there is no more ground truth data to iterate over.
- Returns:
The next ground truth data for the corresponding split.
- Return type:
- next_incremental_data(reset=False) InteractionMatrix
Get the next incremental data.
Get the next incremental data for the corresponding split. If the setting is a sliding window setting, then it will iterate over the list of incremental data.
- Parameters:
reset (bool, optional) – To reset the generator, defaults to False
- Raises:
AttributeError – If the setting is not a sliding window setting.
EOWSetting – If there is no more incremental data to iterate over.
- Returns:
The next incremental data for the corresponding split.
- Return type:
- next_t_window(reset=False) int
Get the next data timestamp limit.
Get the next upper timestamp limit for the corresponding split. If the setting is a sliding window setting, then it will iterate over the list of timestamps that specify the timestamp cut off for the data.
- Parameters:
reset (bool, optional) – To reset the generator, defaults to False
- Raises:
EOWSetting – If there is no more data timestamp limit to iterate over.
- Returns:
The next t_window for the corresponding split.
- Return type:
int
- next_unlabeled_data(reset=False) InteractionMatrix
Get the next unlabeled data.
Get the next unlabeled data for the corresponding split. If the setting is a sliding window setting, then it will iterate over the list of unlabeled data.
- Parameters:
reset (bool, optional) – To reset the generator, defaults to False
- Raises:
EOWSetting – If there is no more unlabeled data to iterate over.
- Returns:
The next unlabeled data for the corresponding split.
- Return type:
- property num_split: int
Number of splits created from dataset.
This property defaults to 1 (no splits on training set) on a typical setting. Usually for the
SlidingWindowSetting
this property will be greater than 1 if there are multiple splits created from the sliding window on the dataset.- Returns:
Number of splits created from dataset.
- Return type:
int
- property params
Parameters of the setting.
- reset_data_generators() None
Reset data generators.
Resets the data generators to the beginning of the data series. API allows the programmer to reset the data generators of the setting object to the beginning of the data series.
- restore_generators(n: int | None = None) None
Restore data generators.
Restores the data generators of the setting object. If :param:`n` is provided, then it will restore the data generators to the iteration number :param:`n`. If :param:`n` is not provided, then it will restore the data generators to the beginning of the data series.
- Parameters:
n (int, optional) – iteration number to restore generator to, defaults to int
- split(data: InteractionMatrix) None
Splits :param:`data` according to the setting.
Calling this method will change the state of the setting object to be ready for evaluation. The method will split the data into
background_data
,ground_truth_data
,unlabeled_data
.This method will perform a basic check on the split to ensure that the split did not result in any empty or unusually small datasets.
Note
SlidingWindowSetting
will have additional attributeincremental_data
.- Parameters:
data (InteractionMatrix) – Interaction matrix that should be split.
- t_upper
Upper bound on the timestamp of interactions. Defaults to maximal integer value (acting as infinity).
- property t_window: None | int | List[int]
The upper timestamp of the window in split.
In settings that respect the global timeline, a timestamp value will be returned. In the case of
SlidingWindowSetting
, a list of timestamp values will be returned.Settings such as
LeaveNOutSetting
will return None since there is no split with respect to time.- Returns:
timestamp limit for the data.
- Return type:
Union[int, List[int]]
- top_K: int
Number of interaction per user that should be selected for evaluation purposes in
ground_truth_data
.
- property unlabeled_data: InteractionMatrix | List[InteractionMatrix]
Unlabeled data for the model to make predictions on.
Contains the user/item ID for prediction along with previous sequential interactions of user-item on items if it exists. This data is used to make predictions on the ground truth data.
- Returns:
Either a single InteractionMatrix or a list of InteractionMatrix if the setting is a sliding window setting.
- Return type:
Union[InteractionMatrix, List[InteractionMatrix]]
- window_size
Window size in seconds for splitter to slide over the data.