sliding_window_setting
logger = logging.getLogger(__name__) module-attribute ¶
SlidingWindowSetting ¶
Bases: Setting
Sliding window setting for splitting data.
The data is split into a training set and evaluation set. The evaluation set is defined by a sliding window that moves over the data. The window size is defined by the window_size parameter. The evaluation set comprises of the unlabeled data and ground truth data stored in a list. The unlabeled data contains the last n_seq_data interactions of the users/item before the split point along with masked interactions after the split point. The number of interactions per user/item is limited to top_K. The ground truth data is the interactions after the split point and spans window_size seconds.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
training_t | int | Time point to split the data into training and evaluation data. Split will be from [0, t). | required |
window_size | int | Size of the window in seconds to slide over the data. Affects the incremental data being released to the model. If t_ground_truth_window is not provided, ground truth data will also take this window. Defaults to np.iinfo(np.int32).max. | max |
n_seq_data | int | Number of last sequential interactions to provide as data for model to make prediction. Defaults to 0. | 0 |
top_K | int | Number of interaction per user that should be selected for evaluation purposes. Defaults to 10. | 10 |
t_upper | int | Upper bound on the timestamp of interactions. Defaults to maximal integer value (acting as infinity). | max |
t_ground_truth_window | None | int | Size of the window in seconds to slide over the data for ground truth data. If not provided, defaults to window_size during computation. | None |
seed | int | Seed for random number generator. Defaults to 42. | 42 |
Source code in src/recnexteval/settings/strategy/sliding_window_setting.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 | |
IS_BASE = False class-attribute instance-attribute ¶
t = training_t instance-attribute ¶
window_size = window_size instance-attribute ¶
Window size in seconds for splitter to slide over the data.
n_seq_data = n_seq_data instance-attribute ¶
top_K = top_K instance-attribute ¶
t_upper = t_upper instance-attribute ¶
Upper bound on the timestamp of interactions. Defaults to maximal integer value (acting as infinity).
t_ground_truth_window = t_ground_truth_window instance-attribute ¶
name property ¶
Name of the object's class.
:return: Name of the object's class :rtype: str
params property ¶
Parameters of the object.
:return: Parameters of the object :rtype: dict
identifier property ¶
Name of the setting.
seed = seed instance-attribute ¶
prediction_data_processor = PredictionDataProcessor() instance-attribute ¶
num_split property ¶
Get number of splits created from dataset.
This property defaults to 1 (no splits on training set) for typical settings. For SlidingWindowSetting, this is typically greater than 1 if there are multiple splits created from the sliding window.
Returns:
| Type | Description |
|---|---|
int | Number of splits created from dataset. |
is_ready property ¶
Check if setting is ready for evaluation.
Returns:
| Type | Description |
|---|---|
bool | True if the setting has been split and is ready to use. |
is_sliding_window_setting property ¶
Check if setting is SlidingWindowSetting.
Returns:
| Type | Description |
|---|---|
bool | True if this is a SlidingWindowSetting instance. |
training_data property ¶
Get background data for initial model training.
Returns:
| Type | Description |
|---|---|
InteractionMatrix | InteractionMatrix of training interactions. |
t_window property ¶
Get the upper timestamp of the window in split.
In settings that respect the global timeline, returns a timestamp value. In SlidingWindowSetting, returns a list of timestamp values. In settings like LeaveNOutSetting, returns None.
Returns:
| Type | Description |
|---|---|
Union[None, int, list[int]] | Timestamp limit for the data (int, list of ints, or None). |
unlabeled_data property ¶
Get unlabeled data for model predictions.
Contains the user/item ID for prediction along with previous sequential interactions. Used to make predictions on ground truth data.
Returns:
| Type | Description |
|---|---|
InteractionMatrix | list[InteractionMatrix] | Single InteractionMatrix or list of InteractionMatrix for sliding window setting. |
ground_truth_data property ¶
Get ground truth data for model evaluation.
Contains the actual interactions of user-item that the model should predict.
Returns:
| Type | Description |
|---|---|
InteractionMatrix | list[InteractionMatrix] | Single InteractionMatrix or list of InteractionMatrix for sliding window. |
incremental_data property ¶
Get data for incrementally updating the model.
Only available for SlidingWindowSetting.
Returns:
| Type | Description |
|---|---|
list[InteractionMatrix] | List of InteractionMatrix objects for incremental updates. |
Raises:
| Type | Description |
|---|---|
AttributeError | If setting is not SlidingWindowSetting. |
get_params() ¶
Get the parameters of the setting.
Source code in src/recnexteval/settings/base.py
74 75 76 77 78 79 80 81 82 83 84 85 | |
split(data) ¶
Split data according to the setting.
Calling this method changes the state of the setting object to be ready for evaluation. The method splits data into training_data, ground_truth_data, and unlabeled_data.
Note
SlidingWindowSetting will have an additional attribute incremental_data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data | InteractionMatrix | Interaction matrix to be split. | required |
Source code in src/recnexteval/settings/base.py
105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 | |
restore(n=0) ¶
Restore last run.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n | int | Iteration number to restore to. If None, restores to beginning. | 0 |
Source code in src/recnexteval/settings/base.py
303 304 305 306 307 308 309 310 | |
get_split_at(index) ¶
Get the split data at a specific index.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
index | int | The index of the split to retrieve. | required |
Returns:
| Type | Description |
|---|---|
SplitResult | SplitResult with keys: 'unlabeled', 'ground_truth', 't_window', 'incremental'. |
Raises:
| Type | Description |
|---|---|
IndexError | If index is out of range. |
Source code in src/recnexteval/settings/base.py
367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 | |