strategy

`LeaveNOutSetting` ¶

Bases: Setting

Leave-N-Out setting for splitting data.

Splits the dataset into training and test sets by leaving out the last N interactions for each user as test data, using the previous n_seq_data interactions as context.

Source code in src/recnexteval/settings/strategy/leave_n_out_setting.py

class LeaveNOutSetting(Setting):
    """Leave-N-Out setting for splitting data.

    Splits the dataset into training and test sets by leaving out the last N interactions
    for each user as test data, using the previous n_seq_data interactions as context.
    """

    IS_BASE: bool = False

    def __init__(
        self,
        n_seq_data: int = 1,
        N: int = 1,
        seed: int = 42,
    ) -> None:
        super().__init__(seed=seed)
        self.n_seq_data = n_seq_data
        # we use top_K to denote the number of items to predict
        self.top_K = N
        logger.info("Splitting data")
        self._splitter = NLastInteractionSplitter(N, n_seq_data)

    def _split(self, data: InteractionMatrix) -> None:
        """Splits the dataset into training and test sets based on interaction timestamps.

        Args:
            data: Interaction matrix to be split. Must contain timestamps.
        """

        self._training_data, future_interaction = self._splitter.split(data)
        # we need to copy the data to avoid modifying the background data
        past_interaction = self._training_data.copy()

        self._unlabeled_data, self._ground_truth_data = self.prediction_data_processor.process(
            past_interaction=past_interaction,
            future_interaction=future_interaction,
            top_K=self.top_K,
        )
        self._t_window = None

`name` `property` ¶

Name of the object's class.

:return: Name of the object's class :rtype: str

`params` `property` ¶

Parameters of the object.

:return: Parameters of the object :rtype: dict

`identifier` `property` ¶

Name of the setting.

`seed = seed` `instance-attribute` ¶

`prediction_data_processor = PredictionDataProcessor()` `instance-attribute` ¶

`num_split` `property` ¶

Get number of splits created from dataset.

This property defaults to 1 (no splits on training set) for typical settings. For SlidingWindowSetting, this is typically greater than 1 if there are multiple splits created from the sliding window.

Returns:

Type	Description
`int`	Number of splits created from dataset.

`is_ready` `property` ¶

Check if setting is ready for evaluation.

Returns:

Type	Description
`bool`	True if the setting has been split and is ready to use.

`is_sliding_window_setting` `property` ¶

Check if setting is SlidingWindowSetting.

Returns:

Type	Description
`bool`	True if this is a SlidingWindowSetting instance.

`training_data` `property` ¶

Get background data for initial model training.

Returns:

Type	Description
`InteractionMatrix`	InteractionMatrix of training interactions.

`t_window` `property` ¶

Get the upper timestamp of the window in split.

In settings that respect the global timeline, returns a timestamp value. In SlidingWindowSetting, returns a list of timestamp values. In settings like LeaveNOutSetting, returns None.

Returns:

Type	Description
`Union[None, int, list[int]]`	Timestamp limit for the data (int, list of ints, or None).

`unlabeled_data` `property` ¶

Get unlabeled data for model predictions.

Contains the user/item ID for prediction along with previous sequential interactions. Used to make predictions on ground truth data.

Returns:

Type	Description
`InteractionMatrix \| list[InteractionMatrix]`	Single InteractionMatrix or list of InteractionMatrix for sliding window setting.

`ground_truth_data` `property` ¶

Get ground truth data for model evaluation.

Contains the actual interactions of user-item that the model should predict.

Returns:

Type	Description
`InteractionMatrix \| list[InteractionMatrix]`	Single InteractionMatrix or list of InteractionMatrix for sliding window.

`incremental_data` `property` ¶

Get data for incrementally updating the model.

Only available for SlidingWindowSetting.

Returns:

Type	Description
`list[InteractionMatrix]`	List of InteractionMatrix objects for incremental updates.

Raises:

Type	Description
`AttributeError`	If setting is not SlidingWindowSetting.

`IS_BASE = False` `class-attribute` `instance-attribute` ¶

`n_seq_data = n_seq_data` `instance-attribute` ¶

`top_K = N` `instance-attribute` ¶

`get_params()` ¶

Get the parameters of the setting.

Source code in src/recnexteval/settings/base.py

def get_params(self) -> dict[str, Any]:
    """Get the parameters of the setting."""
    # Get all instance attributes that don't start with underscore
    # and are not special attributes
    exclude_attrs = {"prediction_data_processor"}

    params = {}
    for attr_name, attr_value in vars(self).items():
        if not attr_name.startswith("_") and attr_name not in exclude_attrs:
            params[attr_name] = attr_value

    return params

`split(data)` ¶

Split data according to the setting.

Calling this method changes the state of the setting object to be ready for evaluation. The method splits data into training_data, ground_truth_data, and unlabeled_data.

Note

SlidingWindowSetting will have an additional attribute incremental_data.

Parameters:

Name	Type	Description	Default
`data`	`InteractionMatrix`	Interaction matrix to be split.	required

Source code in src/recnexteval/settings/base.py

def split(self, data: InteractionMatrix) -> None:
    """Split data according to the setting.

    Calling this method changes the state of the setting object to be ready
    for evaluation. The method splits data into training_data, ground_truth_data,
    and unlabeled_data.

    Note:
        SlidingWindowSetting will have an additional attribute incremental_data.

    Args:
        data: Interaction matrix to be split.
    """
    logger.debug("Splitting data...")
    self._num_full_interactions = data.num_interactions
    start = time.time()
    self._split(data)
    end = time.time()
    logger.info(f"{self.name} data split - Took {end - start:.3}s")

    logger.debug("Checking split attribute and sizes.")
    self._check_split()

    self._split_complete = True
    logger.info(f"{self.name} data split complete.")

`restore(n=0)` ¶

Restore last run.

Parameters:

Name	Type	Description	Default
`n`	`int`	Iteration number to restore to. If None, restores to beginning.	`0`

Source code in src/recnexteval/settings/base.py

def restore(self, n: int = 0) -> None:
    """Restore last run.

    Args:
        n: Iteration number to restore to. If None, restores to beginning.
    """
    logger.debug(f"Restoring setting to iteration {n}")
    self.current_index = n

`get_split_at(index)` ¶

Get the split data at a specific index.

Parameters:

Name	Type	Description	Default
`index`	`int`	The index of the split to retrieve.	required

Returns:

Type	Description
`SplitResult`	SplitResult with keys: 'unlabeled', 'ground_truth', 't_window', 'incremental'.

Raises:

Type	Description
`IndexError`	If index is out of range.

Source code in src/recnexteval/settings/base.py

def get_split_at(self, index: int) -> SplitResult:
    """Get the split data at a specific index.

    Args:
        index: The index of the split to retrieve.

    Returns:
        SplitResult with keys: 'unlabeled', 'ground_truth', 't_window', 'incremental'.

    Raises:
        IndexError: If index is out of range.
    """
    if index < 0 or index > self.num_split:
        raise IndexError(f"Index {index} out of range for {self.num_split} splits")

    if self._sliding_window_setting:
        if not (
            isinstance(self._unlabeled_data, list)
            and isinstance(self._ground_truth_data, list)
            and isinstance(self._t_window, list)
        ):
            raise ValueError("Expected list of InteractionMatrix for sliding window setting.")
        result = SplitResult(
            unlabeled=self._unlabeled_data[index],
            ground_truth=self._ground_truth_data[index],
            # TODO change this variable to training_data when refactoring
            incremental=(
                self._incremental_data[index - 1] if index < len(self._incremental_data) and index > 0 else None
            ),
            t_window=self._t_window[index],
        )
    else:
        if index != 0:
            raise IndexError("Non-sliding setting has only one split at index 0")
        if (
            isinstance(self._unlabeled_data, list)
            or isinstance(self._ground_truth_data, list)
            or isinstance(self._t_window, list)
        ):
            raise ValueError("Expected single data for non-sliding setting.")
        result = SplitResult(
            unlabeled=self._unlabeled_data,
            ground_truth=self._ground_truth_data,
            incremental=None,
            t_window=self._t_window,
        )

    return result

`SingleTimePointSetting` ¶

Bases: Setting

Single time point setting for data split.

Splits an interaction dataset at a single timestamp into training data and evaluation data. The evaluation data can be further processed to produce unlabeled inputs and ground-truth targets for model evaluation.

Parameters:

Name	Type	Description	Default
`training_t`	`int`	Time point to split the data. The training split covers interactions with timestamps in `[0, training_t)`.	required
`n_seq_data`	`int`	Number of last sequential interactions to provide as input for prediction. Defaults to `1`.	`1`
`top_K`	`int`	Number of interactions per user to select for evaluation purposes. Defaults to `1`.	`1`
`t_upper`	`int`	Upper bound on the timestamp of interactions included in evaluation. Defaults to the maximum 32-bit integer value (acts like infinity).	`max`
`include_all_past_data`	`bool`	If True, include all past interactions when constructing input sequences. Defaults to False.	`False`
`seed`	`int`	Random seed for reproducible behavior. If None, a seed will be generated.	`42`

Source code in src/recnexteval/settings/strategy/single_time_point_setting.py

class SingleTimePointSetting(Setting):
    """Single time point setting for data split.

    Splits an interaction dataset at a single timestamp into training
    data and evaluation data. The evaluation data can be
    further processed to produce unlabeled inputs and ground-truth
    targets for model evaluation.

    Args:
        training_t: Time point to split the data. The training
            split covers interactions with timestamps in `[0, training_t)`.
        n_seq_data: Number of last sequential interactions
            to provide as input for prediction. Defaults to `1`.
        top_K: Number of interactions per user to select for
            evaluation purposes. Defaults to `1`.
        t_upper: Upper bound on the timestamp of
            interactions included in evaluation. Defaults to the maximum
            32-bit integer value (acts like infinity).
        include_all_past_data: If True, include all past
            interactions when constructing input sequences. Defaults to False.
        seed: Random seed for reproducible behavior.
            If None, a seed will be generated.
    """

    IS_BASE: bool = False

    def __init__(
        self,
        training_t: int,
        n_seq_data: int = 1,
        top_K: int = 1,
        t_upper: int = np.iinfo(np.int32).max,
        include_all_past_data: bool = False,
        seed: int = 42,
    ):
        super().__init__(seed=seed)
        self.t = training_t
        """Epoch timestamp value to be used in for training set."""
        self.t_upper = t_upper
        """Epoch value to be added to `t` as upper bound for evaluation data."""
        self.n_seq_data = n_seq_data
        self.top_K = top_K

        logger.info("Splitting data at time %s with t_upper interval %s", training_t, t_upper)

        self._training_data_splitter = TimestampSplitter(
            t=training_t,
            t_lower=None,
            t_upper=t_upper,
        )
        self._splitter = NLastInteractionTimestampSplitter(
            t=training_t,
            t_upper=t_upper,
            n_seq_data=n_seq_data,
            include_all_past_data=include_all_past_data,
        )
        self._t_window = training_t

    def _split(self, data: InteractionMatrix) -> None:
        """Split the dataset by timestamp into training and evaluation sets.

        The method raises :class:`TimestampAttributeMissingError` when the
        provided :class:`InteractionMatrix` does not contain timestamp
        information. It will warn if the chosen split time is before the
        earliest timestamp in the data.

        Args:
            data: Interaction matrix to split. Must have timestamps.

        Raises:
            TimestampAttributeMissingError: If `data` has no timestamp attribute.
        """
        if not data.has_timestamps:
            raise TimestampAttributeMissingError()
        if data.min_timestamp > self.t:
            warn(
                f"Splitting at time {self.t} is before the first timestamp"
                " in the data. No data will be in the training set."
            )

        self._training_data, _ = self._training_data_splitter.split(data)
        past_interaction, future_interaction = self._splitter.split(data)
        self._unlabeled_data, self._ground_truth_data = self.prediction_data_processor.process(
            past_interaction=past_interaction,
            future_interaction=future_interaction,
            top_K=self.top_K,
        )

        if len(self._training_data) == 0:
            logger.info("Training data is empty after splitting at time %s", self.t)
        if len(self._unlabeled_data) == 0:
            logger.info("Unlabeled data is empty after splitting at time %s", self.t)
        if len(self._ground_truth_data) == 0:
            logger.info("Ground truth data is empty after splitting at time %s", self.t)

        logger.info("Finished splitting data at time %s", self.t)

`name` `property` ¶

Name of the object's class.

:return: Name of the object's class :rtype: str

`params` `property` ¶

Parameters of the object.

:return: Parameters of the object :rtype: dict

`identifier` `property` ¶

Name of the setting.

`seed = seed` `instance-attribute` ¶

`prediction_data_processor = PredictionDataProcessor()` `instance-attribute` ¶

`num_split` `property` ¶

Get number of splits created from dataset.

This property defaults to 1 (no splits on training set) for typical settings. For SlidingWindowSetting, this is typically greater than 1 if there are multiple splits created from the sliding window.

Returns:

Type	Description
`int`	Number of splits created from dataset.

`is_ready` `property` ¶

Check if setting is ready for evaluation.

Returns:

Type	Description
`bool`	True if the setting has been split and is ready to use.

`is_sliding_window_setting` `property` ¶

Check if setting is SlidingWindowSetting.

Returns:

Type	Description
`bool`	True if this is a SlidingWindowSetting instance.

`training_data` `property` ¶

Get background data for initial model training.

Returns:

Type	Description
`InteractionMatrix`	InteractionMatrix of training interactions.

`t_window` `property` ¶

Get the upper timestamp of the window in split.

In settings that respect the global timeline, returns a timestamp value. In SlidingWindowSetting, returns a list of timestamp values. In settings like LeaveNOutSetting, returns None.

Returns:

Type	Description
`Union[None, int, list[int]]`	Timestamp limit for the data (int, list of ints, or None).

`unlabeled_data` `property` ¶

Get unlabeled data for model predictions.

Contains the user/item ID for prediction along with previous sequential interactions. Used to make predictions on ground truth data.

Returns:

Type	Description
`InteractionMatrix \| list[InteractionMatrix]`	Single InteractionMatrix or list of InteractionMatrix for sliding window setting.

`ground_truth_data` `property` ¶

Get ground truth data for model evaluation.

Contains the actual interactions of user-item that the model should predict.

Returns:

Type	Description
`InteractionMatrix \| list[InteractionMatrix]`	Single InteractionMatrix or list of InteractionMatrix for sliding window.

`incremental_data` `property` ¶

Get data for incrementally updating the model.

Only available for SlidingWindowSetting.

Returns:

Type	Description
`list[InteractionMatrix]`	List of InteractionMatrix objects for incremental updates.

Raises:

Type	Description
`AttributeError`	If setting is not SlidingWindowSetting.

`IS_BASE = False` `class-attribute` `instance-attribute` ¶

`t = training_t` `instance-attribute` ¶

Epoch timestamp value to be used in for training set.

`t_upper = t_upper` `instance-attribute` ¶

Epoch value to be added to t as upper bound for evaluation data.

`n_seq_data = n_seq_data` `instance-attribute` ¶

`top_K = top_K` `instance-attribute` ¶

`get_params()` ¶

Get the parameters of the setting.

Source code in src/recnexteval/settings/base.py

def get_params(self) -> dict[str, Any]:
    """Get the parameters of the setting."""
    # Get all instance attributes that don't start with underscore
    # and are not special attributes
    exclude_attrs = {"prediction_data_processor"}

    params = {}
    for attr_name, attr_value in vars(self).items():
        if not attr_name.startswith("_") and attr_name not in exclude_attrs:
            params[attr_name] = attr_value

    return params

`split(data)` ¶

Split data according to the setting.

Calling this method changes the state of the setting object to be ready for evaluation. The method splits data into training_data, ground_truth_data, and unlabeled_data.

Note

SlidingWindowSetting will have an additional attribute incremental_data.

Parameters:

Name	Type	Description	Default
`data`	`InteractionMatrix`	Interaction matrix to be split.	required

Source code in src/recnexteval/settings/base.py

def split(self, data: InteractionMatrix) -> None:
    """Split data according to the setting.

    Calling this method changes the state of the setting object to be ready
    for evaluation. The method splits data into training_data, ground_truth_data,
    and unlabeled_data.

    Note:
        SlidingWindowSetting will have an additional attribute incremental_data.

    Args:
        data: Interaction matrix to be split.
    """
    logger.debug("Splitting data...")
    self._num_full_interactions = data.num_interactions
    start = time.time()
    self._split(data)
    end = time.time()
    logger.info(f"{self.name} data split - Took {end - start:.3}s")

    logger.debug("Checking split attribute and sizes.")
    self._check_split()

    self._split_complete = True
    logger.info(f"{self.name} data split complete.")

`restore(n=0)` ¶

Restore last run.

Parameters:

Name	Type	Description	Default
`n`	`int`	Iteration number to restore to. If None, restores to beginning.	`0`

Source code in src/recnexteval/settings/base.py

def restore(self, n: int = 0) -> None:
    """Restore last run.

    Args:
        n: Iteration number to restore to. If None, restores to beginning.
    """
    logger.debug(f"Restoring setting to iteration {n}")
    self.current_index = n

`get_split_at(index)` ¶

Get the split data at a specific index.

Parameters:

Name	Type	Description	Default
`index`	`int`	The index of the split to retrieve.	required

Returns:

Type	Description
`SplitResult`	SplitResult with keys: 'unlabeled', 'ground_truth', 't_window', 'incremental'.

Raises:

Type	Description
`IndexError`	If index is out of range.

Source code in src/recnexteval/settings/base.py

def get_split_at(self, index: int) -> SplitResult:
    """Get the split data at a specific index.

    Args:
        index: The index of the split to retrieve.

    Returns:
        SplitResult with keys: 'unlabeled', 'ground_truth', 't_window', 'incremental'.

    Raises:
        IndexError: If index is out of range.
    """
    if index < 0 or index > self.num_split:
        raise IndexError(f"Index {index} out of range for {self.num_split} splits")

    if self._sliding_window_setting:
        if not (
            isinstance(self._unlabeled_data, list)
            and isinstance(self._ground_truth_data, list)
            and isinstance(self._t_window, list)
        ):
            raise ValueError("Expected list of InteractionMatrix for sliding window setting.")
        result = SplitResult(
            unlabeled=self._unlabeled_data[index],
            ground_truth=self._ground_truth_data[index],
            # TODO change this variable to training_data when refactoring
            incremental=(
                self._incremental_data[index - 1] if index < len(self._incremental_data) and index > 0 else None
            ),
            t_window=self._t_window[index],
        )
    else:
        if index != 0:
            raise IndexError("Non-sliding setting has only one split at index 0")
        if (
            isinstance(self._unlabeled_data, list)
            or isinstance(self._ground_truth_data, list)
            or isinstance(self._t_window, list)
        ):
            raise ValueError("Expected single data for non-sliding setting.")
        result = SplitResult(
            unlabeled=self._unlabeled_data,
            ground_truth=self._ground_truth_data,
            incremental=None,
            t_window=self._t_window,
        )

    return result

`SlidingWindowSetting` ¶

Bases: Setting

Sliding window setting for splitting data.

The data is split into a training set and evaluation set. The evaluation set is defined by a sliding window that moves over the data. The window size is defined by the window_size parameter. The evaluation set comprises of the unlabeled data and ground truth data stored in a list. The unlabeled data contains the last n_seq_data interactions of the users/item before the split point along with masked interactions after the split point. The number of interactions per user/item is limited to top_K. The ground truth data is the interactions after the split point and spans window_size seconds.

Parameters:

Name	Type	Description	Default
`training_t`	`int`	Time point to split the data into training and evaluation data. Split will be from [0, t).	required
`window_size`	`int`	Size of the window in seconds to slide over the data. Affects the incremental data being released to the model. If t_ground_truth_window is not provided, ground truth data will also take this window. Defaults to np.iinfo(np.int32).max.	`max`
`n_seq_data`	`int`	Number of last sequential interactions to provide as data for model to make prediction. Defaults to 0.	`0`
`top_K`	`int`	Number of interaction per user that should be selected for evaluation purposes. Defaults to 10.	`10`
`t_upper`	`int`	Upper bound on the timestamp of interactions. Defaults to maximal integer value (acting as infinity).	`max`
`t_ground_truth_window`	`None \| int`	Size of the window in seconds to slide over the data for ground truth data. If not provided, defaults to window_size during computation.	`None`
`seed`	`int`	Seed for random number generator. Defaults to 42.	`42`

Source code in src/recnexteval/settings/strategy/sliding_window_setting.py

class SlidingWindowSetting(Setting):
    """Sliding window setting for splitting data.

    The data is split into a training set and evaluation set. The evaluation set is defined by a sliding window
    that moves over the data. The window size is defined by the window_size parameter. The evaluation set comprises of the
    unlabeled data and ground truth data stored in a list. The unlabeled data contains the last n_seq_data interactions
    of the users/item before the split point along with masked interactions after the split point. The number of
    interactions per user/item is limited to top_K.
    The ground truth data is the interactions after the split point and spans window_size seconds.

    Args:
        training_t: Time point to split the data into training and evaluation data. Split will be from [0, t).
        window_size: Size of the window in seconds to slide over the data.
            Affects the incremental data being released to the model. If
            t_ground_truth_window is not provided, ground truth data will also
            take this window. Defaults to np.iinfo(np.int32).max.
        n_seq_data: Number of last sequential interactions to provide as
             data for model to make prediction. Defaults to 0.
        top_K: Number of interaction per user that should be selected for evaluation purposes.
            Defaults to 10.
        t_upper: Upper bound on the timestamp of interactions.
            Defaults to maximal integer value (acting as infinity).
        t_ground_truth_window: Size of the window in seconds to slide over the data for ground truth data.
            If not provided, defaults to window_size during computation.
        seed: Seed for random number generator. Defaults to 42.
    """

    IS_BASE: bool = False

    def __init__(
        self,
        training_t: int,
        window_size: int = np.iinfo(np.int32).max,  # in seconds
        n_seq_data: int = 0,
        top_K: int = 10,
        t_upper: int = np.iinfo(np.int32).max,
        t_ground_truth_window: None | int = None,
        seed: int = 42,
    ) -> None:
        super().__init__(seed=seed)
        self._sliding_window_setting = True
        self.t = training_t
        self.window_size = window_size
        """Window size in seconds for splitter to slide over the data."""
        self.n_seq_data = n_seq_data
        self.top_K = top_K
        self.t_upper = t_upper
        """Upper bound on the timestamp of interactions. Defaults to maximal integer value (acting as infinity)."""

        if t_upper and t_upper < training_t:
            raise ValueError("t_upper must be greater than training_t")

        if t_ground_truth_window is None:
            t_ground_truth_window = window_size

        self.t_ground_truth_window = t_ground_truth_window

        self._training_data_splitter = TimestampSplitter(t=training_t, t_lower=None, t_upper=self.t_upper)
        self._window_splitter = NLastInteractionTimestampSplitter(
            t=training_t,
            t_upper=t_ground_truth_window,
            n_seq_data=n_seq_data,
        )

    def _split(self, data: InteractionMatrix) -> None:
        if not data.has_timestamps:
            raise TimestampAttributeMissingError()
        if data.min_timestamp > self.t:
            warn(
                f"Splitting at time {self.t} is before the first "
                "timestamp in the data. No data will be in the training set."
            )
        if self.t_upper:
            data = data.timestamps_lt(self.t_upper)

        self._training_data, _ = self._training_data_splitter.split(data)
        self._ground_truth_data, self._unlabeled_data, self._t_window, self._incremental_data = (
            [],
            [],
            [],
            [],
        )

        # sub_time is the subjugate time point that the splitter will slide over the data
        sub_time = self.t
        max_timestamp = data.max_timestamp

        pbar = tqdm(total=int((max_timestamp - sub_time) / self.window_size))
        while sub_time <= max_timestamp:
            self._t_window.append(sub_time)
            # the set used for eval will always have a timestamp greater than
            # data released such that it is unknown to the model
            self._window_splitter.update_split_point(t=sub_time)
            past_interaction, future_interaction = self._window_splitter.split(data)

            # if past_interaction, future_interaction is empty, log an info message
            if len(past_interaction) == 0:
                logger.info(
                    "Split at time %s resulted in empty unlabelled testing samples.", sub_time
                )
            if len(future_interaction) == 0:
                logger.info("Split at time %s resulted in empty incremental data.", sub_time)

            unlabeled_set, ground_truth = self.prediction_data_processor.process(
                past_interaction=past_interaction,
                future_interaction=future_interaction,
                top_K=self.top_K,
            )
            self._unlabeled_data.append(unlabeled_set)
            self._ground_truth_data.append(ground_truth)

            self._incremental_data.append(future_interaction)

            sub_time += self.window_size
            pbar.update(1)
        pbar.close()

        self._num_split_set = len(self._unlabeled_data)
        logger.info(
            "Finished split with window size %s seconds. Number of splits: %s in total.",
            self.window_size,
            self._num_split_set,
        )

`name` `property` ¶

Name of the object's class.

:return: Name of the object's class :rtype: str

`params` `property` ¶

Parameters of the object.

:return: Parameters of the object :rtype: dict

`identifier` `property` ¶

Name of the setting.

`seed = seed` `instance-attribute` ¶

`prediction_data_processor = PredictionDataProcessor()` `instance-attribute` ¶

`num_split` `property` ¶

Get number of splits created from dataset.

This property defaults to 1 (no splits on training set) for typical settings. For SlidingWindowSetting, this is typically greater than 1 if there are multiple splits created from the sliding window.

Returns:

Type	Description
`int`	Number of splits created from dataset.

`is_ready` `property` ¶

Check if setting is ready for evaluation.

Returns:

Type	Description
`bool`	True if the setting has been split and is ready to use.

`is_sliding_window_setting` `property` ¶

Check if setting is SlidingWindowSetting.

Returns:

Type	Description
`bool`	True if this is a SlidingWindowSetting instance.

`training_data` `property` ¶

Get background data for initial model training.

Returns:

Type	Description
`InteractionMatrix`	InteractionMatrix of training interactions.

`t_window` `property` ¶

Get the upper timestamp of the window in split.

In settings that respect the global timeline, returns a timestamp value. In SlidingWindowSetting, returns a list of timestamp values. In settings like LeaveNOutSetting, returns None.

Returns:

Type	Description
`Union[None, int, list[int]]`	Timestamp limit for the data (int, list of ints, or None).

`unlabeled_data` `property` ¶

Get unlabeled data for model predictions.

Contains the user/item ID for prediction along with previous sequential interactions. Used to make predictions on ground truth data.

Returns:

Type	Description
`InteractionMatrix \| list[InteractionMatrix]`	Single InteractionMatrix or list of InteractionMatrix for sliding window setting.

`ground_truth_data` `property` ¶

Get ground truth data for model evaluation.

Contains the actual interactions of user-item that the model should predict.

Returns:

Type	Description
`InteractionMatrix \| list[InteractionMatrix]`	Single InteractionMatrix or list of InteractionMatrix for sliding window.

`incremental_data` `property` ¶

Get data for incrementally updating the model.

Only available for SlidingWindowSetting.

Returns:

Type	Description
`list[InteractionMatrix]`	List of InteractionMatrix objects for incremental updates.

Raises:

Type	Description
`AttributeError`	If setting is not SlidingWindowSetting.

`IS_BASE = False` `class-attribute` `instance-attribute` ¶

`t = training_t` `instance-attribute` ¶

`window_size = window_size` `instance-attribute` ¶

Window size in seconds for splitter to slide over the data.

`n_seq_data = n_seq_data` `instance-attribute` ¶

`top_K = top_K` `instance-attribute` ¶

`t_upper = t_upper` `instance-attribute` ¶

Upper bound on the timestamp of interactions. Defaults to maximal integer value (acting as infinity).

`t_ground_truth_window = t_ground_truth_window` `instance-attribute` ¶

`get_params()` ¶

Get the parameters of the setting.

Source code in src/recnexteval/settings/base.py

def get_params(self) -> dict[str, Any]:
    """Get the parameters of the setting."""
    # Get all instance attributes that don't start with underscore
    # and are not special attributes
    exclude_attrs = {"prediction_data_processor"}

    params = {}
    for attr_name, attr_value in vars(self).items():
        if not attr_name.startswith("_") and attr_name not in exclude_attrs:
            params[attr_name] = attr_value

    return params

`split(data)` ¶

Split data according to the setting.

Calling this method changes the state of the setting object to be ready for evaluation. The method splits data into training_data, ground_truth_data, and unlabeled_data.

Note

SlidingWindowSetting will have an additional attribute incremental_data.

Parameters:

Name	Type	Description	Default
`data`	`InteractionMatrix`	Interaction matrix to be split.	required

Source code in src/recnexteval/settings/base.py

def split(self, data: InteractionMatrix) -> None:
    """Split data according to the setting.

    Calling this method changes the state of the setting object to be ready
    for evaluation. The method splits data into training_data, ground_truth_data,
    and unlabeled_data.

    Note:
        SlidingWindowSetting will have an additional attribute incremental_data.

    Args:
        data: Interaction matrix to be split.
    """
    logger.debug("Splitting data...")
    self._num_full_interactions = data.num_interactions
    start = time.time()
    self._split(data)
    end = time.time()
    logger.info(f"{self.name} data split - Took {end - start:.3}s")

    logger.debug("Checking split attribute and sizes.")
    self._check_split()

    self._split_complete = True
    logger.info(f"{self.name} data split complete.")

`restore(n=0)` ¶

Restore last run.

Parameters:

Name	Type	Description	Default
`n`	`int`	Iteration number to restore to. If None, restores to beginning.	`0`

Source code in src/recnexteval/settings/base.py

def restore(self, n: int = 0) -> None:
    """Restore last run.

    Args:
        n: Iteration number to restore to. If None, restores to beginning.
    """
    logger.debug(f"Restoring setting to iteration {n}")
    self.current_index = n

`get_split_at(index)` ¶

Get the split data at a specific index.

Parameters:

Name	Type	Description	Default
`index`	`int`	The index of the split to retrieve.	required

Returns:

Type	Description
`SplitResult`	SplitResult with keys: 'unlabeled', 'ground_truth', 't_window', 'incremental'.

Raises:

Type	Description
`IndexError`	If index is out of range.

Source code in src/recnexteval/settings/base.py

def get_split_at(self, index: int) -> SplitResult:
    """Get the split data at a specific index.

    Args:
        index: The index of the split to retrieve.

    Returns:
        SplitResult with keys: 'unlabeled', 'ground_truth', 't_window', 'incremental'.

    Raises:
        IndexError: If index is out of range.
    """
    if index < 0 or index > self.num_split:
        raise IndexError(f"Index {index} out of range for {self.num_split} splits")

    if self._sliding_window_setting:
        if not (
            isinstance(self._unlabeled_data, list)
            and isinstance(self._ground_truth_data, list)
            and isinstance(self._t_window, list)
        ):
            raise ValueError("Expected list of InteractionMatrix for sliding window setting.")
        result = SplitResult(
            unlabeled=self._unlabeled_data[index],
            ground_truth=self._ground_truth_data[index],
            # TODO change this variable to training_data when refactoring
            incremental=(
                self._incremental_data[index - 1] if index < len(self._incremental_data) and index > 0 else None
            ),
            t_window=self._t_window[index],
        )
    else:
        if index != 0:
            raise IndexError("Non-sliding setting has only one split at index 0")
        if (
            isinstance(self._unlabeled_data, list)
            or isinstance(self._ground_truth_data, list)
            or isinstance(self._t_window, list)
        ):
            raise ValueError("Expected single data for non-sliding setting.")
        result = SplitResult(
            unlabeled=self._unlabeled_data,
            ground_truth=self._ground_truth_data,
            incremental=None,
            t_window=self._t_window,
        )

    return result

strategy

LeaveNOutSetting ¶

name property ¶

params property ¶

identifier property ¶

seed = seed instance-attribute ¶

prediction_data_processor = PredictionDataProcessor() instance-attribute ¶

num_split property ¶

is_ready property ¶

is_sliding_window_setting property ¶

training_data property ¶

t_window property ¶

unlabeled_data property ¶

ground_truth_data property ¶

incremental_data property ¶

IS_BASE = False class-attribute instance-attribute ¶

n_seq_data = n_seq_data instance-attribute ¶

top_K = N instance-attribute ¶

get_params() ¶

split(data) ¶

restore(n=0) ¶

get_split_at(index) ¶

SingleTimePointSetting ¶

name property ¶

params property ¶

identifier property ¶

seed = seed instance-attribute ¶

prediction_data_processor = PredictionDataProcessor() instance-attribute ¶

num_split property ¶

is_ready property ¶

is_sliding_window_setting property ¶

training_data property ¶

t_window property ¶

unlabeled_data property ¶

ground_truth_data property ¶

incremental_data property ¶

IS_BASE = False class-attribute instance-attribute ¶

t = training_t instance-attribute ¶

t_upper = t_upper instance-attribute ¶

n_seq_data = n_seq_data instance-attribute ¶

top_K = top_K instance-attribute ¶

get_params() ¶

split(data) ¶

restore(n=0) ¶

get_split_at(index) ¶

SlidingWindowSetting ¶

name property ¶

params property ¶

identifier property ¶

seed = seed instance-attribute ¶

prediction_data_processor = PredictionDataProcessor() instance-attribute ¶

num_split property ¶

is_ready property ¶

is_sliding_window_setting property ¶

training_data property ¶

t_window property ¶

unlabeled_data property ¶

ground_truth_data property ¶

incremental_data property ¶

IS_BASE = False class-attribute instance-attribute ¶

t = training_t instance-attribute ¶

window_size = window_size instance-attribute ¶

n_seq_data = n_seq_data instance-attribute ¶

top_K = top_K instance-attribute ¶

t_upper = t_upper instance-attribute ¶

t_ground_truth_window = t_ground_truth_window instance-attribute ¶

get_params() ¶

split(data) ¶

restore(n=0) ¶

get_split_at(index) ¶

`LeaveNOutSetting` ¶

`name` `property` ¶

`params` `property` ¶

`identifier` `property` ¶

`seed = seed` `instance-attribute` ¶

`prediction_data_processor = PredictionDataProcessor()` `instance-attribute` ¶

`num_split` `property` ¶

`is_ready` `property` ¶

`is_sliding_window_setting` `property` ¶

`training_data` `property` ¶

`t_window` `property` ¶

`unlabeled_data` `property` ¶

`ground_truth_data` `property` ¶

`incremental_data` `property` ¶

`IS_BASE = False` `class-attribute` `instance-attribute` ¶

`n_seq_data = n_seq_data` `instance-attribute` ¶

`top_K = N` `instance-attribute` ¶

`get_params()` ¶

`split(data)` ¶

`restore(n=0)` ¶

`get_split_at(index)` ¶

`SingleTimePointSetting` ¶

`name` `property` ¶

`params` `property` ¶

`identifier` `property` ¶

`seed = seed` `instance-attribute` ¶

`prediction_data_processor = PredictionDataProcessor()` `instance-attribute` ¶

`num_split` `property` ¶

`is_ready` `property` ¶

`is_sliding_window_setting` `property` ¶

`training_data` `property` ¶

`t_window` `property` ¶

`unlabeled_data` `property` ¶

`ground_truth_data` `property` ¶

`incremental_data` `property` ¶

`IS_BASE = False` `class-attribute` `instance-attribute` ¶

`t = training_t` `instance-attribute` ¶

`t_upper = t_upper` `instance-attribute` ¶

`n_seq_data = n_seq_data` `instance-attribute` ¶

`top_K = top_K` `instance-attribute` ¶

`get_params()` ¶

`split(data)` ¶

`restore(n=0)` ¶

`get_split_at(index)` ¶

`SlidingWindowSetting` ¶

`name` `property` ¶

`params` `property` ¶

`identifier` `property` ¶

`seed = seed` `instance-attribute` ¶

`prediction_data_processor = PredictionDataProcessor()` `instance-attribute` ¶

`num_split` `property` ¶

`is_ready` `property` ¶

`is_sliding_window_setting` `property` ¶

`training_data` `property` ¶

`t_window` `property` ¶

`unlabeled_data` `property` ¶

`ground_truth_data` `property` ¶

`incremental_data` `property` ¶

`IS_BASE = False` `class-attribute` `instance-attribute` ¶

`t = training_t` `instance-attribute` ¶

`window_size = window_size` `instance-attribute` ¶

`n_seq_data = n_seq_data` `instance-attribute` ¶

`top_K = top_K` `instance-attribute` ¶

`t_upper = t_upper` `instance-attribute` ¶

`t_ground_truth_window = t_ground_truth_window` `instance-attribute` ¶

`get_params()` ¶

`split(data)` ¶

`restore(n=0)` ¶

`get_split_at(index)` ¶