Skip to content

n_last_timestamp

logger = logging.getLogger(__name__) module-attribute

NLastInteractionTimestampSplitter

Bases: TimestampSplitter

Splits with n last interactions based on a timestamp.

Splits the data into unlabeled and ground truth data based on a timestamp. Historical data contains last n_seq_data interactions before the timestamp t and the future interaction contains interactions after the timestamp t.

Attributes:

Name Type Description
past_interaction

List of unlabeled data. Interval is [0, t).

future_interaction

Data used for training the model. Interval is [t, t+t_upper) or [t,inf].

n_seq_data

Number of last interactions to provide as data for model to make prediction. These interactions are past interactions from before the timestamp t.

Parameters:

Name Type Description Default
t int

Timestamp to split on in seconds since epoch.

required
t_upper None | int

Seconds past t. Upper bound on the timestamp of interactions. Defaults to None (infinity).

None
n_seq_data int

Number of last interactions to provide as data for model to make prediction. Defaults to 1.

1
include_all_past_data bool

If True, include all past data in the past_interaction. Defaults to False.

False
Source code in src/recnexteval/settings/splitters/n_last_timestamp.py
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
class NLastInteractionTimestampSplitter(TimestampSplitter):
    """Splits with n last interactions based on a timestamp.

    Splits the data into unlabeled and ground truth data based on a timestamp.
    Historical data contains last `n_seq_data` interactions before the timestamp `t`
    and the future interaction contains interactions after the timestamp `t`.


    Attributes:
        past_interaction: List of unlabeled data. Interval is `[0, t)`.
        future_interaction: Data used for training the model.
            Interval is `[t, t+t_upper)` or `[t,inf]`.
        n_seq_data: Number of last interactions to provide as data for model to make prediction.
            These interactions are past interactions from before the timestamp `t`.

    Args:
        t: Timestamp to split on in seconds since epoch.
        t_upper: Seconds past t. Upper bound on the timestamp
            of interactions. Defaults to None (infinity).
        n_seq_data: Number of last interactions to provide as data
            for model to make prediction. Defaults to 1.
        include_all_past_data: If True, include all past data in the past_interaction.
            Defaults to False.
    """

    def __init__(
        self,
        t: int,
        t_upper: None | int = None,
        n_seq_data: int = 1,
        include_all_past_data: bool = False,
    ) -> None:
        super().__init__(t=t, t_lower=None, t_upper=t_upper)
        self.n_seq_data = n_seq_data
        self.include_all_past_data = include_all_past_data

    def update_split_point(self, t: int) -> None:
        logger.debug(f"{self.identifier} - Updating split point to t={t}")
        self.t = t

    def split(self, data: InteractionMatrix) -> tuple[InteractionMatrix, InteractionMatrix]:
        """Splits data such that the following definition holds:

        - past_interaction: List of unlabeled data. Interval is `[0, t)`.
        - future_interaction: Data used for training the model.
            Interval is `[t, t+t_upper)` or `[t,inf]`.

        Args:
            data: Interaction matrix to be split. Must contain timestamps.

        Returns:
            A 2-tuple containing the `past_interaction` and `future_interaction` matrices.
        """
        if self.t_upper is None:
            future_interaction = data.timestamps_gte(timestamp=self.t)
        else:
            future_interaction = data.timestamps_lt(timestamp=self.t + self.t_upper).timestamps_gte(timestamp=self.t)

        if self.include_all_past_data:
            past_interaction = data.timestamps_lt(timestamp=self.t)
        else:
            past_interaction = data.get_users_n_last_interaction(
                n_seq_data=self.n_seq_data, t_upper=self.t, user_in=future_interaction.user_ids
            )

        logger.debug(f"{self.identifier} has complete split")
        return past_interaction, future_interaction

n_seq_data = n_seq_data instance-attribute

include_all_past_data = include_all_past_data instance-attribute

name property

Return the class name of the splitter.

Returns:

Type Description
str

The splitter class name.

identifier property

Return a string identifier including the splitter's parameters.

The identifier includes the class name and a comma-separated list of attribute name/value pairs from self.__dict__.

Returns:

Type Description
str

Identifier string like Name(k1=v1,k2=v2).

t = t instance-attribute

t_lower = t_lower instance-attribute

t_upper = t_upper instance-attribute

update_split_point(t)

Source code in src/recnexteval/settings/splitters/n_last_timestamp.py
46
47
48
def update_split_point(self, t: int) -> None:
    logger.debug(f"{self.identifier} - Updating split point to t={t}")
    self.t = t

split(data)

Splits data such that the following definition holds:

  • past_interaction: List of unlabeled data. Interval is [0, t).
  • future_interaction: Data used for training the model. Interval is [t, t+t_upper) or [t,inf].

Parameters:

Name Type Description Default
data InteractionMatrix

Interaction matrix to be split. Must contain timestamps.

required

Returns:

Type Description
tuple[InteractionMatrix, InteractionMatrix]

A 2-tuple containing the past_interaction and future_interaction matrices.

Source code in src/recnexteval/settings/splitters/n_last_timestamp.py
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
def split(self, data: InteractionMatrix) -> tuple[InteractionMatrix, InteractionMatrix]:
    """Splits data such that the following definition holds:

    - past_interaction: List of unlabeled data. Interval is `[0, t)`.
    - future_interaction: Data used for training the model.
        Interval is `[t, t+t_upper)` or `[t,inf]`.

    Args:
        data: Interaction matrix to be split. Must contain timestamps.

    Returns:
        A 2-tuple containing the `past_interaction` and `future_interaction` matrices.
    """
    if self.t_upper is None:
        future_interaction = data.timestamps_gte(timestamp=self.t)
    else:
        future_interaction = data.timestamps_lt(timestamp=self.t + self.t_upper).timestamps_gte(timestamp=self.t)

    if self.include_all_past_data:
        past_interaction = data.timestamps_lt(timestamp=self.t)
    else:
        past_interaction = data.get_users_n_last_interaction(
            n_seq_data=self.n_seq_data, t_upper=self.t, user_in=future_interaction.user_ids
        )

    logger.debug(f"{self.identifier} has complete split")
    return past_interaction, future_interaction