streamsight.metrics.PrecisionK

class streamsight.metrics.PrecisionK(K: int | None = 10, timestamp_limit: int | None = None, cache: bool = False)

Bases: ListwiseMetricK

Computes the fraction of top-K recommendations that correspond to true interactions.

Given the prediction and true interaction in binary representation, the matrix is multiplied elementwise. These will result in the true positives to be 1 and the false positives to be 0. The sum of the resulting true positives is then divided by the number of actual top-K interactions to get the precision on user level.

In simple terms, precision is the ratio of correctly predicted positive observations to the total predictions made.

Precision is computed per user as:

\[\text{Precision}(u) = \frac{\sum\limits_{i \in \text{Top-K}(u)} y^{true}_{u,i}}{K}\\]

ref: RecPack

Parameters:

K (int) – Size of the recommendation list consisting of the Top-K item predictions.

__init__(K: int | None = 10, timestamp_limit: int | None = None, cache: bool = False)

Methods

__init__([K, timestamp_limit, cache])

cache_values(y_true, y_pred)

Cache the values of y_true and y_pred for later use.

calculate(y_true, y_pred)

Computes metric given true labels y_true and predicted scores y_pred.

calculate_cached()

Calculate the metric using the cached values of y_true and y_pred.

get_params()

Get the parameters of the metric.

prepare_matrix(y_true, y_pred)

Prepare the matrices for the metric calculation.

Attributes

DEFAULT_K

col_names

The names of the columns in the results DataFrame.

identifier

Name of the metric.

macro_result

Global metric value obtained by taking the average over all users.

micro_result

User level results for the metric.

name

Name of the metric.

num_items

Dimension of the item-space in both y_true and y_pred

num_users

Dimension of the user-space in both y_true and y_pred after elimination of users without interactions in y_true.

params

Parameters of the metric.

timestamp_limit

The timestamp limit for the metric.

DEFAULT_K = 10
_calculate(y_true: csr_matrix, y_pred_top_K: csr_matrix) None

Computes metric given true labels y_true and predicted scores y_pred. Only Top-K recommendations are considered.

To be implemented in the child class.

Parameters:
  • y_true (csr_matrix) – Expected interactions per user.

  • y_pred_top_K (csr_matrix) – Ranks for topK recommendations per user

_eliminate_empty_users(y_true: csr_matrix, y_pred: csr_matrix) Tuple[csr_matrix, csr_matrix]

Eliminate users that have no interactions in y_true.

Users with no interactions in y_true are eliminated from the prediction matrix y_pred. This is done to avoid division by zero and to also reduce the computational overhead.

Parameters:
  • y_true (csr_matrix) – True user-item interactions.

  • y_pred (csr_matrix) – Predicted affinity of users for items.

Returns:

(y_true, y_pred), with zero users eliminated.

Return type:

Tuple[csr_matrix, csr_matrix]

_false_negative: int

Number of false negatives computed. Used for caching to obtain macro results.

_false_positive: int

Number of false positives computed. Used for caching to obtain macro results.

property _indices

Indices in the prediction matrix for which scores were computed.

_map_users(users)

Map internal identifiers of users to actual user identifiers.

_scores: csr_matrix | None
_set_shape(y_true: csr_matrix) None

Set the number of users and items in the metric.

The values of self._num_users and self._num_items are set to the number of users and items in y_true. This allows for the computation of the metric to be done in the correct shape.

Parameters:

y_true (csr_matrix) – Binary representation of user-item interactions.

_true_positive: int

Number of true positives computed. Used for caching to obtain macro results.

_value: float
_verify_shape(y_true: csr_matrix, y_pred: csr_matrix) bool

Make sure the dimensions of y_true and y_pred match.

Parameters:
  • y_true (csr_matrix) – True user-item interactions.

  • y_pred (csr_matrix) – Predicted affinity of users for items.

Raises:

AssertionError – Shape mismatch between y_true and y_pred.

Returns:

True if dimensions match.

Return type:

bool

_y_pred: csr_matrix
_y_true: csr_matrix
cache_values(y_true: csr_matrix, y_pred: csr_matrix) None

Cache the values of y_true and y_pred for later use.

Basic method to cache the values of y_true and y_pred for later use. This is useful when the metric can be calculated with the cumulative values of y_true and y_pred.

Note

This method should be over written in the child class if the metric cannot be calculated with the cumulative values of y_true and y_pred. For example, in the case of Precision@K, the metric default behavior is to obtain the top-K ranks of y_pred and and y_true, this will cause cumulative values to be possibly dropped.

Parameters:
  • y_true (csr_matrix) – True user-item interactions.

  • y_pred (csr_matrix) – Predicted affinity of users for items.

Raises:

ValueError – If caching is disabled for the metric.

Deprecated since version Caching: values for metric is no longer needed for core functionalities due to change in compute method.

calculate(y_true: csr_matrix, y_pred: csr_matrix) None

Computes metric given true labels y_true and predicted scores y_pred. Only Top-K recommendations are considered.

Detailed metric results can be retrieved with results. Global aggregate metric value is retrieved as value.

Parameters:
  • y_true (csr_matrix) – True user-item interactions.

  • y_pred (csr_matrix) – Predicted affinity of users for items.

calculate_cached()

Calculate the metric using the cached values of y_true and y_pred.

This method calculates the metric using the cached values of y_true and y_pred. calculate() will be called on the cached values.

Note

This method should be overwritten in the child class if the metric cannot be calculated with the cumulative values of y_true and y_pred.

Raises:

ValueError – If caching is disabled for the metric.

Deprecated since version Caching: values for metric is no longer needed for core functionalities due to change in compute method.

property col_names

The names of the columns in the results DataFrame.

get_params()

Get the parameters of the metric.

property identifier

Name of the metric.

property macro_result: float | None

Global metric value obtained by taking the average over all users.

Raises:

ValueError – If the metric has not been calculated yet.

Returns:

The global metric value.

Return type:

float, optional

property micro_result: dict[str, ndarray]

User level results for the metric.

Contains an entry for every user.

Returns:

The results DataFrame with columns: user_id, score

Return type:

pd.DataFrame

property name

Name of the metric.

property num_items: int

Dimension of the item-space in both y_true and y_pred

property num_users: int

Dimension of the user-space in both y_true and y_pred after elimination of users without interactions in y_true.

property params

Parameters of the metric.

prepare_matrix(y_true: csr_matrix, y_pred: csr_matrix) Tuple[csr_matrix, csr_matrix]

Prepare the matrices for the metric calculation.

This method is used to prepare the matrices for the metric calculation. It is used to eliminate empty users and to set the shape of the matrices.

Parameters:
  • y_true (csr_matrix) – True user-item interactions.

  • y_pred (csr_matrix) – Predicted affinity of users for items.

Returns:

Tuple of the prepared matrices.

Return type:

Tuple[csr_matrix, csr_matrix]

property timestamp_limit

The timestamp limit for the metric.