streamsight.algorithms.ItemKNN

class streamsight.algorithms.ItemKNN(K=10)

Bases: Algorithm

Item K Nearest Neighbours model.

First described in ‘Item-based top-n recommendation algorithms.’ [DK04]

This code is adapted from RecPack [MVG22]

For each item the K most similar items are computed during fit. Similarity parameter decides how to compute the similarity between two items.

Cosine similarity between item i and j is computed as

\[sim(i,j) = \frac{X_i X_j}{||X_i||_2 ||X_j||_2}\]
Parameters:

K (int, optional) – How many neigbours to use per item, make sure to pick a value below the number of columns of the matrix to fit on. Defaults to 200

__init__(K=10)

Methods

__init__([K])

fit(X)

Fit the model to the input interaction matrix.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

predict(X)

Predicts scores, given the interactions in X

set_params(**params)

Set the parameters of the estimator.

Attributes

ITEM_USER_BASED

identifier

Identifier of the object.

name

Name of the object's class.

ITEM_USER_BASED: ItemUserBasedEnum = 'item'
_abc_impl = <_abc._abc_data object>
classmethod _build_request_for_signature(router, method)

Build the MethodMetadataRequest for a method using its signature.

This method takes all arguments from the method signature and uses None as their default request value, except X, y, Y, Xt, yt, *args, and **kwargs.

Parameters

routerMetadataRequest

The parent object for the created MethodMetadataRequest.

methodstr

The name of the method.

Returns

method_requestMethodMetadataRequest

The prepared request using the method’s signature.

_check_feature_names(X, *, reset)

Set or check the feature_names_in_ attribute.

Added in version 1.0.

Parameters

X{ndarray, dataframe} of shape (n_samples, n_features)

The input samples.

resetbool

Whether to reset the feature_names_in_ attribute. If False, the input will be checked for consistency with feature names of data provided when reset was last True. .. note:

It is recommended to call `reset=True` in `fit` and in the first
call to `partial_fit`. All other methods that validate `X`
should set `reset=False`.
_check_fit_complete()

Helper function to check if model was correctly fitted

Uses the sklearn check_is_fitted function, https://scikit-learn.org/stable/modules/generated/sklearn.utils.validation.check_is_fitted.html

_check_n_features(X, reset)

Set the n_features_in_ attribute, or check against it.

Parameters

X{ndarray, sparse matrix} of shape (n_samples, n_features)

The input samples.

resetbool

If True, the n_features_in_ attribute is set to X.shape[1]. If False and the attribute exists, then check that it is equal to X.shape[1]. If False and the attribute does not exist, then the check is skipped. .. note:

It is recommended to call reset=True in `fit` and in the first
call to `partial_fit`. All other methods that validate `X`
should set `reset=False`.
_fit(X: csr_matrix) None

Fit a cosine similarity matrix from item to item We assume that X is a binary matrix of shape (n_users, n_items)

classmethod _get_default_requests()

Collect default request values.

This method combines the information present in __metadata_request__* class attributes, as well as determining request keys from method signatures.

Generates a link to the API documentation for a given estimator.

This method generates the link to the estimator’s documentation page by using the template defined by the attribute _doc_link_template.

Returns

urlstr

The URL to the API documentation for this estimator. If the estimator does not belong to module _doc_link_module, the empty string (i.e. “”) is returned.

_get_metadata_request()

Get requested data properties.

Please check User Guide on how the routing mechanism works.

Returns

requestMetadataRequest

A MetadataRequest instance.

classmethod _get_param_names()

Get parameter names for the estimator

_get_tags()
_more_tags()
_pad_predict(X_pred: csr_matrix, intended_shape: tuple, to_predict_frame: DataFrame) csr_matrix

Pad the predictions with random items for users that are not in the training data.

Parameters:
  • X_pred (csr_matrix) – Predictions made by the algorithm

  • intended_shape (tuple) – The intended shape of the prediction matrix

  • to_predict_frame (pd.DataFrame) – DataFrame containing the user IDs to predict for

Returns:

The padded prediction matrix

Return type:

csr_matrix

_predict(X: csr_matrix, predict_frame: DataFrame | None = None) csr_matrix

Predict scores for nonzero users in X

Scores are computed by matrix multiplication of X with the stored similarity matrix.

Parameters:
  • X (csr_matrix) – csr_matrix with interactions

  • predict_frame (pd.DataFrame, optional) – DataFrame containing the user IDs to predict for

Returns:

csr_matrix with scores

Return type:

csr_matrix

property _repr_html_

HTML representation of estimator.

This is redundant with the logic of _repr_mimebundle_. The latter should be favorted in the long term, _repr_html_ is only implemented for consumers who do not interpret _repr_mimbundle_.

_repr_html_inner()

This function is returned by the @property _repr_html_ to make hasattr(estimator, “_repr_html_”) return `True or False depending on get_config()[“display”].

_repr_mimebundle_(**kwargs)

Mime bundle used by jupyter kernels to display estimator

_transform_fit_input(X: InteractionMatrix) csr_matrix

Transform the training data to expected type

Data will be turned into a binary csr matrix.

Parameters:

X (InteractionMatrix) – User-item interaction matrix to fit the model to

Returns:

Transformed user-item interaction matrix to fit the model

Return type:

csr_matrix

_transform_predict_input(X: InteractionMatrix) csr_matrix

Transform the input of predict to expected type

Data will be turned into a binary csr matrix.

Parameters:

X (InteractionMatrix) – User-item interaction matrix used as input to predict

Returns:

Transformed user-item interaction matrix used as input to predict

Return type:

csr_matrix

_validate_data(X='no_validation', y='no_validation', reset=True, validate_separately=False, cast_to_ndarray=True, **check_params)

Validate input data and set or check the n_features_in_ attribute.

Parameters

X{array-like, sparse matrix, dataframe} of shape (n_samples, n_features), default=’no validation’

The input samples. If ‘no_validation’, no validation is performed on X. This is useful for meta-estimator which can delegate input validation to their underlying estimator(s). In that case y must be passed and the only accepted check_params are multi_output and y_numeric.

yarray-like of shape (n_samples,), default=’no_validation’

The targets.

  • If None, check_array is called on X. If the estimator’s requires_y tag is True, then an error will be raised.

  • If ‘no_validation’, check_array is called on X and the estimator’s requires_y tag is ignored. This is a default placeholder and is never meant to be explicitly set. In that case X must be passed.

  • Otherwise, only y with _check_y or both X and y are checked with either check_array or check_X_y depending on validate_separately.

resetbool, default=True

Whether to reset the n_features_in_ attribute. If False, the input will be checked for consistency with data provided when reset was last True. .. note:

It is recommended to call reset=True in `fit` and in the first
call to `partial_fit`. All other methods that validate `X`
should set `reset=False`.
validate_separatelyFalse or tuple of dicts, default=False

Only used if y is not None. If False, call validate_X_y(). Else, it must be a tuple of kwargs to be used for calling check_array() on X and y respectively.

estimator=self is automatically added to these dicts to generate more informative error message in case of invalid input data.

cast_to_ndarraybool, default=True

Cast X and y to ndarray with checks in check_params. If False, X and y are unchanged and only feature_names_in_ and n_features_in_ are checked.

**check_paramskwargs

Parameters passed to sklearn.utils.check_array() or sklearn.utils.check_X_y(). Ignored if validate_separately is not False.

estimator=self is automatically added to these params to generate more informative error message in case of invalid input data.

Returns

out{ndarray, sparse matrix} or tuple of these

The validated input. A tuple is returned if both X and y are validated.

_validate_params()

Validate types and values of constructor parameters

The expected type and values must be defined in the _parameter_constraints class attribute, which is a dictionary param_name: list of constraints. See the docstring of validate_parameter_constraints for a description of the accepted constraints.

fit(X: InteractionMatrix) Algorithm

Fit the model to the input interaction matrix.

The input data is transformed to the expected type using _transform_fit_input(). The fitting is done using the _fit() method. Finally the method checks that the fitting was successful using _check_fit_complete().

Parameters:

X (InteractionMatrix) – The interactions to fit the model on.

Returns:

Fitted algorithm

Return type:

Algorithm

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns

routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

paramsdict

Parameter names mapped to their values.

property identifier: str

Identifier of the object.

Identifier is made by combining the class name with the parameters passed at construction time.

Constructed by recreating the initialisation call. Example: Algorithm(param_1=value)

Returns:

Identifier of the object

Return type:

str

property name: str

Name of the object’s class.

Returns:

Name of the object’s class

Return type:

str

predict(X: InteractionMatrix) csr_matrix

Predicts scores, given the interactions in X

The input data is transformed to the expected type using _transform_predict_input(). The predictions are made using the _predict() method. Finally the predictions are then padded with random items for users that are not in the training data.

Parameters:

X (InteractionMatrix) – interactions to predict from.

Returns:

The recommendation scores in a sparse matrix format.

Return type:

csr_matrix

set_params(**params)

Set the parameters of the estimator.

Parameters:

params (dict) – Estimator parameters