streamsight.algorithms.ItemKNN

class streamsight.algorithms.ItemKNN(K=10)

Bases: Algorithm

Item K Nearest Neighbours model.

First described in ‘Item-based top-n recommendation algorithms.’ [DK04]

This code is adapted from RecPack [MVG22]

For each item the K most similar items are computed during fit. Similarity parameter decides how to compute the similarity between two items.

Cosine similarity between item i and j is computed as

\[sim(i,j) = \frac{X_i X_j}{||X_i||_2 ||X_j||_2}\]
Parameters:

K (int, optional) – How many neigbours to use per item, make sure to pick a value below the number of columns of the matrix to fit on. Defaults to 200

__init__(K=10)

Methods

__init__([K])

fit(X)

Fit the model to the input interaction matrix.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

predict(X)

Predicts scores, given the interactions in X

set_params(**params)

Set the parameters of the estimator.

Attributes

ITEM_USER_BASED

identifier

Identifier of the object.

name

Name of the object's class.

ITEM_USER_BASED: ItemUserBasedEnum = 'item'
_abc_impl = <_abc._abc_data object>
classmethod _build_request_for_signature(router, method)

Build the MethodMetadataRequest for a method using its signature.

This method takes all arguments from the method signature and uses None as their default request value, except X, y, Y, Xt, yt, *args, and **kwargs.

Parameters

routerMetadataRequest

The parent object for the created MethodMetadataRequest.

methodstr

The name of the method.

Returns

method_requestMethodMetadataRequest

The prepared request using the method’s signature.

_check_feature_names(X, *, reset)

Set or check the feature_names_in_ attribute.

Added in version 1.0.

Parameters

X{ndarray, dataframe} of shape (n_samples, n_features)

The input samples.

resetbool

Whether to reset the feature_names_in_ attribute. If False, the input will be checked for consistency with feature names of data provided when reset was last True. .. note:

It is recommended to call `reset=True` in `fit` and in the first
call to `partial_fit`. All other methods that validate `X`
should set `reset=False`.
_check_fit_complete()

Helper function to check if model was correctly fitted

Uses the sklear check_is_fitted function, https://scikit-learn.org/stable/modules/generated/sklearn.utils.validation.check_is_fitted.html

_check_n_features(X, reset)

Set the n_features_in_ attribute, or check against it.

Parameters

X{ndarray, sparse matrix} of shape (n_samples, n_features)

The input samples.

resetbool

If True, the n_features_in_ attribute is set to X.shape[1]. If False and the attribute exists, then check that it is equal to X.shape[1]. If False and the attribute does not exist, then the check is skipped. .. note:

It is recommended to call reset=True in `fit` and in the first
call to `partial_fit`. All other methods that validate `X`
should set `reset=False`.
_fit(X: csr_matrix) None

Fit a cosine similarity matrix from item to item We assume that X is a binary matrix of shape (n_users, n_items)

classmethod _get_default_requests()

Collect default request values.

This method combines the information present in __metadata_request__* class attributes, as well as determining request keys from method signatures.

Generates a link to the API documentation for a given estimator.

This method generates the link to the estimator’s documentation page by using the template defined by the attribute _doc_link_template.

Returns

urlstr

The URL to the API documentation for this estimator. If the estimator does not belong to module _doc_link_module, the empty string (i.e. “”) is returned.

_get_metadata_request()

Get requested data properties.

Please check User Guide on how the routing mechanism works.

Returns

requestMetadataRequest

A MetadataRequest instance.

classmethod _get_param_names()

Get parameter names for the estimator

_get_tags()
_more_tags()
_pad_predict(X_pred: csr_matrix, intended_shape: tuple, to_predict_frame: DataFrame) csr_matrix

Pad the predictions with random items for users that are not in the training data.

Parameters:
  • X_pred (csr_matrix) – Predictions made by the algorithm

  • intended_shape (tuple) – The intended shape of the prediction matrix

  • to_predict_frame (pd.DataFrame) – DataFrame containing the user IDs to predict for

Returns:

The padded prediction matrix

Return type:

csr_matrix

_predict(X: csr_matrix, predict_frame: DataFrame | None = None) csr_matrix

Predict scores for nonzero users in X

Scores are computed by matrix multiplication of X with the stored similarity matrix.

Parameters:
  • X (csr_matrix) – csr_matrix with interactions

  • predict_frame (pd.DataFrame, optional) – DataFrame containing the user IDs to predict for

Returns:

csr_matrix with scores

Return type:

csr_matrix

property _repr_html_

HTML representation of estimator.

This is redundant with the logic of _repr_mimebundle_. The latter should be favorted in the long term, _repr_html_ is only implemented for consumers who do not interpret _repr_mimbundle_.

_repr_html_inner()

This function is returned by the @property _repr_html_ to make hasattr(estimator, “_repr_html_”) return `True or False depending on get_config()[“display”].

_repr_mimebundle_(**kwargs)

Mime bundle used by jupyter kernels to display estimator

_transform_fit_input(X: InteractionMatrix) csr_matrix

Transform the training data to expected type

Data will be turned into a binary csr matrix.

Parameters:

X (InteractionMatrix) – User-item interaction matrix to fit the model to

Returns:

Transformed user-item interaction matrix to fit the model

Return type:

csr_matrix

_transform_predict_input(X: InteractionMatrix) csr_matrix

Transform the input of predict to expected type

Data will be turned into a binary csr matrix.

Parameters:

X (InteractionMatrix) – User-item interaction matrix used as input to predict

Returns:

Transformed user-item interaction matrix used as input to predict

Return type:

csr_matrix

_validate_data(X='no_validation', y='no_validation', reset=True, validate_separately=False, cast_to_ndarray=True, **check_params)

Validate input data and set or check the n_features_in_ attribute.

Parameters

X{array-like, sparse matrix, dataframe} of shape (n_samples, n_features), default=’no validation’

The input samples. If ‘no_validation’, no validation is performed on X. This is useful for meta-estimator which can delegate input validation to their underlying estimator(s). In that case y must be passed and the only accepted check_params are multi_output and y_numeric.

yarray-like of shape (n_samples,), default=’no_validation’

The targets.

  • If None, check_array is called on X. If the estimator’s requires_y tag is True, then an error will be raised.

  • If ‘no_validation’, check_array is called on X and the estimator’s requires_y tag is ignored. This is a default placeholder and is never meant to be explicitly set. In that case X must be passed.

  • Otherwise, only y with _check_y or both X and y are checked with either check_array or check_X_y depending on validate_separately.

resetbool, default=True

Whether to reset the n_features_in_ attribute. If False, the input will be checked for consistency with data provided when reset was last True. .. note:

It is recommended to call reset=True in `fit` and in the first
call to `partial_fit`. All other methods that validate `X`
should set `reset=False`.
validate_separatelyFalse or tuple of dicts, default=False

Only used if y is not None. If False, call validate_X_y(). Else, it must be a tuple of kwargs to be used for calling check_array() on X and y respectively.

estimator=self is automatically added to these dicts to generate more informative error message in case of invalid input data.

cast_to_ndarraybool, default=True

Cast X and y to ndarray with checks in check_params. If False, X and y are unchanged and only feature_names_in_ and n_features_in_ are checked.

**check_paramskwargs

Parameters passed to sklearn.utils.check_array() or sklearn.utils.check_X_y(). Ignored if validate_separately is not False.

estimator=self is automatically added to these params to generate more informative error message in case of invalid input data.

Returns

out{ndarray, sparse matrix} or tuple of these

The validated input. A tuple is returned if both X and y are validated.

_validate_params()

Validate types and values of constructor parameters

The expected type and values must be defined in the _parameter_constraints class attribute, which is a dictionary param_name: list of constraints. See the docstring of validate_parameter_constraints for a description of the accepted constraints.

fit(X: InteractionMatrix) Algorithm

Fit the model to the input interaction matrix.

The input data is transformed to the expected type using _transform_fit_input(). The fitting is done using the _fit() method. Finally the method checks that the fitting was successful using _check_fit_complete().

Parameters:

X (InteractionMatrix) – The interactions to fit the model on.

Returns:

Fitted algorithm

Return type:

Algorithm

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns

routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

paramsdict

Parameter names mapped to their values.

property identifier

Identifier of the object.

Identifier is made by combining the class name with the parameters passed at construction time.

Constructed by recreating the initialisation call. Example: Algorithm(param_1=value)

property name

Name of the object’s class.

predict(X: InteractionMatrix) csr_matrix

Predicts scores, given the interactions in X

The input data is transformed to the expected type using _transform_predict_input(). The predictions are made using the _predict() method. Finally the predictions are then padded with random items for users that are not in the training data.

Parameters:

X (InteractionMatrix) – interactions to predict from.

Returns:

The recommendation scores in a sparse matrix format.

Return type:

csr_matrix

set_params(**params)

Set the parameters of the estimator.

Parameters:

params (dict) – Estimator parameters