Skip to content

utils

get_top_K_ranks(X, K=None)

Returns a matrix of ranks assigned to the largest K values in X.

Selects K largest values for every row in X and assigns a rank to each.

:param X: Matrix from which we will select K values in every row. :type X: csr_matrix :param K: Amount of values to select. :type K: int, optional :return: Matrix with K values per row. :rtype: csr_matrix

Source code in src/streamsight/algorithms/utils.py
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
def get_top_K_ranks(X: csr_matrix, K: None | int = None) -> csr_matrix:
    """Returns a matrix of ranks assigned to the largest K values in X.

    Selects K largest values for every row in X and assigns a rank to each.

    :param X: Matrix from which we will select K values in every row.
    :type X: csr_matrix
    :param K: Amount of values to select.
    :type K: int, optional
    :return: Matrix with K values per row.
    :rtype: csr_matrix
    """
    U, I, V = [], [], []
    for row_ix, (le, ri) in enumerate(zip(X.indptr[:-1], X.indptr[1:])):
        K_row_pick = min(K, ri - le) if K is not None else ri - le

        if K_row_pick != 0:
            top_k_row = X.indices[le + np.argpartition(X.data[le:ri], list(range(-K_row_pick, 0)))[-K_row_pick:]]

            for rank, col_ix in enumerate(reversed(top_k_row)):
                U.append(row_ix)
                I.append(col_ix)
                V.append(rank + 1)
    # data, (row, col) = (V, (U, I)
    X_top_K = csr_matrix((V, (U, I)), shape=X.shape)

    return X_top_K

get_top_K_values(X, K=None)

Returns a matrix of only the K largest values for every row in X.

Selects the top-K items for every user (which is equal to the K nearest neighbours.) In case of a tie for the last position, the item with the largest index of the tied items is used.

:param X: Matrix from which we will select K values in every row. :type X: csr_matrix :param K: Amount of values to select. :type K: int, optional :return: Matrix with K values per row. :rtype: csr_matrix

Source code in src/streamsight/algorithms/utils.py
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
def get_top_K_values(X: csr_matrix, K: None | int = None) -> csr_matrix:
    """Returns a matrix of only the K largest values for every row in X.

    Selects the top-K items for every user (which is equal to the K nearest neighbours.)
    In case of a tie for the last position, the item with the largest index of the tied items is used.

    :param X: Matrix from which we will select K values in every row.
    :type X: csr_matrix
    :param K: Amount of values to select.
    :type K: int, optional
    :return: Matrix with K values per row.
    :rtype: csr_matrix
    """
    top_K_ranks = get_top_K_ranks(X, K)
    # Convert the position into binary values (1 if in top K, 0 otherwise)
    top_K_ranks[top_K_ranks > 0] = 1
    # elementwise multiplication with orignal matrix to get values
    return top_K_ranks.multiply(X)