streamsight.datasets.MovieLens100K

class streamsight.datasets.MovieLens100K(filename: str | None = None, base_path: str | None = None, use_default_filters=False)

Bases: MovieLensDataset

__init__(filename: str | None = None, base_path: str | None = None, use_default_filters=False)

Methods

__init__([filename, base_path, ...])

add_filter(filter)

Add a filter to be applied when loading the data.

fetch_dataset([force])

Check if dataset is present, if not download

load([apply_filters])

Loads data into an InteractionMatrix object.

Attributes

DATASETURL

DEFAULT_BASE_PATH

Default base path where the dataset will be stored.

DEFAULT_FILENAME

Default filename that will be used if it is not specified by the user.

ITEM_IX

Name of the column in the DataFrame that contains item identifiers.

RATING_IX

Name of the column in the DataFrame that contains the rating a user gave to the item.

REMOTE_FILENAME

Name of the file containing user ratings on the MovieLens server.

REMOTE_ZIPNAME

Name of the zip-file on the MovieLens server.

TIMESTAMP_IX

Name of the column in the DataFrame that contains time of interaction in seconds since epoch.

USER_IX

Name of the column in the DataFrame that contains user identifiers.

file_path

File path of the dataset.

name

Name of the object's class.

DATASETURL = 'http://files.grouplens.org/datasets/movielens'
DEFAULT_BASE_PATH = 'data'

Default base path where the dataset will be stored.

property DEFAULT_FILENAME: str

Default filename that will be used if it is not specified by the user.

ITEM_IX = 'movieId'

Name of the column in the DataFrame that contains item identifiers.

RATING_IX = 'rating'

Name of the column in the DataFrame that contains the rating a user gave to the item.

REMOTE_FILENAME = 'u.data'

Name of the file containing user ratings on the MovieLens server.

REMOTE_ZIPNAME = 'ml-100k'

Name of the zip-file on the MovieLens server.

TIMESTAMP_IX = 'timestamp'

Name of the column in the DataFrame that contains time of interaction in seconds since epoch.

USER_IX = 'userId'

Name of the column in the DataFrame that contains user identifiers.

_abc_impl = <_abc._abc_data object>
_check_safe()

Check if the directory is safe. If directory does not exit, create it.

_dataframe_to_matrix(df: DataFrame) InteractionMatrix

Converts a DataFrame to an InteractionMatrix.

Parameters:

df (pd.DataFrame) – DataFrame to convert

Returns:

InteractionMatrix object

Return type:

InteractionMatrix

property _default_filters: List[Filter]

The default filters for all datasets

Concrete classes can override this property to add more filters.

Returns:

List of filters to be applied to the dataset

Return type:

List[Filter]

_download_dataset()

Downloads the dataset.

Downloads the zipfile, and extracts the ratings file to self.file_path

_fetch_remote(url: str, filename: str) str

Fetch data from remote url and save locally

Parameters:
  • url (str) – url to fetch data from

  • filename (str) – Path to save file to

Returns:

The filename where data was saved

Return type:

str

_load_dataframe() DataFrame

Load the raw dataset from file, and return it as a pandas DataFrame.

Warning

This does not apply any preprocessing, and returns the raw dataset.

Returns:

Interation with minimal columns of {user, item, timestamp}.

Return type:

pd.DataFrame

add_filter(filter: Filter)

Add a filter to be applied when loading the data.

Utilize DataFramePreprocessor class to add filters to the dataset to load. The filter will be applied when the data is loaded into an InteractionMatrix object when load() is called.

Parameters:

filter (Filter) – Filter to be applied to the loaded DataFrame processing to interaction matrix.

fetch_dataset(force=False) None

Check if dataset is present, if not download

Parameters:

force (bool, optional) – If True, dataset will be downloaded, even if the file already exists. Defaults to False.

property file_path: str

File path of the dataset.

load(apply_filters=True) InteractionMatrix

Loads data into an InteractionMatrix object.

Data is loaded into a DataFrame using the _load_dataframe() function. Resulting DataFrame is parsed into an InteractionMatrix object. If apply_filters is set to True, the filters set will be applied to the dataset and mapping of user and item ids will be done. This is advised even if there is no filter set, as it will ensure that the user and item ids are incrementing in the order of time.

Parameters:

apply_filters (bool, optional) – To apply the filters set and preprocessing, defaults to True

Returns:

Resulting interaction matrix

Return type:

InteractionMatrix

property name

Name of the object’s class.