Skip to content

movielens

MovieLensDatasetConfig dataclass

Bases: DatasetConfig

MovieLens base configuration.

Source code in src/recnexteval/datasets/config/movielens.py
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
@dataclass
class MovieLensDatasetConfig(DatasetConfig):
    """MovieLens base configuration."""

    user_ix: str = "userId"
    item_ix: str = "movieId"
    timestamp_ix: str = "timestamp"
    rating_ix: str = "rating"
    """Name of the column in the DataFrame that contains the rating a user gave to the item."""
    dataset_url: str = "https://files.grouplens.org/datasets/movielens"
    remote_zipname: str = "ml-100k"
    """Name of the zip-file on the MovieLens server."""
    remote_filename: str = "ratings.csv"
    """Name of the file containing user ratings on the MovieLens server."""
    default_base_path: str = DatasetConfig.default_base_path + "/movielens"

user_ix = 'userId' class-attribute instance-attribute

item_ix = 'movieId' class-attribute instance-attribute

timestamp_ix = 'timestamp' class-attribute instance-attribute

rating_ix = 'rating' class-attribute instance-attribute

Name of the column in the DataFrame that contains the rating a user gave to the item.

dataset_url = 'https://files.grouplens.org/datasets/movielens' class-attribute instance-attribute

remote_zipname = 'ml-100k' class-attribute instance-attribute

Name of the zip-file on the MovieLens server.

remote_filename = 'ratings.csv' class-attribute instance-attribute

Name of the file containing user ratings on the MovieLens server.

default_base_path = DatasetConfig.default_base_path + '/movielens' class-attribute instance-attribute

default_filename property

Derived filename from remote components.

MovieLens100KDatasetConfig dataclass

Bases: MovieLensDatasetConfig

MovieLens 100K specific configuration.

Source code in src/recnexteval/datasets/config/movielens.py
26
27
28
29
30
@dataclass
class MovieLens100KDatasetConfig(MovieLensDatasetConfig):
    """MovieLens 100K specific configuration."""

    remote_filename: str = "u.data"

remote_filename = 'u.data' class-attribute instance-attribute

user_ix = 'userId' class-attribute instance-attribute

item_ix = 'movieId' class-attribute instance-attribute

timestamp_ix = 'timestamp' class-attribute instance-attribute

dataset_url = 'https://files.grouplens.org/datasets/movielens' class-attribute instance-attribute

default_base_path = DatasetConfig.default_base_path + '/movielens' class-attribute instance-attribute

remote_zipname = 'ml-100k' class-attribute instance-attribute

Name of the zip-file on the MovieLens server.

default_filename property

Derived filename from remote components.

rating_ix = 'rating' class-attribute instance-attribute

Name of the column in the DataFrame that contains the rating a user gave to the item.

MovieLens100kUserMetadataConfig dataclass

Bases: MetadataConfig, MovieLensDatasetConfig

MovieLens 100K User Metadata Configuration.

Handles configuration for user demographic data: - User ID mapping - Age information - Gender information - Occupation information - Zipcode information

All properties are computed from base fields to ensure consistency.

Source code in src/recnexteval/datasets/config/movielens.py
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
@dataclass
class MovieLens100kUserMetadataConfig(MetadataConfig, MovieLensDatasetConfig):
    """
    MovieLens 100K User Metadata Configuration.

    Handles configuration for user demographic data:
    - User ID mapping
    - Age information
    - Gender information
    - Occupation information
    - Zipcode information

    All properties are computed from base fields to ensure consistency.
    """
    user_ix: str = "userId"
    """Name of the column containing user identifiers."""
    age_ix: str = "age"
    """Name of the column containing user age."""
    gender_ix: str = "gender"
    """Name of the column containing user gender."""
    occupation_ix: str = "occupation"
    """Name of the column containing user occupation."""
    zipcode_ix: str = "zipcode"
    """Name of the column containing user zipcode."""

    remote_filename: str = "u.user"
    """Filename of user metadata file in remote zip."""
    remote_zipname: str = "ml-100k"
    """Name of the zip file on remote server."""
    dataset_url: str = "https://files.grouplens.org/datasets/movielens"
    """URL to fetch the metadata from."""

    @property
    def column_names(self) -> list[str]:
        return [
            self.user_ix,
            self.age_ix,
            self.gender_ix,
            self.occupation_ix,
            self.zipcode_ix,
        ]

    @property
    def dtype_dict(self) -> dict:
        return {
            self.age_ix: np.int64,
            self.gender_ix: str,
            self.occupation_ix: str,
            self.zipcode_ix: str,
        }

user_ix = 'userId' class-attribute instance-attribute

Name of the column containing user identifiers.

age_ix = 'age' class-attribute instance-attribute

Name of the column containing user age.

gender_ix = 'gender' class-attribute instance-attribute

Name of the column containing user gender.

occupation_ix = 'occupation' class-attribute instance-attribute

Name of the column containing user occupation.

zipcode_ix = 'zipcode' class-attribute instance-attribute

Name of the column containing user zipcode.

remote_filename = 'u.user' class-attribute instance-attribute

Filename of user metadata file in remote zip.

remote_zipname = 'ml-100k' class-attribute instance-attribute

Name of the zip file on remote server.

dataset_url = 'https://files.grouplens.org/datasets/movielens' class-attribute instance-attribute

URL to fetch the metadata from.

column_names property

dtype_dict property

item_ix = 'movieId' class-attribute instance-attribute

timestamp_ix = 'timestamp' class-attribute instance-attribute

default_base_path = DatasetConfig.default_base_path + '/movielens' class-attribute instance-attribute

default_filename property

Derived filename from remote components.

rating_ix = 'rating' class-attribute instance-attribute

Name of the column in the DataFrame that contains the rating a user gave to the item.

sep = '|' class-attribute instance-attribute

Column separator in the data file.

MovieLens100kItemMetadataConfig dataclass

Bases: MetadataConfig, MovieLensDatasetConfig

MovieLens 100K Item Metadata Configuration.

Handles configuration for movie metadata including: - Movie ID mapping - Title, release date, IMDB URL - 19 binary genre indicator columns

All properties are computed from base fields to ensure consistency.

Source code in src/recnexteval/datasets/config/movielens.py
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
@dataclass
class MovieLens100kItemMetadataConfig(MetadataConfig, MovieLensDatasetConfig):
    """
    MovieLens 100K Item Metadata Configuration.

    Handles configuration for movie metadata including:
    - Movie ID mapping
    - Title, release date, IMDB URL
    - 19 binary genre indicator columns

    All properties are computed from base fields to ensure consistency.
    """

    item_ix: str = "movieId"
    """Name of the column containing movie identifiers."""
    title_ix: str = "title"
    """Name of the column containing movie title."""
    release_date_ix: str = "releaseDate"
    """Name of the column containing movie release date."""
    video_release_date_ix: str = "videoReleaseDate"
    """Name of the column containing video release date."""
    imdb_url_ix: str = "imdbUrl"
    """Name of the column containing IMDB URL."""
    genres: tuple[str, ...] = (
        "unknown",
        "action",
        "adventure",
        "animation",
        "children",
        "comedy",
        "crime",
        "documentary",
        "drama",
        "fantasy",
        "filmNoir",
        "horror",
        "musical",
        "mystery",
        "romance",
        "sciFi",
        "thriller",
        "war",
        "western",
    )
    """Tuple of 19 genre names in canonical order."""

    remote_filename: str = "u.item"
    remote_zipname: str = "ml-100k"
    dataset_url: str = "https://files.grouplens.org/datasets/movielens"
    encoding: str = "ISO-8859-1"
    """File encoding (ISO-8859-1 needed for special characters)."""

    @property
    def non_genre_columns(self) -> list[str]:
        """
        Column names for non-genre metadata.

        Returns:
            list[str]: [movie_id, title, release_date, video_release_date, imdb_url]

        Example:
            ["movieId", "title", "releaseDate", "videoReleaseDate", "imdbUrl"]
        """
        return [
            self.item_ix,
            self.title_ix,
            self.release_date_ix,
            self.video_release_date_ix,
            self.imdb_url_ix,
        ]

    @property
    def column_names(self) -> list[str]:
        return self.non_genre_columns + list(self.genres)

    @property
    def dtype_dict(self) -> dict:
        dtype_dict: dict[str, Any] = {
            self.title_ix: str,
            self.release_date_ix: str,
            self.video_release_date_ix: str,
            self.imdb_url_ix: str,
        }
        dtype_dict.update({genre: np.int64 for genre in self.genres})
        return dtype_dict

item_ix = 'movieId' class-attribute instance-attribute

Name of the column containing movie identifiers.

title_ix = 'title' class-attribute instance-attribute

Name of the column containing movie title.

release_date_ix = 'releaseDate' class-attribute instance-attribute

Name of the column containing movie release date.

video_release_date_ix = 'videoReleaseDate' class-attribute instance-attribute

Name of the column containing video release date.

imdb_url_ix = 'imdbUrl' class-attribute instance-attribute

Name of the column containing IMDB URL.

genres = ('unknown', 'action', 'adventure', 'animation', 'children', 'comedy', 'crime', 'documentary', 'drama', 'fantasy', 'filmNoir', 'horror', 'musical', 'mystery', 'romance', 'sciFi', 'thriller', 'war', 'western') class-attribute instance-attribute

Tuple of 19 genre names in canonical order.

remote_filename = 'u.item' class-attribute instance-attribute

remote_zipname = 'ml-100k' class-attribute instance-attribute

dataset_url = 'https://files.grouplens.org/datasets/movielens' class-attribute instance-attribute

encoding = 'ISO-8859-1' class-attribute instance-attribute

File encoding (ISO-8859-1 needed for special characters).

non_genre_columns property

Column names for non-genre metadata.

Returns:

Type Description
list[str]

list[str]: [movie_id, title, release_date, video_release_date, imdb_url]

Example

["movieId", "title", "releaseDate", "videoReleaseDate", "imdbUrl"]

column_names property

dtype_dict property

user_ix = 'userId' class-attribute instance-attribute

timestamp_ix = 'timestamp' class-attribute instance-attribute

default_base_path = DatasetConfig.default_base_path + '/movielens' class-attribute instance-attribute

default_filename property

Derived filename from remote components.

rating_ix = 'rating' class-attribute instance-attribute

Name of the column in the DataFrame that contains the rating a user gave to the item.

sep = '|' class-attribute instance-attribute

Column separator in the data file.