Dataset construction
- class lightfm.data.Dataset(user_identity_features=True, item_identity_features=True)[source]
Bases:
object
Tool for building interaction and feature matrices, taking care of the mapping between user/item ids and feature names and internal feature indices.
To create a dataset: - Create an instance of the Dataset class. - Call fit (or fit_partial), supplying user/item ids and feature names
that you want to use in your model. This will create internal mappings that translate the ids and feature names to internal indices used by the LightFM model.
Call build_interactions with an iterable of (user id, item id) or (user id, item id, weight) to build an interactions and weights matrix.
Call build_user/item_features with iterables of (user/item id, [features]) or (user/item id, {feature: feature weight}) to build feature matrices.
To add new user/item ids or features, call fit_partial again. You will need to resize your LightFM model to be able to use the new features.
- Parameters
user_identity_features (bool, optional) – Create a unique feature for every user in addition to other features. If true (default), a latent vector will be allocated for every user. This is a reasonable default for most applications, but should be set to false if there is very little data for every user. For more details see the Notes in LightFM.
item_identity_features (bool, optional) – Create a unique feature for every item in addition to other features. If true (default), a latent vector will be allocated for every item. This is a reasonable default for most applications, but should be set to false if there is very little data for every item. For more details see the Notes in LightFM.
- build_interactions(data)[source]
Build an interaction matrix.
Two matrices will be returned: a (num_users, num_items) COO matrix with interactions, and a (num_users, num_items) matrix with the corresponding interaction weights.
- Parameters
data (iterable of (user_id, item_id) or (user_id, item_id, weight)) – An iterable of interactions. The user and item ids will be translated to internal model indices using the mappings constructed during the fit call. If weights are not provided they will be assumed to be 1.0.
- Returns
(interactions, weights) – Two COO matrices: the interactions matrix and the corresponding weights matrix.
- Return type
COO matrix, COO matrix
- build_item_features(data, normalize=True)[source]
Build a item features matrix out of an iterable of the form (item id, [list of feature names]) or (item id, {feature name: feature weight}).
- Parameters
data (iterable of the form) – (item id, [list of feature names]) or (item id, {feature name: feature weight}). Item and feature ids will be translated to internal indices constructed during the fit call.
normalize (bool, optional) – If true, will ensure that feature weights sum to 1 in every row.
- Returns
feature matrix – Matrix of item features.
- Return type
CSR matrix (num items, num features)
- build_user_features(data, normalize=True)[source]
Build a user features matrix out of an iterable of the form (user id, [list of feature names]) or (user id, {feature name: feature weight}).
- Parameters
data (iterable of the form) – (user id, [list of feature names]) or (user id, {feature name: feature weight}). User and feature ids will be translated to internal indices constructed during the fit call.
normalize (bool, optional) – If true, will ensure that feature weights sum to 1 in every row.
- Returns
feature matrix – Matrix of user features.
- Return type
CSR matrix (num users, num features)
- fit(users, items, user_features=None, item_features=None)[source]
Fit the user/item id and feature name mappings.
Calling fit the second time will reset existing mappings.
- Parameters
users (iterable of user ids) –
items (iterable of item ids) –
user_features (iterable of user features, optional) –
item_features (iterable of item features, optional) –
- fit_partial(users=None, items=None, user_features=None, item_features=None)[source]
Fit the user/item id and feature name mappings.
Calling fit the second time will add new entries to existing mappings.
- Parameters
users (iterable of user ids, optional) –
items (iterable of item ids, optional) –
user_features (iterable of user features, optional) –
item_features (iterable of item features, optional) –
- item_features_shape()[source]
Return the shape of the item features matrix.
- Returns
(num item ids, num item features) – The shape.
- Return type
tuple of ints
- mapping()[source]
Return the constructed mappings.
Invert these to map internal indices to external ids.
- Returns
(user id map, user feature map, item id map, item feature map)
- Return type
tuple of dictionaries