Dataset splitting functions.

lightfm.cross_validation.random_train_test_split(interactions, test_percentage=0.2, random_state=None)[source]

Randomly split interactions between training and testing.

This function takes an interaction set and splits it into two disjoint sets, a training set and a test set. Note that no effort is made to make sure that all items and users with interactions in the test set also have interactions in the training set; this may lead to a partial cold-start problem in the test set. To split a sample_weight matrix along the same lines, pass it into this function with the same random_state seed as was used for splitting the interactions.

  • interactions (a scipy sparse matrix containing interactions) – The interactions to split.

  • test_percentage (float, optional) – The fraction of interactions to place in the test set.

  • random_state (int or numpy.random.RandomState, optional) – Random seed used to initialize the numpy.random.RandomState number generator. Accepts an instance of numpy.random.RandomState for backwards compatibility.


(train, test) – scipy.sparse.COOMatrix) A tuple of (train data, test data)

Return type