generateModel()

Description

generateModel() is a function used to train Machine Learning models to predict the phases of molecules based on the input systems. The models will be saved in a file unless told otherwise, and will also output the models in a variable that can be used directly.

Argument, keywords and outputs

Input(s) / Argument(s)

Name Flag  Type Description
Systems    list of classes System List of each instances of the System class that should be used for the training. Systems can be generated using the function openSystem(). Provide one system only per phase. The types, number of molecules and neighbor ranking should be the same in each system. By default, the first system of the list should be the gel one, and the second one the fluid one. 
Phases phases= list of str (Opt.) Names of the phase of each system submitted above. The order should be the same. Default is gel/fluid.
File Path file_path= str (Opt.) Path and name of the file to generate. File extension should be .lpm. By default, the name is autogenerated as “date_moltype.lpm” (e.g. 20201201_DPPC.lpm)
Validation size validationSize= float (Opt.) Proportion of molecules from the systems kept aside for the validation set. Default is 0.20 (20%)
 Seed  seed= int (Opt.) Seed for random shuffle of the input systems. Default is 7.
Number of splits nSplits= int (Opt.) Number of time the training should be repeated on the training set with random shuffling. Default is 10.
Save Model save_model= bool (Opt.) Save the model in a file. Default is True.

Output(s)

Name  Type Description
Models dict of models Scikit-learn models trained on the input systems for molecule classification.

Examples

Generate automatically a model file on 2 phases

The following example will use two instances of the System class, named system_A and system_B, to train the Machine Learning algorithms to recognise phases named A and B. The model file will automatically be named and saved in the current folder.

import mllpa

mllpa.generateModel([system_A, system_B], phases=['A', 'B'])

Generate automatically a model file on 3 phases with a chosen name

The following example will use three instances of the System class, named system_A, system_B and system_C, to train the Machine Learning algorithms to recognise phases named A, B and C. The model file named new_model.lpm will be saved in the current folder.

mllpa.generateModel([system_A, system_B, system_C], phases=['A', 'B', 'C'], file_path= 'new_model.lpm')

Modify the training and scoring methods

The following example will use two instances of the System class, named system_A and system_B, to train the Machine Learning algorithms to recognise phases automatically named respectively gel and fluid. The training will be done with a validation size of 33%, repeated 100 times and start with a random seed generator equals to 10. The model file will automatically be named and saved in the current folder.

mllpa.generateModel([system_A, system_B], validationSize=0.33, seed=10, nSplits=100)

Save in variables but not in a file

The following example will use two instances of the System class, named system_A and system_B, to train the Machine Learning algorithms to recognise phases automatically named respectively gel and fluid. No model file will be saved and the models will instead be returned in the models variable.

models = mllpa.generateModel([system_A, system_B], save_model=False)

Save in variables and in a file

The following example will use two instances of the System class, named system_A and system_B, to train the Machine Learning algorithms to recognise phases automatically named respectively gel and fluid. The model file will automatically be named and saved in the current folder, and the models will instead be returned in the models variable.

models = mllpa.generateModel([system_A, system_B])

The following tutorial(s) detail further the use of the generateModel() function: