generateModel()
Description
generateModel() is a function used to train Machine Learning models to predict the phases of molecules based on the input systems. The models will be saved in a file unless told otherwise, and will also output the models in a variable that can be used directly.
Argument, keywords and outputs
Input(s) / Argument(s)
Name | Flag | Type | Description |
---|---|---|---|
Systems | list of classes System | List of each instances of the System class that should be used for the training. Systems can be generated using the function openSystem(). Provide one system only per phase. The types, number of molecules and neighbor ranking should be the same in each system. By default, the first system of the list should be the gel one, and the second one the fluid one. | |
Phases | phases= | list of str | (Opt.) Names of the phase of each system submitted above. The order should be the same. Default is gel/fluid. |
File Path | file_path= | str | (Opt.) Path and name of the file to generate. File extension should be .lpm. By default, the name is autogenerated as “date_moltype.lpm” (e.g. 20201201_DPPC.lpm) |
Validation size | validationSize= | float | (Opt.) Proportion of molecules from the systems kept aside for the validation set. Default is 0.20 (20%) |
Seed | seed= | int | (Opt.) Seed for random shuffle of the input systems. Default is 7. |
Number of splits | nSplits= | int | (Opt.) Number of time the training should be repeated on the training set with random shuffling. Default is 10. |
Save Model | save_model= | bool | (Opt.) Save the model in a file. Default is True. |
Output(s)
Name | Type | Description |
---|---|---|
Models | dict of models | Scikit-learn models trained on the input systems for molecule classification. |
Examples
Generate automatically a model file on 2 phases
The following example will use two instances of the System class, named system_A and system_B, to train the Machine Learning algorithms to recognise phases named A and B. The model file will automatically be named and saved in the current folder.
import mllpa
mllpa.generateModel([system_A, system_B], phases=['A', 'B'])
Generate automatically a model file on 3 phases with a chosen name
The following example will use three instances of the System class, named system_A, system_B and system_C, to train the Machine Learning algorithms to recognise phases named A, B and C. The model file named new_model.lpm will be saved in the current folder.
mllpa.generateModel([system_A, system_B, system_C], phases=['A', 'B', 'C'], file_path= 'new_model.lpm')
Modify the training and scoring methods
The following example will use two instances of the System class, named system_A and system_B, to train the Machine Learning algorithms to recognise phases automatically named respectively gel and fluid. The training will be done with a validation size of 33%, repeated 100 times and start with a random seed generator equals to 10. The model file will automatically be named and saved in the current folder.
mllpa.generateModel([system_A, system_B], validationSize=0.33, seed=10, nSplits=100)
Save in variables but not in a file
The following example will use two instances of the System class, named system_A and system_B, to train the Machine Learning algorithms to recognise phases automatically named respectively gel and fluid. No model file will be saved and the models will instead be returned in the models variable.
models = mllpa.generateModel([system_A, system_B], save_model=False)
Save in variables and in a file
The following example will use two instances of the System class, named system_A and system_B, to train the Machine Learning algorithms to recognise phases automatically named respectively gel and fluid. The model file will automatically be named and saved in the current folder, and the models will instead be returned in the models variable.
models = mllpa.generateModel([system_A, system_B])
Related tutorials
The following tutorial(s) detail further the use of the generateModel() function: