MLome

The machine learning analysis function of the database can perform binary and multi-classification analysis based on 15 algorithms and survival analysis based on 11 algorithms.

The functional modules include "Model Generation", "Feature Browse", "Model Evaluation" and "Model Application", as illustrated in the left figure. And the specific steps are shown in the navigation bar. (The navigation bar is only used to display the current step and is not for page turning)

Click the "Next Step" and "Last Step" buttons at the bottom to turn pages. Click the other buttons here for analysis.

Each step and module offers multiple visualization approaches and free download services.

The "Model Application" module is an optional function. When drawing a nomogram, there is no need to upload the prediction set.

Users can upload the Training Set and Validation Set separately, or upload a comprehensive dataset in the "Step1: Data Input" and set a proportion for random division to obtain the Training Set and Validation Set.

Basic Parameter

*The generation way for validation set:

Upload Dataset

Divide Dataset

*Missing Value Treatment:

Training Set

*Upload Matrix File:

*Upload Group Information File:

Validation Set

*Upload Matrix File:

*Upload Group Information File:

Please make sure the input file format is correct to avoid errors that caused no results.

The contents of the files should be separated by tab. If the IDs contain special characters, they will be automatically replaced by underscores.

In the upload matrix file, each line should be a feature and each column should be a sample. Feature IDs and sample IDs should not be duplicated; otherwise, it may lead to no results. And the values in the matrix should be numerical.

For binary and multiple classification analysis, the group information file should have two columns and the headers should be "sample" and "condition". For binary classification analysis, the value of the "condition" should be "case" or "control".

For survival analysis, the group information file should have three columns and the headers should be "sample", "time" and "status". The unit of the "time" column should be days, and the value of the "status" column should be "0" or "1".

Too many features and samples will result in slow running time. And some ML algorithms are time-consuming, please be patient.

If some ML algorithms do not produce results of "Model Generation" or "Model Evaluation", users are advised to adjust the parameters and try again.

Please note that these algorithms may not be able to model all data structures successfully. If the format of the uploaded data is correct, please focus more on the successful algorithm for modeling.

Click the button on the right navigation bar to obtain the result window of the corresponding ML algorithm. The result window contains sub-pages including "Importance Scores" and other visualizations.
The numbers in the right navigation bar represent the number of top/all features in models.
The top feature will be displayed on the "Importance Score" subpage in the result window of each ML algorithm, and the number can be adjusted by user independently.
For some algorithms, it is necessary to select parameters in the result window based on the intermediate results, and the results can be obtained after submission.
In the subsequent steps, "Feature Browse" will display the intersection and union of top features.
In the "Model Evaluation" step, all features will be used to conduct an overall assessment of the model, and the top features will be evaluated at the feature level.

Overview

Step1: Data Input

Step2: Method Selection

Step3: Model Generation

Step4: Feature Browse

Step5: Model Evaluation

Step6: Feature Filtration

Step7: Model Application

Next Step

Basic Parameter

*Analysis Type:

Binary Classification

Multiple Classification

Survival Analysis

*The generation way for validation set:

Upload Dataset

Divide Dataset

*Missing Value Treatment: Delete Rows Fill by Mean Fill by Median Fill by Maximum Fill by Minimum Fill by Mode

Data Set

*Upload Matrix File:

*Upload Group Information File:

*Proportion of Division: training : validation = 7 : 3

After clicking the "Submit" button, the randomly divided data set can be downloaded for the purpose of reproducing the results when submitting again.

Training Set

*Upload Matrix File:

*Upload Group Information File:

Validation Set

*Upload Matrix File:

*Upload Group Information File:

Submit

Last Step

Method Selection

Next Step

Algorithm and Parameter Selection Select All

XGBoost

Model:

The number of decision trees to display:

Show the top features

LightGBM

Model:

Show the top features

LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed and efficient with the following advantages:

Benefiting from these advantages, LightGBM is being widely-used in many winning solutions of machine learning competitions.

GBM

Model:

Show the top features

Random Forest

Number of Trees:

CatBoost

Model:

Show the top features

CatBoost is a fast, scalable, high performance gradient boosting on decision trees library. Used for ranking, classification, regression and other ML tasks.

AdaBoost

Number of iterations:

Show the top features

M1 algorithm and Breiman's Bagging algorithm using classification trees as individual classifiers. Once these classifiers have been trained, they can be used to predict on new data. Also, cross validation estimation of the error can be done.

Decision Tree

Show the top features

Lasso

Model:

Cross Validation:

Show the top features

Elastic Network

Model:

Cross Validation:

Alpha:

Show the top features

Ridge

Model:

Cross Validation:

Show the top features

PLS

Show the top features

GLM

Model:

Show the top features

In GLM (generalized linear model), logistic regression is used for analysis of binary-class, linear regression is used for analysis of multi-class.

Neural Network

Model:

Show the top features

*Missing Value Treatment: