SuperPred3 offers a knowledge-based method for ATC code and target predicition of your compounds which is based on machine learning models.
The data is stored in a relational MySQL database, which is hosted on the Charité IT system. For the handling of chemical information in the database, the Python package RDKit and ChemAxon software were used. The website back-end consists of a lab-based LAMP (Linux/Apache/MySQL/PHP) server, with PHP serving as the back-end language. The database connection is established through the MySQL interface and front-end data delivery through a mixture of Html from submission responses and AJAX requests. Website functionalities are implemented using Javascript and, in extension, its plugin jQuery. Additionally, the CSS_Framework Bootstrap 4 is used. Tables on the website were created with the jQuery plugin DataTables, and the absolute sorting extension. For the chemistry interface, the JavaScript library ChemDoodle Web components was used. The usage of a JavaScript-capable browser is essential, and the server was tested on the most recent version of Google Chrome and Mozilla Firefox.
The Anatomical Therapeutic Chemical (ATC) classification system is used for the classification of drugs. It is published by the World Health Organization (WHO). The classification into groups is based on therapeutic and chemical characteristics of the drugs.Each ATC code is divided into 5 levels:
Substances or combination of substances in the 5th level refer to a single indication. Drugs having more than one indication belong to more than one ATC code. Aspirine for example has 3 ATC codes assigned.
For the prediction of ATC class and targets, a molecular structure has to be loaded in the ChemDoodle web interface. Structures can be obtained by entering a PubChem name, a SMILES string, loading a structure file or drawing with the provided tools (see below). Once a structure is loaded, additional modifications can be done as well. When satisfied with the result, the button "Start Calculation" can be used to start the predictions.
ATC codes were obtained from WHO and filtered as described in the statistics. For a more detailed look into the dataset and machine learning accuracy, you can download the csv file, containing substance name as well as expected and predicted ATC code of each training sample.
Predictions are made by logistic regression machine learning models, based on Morgan fingerprints of
length 2048. Training data was filtered in multiple steps (for details see statistics),
and the model performance was evaluated using 10-fold cross-validation for the target predcition,
and leave-one-out cross-validation in the ATC code prediction.
For a detailed look into performance values for each class (including sensitivity and
precision) you can have a look at the
corresponding csv file.
When predicting targets, two different scores are reported, "probability" and "model accuracy". The first score is the probability that the input structure binds with the specific target, as determined by the respective target machine learning model. Since the model performances vary between different targets, additionally the 10-fold cross-validation score of the respective logistic regression model is displayed.