SuperPred3 offers a knowledge-based method for ATC code and target predicition of your compounds which is based on machine learning models.
The Anatomical Therapeutic Chemical (ATC) classification system is used for the classification of drugs. It is published by the World Health Organization (WHO). The classification into groups is based on therapeutic and chemical characteristics of the drugs.Each ATC code is divided into 5 levels:
Substances or combination of substances in the 5th level refer to a single indication. Drugs having more than one indication belong to more than one ATC code. Aspirine for example has 3 ATC codes assigned.
For the prediction of ATC class and targets, a molecular structure has to be loaded in the ChemDoodle web interface. Structures can be obtained by entering a PubChem name, a SMILES string, loading a structure file or drawing with the provided tools (see below). Once a structure is loaded, additional modifications can be done as well. When satisfied with the result, the button "Start Calculation" can be used to start the predictions.
ATC codes were obtained from WHO and filtered as described in the statistics. For a more detailed look into the dataset and machine learning accuracy, you can download the csv file, containing substance name as well as expected and predicted ATC code of each training sample.
Predictions are made by logistic regression machine learning models, based on Morgan fingerprints of
length 2048. Training data was filtered in multiple steps (for details see statistics),
and the model performance was evaluated using 10-fold cross-validation for the target predcition,
and leave-one-out cross-validation in the ATC code prediction.
For a detailed look into performance values for each class (including sensitivity and precision) you can have a look at the corresponding csv file.
When predicting targets, two different scores are reported, "probability" and "model accuracy". The first score is the probability that the input structure binds with the specific target, as determined by the respective target machine learning model. Since the model performances vary between different targets, additionally the 10-fold cross-validation score of the respective logistic regression model is displayed.