Chemical Descriptors


A software to calculate molecular descriptors and fingerprints. The software currently calculates 813 descriptors (679 1D, 2D descriptors and 134 3D descriptors) and 10 types of fingerprints. The descriptors and fingerprints are calculated using The Chemistry Development Kit with some additional descriptors and fingerprints. These additions include atom type electrotopological state descriptors, McGowan volume, molecular linear free energy relation descriptors, ring counts, count of chemical substructures identified by Laggner, and binary fingerprints and count of chemical substructures identified by Klekota and Roth.



The MOLE db – Molecular Descriptors Data Base is a free on-line database comprised of 1124 molecular descriptors calculated for 234773 molecules. At the present moment, 9166 queries have been made on the database. This data base is intended as a research and teaching tool.


Collection of Substituents and Spacers Extracted from Bioactive Molecules

Virtual Computational Chemistry Laboratory provides free on-line tools, helpful in performing computational chemistry, ADME/T and chemoinformatics tasks including the building and visualisation of chemical structures, the calculation of molecular properties and the analysis of relationships between chemical structure and properties. All these tools are developed, provided and supported by the VCCLAB partners. The on line software ALOGPS 2.1 is an accurate program to predict lipophilicity and aqueous solubility of molecules



ASNN represents a combination of an ensemble of feed-forward neural networks and the k-nearest neighbour technique. This method uses the correlation between ensemble responses as a measure of distance amid the analysed cases for the nearest neighbour technique. This provides an improved prediction by the bias correction of the neural network ensemble. An associative neural network has a memory that can coincide with the training set. If new data becomes available, the network further improves its predictive ability and provides a reasonable approximation of the unknown function without a need to retrain the neural network ensemble. This feature of the method dramatically improves its predictive ability over traditional neural networks and k-nearest neighbour techniques. Another important feature of ASNN is the possibility to interpret neural network results by analysis of correlations between data cases in the space of models.



Open Babel is a “molecular structure information interchange hub”. Babel was started by Pat Walters and Matt Stahl under the direction of Professor Daniel P. Dolata at the University of Arizona. The current project is Open Source GPL-licensed software. E-BABEL is on-line interactive version of it. For more information, check the Open Babel website:



The Polynomial Neural Network (PNN) algorithm[1,2]  is also known as Iterational Algorithm of Group Methods of Data Handling (GMDH). GMDH were originally proposed by Prof. A.G. Ivakhnenko. PNN correlates input and target variables using (non) linear regression. In this particular software the user can define the desired properties of the solution such as the number of terms and the maximum degree of polynoms using an approach proposed by Prof. Yu.P.Yurachkovsky[3].



Parameter Client provides an interface to different programs that calculate several groups of indices with a total number of >3,000. Molecules can be input in SMILES or SDF format. 2D and 3D SDF files are distinguished. SMILES can be input with the help of JMEmolecular editor. If the 3D atom coordinates are needed for parameter calculation, they are obtained with the help of CORINA, provided by Molecular Networks GmbH.



E-DRAGON is the electronic remote version of the well known software DRAGON, which is an application for the calculation of molecular descriptors developed by the Milano Chemometrics and QSAR Research Group of Prof. R. Todeschini. These descriptors can be used to evaluate molecular structure-activity or structure-property relationships, as well as for similarity analysis and highthroughput screening of molecule databases.



PLSR statistical analysis module performs model construction and prediction of activity/property using the Partial Least Squares (PLS) regression technique. It is based on linear transition from a large number of original descriptors to a small number of orthogonal factors (latent variables) providing the optimal linear model in terms of predictivity (characterized by the Q2 value). 



Unsupervised Forward Selection (UFS) is a data reduction algorithm that selects from a data matrix a maximal linearly independent set of columns with a minimal amount of multiple correlation.UFS was designed for use in the development of Quantitative Structure-Activity Relationship (QSAR) models, where the m by n data matrix contains the values of n variables (typically molecular properties) for m objects (typically compounds). QSAR data sets often contain redundancy (exact linear dependencies between subsets of the variables), and multicollinearity (high multiple correlations between subsets of the variables). Both of these features inhibit the development of QSAR models with the ability to generalise successfully to new objects. UFS produces a reduced data set that contains no redundancy and a minimal amount of multicollinearity.


Molecular Descriptors website

The aim of this website is to collect the information related to the molecular descriptors, thus helping researchers in their daily work. Software, books, links, events and news are collected in a synthetic way to allow a quick and easy consultation. Moreover, this website could be an incentive to open a forum on molecular descriptors, where experts can initiate discussions on different topics related to the molecular descriptors as well as collect lists of bibliographic references about descriptors or discuss some interpretations about them.



Dragon is an application for the calculation of molecular descriptors. These descriptors can be used to evaluate molecular structure-activity or structure-property relationships, as well as for similarity analysis and high-throughput screening of molecule databases. At the same website there is MobyDigs, a software package (available for evaluation) for the calculation of regression models by using genetic algorithms for variable selection to obtain an optimal subset of predictive models.


QSAR and Modeling Society

The QSAR and Modeling Society provides several resources about databases, software, supercomputing centers, web services.