Molecular Descriptors & Ligand Efficiency Metrics Table
A collection of molecular descriptors that are commonly used in medicinal chemistry and drug discovery are listed in the table below. The definition for each is given along with comments on composite ligand efficiency metrics. Recommended values (where available) from the literature are shown together with the average value found in drugs for that descriptor (obtained from Leeson 2021). Links to relevant publications are also provided.
We stress that the values should be regarded as guidelines rather than followed dogmatically as they will vary according to chemical series, biological target and route of administration.
Some of the efficiency metrics are strongly correlated so it is best to select those that are most suitable for the project rather than attempt to use all of them.
The descriptors we have found most useful are lipophilicity (clogP/D), hydrogen bond donors (HBDs), number of aromatic rings (nAr), ligand efficiency (LE), ligand lipophilic efficiency Astex (LLEAT), molecular weight (MWt) and to consider the ionisation state of the molecule at the relevant pH.
The guide was created by: Tim Ritchie of Zerlavanz Consulting Ltd; Rick Cousins of Cinnabar Consulting Ltd; Simon Macdonald of RGDscience Ltd; Richard Hatley of RGDscience Ltd.
|Average drug values
|MWt (Molecular weight)
|Molecular weight is a measure of the sum of the atomic weight values of the atoms in a molecule.
For a small molecule therapeutic to fall within Lipinski's rule of five, a molecule must have a molecular mass of less than 500 Da.
The Ghose drug-likeness filter sets molecular weight for a small molecule as 180 to 480 Da, whereas Veber’s Rule ignores the molecular weight cut-off parameter.
Retrospective analyses indicate the molecular weight of drugs is increasing over time. If other physicochemical properties such as lipophilicity are controlled then the molecular weight may not need to be limited by these guidelines but can impact on ligand efficiency of the molecule due to increase in the heavy atom count.
|HA (Number of heavy atoms)
|The heavy atom count of a molecule is the total number of non-hydrogen atom ('heavy' atom) within the chemical structure.
The use of molecular-size measures such as heavy atom count in ligand efficiency metrics has some caveats. All non-hydrogen component atoms, such as carbon, nitrogen, oxygen, sulphur and halogen are treated equally even though their sizes and binding properties are different, and some atoms in a molecule may not participate in receptor binding interactions.
To achieve a Ligand Efficiency (LE) value of 0.3 for pIC50 of 8.0, a molecule would have a HAC of ~37.
|clogP (calculated octanol-water partition coefficient)
|LogP is one of the most important molecule properties, having a significant influence on many ADMET-related parameters and overall compound 'quality' Reference.
The average clogP in marketed drugs has not changed over time Reference.
|clogD7.4 (calculated octanol-water distribution coefficient at pH7.4)
|Takes into account ionisable groups in molecules. Log D values in the range 1 to 3 are more likely to avoid issues associated with excessive lipophilicity (poor solubility, high metabolic clearance, hERG inhibition, toxicity, promiscuity, CYP450 inhibition) or low lipophilicity (poor absorption, renal clearance).
|HBA (Number of H-bond acceptor atoms)
|The total number of NH and OH bonds (from Lipinski's rules). Reference
For Leads (RO3) <=3 Reference
The average number of HBA in marketed drugs is increasing over time Reference
|HBD (Number of H-bond donor atoms)
|The total number of N or O atoms (from Lipinski's rules). Reference
For Leads (RO3) <=3 Reference
The average number of HBD in marketed drugs has not changed over time Reference
|tPSA (topological polar surface area)
|≤ 140 Å2
|A 2-dimensional approximation of the surface sum over all polar atoms in molecules, primarily oxygen and nitrogen, also including their attached hydrogen atoms. It can be used in combination with <=10 rotatable bonds (Veber's rule for orally active compounds in rat).
|nRotB (Number of rotatable bonds)
|Any single non-ring bond, attached to a non-terminal, non-hydrogen atom. Amide C-N bonds are not counted because of their high barrier to rotation. Can be used in combination with <=140 Å2 tPSA (Veber's rule for orally active compounds in rat). Reference
The mean value for rotatable bonds per molecule in drugs is 6. Reference
|nAr (Number of aromatic rings)
Remove or replace carboaromatic rings
|The mean aromatic ring count of compounds in development tends to be lower in later clinical phases, and lower still in marketed drugs. A higher number of aromatic rings has a negative impact on solubility, whilst increasing protein binding, CYP450 inhibition, and hERG inhibition, which is not simply due to changes in size or lipophilicity. Reducing the number of carboaromatic rings, by replacing them with heteroaromatic rings or aliphatic rings, significantly increases ADMET developability.
|Fsp3 (Fraction of carbon atoms that are sp3 hybridised)
|Fsp3, the fraction of carbon atoms that are sp3 hybridised (also known as the Aliphatic Indicator), is a 2-D descriptor used as a surrogate for three-dimensionality. It is expressed as a value between zero and one. Compounds with higher values of Fsp3 appear to exhibit higher solubility, lower melting points, less promiscuity, less protein binding, and less CYP450 inhibition. Fsp3 is negatively correlated with nAr (r = -0.61).
|LE (Ligand Efficiency = 1.4(-logIC50)/Heavy Atom Count)
|The Ligand Efficiency (LE) concept was derived from the observation that the maximum affinity achievable by ligands is −1.5 kcal per mole per non-hydrogen atom ('heavy' atom), ignoring simple cations and anions, and from studies examining functional group binding energy.
Ligand efficiency (LE) metrics are calculated in a simple way and can be applied at all stages of drug discovery to evaluate fragments, screening hits, leads and candidate drugs. The use of molecular-size measures such as heavy atom count (HAC) in LE metrics treats all component atoms equally even though their sizes and binding properties are different, and some atoms in a molecule may not participate in receptor binding interactions.
The historical analysis of the use of ligand efficiency metrics in drug discovery provides a strong case for optimising molecules using LE to identify compounds that meet the project objectives with as high as LE values possible for the target class. The absolute LE value of a drug candidate may therefore be of less importance.
A guide not a prerequisite - an LE of 0.3 for pIC50 of 8.0 would be for a molecule with a HAC of ~37 and a molecular weight of 518 for a mean HA weight of 14.
|BEI (Binding efficiency index = [pKd × 1000] ÷ MWt
|Idealised reference value = 27. The closer to this value the better.
|BEI has strong correlation with LE. BEI reflects the value of each atom's contribution of a ligand to binding potency through inclusion of a molecular weight term to provide an easy and effective ranking of molecules. Can be used in conjunction with SEI.
|SILE (Size-independent LE = pKd ÷ HA^0.3)
|No value recommended by metric originator
|SILE is similar to LE and BEI but is designed to overcome the negative correlation with heavy atom count seen for LE and BEI. SILE has strong correlation with FQ.
|FQ (Fit quality = [pKd ÷ HA] ÷ [0.0715 + (7.5328 ÷ HA) + (25.7079 ÷ HA2) − (361.4722 ÷ HA3)]
|No value recommended by metric originator but values 1 and above have "exceptional efficiencies". "Compounds with the best ligand efficiency...have FQ scores falling aorund 1.0"
|FQ is similar to LE and BEI but is designed to overcome the negative correlation with heavy atom count seen for LE and BEI. FQ has a strong correlation with SILE.
|LLE or LipE (Lipophilic Ligand Efficiency = pIC50 − cLogP)
|Preferred 5-7 - higher value if possible
|A guide not a prerequisite - an alternative ligand efficiency metric that uses logP as the normalising quantity rather than the number of heavy atoms. Lipophilicity may perhaps be more important to control for that molecular weight.
Composite metrics are generally applied in two different situations. Firstly, to enable comparison between different chemical series for example from a screening campaign that are different in size or lipophilicity. Secondly to assess how changes within a chemotype relate to the lipophilicity or size changes made.
LLE assesses the potency relative to lipophilicity as opposed to molecule size assessed in LE, which uses HAC as a surrogate for molecular weight. In medicinal chemistry the lead optimisation of a chemical series is most often focused on controlling or reducing lipophilicity whilst increasing or at least maintaining potency. Most ADMET properties are negatively correlated to lipophilicity, although permeability will be optimal within a range of lipophilicity where the lower limit needs to be defined for each series.
Therefore, the LLE metric may be more helpful in supporting lead optimisation, as HAC does not cater for the change in lipophilicity that heteroatoms will provide compared to a carbon atom. However, it is preferred that measured values of lipophilicity (logP, logD or chromatographic lipophilicity value) are obtained and used in the calculation of LLE values. Calculated values such as cLogP often include an error of more than one log unit, rendering the calculated LLE value meaningless. Reference.
|LLEAT (LLE adjusted for HA count = 0.111 + [(1.37× LLE) ÷ HA])
|Lipophilicity is amongst the most important of these additional properties, and the composite metric LLE attempts to enable comparison of molecules with respect to potency and lipophilicity. However, for fragments and early lead optimisation the application of the LLE metric is less helpful.
To address this, the efficiency index (LLEAT) was created to combine lipophilicity, size and potency. The index is intuitively defined, and has been designed to have the same target value and dynamic range as LE with a desired value of >0.3, making it easily interpretable by medicinal chemists. It is proposed that monitoring both LE and LLEAT should help both in the selection of more promising fragment hits, and controlling molecular weight and lipophilicity during optimisation.
Reference 1; Reference 2
|LELP (LE price paid in lipophilicity = ALogP ÷ LE)
|A further metric developed to enable assessment of fragments and early stage hit optimisation is ligand-efficiency-dependent lipophilicity, or LELP. LELP attempts to address the shortcoming of the LE metric through the incorporation of lipophilicity, however, as log P approaches 0 LELP becomes less sensitive to potency and size. LELP has so far not been taken up widely by the drug discovery community compared to other ligand efficiency metrics.
|SEI (Surface-binding efficiency index =[pKd × 100] ÷ PSA)
|Ref. value = 18
|SEI is defined as the pKi, pKd, or pIC50 per PSA (where 100 Å2 is used as a normalizing factor for
PSA values) e.g. A compound with an affinity of 1 nM and a PSA of 50 Å2 will have a SEI of 18.
|AEI (ADME efficiency index = ([pKd − |ALogP|] ÷ PSA) × 100
|> 7 good; < 4 bad
|A modification of LLE that incorporates a polarity measure (PSA). High AEI minimises the risk of transporter interactions, and often results in lower daily doses.
|QED (Quantitative estimate of drug-likeness)
|0.5-1.0 better; 0.0-0.5 worse
|A composite score related to the molecule properties of oral drugs, using MWt, clogP, HBA, HBD, tPSA, nRotB, nAr, & structural alerts. Compounds with higher values of QED have properties more similar to those of existing oral drugs.
|PFI (property forecast index = clogD + nAr)
|< 5 good; > 7 bad
|High PFI is associated with poor ADMET outcomes, such as low solubility, low permeability, high protein binding, high CYP450 inhibition, high clearance, hERG inhibition, and off-target promiscuity. The sum of clogD (or clogP) and nAr is often more predictive than the separate descriptors. Reference
|AB-MPS (|LogD7.4 − 3| + nAr + nRotB)
|For molecules that fall outside Lipinski Rule of 5 space and particularly those with a molecular weight of > 500, this is a helpful guide. Compliance with AB-MPS gives "an increased probability of higher oral bioavailability" and it correaltes with rat bioavailability in bRo5 (beyond Rule-of-5 molecules).
|Ro5 (also known as Lipinksi's Rules, or the Rule of 5)
|MWt ≤ 500, clogP <5, H-bond donors ≤ 5, H-bond acceptors ≤ 10
|Lipinski's rules are probably the most famous heuristic for oral drug design in the last 25 years. It is widely used and catalysed the development of many heuristics described here and elsewhere. Originally, Lipinski stated a molecule that complied with any three of the rules was more likely to be soluble (in water) and/or be permeable but it is widely interpreted to also apply to oral bioavailability. There are caveats to Lipinski's rules and they should be applied with careful thought in the context of the molecules being studied.
Reference 1; Reference 2
|An analysis by Kihlberg in 2014 suggests that oral drugs “are found far bRo5 and properties such as intramolecular hydrogen bonding, macrocyclization, dosage, and formulations can be used to improve bRo5 bioavailability.” They find that whilst the molecular weight can be > or >> 500, the number of HBD’s are usually ≤ 6 Reference. This is echoed by an analysis by Schultz in 2019 that indicated that whilst most Lipinski parameters for drugs have varied temporally – for example molecular weight has increased – lipophilicity (experimentally determined logP) and the number of HBD’s have stayed about the same Reference.
AB-MPS described above is a useful heuristic for this space.
bRo5 is currently a topic of great interest particularly in PROTAC drug discovery with an analysis from AstraZeneca due for publication in 2022.
|Central Nervous System (CNS) drugs need to penetrate the Blood-Brain Barrier (BBB) in order to access their target. Analysis of active CNS drugs has determined their physicochemical properties which are a smaller subset of the larger property space of orally available drugs.
Wager and his co-researchers at Pfizer built a CNS Multiparameter Optimization (MPO) algorithm for generating a score that used 6 common physicochemical properties: Molecular weight (MW), logP, logD at pH 7.4, topological polar surface area (TPSA), the number of H-bond donors (HBD), the pKa of the most basic centre of the molecule
The research revealed that CNS drugs already on the market and potential candidates show high MPO score values, with a desirability score value of ≥ 4.0 on a scale between 0 and 6, making this a useful tool at the design stage.
Since the publication of the CNS MPO Score, it has become a generally accepted algorithm in the CNS-drug medicinal chemistry arena.
|Weaver and co-authors propose a new model that predicts BBB penetration using a combination of commonly used molecular descriptors: number of aromatic rings, number of non-hydrogen atoms, molecular weight, polar surface area, pKa, and the number of hydrogen bonding acceptors and donors.
The authors found this new method, BBB Score, outperforms existing state of the art methods on the datasets employed by the authors.
The authors provide a Microsoft Excel implementation of the method, which can be used to calculate the BBB score during the design of potential compounds prior to synthesis