MolScreen

MolScreen Contains a Panel of >2500 High Quality 2D and 3D Models.

MolScreen is a set of high quality 2D fingerprint and 3D pharmacophore models for a broad range of pharmacology and toxicology targets. The models can be used for lead discovery or counter screening. The models use MolSoft's 2D QSAR/Fingerprint and 3D Atomic Property Fields ( Totrov 2008) methods. There are currently approximately 2500 models for 1200 targets.

The models can be screened directly using MolSoft's ICM-Pro + VLS software. Alternatively we can screen a set of chemicals for you via our contract research services. Please contact us for more information about how to use MolScreen.

MolScreen Applications

MolScreen can be used for:

Target Identification - Search chemicals against a set of protein targets.
Lead Identification - Identify chemicals that can bind against a protein target.
Profiling - Multiple protein targets versus multiple chemicals.
Drug-Repurposing - Use MolScreen to search for new protein targets for available drugs

Available Models

You can download and view the available models using the links below (updated (3/8/2019). Each model has a name which starts with the 3 letter abbreviation of the model type as described below followed by the gene name.

All Targets - Download Excel
Approved Drug Targets Download Excel
Drug Toxicity Targets Download Excel
Endocrine Targets Download Excel
Non-Mamammalian Targets Download Excel

Model Types

There are two categories of models:

1. ADMET Prediction Models ( mcp)

CACO2, hERG, HALFLIFE, LD50, CYP, Tox21, etc...
Regression and Classification Models as well as fully connected neural networks.

2. Different types of Activity Models for a large Panel of Drug Targets

Approximately 2500 models against 1200 targets.
Machine Learning ( kcc), Ligand Field - 3D Atomic Properties Field ( dfz), 4D Docking/3D-QSAR ( dpc), 3D APF/3D-QSAR ( dfa) and Neural Network Chemical Classification ( ncc )

About the Models

Machine Learning Models - Hybrid 2D QSAR/Fingerprint Models kcc(+kca)

kcc(+kca): Kernel regression Chemical fingerprint Classification/Activity prediction

Currently: 999 mammalian models
Training set: ChEMBL Ki, IC50, EC50
Report kcc(Classification) score and kca(pKd regression) score
Median training set: 370 ligands
Median external test set AUC: 96%
Median external test set Q2: 0.5
Extremely fast (thousands of cpds in min)

Training:

Cluster Actives by fingerprint
Add 40k ChEMBL actives decoy
Kernel function to each cluster -> probability score (kcc/MolClass Score)
Partial Least Square Regression for each cluster + Kernel Regression (kca/MolpKd Score)
MolScore: combine MolpKd and MolSimilarity to known binders

Ligand Field Docking Models - 3D Atomic Property Field Models (dfz)

dfz: Docking to ligand Field Z-score prediction model

Built using Atomic Property Fields
Currently: 504 mammalian models
Pocketome ligands/custom alignment as APF template
ChEMBL cpds for validation
Median AUC: 92%, 139 cpds vs decoy
Fast-ish (single template cluster ~5 sec per cpd)

Pocket Docking 3D QSAR Models

dpc: Docking to Pocket Classification/Activity

Currently: 343 mammalian models w/ AUC> 80%
Training set: ChEMBL Ki, IC50, EC50, Drugbank assignment
Median size: 307 ligands
Median external Q2: 0.53
Median external AUC: 95%

Training:

Pocketome -> Clustering of pocket residues
4D Docking w/ co-crystallized ligand as APF template
Docking Score -> Probability score (dpc/MolClass score)
3D QSAR training of Activity-> (dpa/MolpKd)
MolScore: combine MolpKd and MolSimilarity to known binders

Hybrid 4D/2D - Hybrid Models (dfa)

dfa: Docking to ligand Field Activity prediction

Currently: 612 mammalian models w/ AUC > 80%
Training set: ChEMBL Ki, IC50, EC50, Drugbank assignment
Median size: 270 ligands
Median external Q2: 0.65
Median external AUC: 96%

Training:

Also from Pocketome -> 4D Docking + Ligand APF template
Cpd align to ligand template -> cluster by 3D poses
APF Score -> Probability Score (dfc/MolClass score)
3D-QSAR training for each cpd cluster (dfa/MolpKd score)
MolScore: combine MolpKd and MolSimilarity to known binders

Neural Network - 2D Fingerprint Neural Network Classifier (ncc)

ncc: Neural Network Chemical fingerprint Classification.

Currently: 6 Target Families each with 12-234 targets and 3K to 144K ligands.
All Models are validated with 25% set aside as external test set.
Median external AUC: 99.5%

Training for each family:

Data: Targets(m) x Compounds(n)
Input Layer: ECFP
Fully Connected Neural Net with 2-3 hidden layers
Output Layer: m Targets
Multitask Prediction

ADMET Models

mcp: Miscellaneous Chemical Property Models

Currently 38 models, mostly from PubChem data
All validated by external test set (20% of data set aside)
Regression Models, Mean external test set Q2: 0.7 - CACO2, PAMPA permeability, LD50 (mg/kg), Half-life (hr)
Classification Models, Median external test set AUC: 84% - hERG, PGPinhibitor, PGPsubstrate, PAINS - Cytochrome P450 1A2, 2C19, 2C9, 2D6, 3A4 - 25 Tox21 Classifier, including Estrogen Agonist/Antagonist, Genotoxicity, Aromatase, etc
Fully connected Neural Network models.