MolSoft Giga-Search Engine. |
New Method for Substructure Search in Large Chemical Space
The Giga-Search method was first described by Eugene Raush (Principal Developer, MolSoft LLC) at MolSoft's ICM User Group Meeting held on November 8-9 2018 in San Diego, CA.
The method enables you to perform substructure search of BILLIONS of chemicals in seconds.
There are currently no other available methods on the market which can perform substructure search in such an efficient way.
You can see MoLSoft's Giga Search Engine in action on the Enamine REAL database website (>5B Million chemicals).
Implementation
The methods adds fingerprint bit statistics to the MolCart search engine which allows extremely fast and efficient way of filtering out molecules based on the input chemical pattern.
The method also provides a new efficient way of storing chemical fingerprints to minimize the amount of data to be scanned on server side.
Currently available databases (August 2023)
- Enamine REAL ~5B
- ZINC 1.79B
- Mcule 80M
Application of Giga-Search
Chemical databases are getting exponentially bigger therefore the ability to be able to effectively mine these databases is important. Some applications of this method include:
- Efficiently mine many billions of chemicals.
- Build target-specific libraries.
- Find chemical homologs.
- Drive your SAR search.
- Find real chemical derivatives of lead compounds.
- Virtual screening of chemicals with similar substructure.
- Output your search to fully interactive chemical spreadsheets in ICM-Chemist for further analysis.
- Fully scriptable for batch high-throughtput searches.
Search using SMILES and SMARTS
Giga Search allows you to search using a SMILES string and SMARTS notation to specify chemical patterns and wild cards. Some of the supported notations include
- any atom
- aromatic
- aliphatic
- aliphatic carbon
- aromatic carbon
- any carbon
- the number of heavy neighbors
- number of attached hydrogens
- the number of rings the atom belongs to
- the size of smallest ring the atom belongs to
- valence, sum of bond orders of all neighbors
- the number of all neighbors including hydrogens
- negative and positive charge
- sp1,sp2,sp3 hybridization
- and more...
How can I get Access to this Method?
- The latest version of ICM-Pro or ICM-Chemist (see screenshot below) provides graphical user interface to Giga-Search engine hosted in Molsoft.
Use 'Chemical Search' button and pick 'Online Databases' tab. Currently we provide interface to two releases of Enamine REAL database.
- For MolCart customers the service can be hosted locally on a custom database.
Questions?
Please get in
contact with us by email or phone with any questions.
Screenshots of Giga Search in ICM-Chemist
Giga Search Results and Analysis - Click to Enlarge
Giga Search Window - Click to Enlarge