MolSoft GigaSearch Engine |
GigaSearch v2.0 - Efficient Massive Scale Substructure Chemical Searching
GigaSearch v2.0 by MolSoft represents a significant advancement in searching ultra-large chemical spaces, moving away from full enumeration and dramatically reducing storage requirements while maintaining high performance.
With GigaSearch v2.0 you can search ultra-large chemical libraries in just 5 to 10 seconds using advanced substructure queries based on 0D (SMILES/SMARTS) or 2D topological structures.
Recent additions to the GigaSearch database collection include:
- 2025 Enamine Space >80 Billion Chemicals
- 2025 Chemspace Freedom Space >140 Billion Chemicals
How to use GigaSearch
Option 1 - Access via MolSoft Server
Current users of ICM-Pro or ICM-Chemist (v3.9-4a or later) can directly access the GigaSearch databases hosted on MolSoft’s secure servers.
- Includes Enamine Space (80B) and Chemspace Freedom Space (140B) databases.
- Ideal for rapid substructure searching and analog retrieval.
- Limited to a defined number of returned hits per search.
- No special hardware or installation required - simply log in from your ICM session.
This option is designed for convenience and speed, giving ICM users immediate access to massive chemical spaces without the need for local data management.
Option 2. Local Installation (Licensed)
A local GigaSearch installation is available under a separate license for users requiring on-premise deployment.
- Enables unrestricted search capacity and full control over proprietary or confidential chemical libraries.
- Suitable for organizations with strict data security or IT policy requirements.
- Allows integration of private in-house databases in addition to MolSoft?s hosted collections.
- Requires a Linux workstation or server with >64 GB RAM (126 GB recommended) and a modern multicore CPU (>16 cores).
This setup is ideal for companies or institutions wishing to run GigaSearch fully within their own infrastructure, maintaining control over local datasets and computational performance.
Application of GigaSearch v2.0
Chemical databases are expanding exponentially, containing billions up to trillions of commercially available and virtual compounds. The ability to efficiently mine and extract meaningful chemical insights from these massive datasets is critical for modern drug discovery and chemical informatics. GigaSearch v2.0 enables you to:
- Efficiently search billions of compounds to identify potential hits or analogs.
- Build target-specific virtual libraries tailored to your protein or binding site of interest.
- Identify chemical homologs and scaffold analogs across ultra-large collections.
- Guide structure-activity relationship (SAR) exploration through rapid analog retrieval and filtering.
- Find real, synthetically accessible derivatives of lead compounds for follow-up studies.
- Perform virtual screening by substructure to discover new chemical series.
- Export results directly to interactive ICM-Chemist spreadsheets for visualization, clustering, and further analysis.
- Automate and scale up searches with full scripting support for batch and high-throughput workflows.
Key Features of GigaSearch v2.0
- Speed: The average search time is typically around 5 to 10 seconds on the 80B Enamine Space and 140B Chemspace Freedom database.
- Storage Efficiency: GigaSearch v2.0 handles much larger datasets with vastly less storage e.g., ~86 GB for the 80B 2025 Enamine Space.
- Support for SMILES and SMARTS Search: Advanced substructure queries allow chemical patterns with atomic and bonding properties, including atom type, aromaticity, ring membership, hybridization, valence, charge, and connectivity.
- Scalability: Fully utilizes multicore systems for rapid, large-scale searching.
Licensing and Contact
GigaSearch v2.0 can be used with a license for MolSoft ICM-Pro or ICM-Chemist and runs on any platform (Windows, Mac, or Linux). For unrestricted search capabilities, a GigaSearch license and local hosting are required.
For more information about licensing GigaScreen V2.0 please contact info@molsoft.com or call 858-625-2000 x108.
GigaSearch v1.0 - New Method for Substructure Search in Large Chemical Space
The Giga-Search method was first described by Eugene Raush (Principal Developer, MolSoft LLC) at MolSoft's ICM User Group Meeting held on November 8-9 2018 in San Diego, CA.
The method enables you to perform substructure search of BILLIONS of chemicals in seconds.
There are currently no other available methods on the market which can perform substructure search in such an efficient way.
You can see MoLSoft's Giga Search Engine in action on the Enamine REAL database website (>5B Million chemicals).
Search using SMILES and SMARTS
Giga Search allows you to search using a SMILES string and SMARTS notation to specify chemical patterns and wild cards. Some of the supported notations include
- any atom
- aromatic
- aliphatic
- aliphatic carbon
- aromatic carbon
- any carbon
- the number of heavy neighbors
- number of attached hydrogens
- the number of rings the atom belongs to
- the size of smallest ring the atom belongs to
- valence, sum of bond orders of all neighbors
- the number of all neighbors including hydrogens
- negative and positive charge
- sp1,sp2,sp3 hybridization
- and more...
Screenshot of Giga Search in ICM-Chemist