We construct fingerprint by enumerating linear and some non-linear fragments plus some information about rings. Atoms type mapping varies depending on the chain/fragment size. All this hashed into counted or non-counted vector. (counted are mostly used in prediction models)
Fingerprint properties:
- Result vector size : (1536 for default chemical similarity/distance fingerprint)
- Minimum/Maximum fragment size (chain length for linear fingerprints) : 1-6 default
- Flexible atom typing which from the combination of the following features:
- cd : element code
- hyb: hybridization
- a : aromaticity.
- h : number of connected hydrogen
- hp : number of connected hydrogen for polar atoms (0- others)
- r : ring/chain attribute
- b : number of heavy neighbors
- rs : size of the smallest ring atom belongs to (0 - for non-ring atoms)
- Flexible bond typing:
- bt : bond order
- ~r : ring/chain attribute
- ~rt : rotatable attribute
- enumeration method:
- linear chains, rings and small branched fragments (this is the default for chemical distance/similarity)
- ecfp (non-linear fragment enumeration)
- 2D ph4
- binary / counted
For the chemical distance/similarity and clustering the default fingerprint type type is 1. Length: 1536 Max linear chain length: 6 Atom typing depends on the chain length.
Custom fingerprints can be built using Descriptor function or in prediction models