ICM Manual v.3.9
by Ruben Abagyan,Eugene Raush and Max Totrov
Copyright © 2020, Molsoft LLC
Nov 14 2024

Contents
 
Introduction
Reference Guide
 ICM options
 Editing
 Graph.Controls
 Alignment Editor
 Constants
 Subsets
 Molecules
 Selections
 Fingerprints
 Regexp
 Cgi programming with icm
 Xml drugbank example
 Tree cluster
 Arithmetics
 Flow control
 MolObjects
 Energy Terms
 Integers
 Reals
 Logicals
 Strings
 Preferences
 Tables
 Other
 Chemical
 Smiles
 Chemical Functions
 MolLogP
 MolLogS
 MolSynth
 Soap
 Gui programming
 Commands
 Functions
 Icm shell functions
 Macros
 Files
Command Line User's Guide
References
Glossary
 
Index
PrevICM Language Reference
Parsing XML example: DrugBank.
Next

The DrugBank database is a unique bioinformatics and cheminformatics resource that combines detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e. sequence, structure, and pathway) information. The database contains 6826 drug entries including 1431 FDA-approved small molecule drugs, 133 FDA-approved biotech (protein/peptide) drugs, 83 nutraceuticals and 5211 experimental drugs. Additionally, 4435 non-redundant protein (i.e. drug target/enzyme/transporter/carrier) sequences are linked to these drug entries. Each DrugCard entry contains more than 150 data fields with half of the information being devoted to drug/chemical data and the other half devoted to drug target or protein data. Read more information: here

The most complete drug information (target, transporter, carrier, and enzyme information ) is provided in XML format. Chemical structures are provided separately in SDF format

The following example will demonstrate how to deal with such data in ICM.

  1. Read the XML data directly from the website
    
    read xml "http://www.drugbank.ca/system/downloads/current/drugbank.xml.zip" name="drugbank"
    
    The command above will create collection object "drugbank".

  2. Examine the content
    
    icm/def> Name( drugbank )
    #>S string_array
    drugs
    
    This shows us that collection contains a single root node called "drugs"

  3. Going further gives the following:
    
    icm/def> Name( drugbank["drugs"] )
    #>S string_array
    drug
    partners
    xmlns
    xmlns:xs
    xs:schemaLocation
    icm/def> Type(  drugbank["drugs","drug"] )
    array
    icm/def> Type(  drugbank["drugs","partners"] )
    collection
    icm/def> Name(  drugbank["drugs","partners"] )
    #>S string_array
    partner
    icm/def> Type(  drugbank["drugs","partners","partner"] )
    array
    
    Which means that drugbank["drugs","drug"] is an array where each entry contains the information about particular drug. In addition there is an another array drugbank["drugs","partners","partner"] which contains an additional information about targets.

  4. Examine individual entries
    
    drugbank["drugs","drug"][1]
    drugbank["drugs","drug"][2]
    drugbank["drugs","partners","partner"][1]
    drugbank["drugs","partners","partner"][2]
    
    The default output format for displaying collection is JSON which gives you nicely formated easy-to-read text. Looking at the output it's easy find the fields of interest.

    WARNING: do not try to show the entire array into the terminal window because it'll take very long and most likely you'll need to kill the window.

  5. Fetching individual fields

    Let's create a table with a single column containing an array with drug cards.

    
    add column drugs drugbank["drugbank","drug"]
    

    Hint: In GUI you can resize all simultaneously by holding 'CTRL' key which resizing an individual row.

    The single field can be extracted by providing dot separated path to it. Note that fields which contain non-alphanumeric characters must be quoted.
    • A.drugbank - OK
    • A.'drugbank-id' - must be quoted
    
    # extracts drugbank-id into separate column
    add column drugs function="A.'drugbank-id'[1]['$']" name="drugbank_id"
    
    # extracts name into separate column
    add column drugs function="A.name" name="name"  
    

  6. Fetching multi-value fields

    Multiple properties will be extracted as an array for each drug entry.

    
    # display targets information for the second entry
    drugs.A[2]["targets","target"]
    # extract array of partner IDs for each drug into separate column
    add column drugs function= "A.targets.target.partner" name="partner_id"  
    Type( drugs.partner_id[2] )  # array
    
    This way to extract multiple properties has one problem. For entries with only one property the result will be not array but rather individual value (E.g: Type(Type( drugs.partner_id[1] ). This will prevent from the unified access to the column in the future. In such cases it's recommended to use ':' operation instead of '.'. The result of this operation will always be an array (even for single entries).
    
    delete drugs.partner_id
    add column drugs function="A.targets.target:partner" name="partner_id"  # will create an array for all entries.
    Type( drugs.partner_id[1] )  # array (even for single entries)
    

  7. Querying XML fields

    Let's say you want to extract a value of the property with name which start with "logP". It can be done similar to the ICM-table filtering operations. The only difference is that colon ':' (instead of dot) must be used to separate field name

    The general filtering syntax:
    
    <field1>.<field2>:<queryField> <op> <value>
    

    The following operations are supported in array filtering: ==,!=,>,<,>=,<=,~,!~

    Example:
    
    # query and extract logP property
    add column drugs function="(A.'experimental-properties'.property:kind ~ '^logP').value[1]" name="logP"  
    
    Note that some entries contain text information ('0.61 [HANSCH,C ET AL. (1995)]') so the result column will not be automatically converted to rarray. You can convert it explicitly:
    
    # empty or 'bad' entries will be marked as 'ND'
    add column drugs Rarray( drugs.logP ) name="logPNum" 
    delete drugs.logP
    
    The other example will extract Wikipedia links:
    
    add column drugs \
      function="(A.'external-links'.'external-link':resource == 'RxList')[1].url"\
      name    ="rxlist"
    
  8. Joining with information from drugbank["drugs","partners","partner"]

    For each drug entry we have list of partner IDs which refers to information from drugbank["drugs","partners","partner"] array. To join them we need to add this array to the other table and extract fields which will be used in join.

    
    # creates a table and put partner entries there.
    add column partners drugbank["drugs","partners","partner"] 
    # extract ID column which will be used to join with drugs.partner_id
    add column partners function= "A.id" name="id"   
    # extract uniprot-id from the "external-identifiers" array using query functions
    add column partners \
     function= '(A."external-identifiers"."external-identifier":resource ~ "UniProtKB")."identifier"[1]' \
     name    = "uniprot_id"  
    
    Finally we need to join drugs.partner_id with partners.id.
    
    join drugs.partner_id partners.id column ="drugs.*,partners.uniprot_id" name="drugs"
    
    Note that since drugs.partner_id contains multiple entries for each row the result drugs.uniprot_id will also contain multiple entries for each row. You can set special format with set format command to execute a special action when particular uniprot entry is clicked.
    
    # load sequence 
    set format drugs.uniprot_id \
     "<!--icmscript name=\"1\"\nread sequence swiss \"http://www.uniprot.org/uniprot/%1.txt\"\n--><a href=#_>%1</a>" 
    # or simply go to the website
    set format drugs.uniprot_id "<a href=http://www.expasy.org/uniprot/%1>%1</a>"
    
  9. Joining with chemical structures The final step would be to add a chemical structure information.
    
    # read SDF from the website
    read table mol "http://www.drugbank.ca/system/downloads/current/structures/all.sdf.zip" name="drugs_chem"
    # join 'mol' column
    join drugs.drugbank_id drugs_chem.DRUGBANK_ID  column="drugs.*,drugs_chem.mol" name="drugs"
    
    A little bit more rearrangements and your table is ready to be exported to SDF file.
    
    move drugs.mol 1  # move structure column to the first position
    delete drugs.A    # delete drug-card information
    delete drugs.partner_id # delete partner id information
    write table mol drugs "mydrugs.sdf"
    

See also: collection, read xml



Prev
Cgi programming with icm
Home
Up
Next
Tree cluster

Copyright© 1989-2024, Molsoft,LLC - All Rights Reserved. Copyright© 1989-2024, Molsoft,LLC - All Rights Reserved. This document contains proprietary and confidential information of Molsoft, LLC. The content of this document may not be disclosed to third parties, copied or duplicated in any form, in whole or in part, without the prior written permission from Molsoft, LLC.