The records, or rows, of any table can be clustered into a hierarchical tree, and one or several trees
associated with this table can be stored with it, displayed and edited in the ICM GUI, and deleted.
A tree is created with the make tree command.
We can decide 1) the tree type and, 2) the distance function between two table rows, as well as
establish a number of arguments. Then a tree object is added to the header of the table and is stored
together with the table. The table gets a new
column with the tree order, and optionally two new elements: and a column with the branch number at a certain level,
(option split) and the distance matrix (option matrix).
The related commands and functions:
| make tree create tree object and attach it to the table |
| Split function to split cluster by threshold or number of clusters |
| split command to change the position of tree cursor (separator) and recalculate new cluster numbers |
| Name( table.cluster i_tree [index,label,matrix,sort,split] ) names of important table columns |
Max( table.cluster ) the distance of the root node
|
| Distance of the cluster splitting level |
| Nof( table.cluster tree ) clusters |
| Centers of clusters |
Example:
# create a distance matrix
m=Matrix(5,3)
m[2,1:3]={1. 0. 0.}
m[3,1:3]={1. 1. 0.}
m[4,1:3]={1. 1. 1.}
m[5,1:3]={1. 0.1 0.1}
D = Distance( m )
# create a table and move distance matrix into header
group table t { "a" "b" "c" "d" "e" } "label" {1. 2. 2. 1. 4. } "val"
group table t append header D "dm"
make tree t distance = "dm" # uses external distance matrix for clustering
# get cluster number with threshold set to the middle
cl = Split( t.cluster, Max( t.cluster )/2 )
add column t cl name="cl"
# group by cluster and take rows by smallest value of "val" column
group t.cl t.val "min" all "refmin" name="t1"
This involves several steps:
- creating a tree and a table column with cluster numbers
- selecting cluster representatives according to a certain threshold in the cluster tree
Example:
read table mol s_icmhome + "drug_groups.sdf"
make tree drug_groups
I = Index( drug_groups.cluster center 0.4 ) # divide at threshold 0.4