Molecule mining

Updated on Dec 08, 2024

Edit

Comment

This page describes mining for molecules. Since molecules may be represented by molecular graphs this is strongly related to graph mining and structured data mining. The main problem is how to represent molecules while discriminating the data instances. One way to do this is chemical similarity metrics, which has a long tradition in the field of cheminformatics.

Typical approaches to calculate chemical similarities use chemical fingerprints, but this loses the underlying information about the molecule topology. Mining the molecular graphs directly avoids this problem. So does the inverse QSAR problem which is preferable for vectorial mappings.

Kernel methods

Marginalized graph kernel

Optimal assignment kernel

Pharmacophore kernel

C++ (and R) implementation combining

the marginalized graph kernel between labeled graphs

extensions of the marginalized kernel

Tanimoto kernels

graph kernels based on tree patterns

kernels based on pharmacophores for 3D structure of molecules

Maximum Common Graph methods

MCS-HSCS (Highest Scoring Common Substructure (HSCS) ranking strategy for single MCS)

Small Molecule Subgraph Detector (SMSD)- is a Java-based software library for calculating Maximum Common Subgraph (MCS) between small molecules. This will help us to find similarity/distance between two molecules. MCS is also used for screening drug like compounds by hitting molecules, which share common subgraph (substructure).

Molecular query methods

Warmr

AGM

PolyFARM

FSG

MolFea

MoFa/MoSS

Gaston

LAZAR

ParMol (contains MoFa, FFSM, gSpan, and Gaston)

optimized gSpan

SMIREP

DMax

SAm/AIm/RHC

AFGen

gRed

G-Hash

Methods based on special architectures of neural networks

BPZ

ChemNet

CCS

MolNet

Graph machines

References

Molecule mining Wikipedia

(Text) CC BY-SA

Contents

Kernel methods

Maximum Common Graph methods

Molecular query methods

Methods based on special architectures of neural networks

References