SVM Compounds

Compound Files for Jorissen and Gilson SVM Paper

This page provides access to SDfiles for the compounds used in Robert Jorissen and Mike Gilson's paper on the use of a Support Vector Machine method for compound screening. These files are freely available for academic, commercial, or personal use. We do ask that you cite our reference in any publication that uses this information: Virtual Screening of Molecular Databases Using a Support Vector Machine, Jorissen & Gilson, J.Chem.Inf.Mod. 45:549-561, 2005, DOI 10.1021/ci049641u .

This download contains:

    • compounds_1ST.sdf
    • compounds_2ND.sdf
    • compounds_EVEN.sdf
    • compounds_ODD.sdf
    • readme.txt

Each SDfile begins with 125 known binders: 25 compounds for each protein target. The targets are as follows:

    • CDK2: cyclin-dependent kinase 2
    • COX2: cyclooxygenase 2
    • FXa: coagulation factor Xa
    • PDE5: phosphodiesterase 5
    • A1A: alpha-1A adrenoceptor

The subsequent compounds are "decoys" from the National Cancer Institute diversity set. Note that the decoys included in compounds_1ST.sdf are the same as those included in compounds_ODD.sdf and those included in compounds_2ND.sdf are the same as those included in compounds_EVEN.sdf.

The NCI diversity set compounds were filtered and prepared as follows:

    1. For each compound (connection table) in the SDfile, retain only the connected molecule with the largest number of atoms.
    2. Omit molecules with atoms other than C,N,O,H,P,S and the halogens.
    3. Remove hydrogen atoms.
    4. Identify molecules with improper valences and fix them.
    5. Omit one molecule found to have a positively charged oxygen.
    6. Add back hydrogen atoms appropriate to pH ~ 7 using a locally modified (Jorissen) version of the program Babel (Pat Walters and Matt Stahl).
    7. Generate new 3D geometries using the "predock" (2D-to-3D) setting of a pre-release version of the program Vconf (VeraChem, LLC).