Using SQL within CLSD

This page is presents some examples of using SQL within CLSD. For a very brief introduction to SQL, see SQL and the relational algebra. For more detailed information about SQL, see these online tutorials:

For detailed information about using SQL to build queries for dbSNP data see Using dbSNP via SQL queries.

Note that within DB2, CLSD appears as a collection of "schemas", each of which includes a collection of "tables", and that many of the tables are actually "virtual". That is, they are not physically present on local hardware, but are accessed over the network as needed.

Getting general information about the CLSD database

  • To get a list of all tables available within the CLSD instance of DB2:
    select tabname from syscat.tables
    

  • To get all the information within syscat.tables about each table:
    select * from syscat.tables
    

  • To find other information about the CLSD instance, consider this material taken from the Selector web site:

    There are 3 default tablespaces (TABLESPACES):

    • SYSCATSPACE
    • TEMPSPACE
    • USERSPACE1

    Example:

    select * from SYSCAT.TABLESPACES

    There are 4 schemas within SYSCATSPACE

    • SYSIBM - system tables (read only)
    • SYSCAT - views of system tables (read only)
    • SYSFUN - stored functions
    • SYSSTAT - system use statistics

    Here is a table of some useful tables. Note that for each entry in the SYSIBM column there is a corresponding entry in the SYSCAT column.

    Database Object SYSIBM system table SYSCAT system table view SYSSTAT updateable view
    Tablespace SYSTABLESPACES TABLESPACES  
    Table SYSTABLES TABLES TABLES
    Schema SYSSCHEMATA SCHEMATA  
    View SYSVIEWS VIEWS  
    Column SYSCOLUMNS COLUMNS COLUMNS

    Examples from SYSIBM system table area:

         select * from SYSIBM.TABLES
         select * from SYSIBM.SYSTABLES
    
    and examples from the SYSCAT area:
         select * from SYSCAT.SCHEMATA
         select * from SYSCAT.TABLES
         select typename from SYSCAT.DATATYPES
    

  • To get a list of all tables or tablenames in schema dbSNP126_SHARED:
    select * from syscat.tables where tabschema like '%DBSNP126%'
    select tabname from syscat.tables where tabschema='DBSNP126_SHARED'
    
    Note that "like" operands are case-sensitive.

  • To get a list of rowcounts for all tables in schema dbSNP125_human:
    select * from dbsnp125_human.DN_TABLE_ROWCOUNT
    

  • To get a description of each column in each table within schema dbSNP126_shared:
    select * from syscat.tables where tabschema = 'DBSNP126_SHARED'
    

Getting specific information from the database

This section shows some queries that display table contents rather than table "metadata".

  • To get all columns of the alleles table of schema dbSNP126_shared:

    select * from dbsnp126_shared.allele
    
    (Note that this contain more rows than allowed by the default maximum row count of 5000 employed by the reference web interface. If you want to see all the entries, you will have to adjust the maximum row count parameter on the web-based form.)

    On the other hand, to limit the number of rows returned by the SQL command itself, you can use:

    select * from dbsnp126_shared.allele fetch first 100 rows only
    
  • To count the number of entries in the allele database:

    select count(*) from dbSNP126_shared.allele

  • To see the entire HapMap classification code table (9 entries):

    select * from dbsnp126_human.hapmapclasscode

  • To get a list of just the codes and their abbreviations:

    select code,abbrev from dbsnp126_human.hapmapclasscode

  • To run a a BLASTN_NT search:
    select GB_ACC_NUM,DESCRIPTION from NCBI.BLASTN_NT where 
    BLASTSEQ='AGTACTAGCTAGCTAGCTACTAGCTGACTGACTGACTGATGCATCGATGATGC'
    

    Note that the string "BLASTN" includes a trailing "N", and that query nucleotide and AA sequences must be at least 15 bytes long.

    See Using BLAST via CLSD for more information.

  • To get a list of BLAST results filtered by a Prosite pattern: select gb_acc_num,HSP_H_SEQ from ncbi.blastp_nr where blastseq='MSQICKRGLLISNRLAPAALRCKSTWFSEVQMGPPDAILGVTEAFKKDTNPKKI \ NLGAGAYRDDNTQPFVLPSVREAEKRVVSRSLDKEYATIIGIPEFYNKAIELALGKGSKRLAA \ KHNVTAQSISGTGALRIGAAFLAKFWQGNREIYIPSPSWGNHVAIFEHAGLPVNRYRYYDKDT' and DB2LS.LSPatternMatch(HSP_H_SEQ, db2ls.LSPrositePattern('[GS]-[LIVMFYTAC]-[GSTA]-K-x(2)-[GSALVN].') ) > 0 Note the use of the period (.) to terminate the Prosite pattern, and that the LSPatternMatch function returns the character position of the left-most substring matching the pattern, or zero if there is no match.

  • To get a list of genes containing "brain" in their LOCUS_NAME in dbSNP126_shared:
    select * from DBSNP126_SHARED.GENEIDTONAME 
    where locus_name like '%brain%'
    
    Note that "brain" is case specific.

  • To get a list of Bind Genes and their species:
    select GeneNameA,Organism from bind.bind_interaction
    

  • To get a list of genes mentioning "HUMAN" in their descriptions in KEGG:
    select * from KEGG.GENE where description like '%HUMAN%'
    
    Note that "like" operands are case-sensitive.

  • To get a list of all columns in the "REAC%" tables:
    select * from syscat.columns where TABNAME like 'REAC%'
    

  • To get a list of each reaction (showing reactants, enzymes and products from the Ligand database) where the enzyme codes have been replaced with the enzyme names, as held in the Enzyme database, for each pathway recorded in the Ligand database:
    select 
       pw.PATH_ID, pw.REAC_ID, re.COMP_ID, sx.DESCRIPTION, pr.COMP_ID 
    from 
       Ligand.PATHREAC pw, Ligand.REACREAC re, Ligand.REACENZY en, 
       Ligand.REACPROD pr, Enzyme.general sx 
    where 
       en.REAC_ID=re.REAC_ID AND en.REAC_ID=pw.REAC_ID 
       AND en.REAC_ID=pr.REAC_ID 
       AND sx.EC_ID=en.EC_ID 
    order by 
       pw.PATH_ID,pw.REAC_ID
    

  • To get some info from PubMed:
    select PMID, ArticleTitle FROM NCBI.pmarticles 
    where entrez.contains (ArticleTitle, 'granulation') = 1 
       AND entrez.contains (PubDate, '1992') = 1 
    

  • To get a list of sequences involving JAK3 from NCBI Nucleotide:
    SELECT PRIMARYACCESSION, SEQLENGTH, DEFINITION, SEQUENCE 
    FROM NCBI.GBSEQ
    WHERE ( ENTREZ.CONTAINS( ORGANISM, 'HUMAN') = 1 ) 
          AND 
          ( ENTREZ.CONTAINS( DEFINITION, 'JAK3' ) = 1 )
    

Additional information

If you have received SQL error messages from this script, please consult this IBM Wanrning, error and completion messages information.

This index into the IBM help files might also be helpful.

If you have questions about this page, please contact Michael Grobe at dgrobe @ iupui.edu.