This is a discussion on indexing within the SQL Server forums, part of the Microsoft SQL Server category; --> Hello, I need some help understanding why my indexes do not seem to be affecting my searches. I would ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hello, I need some help understanding why my indexes do not seem to be affecting my searches. I would really appreciate help understanding what indexes I need to make this query run faster. I realize that I use wildcards when searching for g1.gene_name, but is there anything I can do to make that less of a problem? I ran EXPLAIN on the search I wanted to optimize and got the following: EXPLAIN SELECT c1.SFID FROM Gene g1, cDNA c1, Transcript t1, Refseq r1 WHERE (c1.SFID = t1.cDNA_SFID AND t1.gene_SFID = g1.SFID AND (g1.gene_sym = 'hh' OR g1.genbank_acc = 'hh' OR g1.gene_name LIKE '%hh%')) OR (c1.genbank_acc = 'hh' OR c1.SUID = 'hh') OR (c1.SFID = t1.cDNA_SFID AND t1.gene_SFID = g1.SFID AND g1.locuslink_id = r1.locuslink_id AND (r1.mRNA_acc = 'hh')); +-------+-------+--------------------------+------+---------+------+-------- +-------------------------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +-------+-------+--------------------------+------+---------+------+-------- +-------------------------+ | r1 | index | mRNA_acc,llid,rma,rllid | rma | 25 | NULL | 20093 | Using index | | g1 | ALL | PRIMARY,llid,ggs,gga,gll | NULL | NULL | NULL | 190475 | | | c1 | ALL | PRIMARY,cga,cs | NULL | NULL | NULL | 43714 | where used | | t1 | index | gene_SFID,gS,cS,tg,tc | gS | 4 | NULL | 47238 | where used; Using index | +-------+-------+--------------------------+------+---------+------+-------- +-------------------------+ I have the following indexes (which were all added after the database was populated): ALTER TABLE cDNA ADD INDEX cga (genbank_acc, SFID); ALTER TABLE cDNA ADD INDEX co (organism, SFID); ALTER TABLE cDNA ADD INDEX cs (SUID, SFID); ALTER TABLE Gene ADD INDEX ggs (gene_sym, SFID); ALTER TABLE Gene ADD INDEX gga (genbank_acc, SFID); ALTER TABLE Gene ADD INDEX ggn (gene_name, SFID); ALTER TABLE Gene ADD INDEX go (organism, SFID); ALTER TABLE Gene ADD INDEX gll (locuslink_id, SFID); ALTER TABLE Gene ADD INDEX gui (unigene_id, SFID); ALTER TABLE Transcript ADD INDEX tg (gene_SFID, cDNA_SFID); ALTER TABLE Transcript ADD INDEX tc (cDNA_SFID); ALTER TABLE Refseq ADD INDEX rma (mRNA_acc, locuslink_id); ALTER TABLE Refseq ADD INDEX rllid (locuslink_id); |
| |||
| "superfly2" <darius_fatakia@yahoo.com> wrote in message news:cb030a$11i$1@news.Stanford.EDU... > Hello, > > I need some help understanding why my indexes do not seem to be affecting my > searches. I would really appreciate help understanding what indexes I need > to make this query run faster. I realize that I use wildcards when searching > for g1.gene_name, but is there anything I can do to make that less of a > problem? I ran EXPLAIN on the search I wanted to optimize and got the > following: > > EXPLAIN SELECT c1.SFID FROM Gene g1, cDNA c1, Transcript t1, Refseq r1 WHERE > (c1.SFID = t1.cDNA_SFID AND t1.gene_SFID = g1.SFID AND (g1.gene_sym = 'hh' > OR g1.genbank_acc = 'hh' OR g1.gene_name LIKE '%hh%')) OR (c1.genbank_acc = > 'hh' OR c1.SUID = 'hh') OR (c1.SFID = t1.cDNA_SFID AND t1.gene_SFID = > g1.SFID AND g1.locuslink_id = r1.locuslink_id AND (r1.mRNA_acc = 'hh')); > +-------+-------+--------------------------+------+---------+------+-------- > +-------------------------+ > | table | type | possible_keys | key | key_len | ref | rows > | Extra | > +-------+-------+--------------------------+------+---------+------+-------- > +-------------------------+ > | r1 | index | mRNA_acc,llid,rma,rllid | rma | 25 | NULL | 20093 > | Using index | > | g1 | ALL | PRIMARY,llid,ggs,gga,gll | NULL | NULL | NULL | 190475 > | | > | c1 | ALL | PRIMARY,cga,cs | NULL | NULL | NULL | 43714 > | where used | > | t1 | index | gene_SFID,gS,cS,tg,tc | gS | 4 | NULL | 47238 > | where used; Using index | > +-------+-------+--------------------------+------+---------+------+-------- > +-------------------------+ > > > > I have the following indexes (which were all added after the database was > populated): > > ALTER TABLE cDNA ADD INDEX cga > (genbank_acc, SFID); > > ALTER TABLE cDNA ADD INDEX co > (organism, SFID); > > ALTER TABLE cDNA ADD INDEX cs > (SUID, SFID); > > > ALTER TABLE Gene ADD INDEX ggs > (gene_sym, SFID); > > ALTER TABLE Gene ADD INDEX gga > (genbank_acc, SFID); > > ALTER TABLE Gene ADD INDEX ggn > (gene_name, SFID); > > ALTER TABLE Gene ADD INDEX go > (organism, SFID); > > ALTER TABLE Gene ADD INDEX gll > (locuslink_id, SFID); > > ALTER TABLE Gene ADD INDEX gui > (unigene_id, SFID); > > > ALTER TABLE Transcript ADD INDEX tg > (gene_SFID, cDNA_SFID); > > ALTER TABLE Transcript ADD INDEX tc > (cDNA_SFID); > > > ALTER TABLE Refseq ADD INDEX rma > (mRNA_acc, locuslink_id); > > ALTER TABLE Refseq ADD INDEX rllid > (locuslink_id); > > I believe that EXPLAIN is a MySQL statement - this is a Microsoft SQL Server nesgroup, so I guess you'll get a better answer in a MySQL forum. Simon |
| ||||
| [posted and mailed, please reply in news] superfly2 (darius_fatakia@yahoo.com) writes: > EXPLAIN SELECT c1.SFID FROM Gene g1, cDNA c1, Transcript t1, Refseq r1 > WHERE (c1.SFID = t1.cDNA_SFID AND t1.gene_SFID = g1.SFID AND > (g1.gene_sym = 'hh' OR g1.genbank_acc = 'hh' OR g1.gene_name LIKE > '%hh%')) OR (c1.genbank_acc = 'hh' OR c1.SUID = 'hh') OR (c1.SFID = > t1.cDNA_SFID AND t1.gene_SFID = g1.SFID AND g1.locuslink_id = > r1.locuslink_id AND (r1.mRNA_acc = 'hh')); Obviously you are in the wrong newsgroup, beause there is no EXPLAIN command in SQL Server, and you don't use ALTER TABLE to add indexes. Nevertheless, I might be able to give some input. You have this condition: g1.gene_name LIKE '%hh%' To resolve this condition, the DB engine cannot use the index for a quick look up. Compare with looking through the index of a book and try to find all keywords with 'hh' in them. You would have to scan the entire index. I don't know about your DBMS, but MS SQL Server could use the index for a scan, and it would do it, if the index includes all columns in the table that aer referred to. If I see right, you would have to add locuslink_id to ALTER TABLE Gene ADD INDEX ggs (gene_sym, SFID); Not that I know if this would help on your engine, but most DBMS's are fond of using covering indexes. Then again, I have no idea how your DBMS handle all those OR. -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Books Online for SQL Server SP3 at http://www.microsoft.com/sql/techinf...2000/books.asp |