vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| We have a new query, with a where-clause: where a < b (a & b are date fields) and c = 0 (c can have values from 0 to 270) and d = "2" (d can have 4 different values) What is the best way to write this where clause, and which are the best indexes to use ? The table has a total of 32 million records. My own thoughts were: where d = "2" (get rid of most of the records) and c = 0 (reduce the above subset to about 25%) and a < b (leave the sequential scan for this remaining data set) And create indexes on c & d (which brings up another question - composite or stand alone ?) sending to informix-list |
| |||
| Dirk Moolman wrote: It does not matter. The IDS optimizer uses costs to determine the best query plan. The order of the filters and join conditions in the WHERE and ON clauses do not affect the choices that the optimizer makes. You CAN affect the optimizer's decisions my maintaining sufficiently detailed statistics in the database by running UPDATE STATISTICS according to the recommendations in the Performance Guide (or as implemented in my dostats utility) and by providing appropriate indexes. To that know that IDS will only use ONE index per table so if you create singleton indexes on a, c, & d they will not be combined to filter this query, only the one providing the best filter value will be used. A composite index containing all three will help greatly though. Art S. Kagel > We have a new query, with a where-clause: > > where a < b (a & b are date fields) > and c = 0 (c can have values from 0 to 270) > and d = "2" (d can have 4 different values) > > > What is the best way to write this where clause, and which are the best > indexes to use ? > > The table has a total of 32 million records. > > > > > My own thoughts were: > > where d = "2" (get rid of most of the records) > and c = 0 (reduce the above subset to about 25%) > and a < b (leave the sequential scan for this remaining data > set) > > > And create indexes on c & d (which brings up another question - > composite or stand alone ?) > > > > sending to informix-list |
| ||||
| Dirk Moolman wrote: > We have a new query, with a where-clause: > > where a < b (a & b are date fields) > and c = 0 (c can have values from 0 to 270) > and d = "2" (d can have 4 different values) > > > What is the best way to write this where clause, and which are the best > indexes to use ? > > The table has a total of 32 million records. > The 'a < b' equation can only be solved by a sequential scan. Now we do have methods for reducing the amount of sequential scan. One way would be as you already are looking at: > where d = "2" (get rid of most of the records) > and c = 0 (reduce the above subset to about 25%) If this reduces the number of rows (or tuples), you could do two queries with the first creating a temporary table (into temp). Then run the 'a < b' on the temp table. 'd' can have 4 values ("0", "1", "2", "3") You could partion the table using ... fragment by expression when d = "0" then dbs0 when d = "1" then dbs1 when d = "2" then dbs2 when d = "3" then dbs3 A query with > where a < b (a & b are date fields) > and c = 0 (c can have values from 0 to 270) > and d = "2" (d can have 4 different values) would then eliminate 3 of the fragments and only scan the 'dbs2' partition. If you want to have many (small) partitions you could use column c, fx ... fragment by expression when and c < 10 and c >= 0 then dbs0 when and c < 20 and c >= 10 then dbs1 when and c < 30 and c >= 20 then dbs2 [snip] when c >= 260 then dbs26 Another approach would be to run update statistics high on table(c, d) and the examine the distribution values to find out which of the two columns are most significant. If the 'c'-values are evenly distributed, then a composite index "c, d" will be very efficient. |
| Thread Tools | |
| Display Modes | |
|
|