vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I have a table with 17M rows, and a B-tree index with 13M unique values, and clustering factor almost as large as # of unique values. Oracle bypasses the index when querying on the field, and if I force it to use the index will run forever. i also have 2 other indexes i needed to maintain. 2) bitmap index on another field, only 12K unique values, clustering factor of 18K 3) index on a subset of field above, only 90 unique values, clustering factor of 4M. queries on the 2 indexes above run in reasonable time my questions: I understand I have almost the worst possible clustering factor, but, shouldn't the high selectiveness, almost like a PK, get me reasonable performance? What if that had been a PK with same very high clustering factor? Would performance be as bad? I have read this definition of clustering factor: "A count of how many visits to the table you would have to make if each entry in the index was read in turn, with consecutive visits to the same table block not being counted". But if you have a PK or a highly selective index, why would oracle need to read each entry in the index ? What optimization options are available? Will using bitmap help for the big clustering factor index? If I have to rebuild the table to match the index order, what will happen to the other 2 indexes? Will using bitmap indexes help? |
| |||
| "yoram.ayalon@structuredweb.com" <yoram.ayalon@structuredweb.com> wrote: > I have a table with 17M rows, and a B-tree index with 13M unique > values, So is that 13M values with one row each and 1 value with 4M rows? Or is it 9M values with 1 row each and 4M values with 2 rows each? > and clustering factor almost as large as # of unique values. > Oracle bypasses the index when querying on the field, and if I force it > to use the index will run forever. > > i also have 2 other indexes i needed to maintain. > 2) bitmap index on another field, only 12K unique values, clustering > factor of 18K > 3) index on a subset of field above, only 90 unique values, clustering > factor of 4M. > > queries on the 2 indexes above run in reasonable time > > my questions: > > I understand I have almost the worst possible clustering factor, but, > shouldn't the high selectiveness, almost like a PK, get me reasonable > performance? That would depend on the query, eh? If the index has good selectivity but the query does not (i.e. range query, cartesian join, etc), then what do you get for your trouble? > What if that had been a PK with same very high clustering > factor? Would performance be as bad? That would depend on the query. > I have read this definition of > clustering factor: "A count of how many visits to the table you would > have to make if each entry in the index was read in turn, with > consecutive visits to the same table block not being counted". But if > you have a PK or a highly selective index, why would oracle need to > read each entry in the index ? That would depend on what the query actually is. > What optimization options are available? That would depend on what the query is. .... Xho -- -------------------- http://NewsReader.Com/ -------------------- Usenet Newsgroup Service $9.95/Month 30GB |
| |||
| thanks for the answer. >>So is that 13M values with one row each and 1 value with 4M rows? >>Or is it 9M values with 1 row each and 4M values with 2 rows each? its the second scenario. most values have 1 row, some values have a low number of rows, less than 10 is the norm. >>That would depend on the query I am using a very simply query. SELECT * FROM TABLE WHERE X = value; X is the field indexed, VARCHAR2(9). my test query returns 4 rows |
| |||
| "yoram.ayalon@structuredweb.com" <yoram.ayalon@structuredweb.com> wrote: > thanks for the answer. > > >>So is that 13M values with one row each and 1 value with 4M rows? > >>Or is it 9M values with 1 row each and 4M values with 2 rows each? > > its the second scenario. most values have 1 row, some values have a low > number of rows, less than 10 is the norm. > > >>That would depend on the query > > I am using a very simply query. > > SELECT * FROM TABLE WHERE X = value; > > X is the field indexed, VARCHAR2(9). my test query returns 4 rows What is the explain plan for it? Is value a varchar2? Maybe type conversion is preventing effective use of the index. -- -------------------- http://NewsReader.Com/ -------------------- Usenet Newsgroup Service $9.95/Month 30GB |
| |||
| I feel so stupid... I was using number instead of varchar2. now its working MUCH faster but what does this say about my original question on clustering factor and uniqueness? what is deemed the best practice in terms of type of index to use, bitmap or btree, and the need to rebuild table? |
| |||
| <yoram.ayalon@structuredweb.com> wrote in message news:1143668115.350817.120590@g10g2000cwb.googlegr oups.com... >I feel so stupid... I was using number instead of varchar2. now its > working MUCH faster > > but what does this say about my original question on clustering factor > and uniqueness? what is deemed the best practice in terms of type of > index to use, bitmap or btree, and the need to rebuild table? > Hi Yoram, The clustering factor is simply used by the cost based optimizer to calculate the cost of accessing the table via the use of the specific index. Simple example: A table has 50,000 rows that fit in 50 data blocks (1000 rows per block) A query is calculated by the CBO to access 10% (or 5,000) of rows in this table. The cost of accessing the table via the index is therefore (roughly) the I/O in reading the index itself (in this example assuming a 2 level index, 1 I/O for the index root block + 10% of index leaf blocks) + the cost of reading the table. The cost of reading the table is simply 10% of the clustering factor. In the case of a "really good" CF of 50 (i.e. close to the blocks in the table meaning the data is pretty well sorted in the same manner as the index) that would by 10% of 50 = 5 blocks. In the case of a "really bad" CF of 50,000 (i.e. close to the number of rows in the table meaning the data is scattered all over the place in relation to the index and each read of the table takes us to a different table block from the previous read) that would be 10% of 50,000 = 5,000 blocks. In the case of "an average" CF of 500, that would be 10% of 500 or 50 blocks (meaning on average having 100 rows of interest in each and every block). Therefore the CBO is much more likely to use the index in the first example and not so likely to in the second or third examples. In the case of a single lookup via a PK, the CF is redundant as we only need to access the one table block. As Xho suggested, the CF becomes a bigger issue/differentiator as the cardinality of the query against the table increases. Hopes this makes sense and helps. Cheers Richard |
| |||
| thanks for the support followup - I rebuilt the index as a bitmap, and also corrected the bug, now i access it thru its correct type. I only caught ther bug as I was already rebuilding the index as a bitmap... the access is really fast now, even though the CF is of course the same. this tells me Oracle understands that since it has almost as many distinct keys as a PK it can ignore the high CF this table is updated in batches, in the OLTP is only selected, thus a bitmap idnex is feasible. now I have to rebuilt the index as a b-tree and check the difference. I wil post the results in case anyone is still interested I don't think I will rebuilt the table as along as I get results as of now. At least I have learned a lot about indexes, CF and the all the other good stuff. thanks again everone |
| |||
| <yoram.ayalon@structuredweb.com> wrote in message news:1143752412.171160.252210@e56g2000cwe.googlegr oups.com... > thanks for the support > > followup - I rebuilt the index as a bitmap, and also corrected the bug, > now i access it thru its correct type. I only caught ther bug as I was > already rebuilding the index as a bitmap... > > the access is really fast now, even though the CF is of course the > same. this tells me Oracle understands that since it has almost as many > distinct keys as a PK it can ignore the high CF > > this table is updated in batches, in the OLTP is only selected, thus a > bitmap idnex is feasible. > > now I have to rebuilt the index as a b-tree and check the difference. I > wil post the results in case anyone is still interested > > > > > I don't think I will rebuilt the table as along as I get results as of > now. At least I have learned a lot about indexes, CF and the all the > other good stuff. thanks again everone > For bitmap indexes, the clustering_factor column is irrelevant - it simply echoes the num_rows column of user_indexes. The data distribution pattern is assumed to be 80% packed, 20% scattered for the purposes of calculating the effective clustering of bitmapped data. (See Oracle Wait Interface by Richmond Shee et. al.) -- Regards Jonathan Lewis http://www.oracle.com/technology/com...ce1.html#lewis The Co-operative Oracle Users' FAQ http://www.jlcomp.demon.co.uk/faq/ind_faq.html Cost Based Oracle: Fundamentals http://www.jlcomp.demon.co.uk/cbo_book/ind_book.html |
| ||||
| On Thu, 30 Mar 2006 13:12:08 +0000, Richard Foote wrote: > > The clustering factor is simply used by the cost based optimizer to > calculate the cost of accessing the table via the use of the specific index. There is an excellent, if somewhat complex presentation on MiracleAS site: http://www.miracleas.dk/images/uploa...rd%20Foote.pdf This document explains clustering factor, among other things. It also shows how to dump index blocks and interpret those dumps. I've been long waiting for an opportunity to ask author to update it with the situations from ASSM tablespaces and 10g database (the article deals with 9.2). I'd love to see more detailed explanation of Oracle dumps and statuses. Please, consider yourself asked. -- http://www.mgogala.com |