This is a discussion on indexing a column with only 2 or 3 values within the Oracle Database forums, part of the Database Server Software category; --> Richard Foote wrote: > "Thomas Kellerer" <JUAXQOSZFGQQ@spammotel.com> wrote in message > news:5hg5aiF3kia4nU1@mid.individual.net... >> ciapecki wrote: >>> Hi, >>> >>> ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Richard Foote wrote: > "Thomas Kellerer" <JUAXQOSZFGQQ@spammotel.com> wrote in message > news:5hg5aiF3kia4nU1@mid.individual.net... >> ciapecki wrote: >>> Hi, >>> >>> Does indexing a very big table (about 5Mio records) on the columnA >>> which can hold only values Y,N,<NULL> make sense? >>> >> Yes, that's what bitmap indexes were made for. >> > > Hi Thomas > > A Bitmap index is of no use if both Y and N are roughly evenly distributed > and you have no other predicate in the query. > > Returning approximately 2.5 millions rows through a bitmap index would be > dramatically slower than a full table scan. > > A single bitmap index only would be useless in this scenario, even more so > if the table is subjected to any transactional based DML load. > Hi Richard, thanks for the pointing this out. I wasn't aware of that, but it does sound reasonable. But after all the Concepts manual says: "If the number of distinct values of a column is less than 1% of the number of rows in the table, or if the values in a column are repeated more than 100 times, then the column is a candidate for a bitmap index." Actually a bit further down in the Concepts manual there is an example very similar to the OP's situation: "There are only three possible values for marital status and region, two possible values for gender, and four for income level. Therefore, it is appropriate to create bitmap indexes on these columns" Cheers Thomas |
| |||
| ciapecki wrote: > BTW. how can you index only N's for example? > > chris With a function based index. Go to Morgan's Library at www.psoug.org and scroll down to "Function Based Indexes." -- Daniel A. Morgan University of Washington damorgan@x.washington.edu (replace x with u to respond) Puget Sound Oracle Users Group www.psoug.org |
| |||
| ciapecki wrote: > Hi, > > Does indexing a very big table (about 5Mio records) on the columnA > which can hold only values Y,N,<NULL> make sense? > > thanks > chris I would strongly urge you to ignore the advice you received from Thomas Kellerer. Well intentioned by quite likely inaccurate. The answer is "depends" and the links posted by Charles Hooper will tell you what it depends upon. In almost all cases a bitmap index does not belong in an OLTP system. -- Daniel A. Morgan University of Washington damorgan@x.washington.edu (replace x with u to respond) Puget Sound Oracle Users Group www.psoug.org |
| |||
| "Thomas Kellerer" <JUAXQOSZFGQQ@spammotel.com> wrote in message news:5hgnqqF3l1n69U1@mid.individual.net... > Richard Foote wrote: >> "Thomas Kellerer" <JUAXQOSZFGQQ@spammotel.com> wrote in message >> news:5hg5aiF3kia4nU1@mid.individual.net... >>> ciapecki wrote: >>>> Hi, >>>> >>>> Does indexing a very big table (about 5Mio records) on the columnA >>>> which can hold only values Y,N,<NULL> make sense? >>>> >>> Yes, that's what bitmap indexes were made for. >>> >> >> Hi Thomas >> >> A Bitmap index is of no use if both Y and N are roughly evenly >> distributed and you have no other predicate in the query. >> >> Returning approximately 2.5 millions rows through a bitmap index would be >> dramatically slower than a full table scan. >> >> A single bitmap index only would be useless in this scenario, even more >> so if the table is subjected to any transactional based DML load. >> > > Hi Richard, > > thanks for the pointing this out. > I wasn't aware of that, but it does sound reasonable. > > But after all the Concepts manual says: > > "If the number of distinct values of a column is less than 1% of the > number of rows in the table, or if the values in a column are repeated > more than 100 times, then the column is a candidate for a bitmap index." > > Actually a bit further down in the Concepts manual there is an example > very similar to the OP's situation: > > "There are only three possible values for marital status and region, two > possible values for gender, and four for income level. Therefore, it is > appropriate to create bitmap indexes on these columns" > Hi Thomas There are a number of classic myths associated with bitmap indexes and yes, Oracle is as guilty as anyone in propagating them. One is that bitmap indexes are only useful for low cardinality columns. The above definition is better than many I've read but it's still one of those rules of thumbs that is not entirely accurate as column values outside of the definition could possibly be candidates for a bitmap index. The other classic myth is that a *single* bitmap index on a very low cardinality column (as in the OPs example) can be very efficiently utilised to retrieve the required number of rows. But if this means retrieving 50% or 33% or 25% etc of all rows in a huge table, then it not going to be very efficient at all when compared to the poor old full table scan. Note in the above example, it mentions 4 different columns, not one column. These four columns when *combined* could possibly reduce the final result set to a small enough subset of required rows that would make retrieving them one at a time through the rowids a possibly attractive option. For example, there may not be that many single males that live in Canberra that have a really low income ... At the end of the day, it comes back to the overall selectivity of predicates and can combinations of bitmaps when and/or/not together produce a small enough set of rowids to make it all worthwhile to read the bitmap blocks, perform the set logic and retrieve the resultant rowids one at a time when compared with other alternatives (such as the full table scan). The answer is almost certainly a big no for a single bitmap index on a low cardinality column. Cheers Richard |
| |||
| "Richard Foote" <richard.foote@bigpond.nospam.com> wrote in message news > "Thomas Kellerer" <JUAXQOSZFGQQ@spammotel.com> wrote in message > news:5hgnqqF3l1n69U1@mid.individual.net... >> Richard Foote wrote: >>> "Thomas Kellerer" <JUAXQOSZFGQQ@spammotel.com> wrote in message >>> news:5hg5aiF3kia4nU1@mid.individual.net... >>>> ciapecki wrote: >>>>> Hi, >>>>> >>>>> Does indexing a very big table (about 5Mio records) on the columnA >>>>> which can hold only values Y,N,<NULL> make sense? >>>>> >>>> Yes, that's what bitmap indexes were made for. >>>> >>> >>> Hi Thomas >>> >>> A Bitmap index is of no use if both Y and N are roughly evenly >>> distributed and you have no other predicate in the query. >>> >>> Returning approximately 2.5 millions rows through a bitmap index would >>> be dramatically slower than a full table scan. >>> >>> A single bitmap index only would be useless in this scenario, even more >>> so if the table is subjected to any transactional based DML load. >>> >> >> Hi Richard, >> >> thanks for the pointing this out. >> I wasn't aware of that, but it does sound reasonable. >> >> But after all the Concepts manual says: >> >> "If the number of distinct values of a column is less than 1% of the >> number of rows in the table, or if the values in a column are repeated >> more than 100 times, then the column is a candidate for a bitmap index." >> >> Actually a bit further down in the Concepts manual there is an example >> very similar to the OP's situation: >> >> "There are only three possible values for marital status and region, two >> possible values for gender, and four for income level. Therefore, it is >> appropriate to create bitmap indexes on these columns" >> > > Hi Thomas > > There are a number of classic myths associated with bitmap indexes and > yes, Oracle is as guilty as anyone in propagating them. One is that bitmap > indexes are only useful for low cardinality columns. The above definition > is better than many I've read but it's still one of those rules of thumbs > that is not entirely accurate as column values outside of the definition > could possibly be candidates for a bitmap index. > > The other classic myth is that a *single* bitmap index on a very low > cardinality column (as in the OPs example) can be very efficiently > utilised to retrieve the required number of rows. But if this means > retrieving 50% or 33% or 25% etc of all rows in a huge table, then it not > going to be very efficient at all when compared to the poor old full table > scan. > > Note in the above example, it mentions 4 different columns, not one > column. These four columns when *combined* could possibly reduce the final > result set to a small enough subset of required rows that would make > retrieving them one at a time through the rowids a possibly attractive > option. For example, there may not be that many single males that live in > Canberra that have a really low income ... > > At the end of the day, it comes back to the overall selectivity of > predicates and can combinations of bitmaps when and/or/not together > produce a small enough set of rowids to make it all worthwhile to read the > bitmap blocks, perform the set logic and retrieve the resultant rowids one > at a time when compared with other alternatives (such as the full table > scan). > > The answer is almost certainly a big no for a single bitmap index on a low > cardinality column. > Hi Thomas Ops, forgot to highlight my other original point that's ignored in the Concept recommendations you quoted in that bitmap indexes are going to be problematic in a transactional environment, primarily due to the likely locking and contention issues that would result. Although improved in latter releases, bitmap indexes can also become structurally less efficient under DML load. Therefore Bitmap indexes and transactional systems do not mix well at all ... Cheers Richard |
| |||
| On Aug 3, 9:47 am, ciapecki <ciape...@gmail.com> wrote: > On 3 Aug., 10:21, Helma <bad_elef...@hotmail.com> wrote: > > > > > >>> Does indexing a very big table (about 5Mio records) on the columnA > > > > >>> which can hold only values Y,N,<NULL> make sense? > > > > >> Yes, that's what bitmap indexes were made for. > > > FYI , you can also index only the N ( or Y) value's in the column. > > This is done if e.g. the boolean is an indicator whether a row has > > gone through a processing run or not, and the application just need to > > find the N values. > > Bitmap indexes are not ok if the table is updated intensively. > > > Helma > > Hi Helma, > > It will be actually a view and will be refreshed everyday (once a > day). > But thanks for the warning. > > BTW. how can you index only N's for example? > > chris Ideally you might arrange the design such that the small number of rows you typically want to identify are flagged with a 'Y' (say) and the rest left null. Then a standard btree index on that column would be compact and efficient. This might also work for Y/N/NULL if the null values formed the vast majority, although I would be interested to know what nulls represent in a Yes/No column. |
| |||
| > > Ideally you might arrange the design such that the small number of > rows you typically want to identify are flagged with a 'Y' (say) and > the rest left null. Then a standard btree index on that column would > be compact and efficient. Redesign? What's wrong with my suggestion to create an index on only the Y value's? H. |
| |||
| Helma wrote: >> Ideally you might arrange the design such that the small number of >> rows you typically want to identify are flagged with a 'Y' (say) and >> the rest left null. Then a standard btree index on that column would >> be compact and efficient. > > Redesign? What's wrong with my suggestion to create an index on only > the Y value's? > > H. Nothing ... your original idea is superior in several respects. NULLs are not information ... they are the lack of it. -- Daniel A. Morgan University of Washington damorgan@x.washington.edu (replace x with u to respond) Puget Sound Oracle Users Group www.psoug.org |
| |||
| On Aug 4, 4:51 pm, Helma <bad_elef...@hotmail.com> wrote: > > Ideally you might arrange the design such that the small number of > > rows you typically want to identify are flagged with a 'Y' (say) and > > the rest left null. Then a standard btree index on that column would > > be compact and efficient. > > Redesign? What's wrong with my suggestion to create an index on only > the Y value's? > > H. Nothing. It can be awkward to remember to use a particular NULLIF/ DECODE/CASE expression each time you need to use that column in a WHERE clause, although of course you might embed the expression in a view, code it in a procedure or define it in Business Objects or whatever and never have to care about it again, so it's not necessarily a big deal. If the purpose of the column is to flag rows with a particular status or condition ('needs processing' etc) then storing the opposite value for all the other rows might be somewhat redundant (you might never use "where needs_processing = 'N'). |
| ||||
| On Aug 3, 4:42 am, Helma <bad_elef...@hotmail.com> wrote: > > BTW. how can you index only N's for example? > > Hello Chris, > > I hope i am not unkind to you, but for syntax matters, may i redirect > you to the manuals? > > H. > Hint: you need to look at function based indexes, creating a function > that only returns a value for the recordvalues you need to index. If you look at function based indexes, be sure and also look at where the optimizer has problems, search the metalink bug database for: optimizer function based indexes Always state exactly which versions of Oracle and your platform. jg -- @home.com is bogus. http://dbasrus.blogspot.com/2007/07/...and-final.html |