Unix Technical Forum

indexing a column with only 2 or 3 values

This is a discussion on indexing a column with only 2 or 3 values within the Oracle Database forums, part of the Database Server Software category; --> Richard Foote wrote: > "Thomas Kellerer" <JUAXQOSZFGQQ@spammotel.com> wrote in message > news:5hg5aiF3kia4nU1@mid.individual.net... >> ciapecki wrote: >>> Hi, >>> >>> ...


Go Back   Unix Technical Forum > Database Server Software > Oracle Database

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

Reply

 

LinkBack Thread Tools Display Modes
  #11 (permalink)  
Old 02-26-2008, 07:26 AM
Thomas Kellerer
 
Posts: n/a
Default Re: indexing a column with only 2 or 3 values

Richard Foote wrote:
> "Thomas Kellerer" <JUAXQOSZFGQQ@spammotel.com> wrote in message
> news:5hg5aiF3kia4nU1@mid.individual.net...
>> ciapecki wrote:
>>> Hi,
>>>
>>> Does indexing a very big table (about 5Mio records) on the columnA
>>> which can hold only values Y,N,<NULL> make sense?
>>>

>> Yes, that's what bitmap indexes were made for.
>>

>
> Hi Thomas
>
> A Bitmap index is of no use if both Y and N are roughly evenly distributed
> and you have no other predicate in the query.
>
> Returning approximately 2.5 millions rows through a bitmap index would be
> dramatically slower than a full table scan.
>
> A single bitmap index only would be useless in this scenario, even more so
> if the table is subjected to any transactional based DML load.
>


Hi Richard,

thanks for the pointing this out.
I wasn't aware of that, but it does sound reasonable.

But after all the Concepts manual says:

"If the number of distinct values of a column is less than 1% of the
number of rows in the table, or if the values in a column are repeated
more than 100 times, then the column is a candidate for a bitmap index."

Actually a bit further down in the Concepts manual there is an example
very similar to the OP's situation:

"There are only three possible values for marital status and region, two
possible values for gender, and four for income level. Therefore, it is
appropriate to create bitmap indexes on these columns"


Cheers
Thomas
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #12 (permalink)  
Old 02-26-2008, 07:26 AM
DA Morgan
 
Posts: n/a
Default Re: indexing a column with only 2 or 3 values

ciapecki wrote:

> BTW. how can you index only N's for example?
>
> chris


With a function based index. Go to Morgan's Library at www.psoug.org
and scroll down to "Function Based Indexes."
--
Daniel A. Morgan
University of Washington
damorgan@x.washington.edu (replace x with u to respond)
Puget Sound Oracle Users Group
www.psoug.org
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #13 (permalink)  
Old 02-26-2008, 07:26 AM
DA Morgan
 
Posts: n/a
Default Re: indexing a column with only 2 or 3 values

ciapecki wrote:
> Hi,
>
> Does indexing a very big table (about 5Mio records) on the columnA
> which can hold only values Y,N,<NULL> make sense?
>
> thanks
> chris


I would strongly urge you to ignore the advice you received from
Thomas Kellerer. Well intentioned by quite likely inaccurate.

The answer is "depends" and the links posted by Charles Hooper
will tell you what it depends upon.

In almost all cases a bitmap index does not belong in an OLTP system.
--
Daniel A. Morgan
University of Washington
damorgan@x.washington.edu (replace x with u to respond)
Puget Sound Oracle Users Group
www.psoug.org
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #14 (permalink)  
Old 02-26-2008, 07:26 AM
Richard Foote
 
Posts: n/a
Default Re: indexing a column with only 2 or 3 values

"Thomas Kellerer" <JUAXQOSZFGQQ@spammotel.com> wrote in message
news:5hgnqqF3l1n69U1@mid.individual.net...
> Richard Foote wrote:
>> "Thomas Kellerer" <JUAXQOSZFGQQ@spammotel.com> wrote in message
>> news:5hg5aiF3kia4nU1@mid.individual.net...
>>> ciapecki wrote:
>>>> Hi,
>>>>
>>>> Does indexing a very big table (about 5Mio records) on the columnA
>>>> which can hold only values Y,N,<NULL> make sense?
>>>>
>>> Yes, that's what bitmap indexes were made for.
>>>

>>
>> Hi Thomas
>>
>> A Bitmap index is of no use if both Y and N are roughly evenly
>> distributed and you have no other predicate in the query.
>>
>> Returning approximately 2.5 millions rows through a bitmap index would be
>> dramatically slower than a full table scan.
>>
>> A single bitmap index only would be useless in this scenario, even more
>> so if the table is subjected to any transactional based DML load.
>>

>
> Hi Richard,
>
> thanks for the pointing this out.
> I wasn't aware of that, but it does sound reasonable.
>
> But after all the Concepts manual says:
>
> "If the number of distinct values of a column is less than 1% of the
> number of rows in the table, or if the values in a column are repeated
> more than 100 times, then the column is a candidate for a bitmap index."
>
> Actually a bit further down in the Concepts manual there is an example
> very similar to the OP's situation:
>
> "There are only three possible values for marital status and region, two
> possible values for gender, and four for income level. Therefore, it is
> appropriate to create bitmap indexes on these columns"
>


Hi Thomas

There are a number of classic myths associated with bitmap indexes and yes,
Oracle is as guilty as anyone in propagating them. One is that bitmap
indexes are only useful for low cardinality columns. The above definition is
better than many I've read but it's still one of those rules of thumbs that
is not entirely accurate as column values outside of the definition could
possibly be candidates for a bitmap index.

The other classic myth is that a *single* bitmap index on a very low
cardinality column (as in the OPs example) can be very efficiently utilised
to retrieve the required number of rows. But if this means retrieving 50% or
33% or 25% etc of all rows in a huge table, then it not going to be very
efficient at all when compared to the poor old full table scan.

Note in the above example, it mentions 4 different columns, not one column.
These four columns when *combined* could possibly reduce the final result
set to a small enough subset of required rows that would make retrieving
them one at a time through the rowids a possibly attractive option. For
example, there may not be that many single males that live in Canberra that
have a really low income ...

At the end of the day, it comes back to the overall selectivity of
predicates and can combinations of bitmaps when and/or/not together produce
a small enough set of rowids to make it all worthwhile to read the bitmap
blocks, perform the set logic and retrieve the resultant rowids one at a
time when compared with other alternatives (such as the full table scan).

The answer is almost certainly a big no for a single bitmap index on a low
cardinality column.

Cheers

Richard


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #15 (permalink)  
Old 02-26-2008, 07:26 AM
Richard Foote
 
Posts: n/a
Default Re: indexing a column with only 2 or 3 values

"Richard Foote" <richard.foote@bigpond.nospam.com> wrote in message
newsCGsi.15616$4A1.14750@news-server.bigpond.net.au...
> "Thomas Kellerer" <JUAXQOSZFGQQ@spammotel.com> wrote in message
> news:5hgnqqF3l1n69U1@mid.individual.net...
>> Richard Foote wrote:
>>> "Thomas Kellerer" <JUAXQOSZFGQQ@spammotel.com> wrote in message
>>> news:5hg5aiF3kia4nU1@mid.individual.net...
>>>> ciapecki wrote:
>>>>> Hi,
>>>>>
>>>>> Does indexing a very big table (about 5Mio records) on the columnA
>>>>> which can hold only values Y,N,<NULL> make sense?
>>>>>
>>>> Yes, that's what bitmap indexes were made for.
>>>>
>>>
>>> Hi Thomas
>>>
>>> A Bitmap index is of no use if both Y and N are roughly evenly
>>> distributed and you have no other predicate in the query.
>>>
>>> Returning approximately 2.5 millions rows through a bitmap index would
>>> be dramatically slower than a full table scan.
>>>
>>> A single bitmap index only would be useless in this scenario, even more
>>> so if the table is subjected to any transactional based DML load.
>>>

>>
>> Hi Richard,
>>
>> thanks for the pointing this out.
>> I wasn't aware of that, but it does sound reasonable.
>>
>> But after all the Concepts manual says:
>>
>> "If the number of distinct values of a column is less than 1% of the
>> number of rows in the table, or if the values in a column are repeated
>> more than 100 times, then the column is a candidate for a bitmap index."
>>
>> Actually a bit further down in the Concepts manual there is an example
>> very similar to the OP's situation:
>>
>> "There are only three possible values for marital status and region, two
>> possible values for gender, and four for income level. Therefore, it is
>> appropriate to create bitmap indexes on these columns"
>>

>
> Hi Thomas
>
> There are a number of classic myths associated with bitmap indexes and
> yes, Oracle is as guilty as anyone in propagating them. One is that bitmap
> indexes are only useful for low cardinality columns. The above definition
> is better than many I've read but it's still one of those rules of thumbs
> that is not entirely accurate as column values outside of the definition
> could possibly be candidates for a bitmap index.
>
> The other classic myth is that a *single* bitmap index on a very low
> cardinality column (as in the OPs example) can be very efficiently
> utilised to retrieve the required number of rows. But if this means
> retrieving 50% or 33% or 25% etc of all rows in a huge table, then it not
> going to be very efficient at all when compared to the poor old full table
> scan.
>
> Note in the above example, it mentions 4 different columns, not one
> column. These four columns when *combined* could possibly reduce the final
> result set to a small enough subset of required rows that would make
> retrieving them one at a time through the rowids a possibly attractive
> option. For example, there may not be that many single males that live in
> Canberra that have a really low income ...
>
> At the end of the day, it comes back to the overall selectivity of
> predicates and can combinations of bitmaps when and/or/not together
> produce a small enough set of rowids to make it all worthwhile to read the
> bitmap blocks, perform the set logic and retrieve the resultant rowids one
> at a time when compared with other alternatives (such as the full table
> scan).
>
> The answer is almost certainly a big no for a single bitmap index on a low
> cardinality column.
>


Hi Thomas

Ops, forgot to highlight my other original point that's ignored in the
Concept recommendations you quoted in that bitmap indexes are going to be
problematic in a transactional environment, primarily due to the likely
locking and contention issues that would result. Although improved in latter
releases, bitmap indexes can also become structurally less efficient under
DML load. Therefore Bitmap indexes and transactional systems do not mix well
at all ...

Cheers

Richard


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #16 (permalink)  
Old 02-26-2008, 07:27 AM
William Robertson
 
Posts: n/a
Default Re: indexing a column with only 2 or 3 values

On Aug 3, 9:47 am, ciapecki <ciape...@gmail.com> wrote:
> On 3 Aug., 10:21, Helma <bad_elef...@hotmail.com> wrote:
>
> > > > >>> Does indexing a very big table (about 5Mio records) on the columnA
> > > > >>> which can hold only values Y,N,<NULL> make sense?
> > > > >> Yes, that's what bitmap indexes were made for.

>
> > FYI , you can also index only the N ( or Y) value's in the column.
> > This is done if e.g. the boolean is an indicator whether a row has
> > gone through a processing run or not, and the application just need to
> > find the N values.
> > Bitmap indexes are not ok if the table is updated intensively.

>
> > Helma

>
> Hi Helma,
>
> It will be actually a view and will be refreshed everyday (once a
> day).
> But thanks for the warning.
>
> BTW. how can you index only N's for example?
>
> chris


Ideally you might arrange the design such that the small number of
rows you typically want to identify are flagged with a 'Y' (say) and
the rest left null. Then a standard btree index on that column would
be compact and efficient. This might also work for Y/N/NULL if the
null values formed the vast majority, although I would be interested
to know what nulls represent in a Yes/No column.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #17 (permalink)  
Old 02-26-2008, 07:27 AM
Helma
 
Posts: n/a
Default Re: indexing a column with only 2 or 3 values


>
> Ideally you might arrange the design such that the small number of
> rows you typically want to identify are flagged with a 'Y' (say) and
> the rest left null. Then a standard btree index on that column would
> be compact and efficient.


Redesign? What's wrong with my suggestion to create an index on only
the Y value's?

H.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #18 (permalink)  
Old 02-26-2008, 07:27 AM
DA Morgan
 
Posts: n/a
Default Re: indexing a column with only 2 or 3 values

Helma wrote:
>> Ideally you might arrange the design such that the small number of
>> rows you typically want to identify are flagged with a 'Y' (say) and
>> the rest left null. Then a standard btree index on that column would
>> be compact and efficient.

>
> Redesign? What's wrong with my suggestion to create an index on only
> the Y value's?
>
> H.


Nothing ... your original idea is superior in several respects. NULLs
are not information ... they are the lack of it.
--
Daniel A. Morgan
University of Washington
damorgan@x.washington.edu (replace x with u to respond)
Puget Sound Oracle Users Group
www.psoug.org
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #19 (permalink)  
Old 02-26-2008, 07:28 AM
William Robertson
 
Posts: n/a
Default Re: indexing a column with only 2 or 3 values

On Aug 4, 4:51 pm, Helma <bad_elef...@hotmail.com> wrote:
> > Ideally you might arrange the design such that the small number of
> > rows you typically want to identify are flagged with a 'Y' (say) and
> > the rest left null. Then a standard btree index on that column would
> > be compact and efficient.

>
> Redesign? What's wrong with my suggestion to create an index on only
> the Y value's?
>
> H.


Nothing. It can be awkward to remember to use a particular NULLIF/
DECODE/CASE expression each time you need to use that column in a
WHERE clause, although of course you might embed the expression in a
view, code it in a procedure or define it in Business Objects or
whatever and never have to care about it again, so it's not
necessarily a big deal.

If the purpose of the column is to flag rows with a particular status
or condition ('needs processing' etc) then storing the opposite value
for all the other rows might be somewhat redundant (you might never
use "where needs_processing = 'N').

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #20 (permalink)  
Old 02-26-2008, 07:30 AM
joel garry
 
Posts: n/a
Default Re: indexing a column with only 2 or 3 values

On Aug 3, 4:42 am, Helma <bad_elef...@hotmail.com> wrote:
> > BTW. how can you index only N's for example?

>
> Hello Chris,
>
> I hope i am not unkind to you, but for syntax matters, may i redirect
> you to the manuals?
>
> H.
> Hint: you need to look at function based indexes, creating a function
> that only returns a value for the recordvalues you need to index.


If you look at function based indexes, be sure and also look at where
the optimizer has problems, search the metalink bug database for:
optimizer function based indexes

Always state exactly which versions of Oracle and your platform.

jg
--
@home.com is bogus.
http://dbasrus.blogspot.com/2007/07/...and-final.html

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 09:43 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com