This is a discussion on Re: data distribution question within the Informix forums, part of the Database Server Software category; --> Thank you for the reply. Let me try and understand. Taking bin 3, out of 1.9 million rows in ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Thank you for the reply. Let me try and understand. Taking bin 3, out of 1.9 million rows in it, there are only 9.9k rows that are unique. The rest of the rows are duplicate. The range for that bin contains values between 25763 and 35887. Since I don't see any values in that range, in the overflow, then I guess Ican assume that there are no major "skewers" of the value in that bin, andthat there is just a very high percentage of values that are duplicated. Is that right so far ? So doesn't that mean that the index would have to do a lot of extra reads to find a value ? Thanks for your patience. ======================== -<<Floyd Wellershaus>>- Database Administrator Unix Administrator email: fwellers@yahoo.com Home: 703-430-0805 Cell: 703-477-6045 ======================== http://www.one.org/ ----- Original Message ---- From: Andrew Ford <andrewford@austin.rr.com> To: Floyd Wellershaus <fwellers@yahoo.com>; informix-list@iiug.org Sent: Wednesday, August 2, 2006 10:50:32 PM Subject: Re: data distribution question By 25% duplicate values do you mean you have 100,000 rows in the table and 25,000 have a value of "A", 25,000 have a value of "B", 25,000 have a valueof "C" and 25,000 have a value of "D"? If so, I would think that an index on that column might not serve much purpose unless it was for a foreign key. If you mean you have 75% unique values (or almost unique values) and 25% ofthe data has the same value then the optimizer will (hopefully) choose an index when you are looking for a few rows living in the 75% of the data that is unique and might choose a scan if you're looking for a few rows in the25% of the data that is a duplicate and probably will choose a scan if youare looking for a lot of rows in the table. Based on your distributions it looks like your data has some good uniqueness. If the value for that column isn't defined in the overflow buckets you can be pretty sure you only have 0.5% duplicates based on the data from bin#3. You are right, the bins don't have anything to do with index traversal. The optimizer uses the data distributions to make a guess as to which access method will be the best: one of the many indexes you have created or a sequential scan. Once the decision is made you either traverse the index tree looking for data or sequentially scan the table. Andrew ----- Original Message ----- From: Floyd Wellershaus Newsgroups: comp.databases.informix To: Ford, Andrew G ; informix-list@iiug.org Sent: Wednesday, August 02, 2006 7:29 PM Subject: Re: data distribution question Hmm. Thanks. I like the way you explain it. It helps me to see it more clearly. I'm still a bit confused though. I was taught that a general rule of thumb is that if a table has more than about 25% duplicate values, it would take too many index reads to make the index worthwhile. In your scenario, the index would seem to narrow down to 199 values, but isn't that only if the index knows to go to the correct bin in the first place ? Maybe they do, but I don't know. I didn't think the bins had anything to do with index traversal to find the value. ======================== -<<Floyd Wellershaus>>- Database Administrator Unix Administrator email: fwellers@yahoo.com Home: 703-430-0805 Cell: 703-477-6045 ======================== http://www.one.org/ ----- Original Message ---- From: "Ford, Andrew G" <Andrew_G_Ford@homedepot.com> To: Floyd Wellershaus <fwellers@yahoo.com>; informix-list@iiug.org Sent: Wednesday, August 2, 2006 6:38:34 PM Subject: RE: data distribution question bin# # rows in bucket # unique values highest value in this bin 2: ( 1973055, 9364, 25763) 3: ( 1973055, 9915, 35887) Doesn't look too bad. You've got 1973055 rows in bin #3 representing values between 25764 and 35887 and have 9915 unique values in this range. Calculate this out and you have 1973055 / 9915 = 199 rows for each value between 25764 and 35887. If I'm looking for a row with value of 28742 this index would narrow the search down to approx. 199 rows. Now, if I am looking for a row with even numbered values a sequential scan would be better. It all depends on what info you are trying to get to. Check out this link for some good info on distributions and update statistics by some guy, http://www-1.ibm.com/partnerworld/pw...mix_08mar.html (Did I get it right John?) Andrew Ford -----Original Message----- From: informix-list-bounces@iiug.org [mailto:informix-list-bounces@iiug.org] On Behalf Of Floyd Wellershaus Sent: Wednesday, August 02, 2006 6:08 PM To: informix-list@iiug.org Subject: data distribution question With the below values for the mbntfhst_token column, which is being joined to in the query, I would think these distributions show way too much duplication for an index to be effective. Yet, when I force a scan on that table,the cost goes way way up. What am I missing here ? Thank you. Floyd 1: ( 1973055, 13906, 14381) 2: ( 1973055, 9364, 25763) 3: ( 1973055, 9915, 35887) 4: ( 1973055, 5803, 41746) 5: ( 1973055, 4832, 46654) 6: ( 1973055, 5731, 52437) 7: ( 1973055, 5548, 58053) 8: ( 1973055, 6483, 65065) 9: ( 1973055, 11890, 77038) 10: ( 1973055, 10660, 88268) 11: ( 1973055, 5761, 94404) 12: ( 1973055, 4753, 99639) 13: ( 1973055, 8261, 107909) 14: ( 1973055, 7536, 116822) 15: ( 1973055, 4875, 123111) 16: ( 1973055, 3135, 127073) 17: ( 1973055, 5571, 134148) … … Also, take a look at the overflow bucket for that column. 1: ( 1509940, 206221) 2: ( 499152, 249386) 3: ( 568126, 250016) 4: ( 522169, 276801) 5: ( 495874, 277207) 6: ( 564699, 305435) 7: ( 584721, 305658) 8: ( 507669, 305889) 9: ( 496176, 324590) 10: ( 545764, 335032) 11: ( 575454, 337654) 12: ( 662279, 338354) 13: ( 655045, 357904) 14: ( 1352666, 359863) 15: ( 684749, 368087) 16: ( 521739, 368452) 17: ( 687000, 371474) 18: ( 528608, 371973) 19: ( 679561, 388584) 20: ( 538942, 392313) 21: ( 494174, 393604) 22: ( 546620, 403774) … … ======================== -<<Floyd Wellershaus>>- Database Administrator Unix Administrator email: fwellers@yahoo.com Home: 703-430-0805 Cell: 703-477-6045 ======================== http://www.one.org/ |
| ||||
| No, the index is pretty good. The index isn't constructed like the distribution bins. They just help the optimizer see how many records a select will return and how many it will have to go through. The index is constructed similar to a binary tree. Do you know the number guessing game? If you need to guess a number between 0 and 2^32 it should only take 33 guesses (with each guess you get rid of half the remaining numbers). Translating that to an index you only have to check 33 nodes to drill down to the right number. Real database indexes though are even faster because they are multi-way trees so you ask about many number ranges at one time. (It is a function of the key length and page size.) Floyd Wellershaus wrote: > Thank you for the reply. > Let me try and understand. > Taking bin 3, out of 1.9 million rows in it, there are only 9.9k rows that are unique. The rest of the rows are duplicate. > The range for that bin contains values between 25763 and 35887. > Since I don't see any values in that range, in the overflow, then I guess I can assume that there are no major "skewers" of the value in that bin, and that there is just a very high percentage of values that are duplicated. > > Is that right so far ? > > So doesn't that mean that the index would have to do a lot of extra reads to find a value ? > > Thanks for your patience. > > > > > > > > > > ======================== > -<<Floyd Wellershaus>>- > Database Administrator > Unix Administrator > > > > email: fwellers@yahoo.com > > > Home: 703-430-0805 > > > Cell: 703-477-6045 > ======================== > > > http://www.one.org/ > > > > ----- Original Message ---- > From: Andrew Ford <andrewford@austin.rr.com> > To: Floyd Wellershaus <fwellers@yahoo.com>; informix-list@iiug.org > Sent: Wednesday, August 2, 2006 10:50:32 PM > Subject: Re: data distribution question > > > By 25% duplicate values do you mean you have 100,000 rows in the table and 25,000 have a value of "A", 25,000 have a value of "B", 25,000 have a value of "C" and 25,000 have a value of "D"? > > If so, I would think that an index on that column might not serve much purpose unless it was for a foreign key. > > If you mean you have 75% unique values (or almost unique values) and 25% of the data has the same value then the optimizer will (hopefully) choose an index when you are looking for a few rows living in the 75% of the data that is unique and might choose a scan if you're looking for a few rows in the 25% of the data that is a duplicate and probably will choose a scan if you are looking for a lot of rows in the table. > > Based on your distributions it looks like your data has some good uniqueness. If the value for that column isn't defined in the overflow buckets you can be pretty sure you only have 0.5% duplicates based on the data from bin #3. > > You are right, the bins don't have anything to do with index traversal. The optimizer uses the data distributions to make a guess as to which access method will be the best: one of the many indexes you have created or a sequential scan. Once the decision is made you either traverse the index tree looking for data or sequentially scan the table. > > Andrew > ----- Original Message ----- > From: Floyd Wellershaus > Newsgroups: comp.databases.informix > To: Ford, Andrew G ; informix-list@iiug.org > Sent: Wednesday, August 02, 2006 7:29 PM > Subject: Re: data distribution question > > > Hmm. Thanks. I like the way you explain it. It helps me to see it more clearly. I'm still a bit confused though. I was taught that a general rule of thumb is that if a table has more than about 25% duplicate values, it would take too many index reads to make the index worthwhile. > In your scenario, the index would seem to narrow down to 199 values, but isn't that only if the index knows to go to the correct bin in the first place ? Maybe they do, but I don't know. I didn't think the bins had anything to do with index traversal to find the value. > > > > > > > > > > > > ======================== > -<<Floyd Wellershaus>>- > Database Administrator > Unix Administrator > > > > email: fwellers@yahoo.com > > > Home: 703-430-0805 > > > Cell: 703-477-6045 > ======================== > > > http://www.one.org/ > > > > ----- Original Message ---- > From: "Ford, Andrew G" <Andrew_G_Ford@homedepot.com> > To: Floyd Wellershaus <fwellers@yahoo.com>; informix-list@iiug.org > Sent: Wednesday, August 2, 2006 6:38:34 PM > Subject: RE: data distribution question > > > > bin# # rows in bucket # unique values highest value in this bin > 2: ( 1973055, 9364, 25763) > 3: ( 1973055, 9915, 35887) > Doesn't look too bad. You've got 1973055 rows in bin #3 representing values between 25764 and 35887 and have 9915 unique values in this range. Calculate this out and you have 1973055 / 9915 = 199 rows for each value between 25764 and 35887. > > If I'm looking for a row with value of 28742 this index would narrow the search down to approx. 199 rows. > > Now, if I am looking for a row with even numbered values a sequential scan would be better. It all depends on what info you are trying to get to. > > Check out this link for some good info on distributions and update statistics by some guy, http://www-1.ibm.com/partnerworld/pw...mix_08mar.html > > > (Did I get it right John?) > > > > > Andrew Ford > > > -----Original Message----- > From: informix-list-bounces@iiug.org [mailto:informix-list-bounces@iiug.org] On Behalf Of Floyd Wellershaus > Sent: Wednesday, August 02, 2006 6:08 PM > To: informix-list@iiug.org > Subject: data distribution question > > > With the below values for the mbntfhst_token column, which is being joined to in the query, I would think these distributions show way too much duplication for an index to be effective. Yet, when I force a scan on that table, the cost goes way way up. > > What am I missing here ? > > Thank you. > > Floyd > > > 1: ( 1973055, 13906, 14381) > 2: ( 1973055, 9364, 25763) > 3: ( 1973055, 9915, 35887) > 4: ( 1973055, 5803, 41746) > 5: ( 1973055, 4832, 46654) > 6: ( 1973055, 5731, 52437) > 7: ( 1973055, 5548, 58053) > 8: ( 1973055, 6483, 65065) > 9: ( 1973055, 11890, 77038) > 10: ( 1973055, 10660, 88268) > 11: ( 1973055, 5761, 94404) > 12: ( 1973055, 4753, 99639) > 13: ( 1973055, 8261, 107909) > 14: ( 1973055, 7536, 116822) > 15: ( 1973055, 4875, 123111) > 16: ( 1973055, 3135, 127073) > 17: ( 1973055, 5571, 134148) > ... > ... > > Also, take a look at the overflow bucket for that column. > 1: ( 1509940, 206221) > 2: ( 499152, 249386) > 3: ( 568126, 250016) > 4: ( 522169, 276801) > 5: ( 495874, 277207) > 6: ( 564699, 305435) > 7: ( 584721, 305658) > 8: ( 507669, 305889) > 9: ( 496176, 324590) > 10: ( 545764, 335032) > 11: ( 575454, 337654) > 12: ( 662279, 338354) > 13: ( 655045, 357904) > 14: ( 1352666, 359863) > 15: ( 684749, 368087) > 16: ( 521739, 368452) > 17: ( 687000, 371474) > 18: ( 528608, 371973) > 19: ( 679561, 388584) > 20: ( 538942, 392313) > 21: ( 494174, 393604) > 22: ( 546620, 403774) > ... > ... > > > > > > > > > > > ======================== > -<<Floyd Wellershaus>>- > Database Administrator > Unix Administrator > > > > email: fwellers@yahoo.com > > > Home: 703-430-0805 > > > Cell: 703-477-6045 > ======================== > > > http://www.one.org/ > --0-2048181933-1154595450=:47202 > Content-Type: text/html; charset=utf-8 > Content-Transfer-Encoding: quoted-printable > X-Google-AttachSize: 24744 > > <html><head><style type="text/css"><!-- DIV {margin:0px} --></style></head><body><div style="font-family:times new roman, new york, times, serif;font-size:12pt"><DIV></DIV> > <DIV>Thank you for the reply.</DIV> > <DIV>Let me try and understand.</DIV> > <DIV>Taking bin 3, out of 1.9 million rows in it, there are only 9.9k rows that are unique. The rest of the rows are duplicate.</DIV> > <DIV>The range for that bin contains values between 25763 and 35887.</DIV> > <DIV>Since I don't see any values in that range, in the overflow, then I guess I can assume that there are no major "skewers" of the value in that bin, and that there is just a very high percentage of values that are duplicated. </DIV> > <DIV> </DIV> > <DIV>Is that right so far ?</DIV> > <DIV> </DIV> > <DIV>So doesn't that mean that the index would have to do a lot of extra reads to find a value ?</DIV> > <DIV> </DIV> > <DIV>Thanks for your patience.<BR> </DIV> > <DIV><BR> > <DIV><BR> > <DIV><BR> > <DIV><BR> > <DIV>========================<BR>-<<Floyd Wellershaus>>-<BR>Database Administrator<BR>Unix Administrator</DIV><BR> > <DIV><BR>email:   ; <A href="mailto:fwellers@yahoo.com">fwellers@yahoo.co m</A></DIV><BR> > <DIV>Home: &nbs p; 703-430-0805</DIV><BR> > <DIV>Cell: &nbs p; 703-477-6045<BR>========================</DIV><BR> > <DIV><A href="http://www.one.org/">http://www.one.org/</A></DIV></DIV></DIV></DIV></DIV> > <DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"><BR><BR> > <DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif">----- Original Message ----<BR>From: Andrew Ford <andrewford@austin.rr.com><BR>To: Floyd Wellershaus <fwellers@yahoo.com>; informix-list@iiug.org<BR>Sent: Wednesday, August 2, 2006 10:50:32 PM<BR>Subject: Re: data distribution question<BR><BR> > <STYLE type=text/css>DIV { > MARGIN:0px;} > </STYLE> > > <DIV> > <DIV><FONT face=Arial size=2>By 25% duplicate values do you mean you have 100,000 rows in the table and 25,000 have a value of "A", 25,000 have a value of "B", 25,000 have a value of "C" and 25,000 have a value of "D"?</FONT></DIV> > <DIV><FONT face=Arial size=2></FONT> </DIV> > <DIV><FONT face=Arial size=2>If so, I would think that an index on that column might not serve much purpose unless it was for a foreign key.</FONT></DIV> > <DIV><FONT face=Arial size=2></FONT> </DIV> > <DIV><FONT face=Arial size=2>If you mean you have 75% unique values (or almost unique values) and 25% of the data has the same value then the optimizer will (hopefully) choose an index when you are looking for a few rows living in the 75% of the data that is unique and might choose a scan if you're looking for a few rows in the 25% of the data that is a duplicate and probably will choose a scan if you are looking for a lot of rows in the table.</FONT></DIV> > <DIV><FONT face=Arial size=2></FONT> </DIV> > <DIV><FONT face=Arial size=2>Based on your distributions it looks like your data has some good uniqueness. If the value for that column isn't defined in the overflow buckets you can be pretty sure you only have 0.5% duplicates based on the data from bin #3.</FONT></DIV> > <DIV><FONT face=Arial size=2></FONT> </DIV> > <DIV><FONT face=Arial size=2>You are right, the bins don't have anything to do with index traversal. The optimizer uses the data distributions to make a guess as to which access method will be the best: one of the many indexes you have created or a sequential scan. Once the decision is made you either traverse the index tree looking for data or sequentially scan the table.</FONT></DIV> > <DIV><FONT face=Arial size=2></FONT> </DIV> > <DIV><FONT face=Arial size=2>Andrew</FONT></DIV></DIV> > <BLOCKQUOTE dir=ltr style="PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px"> > <DIV style="FONT: 10pt arial">----- Original Message ----- </DIV> > <DIV style="BACKGROUND: #e4e4e4; FONT: 10pt arial"><B>From:</B> <A id=bodyLinks title=fwellers@yahoo.com href="mailto:fwellers@yahoo.com" target=_blank rel=nofollow>Floyd Wellershaus</A> </DIV> > <DIV style="FONT: 10pt arial"><B>Newsgroups:</B> comp.databases.informix</DIV> > <DIV style="FONT: 10pt arial"><B>To:</B> <A id=bodyLinks title=Andrew_G_Ford@homedepot.com href="mailto:Andrew_G_Ford@homedepot.com" target=_blank rel=nofollow>Ford, Andrew G</A> ; <A id=bodyLinks title=informix-list@iiug.org href="mailto:informix-list@iiug.org" target=_blank rel=nofollow>informix-list@iiug.org</A> </DIV> > <DIV style="FONT: 10pt arial"><B>Sent:</B> Wednesday, August 02, 2006 7:29 PM</DIV> > <DIV style="FONT: 10pt arial"><B>Subject:</B> Re: data distribution question</DIV> > <DIV><BR></DIV> > <DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"> > <DIV></DIV> > <DIV>Hmm. Thanks. I like the way you explain it. It helps me to see it more clearly. I'm still a bit confused though. I was taught that a general rule of thumb is that if a table has more than about 25% duplicate values, it would take too many index reads to make the index worthwhile.</DIV> > <DIV>In your scenario, the index would seem to narrow down to 199 values, but isn't that only if the index knows to go to the correct bin in the first place ? Maybe they do, but I don't know. I didn't think the bins had anything to do with index traversal to find the value.</DIV> > <DIV> </DIV> > <DIV><BR> </DIV> > <DIV><BR> > <DIV><BR> > <DIV><BR> > <DIV><BR> > <DIV>========================<BR>-<<Floyd Wellershaus>>-<BR>Database Administrator<BR>Unix Administrator</DIV><BR> > <DIV><BR>email:   ; <A id=bodyLinks href="mailto:fwellers@yahoo.com" target=_blank rel=nofollow>fwellers@yahoo.com</A></DIV><BR> > <DIV>Home: &nbs p; 703-430-0805</DIV><BR> > <DIV>Cell: &nbs p; 703-477-6045<BR>========================</DIV><BR> > <DIV><A id=bodyLinks href="http://www.one.org/" target=_blank rel=nofollow>http://www.one.org/</A></DIV></DIV></DIV></DIV></DIV> > <DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"><BR><BR> > <DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif">----- Original Message ----<BR>From: "Ford, Andrew G" <Andrew_G_Ford@homedepot.com><BR>To: Floyd Wellershaus <fwellers@yahoo.com>; informix-list@iiug.org<BR>Sent: Wednesday, August 2, 2006 6:38:34 PM<BR>Subject: RE: data distribution question<BR><BR> > <STYLE type=text/css><!-- DIV {margin:0px;}--></STYLE> > > <DIV> > <P class=MsoNormal><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"><FONT face=Arial size=2></FONT></SPAN> </P> > <P class=MsoNormal><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"><SPAN class=734131722-02082006><FONT face=Courier>bin# &nb sp; # rows in bucket # unique values highest value in this bin</FONT></SPAN></SPAN></P> > <P class=MsoNormal><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"><FONT face=Courier> 2: ( <SPAN class=734131722-02082006> </SPAN>1973055, & nbsp;<SPAN class=734131722-02082006>   ; </SPAN>9364, &nbs p; <SPAN class=734131722-02082006>   ; </SPAN>25763)</FONT></SPAN></P> > <P class=MsoNormal><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"><FONT face=Courier> 3: ( <SPAN class=734131722-02082006> </SPAN>1973055, < SPAN class=734131722-02082006>   ; </SPAN> 9915, &nbs p; <SPAN class=734131722-02082006>   ; </SPAN>35887)</FONT></SPAN></P></DIV> > <DIV><SPAN class=734131722-02082006><FONT face=Courier size=2>Doesn't look too bad. You've got 1973055 rows in bin #3 representing values between 25764 and 35887 and have 9915 unique values in this range. Calculate this out and you have 1973055 / 9915 = 199 rows for each value between 25764 and 35887.</FONT></SPAN></DIV> > <DIV><SPAN class=734131722-02082006><FONT face=Courier size=2></FONT></SPAN> </DIV> > <DIV><SPAN class=734131722-02082006><FONT face=Courier size=2>If I'm looking for a row with value of 28742 this index would narrow the search down to approx. 199 rows.</FONT></SPAN></DIV> > <DIV><SPAN class=734131722-02082006><FONT face=Courier size=2></FONT></SPAN> </DIV> > <DIV><SPAN class=734131722-02082006><FONT face=Courier size=2>Now, if I am looking for a row with even numbered values a sequential scan would be better. It all depends on what info you are trying to get to.</FONT></SPAN></DIV> > <DIV><SPAN class=734131722-02082006><FONT face=Courier size=2></FONT></SPAN> </DIV> > <DIV><SPAN class=734131722-02082006><FONT face=Courier size=2>Check out this link for some good info on distributions and update statistics by some guy, <A id=bodyLinks href="http://www-1.ibm.com/partnerworld/pwhome.nsf/pi/eac_index_bizinformix_08mar.html" target=_blank rel=nofollow>http://www-1.ibm.com/partnerworld/pwhome.nsf/pi/eac_index_bizinformix_08mar.html</A></FONT></SPAN></DIV> > <DIV><SPAN class=734131722-02082006><FONT face=Courier size=2></FONT></SPAN> </DIV> > <DIV><SPAN class=734131722-02082006><FONT face=Courier size=2></FONT></SPAN> </DIV> > <DIV><SPAN class=734131722-02082006><FONT face=Courier size=2>(Did I get it right John?)</FONT></SPAN></DIV> > <DIV><SPAN class=734131722-02082006><FONT face=Courier size=2></FONT></SPAN> </DIV> > <DIV><SPAN class=734131722-02082006><FONT face=Courier size=2></FONT></SPAN> </DIV> > <DIV><SPAN class=734131722-02082006><FONT face=Courier size=2></FONT></SPAN> </DIV> > <DIV><SPAN class=734131722-02082006><FONT face=Courier size=2></FONT></SPAN> </DIV> > <DIV align=left><FONT face=Arial size=2>Andrew Ford</FONT></DIV> > <DIV align=left><FONT face=Arial size=2></FONT> </DIV> > <DIV align=left> </DIV> > <DIV></DIV> > <DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left><FONT face=Tahoma size=2>-----Original Message-----<BR><B>From:</B> informix-list-bounces@iiug.org [mailto:informix-list-bounces@iiug.org] <B>On Behalf Of </B>Floyd Wellershaus<BR><B>Sent:</B> Wednesday, August 02, 2006 6:08 PM<BR><B>To:</B> informix-list@iiug.org<BR><B>Subject:</B> data distribution question<BR><BR></FONT></DIV> > <DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"> > <DIV></DIV> > <DIV> > <DIV><FONT size=2><FONT face=Arial>With the below values for the mbntfhst_token column, which is being joined to in the query, I would think these distributions show way too much duplication for an index to be effective. Yet, when I force a scan on that table, the cost goes way way up.</FONT></FONT></DIV> > <DIV><FONT face=Arial size=2></FONT> </DIV> > <DIV><FONT face=Arial size=2>What am I missing here ?</FONT></DIV> > <DIV><FONT face=Arial size=2></FONT> </DIV> > <DIV><FONT face=Arial size=2>Thank you.</FONT></DIV> > <DIV><FONT face=Arial size=2></FONT> </DIV> > <DIV><FONT face=Arial size=2>Floyd</FONT></DIV> > <DIV> </DIV> > <DIV><FONT size=2><FONT face=Arial> </FONT></FONT></DIV> > <DIV><FONT size=2><FONT face=Arial>1: ( 1973055, 13906, 14381)</FONT></FONT></DIV> > <P class=MsoNormal><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 2: ( 1973055, 9364, 25763)</SPAN></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 3: ( 1973055, 9915, 35887)</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 4: ( 1973055, 5803, 41746)</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 5: ( 1973055, 4832, 46654)</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 6: ( 1973055, 5731, 52437)</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 7: ( 1973055, 5548, 58053)</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 8: ( 1973055, 6483, 65065)</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 9: ( 1973055, 11890, 77038)</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 10: ( 1973055, 10660, 88268)</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 11: ( 1973055, 5761, 94404)</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 12: ( 1973055, 4753, 99639)</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 13: ( 1973055, 8261, 107909)</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 14: ( 1973055, 7536, 116822)</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 15: ( 1973055, 4875, 123111)</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 16: ( 1973055, 3135, &nb sp; 127073)</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 17: ( 1973055, 5571, 134148)</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">...</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">...</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"></SPAN></FONT> </P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">Also, take a look at the overflow bucket for that column.</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"></SPAN></FONT><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 1: ( 1509940, &n bsp; 206221)</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 2: ( 499152, & nbsp; &nb sp; 249386)</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 3: ( 568126, & nbsp; &nb sp; 250016)</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 4: ( 522169, & nbsp; &nb sp; 276801)</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 5: ( 495874, & nbsp; &nb sp; 277207)</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 6: ( 564699, & nbsp; &nb sp; 305435)</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 7: ( 584721, & nbsp; &nb sp; 305658)</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 8: ( 507669, & nbsp; &nb sp; 305889)</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 9: ( 496176, & nbsp; &nb sp; 324590)</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 10: ( 545764, & nbsp; &nb sp; 335032)</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 11: ( 575454, & nbsp; &nb sp; 337654)</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 12: ( 662279, & nbsp; &nb sp; 338354)</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 13: ( 655045, & nbsp; &nb sp; 357904)</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 14: ( 1352666, &n bsp; 359863)</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 15: ( 684749, & nbsp; &nb sp; 368087)</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 16: ( 521739, & nbsp; &nb sp; 368452)</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 17: ( 687000, &n bsp; 371474)</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 18: ( 528608, & nbsp; &nb sp; 371973)</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 19: ( 679561, & nbsp; &nb sp; 388584)</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 20: ( 538942, & nbsp; &nb sp; 392313)</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 21: ( 494174, & nbsp; &nb sp; 393604)</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 22: ( 546620, & nbsp; &nb sp; 403774)</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">...</SPAN></FONT></P> > <P class=MsoNormal><FONT face=Arial size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">...</SPAN></FONT></P><BR> </DIV> > <DIV><BR> > <DIV><BR> > <DIV><BR> > <DIV><BR> > <DIV>========================<BR>-<<Floyd Wellershaus>>-<BR>Database Administrator<BR>Unix Administrator</DIV><BR> > <DIV><BR>email:   ; <A id=bodyLinks href="mailto:fwellers@yahoo.com" target=_blank rel=nofollow>fwellers@yahoo.com</A></DIV><BR> > <DIV>Home: &nbs p; 703-430-0805</DIV><BR> > <DIV>Cell: &nbs p; 703-477-6045<BR>========================</DIV><BR> > <DIV><A id=bodyLinks href="http://www.one.org/" target=_blank rel=nofollow>http://www.one.org/</A></DIV></DIV></DIV></DIV></DIV> > <DIV></DIV></DIV></DIV><BR></DIV></DIV></BLOCKQUOTE></DIV><BR></DIV></div></body></html> > --0-2048181933-1154595450=:47202-- |