This is a discussion on Re: data distribution question within the Informix forums, part of the Database Server Software category; --> Hmm. Thanks. I like the way you explain it. It helps me to see it more clearly. I'm still ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hmm. Thanks. I like the way you explain it. It helps me to see it more clearly. I'm still a bit confused though. I was taught that a general rule of thumb is that if a table has more than about 25% duplicate values, it would take too many index reads to make the index worthwhile. In your scenario, the index would seem to narrow down to 199 values, but isn't that only if the index knows to go to the correct bin in the first place ? Maybe they do, but I don't know. I didn't think the bins had anything to do with index traversal to find the value. ======================== -<<Floyd Wellershaus>>- Database Administrator Unix Administrator email: fwellers@yahoo.com Home: 703-430-0805 Cell: 703-477-6045 ======================== http://www.one.org/ ----- Original Message ---- From: "Ford, Andrew G" <Andrew_G_Ford@homedepot.com> To: Floyd Wellershaus <fwellers@yahoo.com>; informix-list@iiug.org Sent: Wednesday, August 2, 2006 6:38:34 PM Subject: RE: data distribution question bin# # rows in bucket # unique values highest value in this bin 2: ( 1973055, 9364, 25763) 3: ( 1973055, 9915, 35887) Doesn't look too bad. You've got 1973055 rows in bin #3 representing values between 25764 and 35887 and have 9915 unique values in this range. Calculate this out and you have 1973055 / 9915 = 199 rows for each value between 25764 and 35887. If I'm looking for a row with value of 28742 this index would narrow the search down to approx. 199 rows. Now, if I am looking for a row with even numbered values a sequential scan would be better. It all depends on what info you are trying to get to. Check out this link for some good info on distributions and update statistics by some guy, http://www-1.ibm.com/partnerworld/pw...mix_08mar.html (Did I get it right John?) Andrew Ford -----Original Message----- From: informix-list-bounces@iiug.org [mailto:informix-list-bounces@iiug.org] On Behalf Of Floyd Wellershaus Sent: Wednesday, August 02, 2006 6:08 PM To: informix-list@iiug.org Subject: data distribution question With the below values for the mbntfhst_token column, which is being joined to in the query, I would think these distributions show way too much duplication for an index to be effective. Yet, when I force a scan on that table,the cost goes way way up. What am I missing here ? Thank you. Floyd 1: ( 1973055, 13906, 14381) 2: ( 1973055, 9364, 25763) 3: ( 1973055, 9915, 35887) 4: ( 1973055, 5803, 41746) 5: ( 1973055, 4832, 46654) 6: ( 1973055, 5731, 52437) 7: ( 1973055, 5548, 58053) 8: ( 1973055, 6483, 65065) 9: ( 1973055, 11890, 77038) 10: ( 1973055, 10660, 88268) 11: ( 1973055, 5761, 94404) 12: ( 1973055, 4753, 99639) 13: ( 1973055, 8261, 107909) 14: ( 1973055, 7536, 116822) 15: ( 1973055, 4875, 123111) 16: ( 1973055, 3135, 127073) 17: ( 1973055, 5571, 134148) … … Also, take a look at the overflow bucket for that column. 1: ( 1509940, 206221) 2: ( 499152, 249386) 3: ( 568126, 250016) 4: ( 522169, 276801) 5: ( 495874, 277207) 6: ( 564699, 305435) 7: ( 584721, 305658) 8: ( 507669, 305889) 9: ( 496176, 324590) 10: ( 545764, 335032) 11: ( 575454, 337654) 12: ( 662279, 338354) 13: ( 655045, 357904) 14: ( 1352666, 359863) 15: ( 684749, 368087) 16: ( 521739, 368452) 17: ( 687000, 371474) 18: ( 528608, 371973) 19: ( 679561, 388584) 20: ( 538942, 392313) 21: ( 494174, 393604) 22: ( 546620, 403774) … … ======================== -<<Floyd Wellershaus>>- Database Administrator Unix Administrator email: fwellers@yahoo.com Home: 703-430-0805 Cell: 703-477-6045 ======================== http://www.one.org/ |