Unix Technical Forum

Re: data distribution question

This is a discussion on Re: data distribution question within the Informix forums, part of the Database Server Software category; --> Hmm. Thanks. I like the way you explain it. It helps me to see it more clearly. I'm still ...


Go Back   Unix Technical Forum > Database Server Software > Informix

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-20-2008, 11:43 AM
Floyd Wellershaus
 
Posts: n/a
Default Re: data distribution question

Hmm. Thanks. I like the way you explain it. It helps me to see it more clearly. I'm still a bit confused though. I was taught that a general rule of thumb is that if a table has more than about 25% duplicate values, it would take too many index reads to make the index worthwhile.
In your scenario, the index would seem to narrow down to 199 values, but isn't that only if the index knows to go to the correct bin in the first place ? Maybe they do, but I don't know. I didn't think the bins had anything to do with index traversal to find the value.











========================
-<<Floyd Wellershaus>>-
Database Administrator
Unix Administrator



email: fwellers@yahoo.com


Home: 703-430-0805


Cell: 703-477-6045
========================


http://www.one.org/



----- Original Message ----
From: "Ford, Andrew G" <Andrew_G_Ford@homedepot.com>
To: Floyd Wellershaus <fwellers@yahoo.com>; informix-list@iiug.org
Sent: Wednesday, August 2, 2006 6:38:34 PM
Subject: RE: data distribution question



bin# # rows in bucket # unique values highest value in this bin
2: ( 1973055, 9364, 25763)
3: ( 1973055, 9915, 35887)
Doesn't look too bad. You've got 1973055 rows in bin #3 representing values between 25764 and 35887 and have 9915 unique values in this range. Calculate this out and you have 1973055 / 9915 = 199 rows for each value between 25764 and 35887.

If I'm looking for a row with value of 28742 this index would narrow the search down to approx. 199 rows.

Now, if I am looking for a row with even numbered values a sequential scan would be better. It all depends on what info you are trying to get to.

Check out this link for some good info on distributions and update statistics by some guy, http://www-1.ibm.com/partnerworld/pw...mix_08mar.html


(Did I get it right John?)




Andrew Ford


-----Original Message-----
From: informix-list-bounces@iiug.org [mailto:informix-list-bounces@iiug.org] On Behalf Of Floyd Wellershaus
Sent: Wednesday, August 02, 2006 6:08 PM
To: informix-list@iiug.org
Subject: data distribution question


With the below values for the mbntfhst_token column, which is being joined to in the query, I would think these distributions show way too much duplication for an index to be effective. Yet, when I force a scan on that table,the cost goes way way up.

What am I missing here ?

Thank you.

Floyd


1: ( 1973055, 13906, 14381)
2: ( 1973055, 9364, 25763)
3: ( 1973055, 9915, 35887)
4: ( 1973055, 5803, 41746)
5: ( 1973055, 4832, 46654)
6: ( 1973055, 5731, 52437)
7: ( 1973055, 5548, 58053)
8: ( 1973055, 6483, 65065)
9: ( 1973055, 11890, 77038)
10: ( 1973055, 10660, 88268)
11: ( 1973055, 5761, 94404)
12: ( 1973055, 4753, 99639)
13: ( 1973055, 8261, 107909)
14: ( 1973055, 7536, 116822)
15: ( 1973055, 4875, 123111)
16: ( 1973055, 3135, 127073)
17: ( 1973055, 5571, 134148)



Also, take a look at the overflow bucket for that column.
1: ( 1509940, 206221)
2: ( 499152, 249386)
3: ( 568126, 250016)
4: ( 522169, 276801)
5: ( 495874, 277207)
6: ( 564699, 305435)
7: ( 584721, 305658)
8: ( 507669, 305889)
9: ( 496176, 324590)
10: ( 545764, 335032)
11: ( 575454, 337654)
12: ( 662279, 338354)
13: ( 655045, 357904)
14: ( 1352666, 359863)
15: ( 684749, 368087)
16: ( 521739, 368452)
17: ( 687000, 371474)
18: ( 528608, 371973)
19: ( 679561, 388584)
20: ( 538942, 392313)
21: ( 494174, 393604)
22: ( 546620, 403774)












========================
-<<Floyd Wellershaus>>-
Database Administrator
Unix Administrator



email: fwellers@yahoo.com


Home: 703-430-0805


Cell: 703-477-6045
========================


http://www.one.org/
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 09:38 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com