This is a discussion on Re: Calculate optimal value for DS_POOLSIZE within the Informix forums, part of the Database Server Software category; --> Actually I believe the confusion is betwen the DS cache values with the DDcache variables. DD_HASHSIZE and DD_HASHMAX mean ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Actually I believe the confusion is betwen the DS cache values with the DDcache variables. DD_HASHSIZE and DD_HASHMAX mean something else. DD_HASHSIZE is the number of buckets or lists, and DD_HASHMAX is the number ofentries (soft limit) per buck or list. DS_POOLSIZE indicates the maximum number of entries for the Distribution Cache, this is a hard limit and not a soft limit. DS_HASHSIZE is the number of buckets or lists for the Cache. Consider that you wanted to configure no more than 4000 DD entires and no more than 4000 DS entries. Ignoring any suggestions for prime numbers at the moment the DD variables could look like this. DD_HASHSIZE 400 DD_HASHMAX 10 However DS would be configured differently and could look like DS_POOLSIZE 4000 DS_HASHSIZE 400 When making changes to any of the ??_HASHSIZE a prime number does give a slight performance improvement, and therefor is recommended. Don't ask me why we changed the way the configuration worked between the two, I'm not entirely sure. However, among the caches, DS and PC act the same, DD and STMT are completely different. ----- Original Message ---- From: Art S. Kagel <art..kagel@gmail.com> To: informix-list@iiug.org Sent: Monday, October 15, 2007 8:49:03 AM Subject: Re: Calculate optimal value for DS_POOLSIZE On Oct 13, 3:58 pm, mohitanch...@gmail.com wrote: > Version: IDS 10 > > I am trying to determine the optimal value for DS_POOLSIZE and >DS_HASHSIZE. I looked at the onstat -g dsc. I have following question: > > 1. How do I determine the correct size of DS_POOLSIZE. Should I look > at our update stats and see number of unique columns that we run >update statistics on and that would become the value of DS_POOLSIZE. > 2.. How do I determine correct value of number of Hash Buckets ? How > manyentries per bucket is good to have ? > 3. When I look at onstat -g dsc Isee number of distribution entries, > should this be always lower than DS_POOLSIZE IB that Mark and David are misinterpreting the use and meaning of DS_HASHSIZE and DS_POOLSIZE, or at least not explaining it clearly.. DS_HASHSIZE is the number of hash buckets for distributions and DS_POOLSIZE is the maximum number of slots in each bucket for when multipledistributions hash to the same bucket. DS_HASHSIZE must be a prime number, DS_POOLSIZE should be relatively small (I've also been told that it'sbetter if it is non-prime). Columns are hashed on table and column name so if you have lots of databases with the same tables and columns in them then you will need a larger value for DS_POOLSIZE to match the level of hash clashes that you have. Ideally - for optimum performance of the hash - each distribution should hash to it's own bucket and the number ofslots per bucket should be very low. Since we cannot adjust the hash algorithm itself, the only control we have is DS_HASHSIZE. Keeping David's experience with Linux and large values in mind (David what is a 'large' value in your testing?) I'd set DS_HASHSIZE to at a prime that's largerthan the number of unique table names to minimize clashes and set DS_POOLSIZE large enough that the product of the two is about 1.5 times the number of distributions in SYSDISTRIB for all databases - allowing a bit more if you know the number of databases/ tables/columns will be growing over time to put off retuning for a while. Art S. Kagel _______________________________________________ Informix-list mailing list Informix-list@iiug.org http://www.iiug.org/mailman/listinfo/informix-list |
| |||
| On Oct 15, 6:49 pm, Mark Jamison <maj...@yahoo.com> wrote: > Actually I believe the confusion is betwen the DS cache values with the DD cache variables. > > DD_HASHSIZE and DD_HASHMAX mean something else. DD_HASHSIZE is the number of buckets or lists, and DD_HASHMAX is the number of entries (soft limit) per buck or list. > > DS_POOLSIZE indicates the maximum number of entries for the Distribution Cache, this is a hard limit and not a soft limit. > DS_HASHSIZE is the number of buckets or lists for the Cache. >From the Administrator's Reference: 'The DS_POOLSIZE parameter specifies the maximum number of entries in each hash bucket in the data-distribution cache ...' 'The DS_HASHSIZE parameter specifies the number of hash buckets in the data-distribution cache ...' DS_POOLSIZE is a max, but it's a max per hash bucket not an absolute total. If there's confusion it's in the manuals and it's been there since the beginning. > > Consider that you wanted to configure no more than 4000 DD entires and no more than 4000 DS entries. > > Ignoring any suggestions for prime numbers at the moment the DD variables could look like this. > > DD_HASHSIZE 400 > DD_HASHMAX 10 > > However DS would be configured differently and could look like > > DS_POOLSIZE 4000 > DS_HASHSIZE 400 Nope: DS_HASHSIZE 400 DS_POOLSIZE 10 Art S. Kagel > When making changes to any of the ??_HASHSIZE a prime number does give a slight performance improvement, and therefor is recommended. > > Don't ask me why we changed the way the configuration worked between the two, I'm not entirely sure. However, among the caches, DS and PC act the same, DD and STMT are completely different. > > ----- Original Message ---- > From: Art S. Kagel <art.ka...@gmail.com> > To: informix-l...@iiug.org > Sent: Monday, October 15, 2007 8:49:03 AM > Subject: Re: Calculate optimal value for DS_POOLSIZE > > On Oct 13, 3:58 pm, mohitanch...@gmail.com wrote: > > Version: IDS 10 > > > I am trying to determine the optimal value for DS_POOLSIZE and > > DS_HASHSIZE. I looked at the onstat -g dsc. I have following > question: > > > 1. How do I determine the correct size of DS_POOLSIZE. Should I look > > at our update stats and see number of unique columns that we run > > update statistics on and that would become the value of DS_POOLSIZE. > > 2. How do I determine correct value of number of Hash Buckets ? How > > many entries per bucket is good to have ? > > 3. When I look at onstat -g dsc I see number of distribution entries, > > should this be always lower than DS_POOLSIZE > > IB that Mark and David are misinterpreting the use and meaning of > DS_HASHSIZE and DS_POOLSIZE, or at least not explaining it clearly. > > DS_HASHSIZE is the number of hash buckets for distributions and > DS_POOLSIZE is the maximum number of slots in each bucket for when > multiple distributions hash to the same bucket. DS_HASHSIZE must be a > prime number, DS_POOLSIZE should be relatively small (I've also been > told that it's better if it is non-prime). > > Columns are hashed on table and column name so if you have lots of > databases with the same tables and columns in them then you will need > a larger value for DS_POOLSIZE to match the level of hash clashes that > you have. Ideally - for optimum performance of the hash - each > distribution should hash to it's own bucket and the number of slots > per bucket should be very low. Since we cannot adjust the hash > algorithm itself, the only control we have is DS_HASHSIZE. Keeping > David's experience with Linux and large values in mind (David what is > a 'large' value in your testing?) I'd set DS_HASHSIZE to at a prime > that's larger than the number of unique table names to minimize > clashes and set DS_POOLSIZE large enough that the product of the two > is about 1.5 times the number of distributions in SYSDISTRIB for all > databases - allowing a bit more if you know the number of databases/ > tables/columns will be growing over time to put off retuning for a > while. > > Art S. Kagel > > _______________________________________________ > Informix-list mailing list > Informix-l...@iiug.orghttp://www.iiug.org/mailman/listinfo/informix-list |
| |||
| Art S. Kagel wrote: > On Oct 15, 6:49 pm, Mark Jamison <maj...@yahoo.com> wrote: >> Actually I believe the confusion is betwen the DS cache values with the DD cache variables. >> >> DD_HASHSIZE and DD_HASHMAX mean something else. DD_HASHSIZE is the number of buckets or lists, and DD_HASHMAX is the number of entries (soft limit) per buck or list. >> >> DS_POOLSIZE indicates the maximum number of entries for the Distribution Cache, this is a hard limit and not a soft limit. >> DS_HASHSIZE is the number of buckets or lists for the Cache. > >>From the Administrator's Reference: > > 'The DS_POOLSIZE parameter specifies the maximum number of entries in > each hash bucket in the data-distribution cache ...' > > 'The DS_HASHSIZE parameter specifies the number of hash buckets in the > data-distribution cache ...' > > DS_POOLSIZE is a max, but it's a max per hash bucket not an absolute > total. > > If there's confusion it's in the manuals and it's been there since the > beginning. The manual is wrong. It works as Mark described it. The Performance Guide (ct1t9na.pdf, 4-33) contradicts the Admin-Ref: <quote> The following formula determines the number of column distributions that can be stored in one bucket. Distributions_per_bucket = DS_POOLSIZE / DS_HASHSIZE To modify the number of distributions per bucket, change either the DS_POOLSIZE or DS_HASHSIZE configuration parameter. For example, with the default values of 127 for DS_POOLSIZE and 31 for DS_HASHSIZE, you can potentially store distributions for about 127 columns in the data-distribution cache. The cache has 31 hash buckets, and each hash bucket can have an average of 4 entries. </> > >> Consider that you wanted to configure no more than 4000 DD entires and no more than 4000 DS entries. >> >> Ignoring any suggestions for prime numbers at the moment the DD variables could look like this. >> >> DD_HASHSIZE 400 >> DD_HASHMAX 10 >> >> However DS would be configured differently and could look like >> >> DS_POOLSIZE 4000 >> DS_HASHSIZE 400 > > Nope: > > DS_HASHSIZE 400 > DS_POOLSIZE 10 The cache seems to be cleaned to 50%. So, if you want 4000 entries, then you'd need to have: DS_HASHSIZE 400 DS_POOLSIZE 8000 Or, use a DS_HASHSIZE of 401 or 409, if you insist on a prime. > > Art S. Kagel > >> When making changes to any of the ??_HASHSIZE a prime number does give a slight performance improvement, and therefor is recommended. >> >> Don't ask me why we changed the way the configuration worked between the two, I'm not entirely sure. However, among the caches, DS and PC act the same, DD and STMT are completely different. >> >> ----- Original Message ---- >> From: Art S. Kagel <art.ka...@gmail.com> >> To: informix-l...@iiug.org >> Sent: Monday, October 15, 2007 8:49:03 AM >> Subject: Re: Calculate optimal value for DS_POOLSIZE >> >> On Oct 13, 3:58 pm, mohitanch...@gmail.com wrote: >>> Version: IDS 10 >>> I am trying to determine the optimal value for DS_POOLSIZE and >>> DS_HASHSIZE. I looked at the onstat -g dsc. I have following >> question: >> >>> 1. How do I determine the correct size of DS_POOLSIZE. Should I look >>> at our update stats and see number of unique columns that we run >>> update statistics on and that would become the value of DS_POOLSIZE. >>> 2. How do I determine correct value of number of Hash Buckets ? How >>> many entries per bucket is good to have ? >>> 3. When I look at onstat -g dsc I see number of distribution entries, >>> should this be always lower than DS_POOLSIZE >> IB that Mark and David are misinterpreting the use and meaning of >> DS_HASHSIZE and DS_POOLSIZE, or at least not explaining it clearly. >> >> DS_HASHSIZE is the number of hash buckets for distributions and >> DS_POOLSIZE is the maximum number of slots in each bucket for when >> multiple distributions hash to the same bucket. DS_HASHSIZE must be a >> prime number, DS_POOLSIZE should be relatively small (I've also been >> told that it's better if it is non-prime). >> >> Columns are hashed on table and column name so if you have lots of >> databases with the same tables and columns in them then you will need >> a larger value for DS_POOLSIZE to match the level of hash clashes that >> you have. Ideally - for optimum performance of the hash - each >> distribution should hash to it's own bucket and the number of slots >> per bucket should be very low. Since we cannot adjust the hash >> algorithm itself, the only control we have is DS_HASHSIZE. Keeping >> David's experience with Linux and large values in mind (David what is >> a 'large' value in your testing?) I'd set DS_HASHSIZE to at a prime >> that's larger than the number of unique table names to minimize >> clashes and set DS_POOLSIZE large enough that the product of the two >> is about 1.5 times the number of distributions in SYSDISTRIB for all >> databases - allowing a bit more if you know the number of databases/ >> tables/columns will be growing over time to put off retuning for a >> while. >> >> Art S. Kagel >> >> _______________________________________________ >> Informix-list mailing list >> Informix-l...@iiug.orghttp://www.iiug.org/mailman/listinfo/informix-list > > |
| |||
| For what it's worth, these are my settings # Dictionary Cache Size DD_HASHSIZE 499 # Number of dictionary hash buckets DD_HASHMAX 68 # Maximum number of dictionary entries per hash bucket # Distribution Cache Size DS_HASHSIZE 503 # Number of distribution hash buckets DS_POOLSIZE 4000 # Maximum number of distribution entries retained in cache # UDR Cache Size PC_HASHSIZE 127 # Number of Procedure hash buckets PC_POOLSIZE 2000 # Maximum number of stored procedure entries retained in cache I had some discussions with UK Tech Support about the settings and their respective values. This is their response: Reproduction: # Created 1500 tables, populated each with 500 rows # Ran update stats high on all tables # Queried each table to force the population of the distributions into the cache. # The structure of interest, hangs off of rhead_t ... onstat -g dmp 0xa000000 hlQA | grep rhead onstat -g dmp 0xa008800 rhead_t | grep sh_copy_of_shdist onstat -g dmp 0x15588018 shcache_t # Same tests with different DS_POOLSIZE / DS_HASHSIZE values ... DS_POOLSIZE 4000 DS_HASHSIZE 1003 0000: sh_hashsize = 1003 0004: sh_usage = 1 0008: sh_uid = 1508 000c: sh_cleaned = 0 0010: sh_cachesize = 4000 0014: sh_curcount = 1508 0018: sh_inusecount = 0 001c: sh_lowthresh = 50 0020: sh_highthresh = 75 0024: sh_pool = 0x6d172c 0028: sh_entry = 0x15588050 002c: sh_lru = 0x15a2ec30 0030: sh_mru = 0x1558af88 0034: sh_mutex = 0x15585328 } DS_POOLSIZE 3000 DS_HASHSIZE 503 *(struct shcache_t *)0x15586288 = { /* sizeof = 56 = 0x38 */ 0000: sh_hashsize = 503 0004: sh_usage = 1 0008: sh_uid = 1500 000c: sh_cleaned = 1 0010: sh_cachesize = 3000 0014: sh_curcount = 1499 0018: sh_inusecount = 0 001c: sh_lowthresh = 50 0020: sh_highthresh = 75 0024: sh_pool = 0x6d172c 0028: sh_entry = 0x155862c0 002c: sh_lru = 0x15802830 0030: sh_mru = 0x175d4030 0034: sh_mutex = 0x15585328 } DS_HASHSIZE 503 DS_POOLSIZE 2000 *(struct shcache_t *)0x15586288 = { /* sizeof = 56 = 0x38 */ 0000: sh_hashsize = 503 0004: sh_usage = 1 0008: sh_uid = 1500 000c: sh_cleaned = 501 0010: sh_cachesize = 2000 0014: sh_curcount = 999 0018: sh_inusecount = 0 001c: sh_lowthresh = 50 0020: sh_highthresh = 75 0024: sh_pool = 0x6d172c 0028: sh_entry = 0x155862c0 002c: sh_lru = 0x16215830 0030: sh_mru = 0x16209be0 0034: sh_mutex = 0x15585328 } The tests show the influencing parameter is DS_POOLSIZE, and the reason theinteger SH_CURCOUNT only shows half the value defined (when the pool is flooded) is down to the sh_lowtheshold value. This appears to keep the dist pool cache current value to half the pools size. I can't find any where in the code which suggests why we look to be keepingthis cache within the 50-75 % threshold. The functions to manipulate thesevalues look to be generic for any cache (procedure, distribution or the dictionary) and may be part of the problem. There is one place where this counter is incremented and decremented: retrieve_cache() / cleanup() >From the customers output of onstat -g dsc, the difference in current value(999) and that listed by an incremental counter while printing each entry (1004) I cannot replicate. I suspect this may be due to their system being in use at the time, and the values updated while this onstat was taken, andit would soon settle back to 999. So the actual poolsize is indeed 2000. Something else that was noticed by Advanced Support is the output of onstat-g dsc in later IDS 9.40 versions is different, the 'Number of entries' isno longer shown: Informix Dynamic Server Version 9.40.UC1 -- On-Line -- Up 33 days 18:10:09 -- 117120 Kbytes Distribution Cache: Number of lists : 31 DS_POOLSIZE : 127 Distribution Cache Entries: list# id ref_cnt dropped? heap_ptr distribution name ----------------------------------------------------------------- = = = 8< = = = Total number of distribution entries: 42. Number of entries in use : 0 Compare this to the onstat -g dsc output you provided: Informix Dynamic Server Version 7.31.FD4 -- On-Line -- Up 02:41:14 -- 10290176 Kbytes Distribution Cache: Number of lists : 503 DS_POOLSIZE : 2000 Number of entries : 999 Number of entries in use : 0 Distribution Cache Entries: list# id ref_cnt dropped? heap_ptr distribution name ----------------------------------------------------------------- = = = 8< = = = Total number of distribution entries: 1004. I cannot find any known issues or bugs that suggest why this change was made, but as this behaviour in 9.40 is different I am doubtful that a bug fix will be provided. Although this is not an entirely conclusive answer, it does confirm that the actual poolsize is indeed 2000. Regards Colin There are 10 types of people in the world, those that understand binary andthose that don't > Date: Tue, 16 Oct 2007 11:16:49 +0100 > From: richard.harnden@lineone.net > Subject: Re: Calculate optimal value for DS_POOLSIZE > To: informix-list@iiug.org > > Art S. Kagel wrote: >> On Oct 15, 6:49 pm, Mark Jamison wrote: >>> Actually I believe the confusion is betwen the DS cache values with theDD cache variables. >>> >>> DD_HASHSIZE and DD_HASHMAX mean something else. DD_HASHSIZE is the number of buckets or lists, and DD_HASHMAX is the number of entries (soft limit) per buck or list. >>> >>> DS_POOLSIZE indicates the maximum number of entries for the Distribution Cache, this is a hard limit and not a soft limit. >>> DS_HASHSIZE is the number of buckets or lists for the Cache. >> >>>From the Administrator's Reference: >> >> 'The DS_POOLSIZE parameter specifies the maximum number of entries in >> each hash bucket in the data-distribution cache ...' >> >> 'The DS_HASHSIZE parameter specifies the number of hash buckets in the >> data-distribution cache ...' >> >> DS_POOLSIZE is a max, but it's a max per hash bucket not an absolute >> total. >> >> If there's confusion it's in the manuals and it's been there since the >> beginning. > > The manual is wrong. It works as Mark described it. > > The Performance Guide (ct1t9na.pdf, 4-33) contradicts the Admin-Ref: > > The following formula determines the number of column distributions > that can be stored in one bucket. > > Distributions_per_bucket = DS_POOLSIZE / DS_HASHSIZE > > To modify the number of distributions per bucket, change either the > DS_POOLSIZE or DS_HASHSIZE configuration parameter. > For example, with the default values of 127 for DS_POOLSIZE and 31 for > DS_HASHSIZE, you can potentially store distributions for about 127 columns > in the data-distribution cache. The cache has 31 hash buckets, and each hash > bucket can have an average of 4 entries. > > >> >>> Consider that you wanted to configure no more than 4000 DD entires and no more than 4000 DS entries. >>> >>> Ignoring any suggestions for prime numbers at the moment the DD variables could look like this. >>> >>> DD_HASHSIZE 400 >>> DD_HASHMAX 10 >>> >>> However DS would be configured differently and could look like >>> >>> DS_POOLSIZE 4000 >>> DS_HASHSIZE 400 >> >> Nope: >> >> DS_HASHSIZE 400 >> DS_POOLSIZE 10 > > The cache seems to be cleaned to 50%. So, if you want 4000 entries, then > you'd need to have: > > DS_HASHSIZE 400 > DS_POOLSIZE 8000 > > Or, use a DS_HASHSIZE of 401 or 409, if you insist on a prime. > >> >> Art S. Kagel >> >>> When making changes to any of the ??_HASHSIZE a prime number does give a slight performance improvement, and therefor is recommended. >>> >>> Don't ask me why we changed the way the configuration worked between the two, I'm not entirely sure. However, among the caches, DS and PC act the same, DD and STMT are completely different. >>> >>> ----- Original Message ---- >>> From: Art S. Kagel >>> To: informix-l...@iiug.org >>> Sent: Monday, October 15, 2007 8:49:03 AM >>> Subject: Re: Calculate optimal value for DS_POOLSIZE >>> >>> On Oct 13, 3:58 pm, mohitanch...@gmail.com wrote: >>>> Version: IDS 10 >>>> I am trying to determine the optimal value for DS_POOLSIZE and >>>> DS_HASHSIZE. I looked at the onstat -g dsc. I have following >>> question: >>> >>>> 1. How do I determine the correct size of DS_POOLSIZE. Should I look >>>> at our update stats and see number of unique columns that we run >>>> update statistics on and that would become the value of DS_POOLSIZE. >>>> 2. How do I determine correct value of number of Hash Buckets ? How >>>> many entries per bucket is good to have ? >>>> 3. When I look at onstat -g dsc I see number of distribution entries, >>>> should this be always lower than DS_POOLSIZE >>> IB that Mark and David are misinterpreting the use and meaning of >>> DS_HASHSIZE and DS_POOLSIZE, or at least not explaining it clearly. >>> >>> DS_HASHSIZE is the number of hash buckets for distributions and >>> DS_POOLSIZE is the maximum number of slots in each bucket for when >>> multiple distributions hash to the same bucket. DS_HASHSIZE must be a >>> prime number, DS_POOLSIZE should be relatively small (I've also been >>> told that it's better if it is non-prime). >>> >>> Columns are hashed on table and column name so if you have lots of >>> databases with the same tables and columns in them then you will need >>> a larger value for DS_POOLSIZE to match the level of hash clashes that >>> you have. Ideally - for optimum performance of the hash - each >>> distribution should hash to it's own bucket and the number of slots >>> per bucket should be very low. Since we cannot adjust the hash >>> algorithm itself, the only control we have is DS_HASHSIZE. Keeping >>> David's experience with Linux and large values in mind (David what is >>> a 'large' value in your testing?) I'd set DS_HASHSIZE to at a prime >>> that's larger than the number of unique table names to minimize >>> clashes and set DS_POOLSIZE large enough that the product of the two >>> is about 1.5 times the number of distributions in SYSDISTRIB for all >>> databases - allowing a bit more if you know the number of databases/ >>> tables/columns will be growing over time to put off retuning for a >>> while. >>> >>> Art S. Kagel >>> >>> _______________________________________________ >>> Informix-list mailing list >>> Informix-l...@iiug.orghttp://www.iiug.org/mailman/listinfo/informix-list >> >> > _______________________________________________ > Informix-list mailing list > Informix-list@iiug.org > http://www.iiug.org/mailman/listinfo/informix-list __________________________________________________ _______________ Get free emoticon packs and customisation from Windows Live. http://www.pimpmylive.co.uk |
| |||
| On Oct 16, 4:10 am, Colin Dawson <cjd_1...@hotmail.com> wrote: > For what it's worth, these are my settings > > # Dictionary Cache Size > DD_HASHSIZE 499 # Number of dictionary hash buckets > DD_HASHMAX 68 # Maximum number of dictionary entries per hash bucket > > # Distribution Cache Size > DS_HASHSIZE 503 # Number of distribution hash buckets > DS_POOLSIZE 4000 # Maximum number of distribution entries retained in cache > > # UDR Cache Size > PC_HASHSIZE 127 # Number of Procedure hash buckets > PC_POOLSIZE 2000 # Maximum number of stored procedure entries retained in cache > > I had some discussions with UK Tech Support about the settings and their respective values. This is their response: > > Reproduction: > > # Created 1500 tables, populated each with 500 rows > > # Ran update stats high on all tables > > # Queried each table to force the population of the distributions into the cache. > > # The structure of interest, hangs off of rhead_t ... > > onstat -g dmp 0xa000000 hlQA | grep rhead > > onstat -g dmp 0xa008800 rhead_t | grep sh_copy_of_shdist > > onstat -g dmp 0x15588018 shcache_t > > # Same tests with different DS_POOLSIZE / DS_HASHSIZE values ... > > DS_POOLSIZE 4000 > > DS_HASHSIZE 1003 > > 0000: sh_hashsize = 1003 > > 0004: sh_usage = 1 > > 0008: sh_uid = 1508 > > 000c: sh_cleaned = 0 > > 0010: sh_cachesize = 4000 > > 0014: sh_curcount = 1508 > > 0018: sh_inusecount = 0 > > 001c: sh_lowthresh = 50 > > 0020: sh_highthresh = 75 > > 0024: sh_pool = 0x6d172c > > 0028: sh_entry = 0x15588050 > > 002c: sh_lru = 0x15a2ec30 > > 0030: sh_mru = 0x1558af88 > > 0034: sh_mutex = 0x15585328 > > } > > DS_POOLSIZE 3000 > > DS_HASHSIZE 503 > > *(struct shcache_t *)0x15586288 = { /* sizeof = 56 = 0x38 */ > > 0000: sh_hashsize = 503 > > 0004: sh_usage = 1 > > 0008: sh_uid = 1500 > > 000c: sh_cleaned = 1 > > 0010: sh_cachesize = 3000 > > 0014: sh_curcount = 1499 > > 0018: sh_inusecount = 0 > > 001c: sh_lowthresh = 50 > > 0020: sh_highthresh = 75 > > 0024: sh_pool = 0x6d172c > > 0028: sh_entry = 0x155862c0 > > 002c: sh_lru = 0x15802830 > > 0030: sh_mru = 0x175d4030 > > 0034: sh_mutex = 0x15585328 > > } > > DS_HASHSIZE 503 > > DS_POOLSIZE 2000 > > *(struct shcache_t *)0x15586288 = { /* sizeof = 56 = 0x38 */ > > 0000: sh_hashsize = 503 > > 0004: sh_usage = 1 > > 0008: sh_uid = 1500 > > 000c: sh_cleaned = 501 > > 0010: sh_cachesize = 2000 > > 0014: sh_curcount = 999 > > 0018: sh_inusecount = 0 > > 001c: sh_lowthresh = 50 > > 0020: sh_highthresh = 75 > > 0024: sh_pool = 0x6d172c > > 0028: sh_entry = 0x155862c0 > > 002c: sh_lru = 0x16215830 > > 0030: sh_mru = 0x16209be0 > > 0034: sh_mutex = 0x15585328 > > } > > The tests show the influencing parameter is DS_POOLSIZE, and the reason the integer SH_CURCOUNT only shows half the value defined (when the pool is flooded) is down to the sh_lowtheshold value. This appears to keep the distpool cache current value to half the pools size. > > I can't find any where in the code which suggests why we look to be keeping this cache within the 50-75 % threshold. The functions to manipulate these values look to be generic for any cache (procedure, distribution or the dictionary) and may be part of the problem. > > There is one place where this counter is incremented and decremented: > > retrieve_cache() / cleanup() > > >From the customers output of onstat -g dsc, the difference in current value (999) and that listed by an incremental counter while printing each entry (1004) I cannot replicate. I suspect this may be due to their system being in use at the time, and the values updated while this onstat was taken, and it would soon settle back to 999. So the actual poolsize is indeed 2000. > > Something else that was noticed by Advanced Support is the output of onstat -g dsc in later IDS 9.40 versions is different, the 'Number of entries' is no longer shown: > > InformixDynamic Server Version 9.40.UC1 -- On-Line -- Up 33 days 18:10:09-- 117120 Kbytes > > Distribution Cache: > > Number of lists : 31 > > DS_POOLSIZE : 127 > > Distribution Cache Entries: > > list# id ref_cnt dropped? heap_ptr distribution name > > ----------------------------------------------------------------- > > = = = 8< = = = > > Total number of distribution entries: 42. > > Number of entries in use : 0 > > Compare this to the onstat -g dsc output you provided: > > InformixDynamic Server Version 7.31.FD4 -- On-Line -- Up 02:41:14 -- 10290176 Kbytes > > Distribution Cache: > > Number of lists : 503 > > DS_POOLSIZE : 2000 > > Number of entries : 999 > > Number of entries in use : 0 > > Distribution Cache Entries: > > list# id ref_cnt dropped? heap_ptr distribution name > > ----------------------------------------------------------------- > > = = = 8< = = = > > Total number of distribution entries: 1004. > > I cannot find any known issues or bugs that suggest why this change was made, but as this behaviour in 9.40 is different I am doubtful that a bug fix will be provided. Although this is not an entirely conclusive answer, it does confirm that the actual poolsize is indeed 2000. > > Regards > > Colin > > There are 10 types of people in the world, those that understand binary and those that don't > > > > > Date: Tue, 16 Oct 2007 11:16:49 +0100 > > From: richard.harn...@lineone.net > > Subject: Re: Calculate optimal value for DS_POOLSIZE > > To:informix-l...@iiug.org > > > Art S. Kagel wrote: > >> On Oct 15, 6:49 pm, Mark Jamison wrote: > >>> Actually I believe the confusion is betwen the DS cache values with the DD cache variables. > > >>> DD_HASHSIZE and DD_HASHMAX mean something else. DD_HASHSIZE is the number of buckets or lists, and DD_HASHMAX is the number of entries (soft limit) per buck or list. > > >>> DS_POOLSIZE indicates the maximum number of entries for the Distribution Cache, this is a hard limit and not a soft limit. > >>> DS_HASHSIZE is the number of buckets or lists for the Cache. > > >>>From the Administrator's Reference: > > >> 'The DS_POOLSIZE parameter specifies the maximum number of entries in > >> each hash bucket in the data-distribution cache ...' > > >> 'The DS_HASHSIZE parameter specifies the number of hash buckets in the > >> data-distribution cache ...' > > >> DS_POOLSIZE is a max, but it's a max per hash bucket not an absolute > >> total. > > >> If there's confusion it's in the manuals and it's been there since the > >> beginning. > > > The manual is wrong. It works as Mark described it. > > > The Performance Guide (ct1t9na.pdf, 4-33) contradicts the Admin-Ref: > > > The following formula determines the number of column distributions > > that can be stored in one bucket. > > > Distributions_per_bucket = DS_POOLSIZE / DS_HASHSIZE > > > To modify the number of distributions per bucket, change either the > > DS_POOLSIZE or DS_HASHSIZE configuration parameter. > > For example, with the default values of 127 for DS_POOLSIZE and 31 for > > DS_HASHSIZE, you can potentially store distributions for about 127 columns > > in the data-distribution cache. The cache has 31 hash buckets, and eachhash > > bucket can have an average of 4 entries. > > >>> Consider that you wanted to configure no more than 4000 DD entires and no more than 4000 DS entries. > > >>> Ignoring any suggestions for prime numbers at the moment the DD variables could look like this. > > >>> DD_HASHSIZE 400 > >>> DD_HASHMAX 10 > > >>> However DS would be configured differently and could look like > > >>> DS_POOLSIZE 4000 > >>> DS_HASHSIZE 400 > > >> Nope: > > >> DS_HASHSIZE 400 > >> DS_POOLSIZE 10 > > > The cache seems to be cleaned to 50%. So, if you want 4000 entries, then > > you'd need to have: > > > DS_HASHSIZE 400 > > DS_POOLSIZE 8000 > > > Or, use a DS_HASHSIZE of 401 or 409, if you insist on a prime. > > >> Art S. Kagel > > >>> When making changes to any of the ??_HASHSIZE a prime number does give a slight performance improvement, and therefor is recommended. > > >>> Don't ask me why we changed the way the configuration worked between the two, I'm not entirely sure. However, among the caches, DS and PC act the same, DD and STMT are completely different. > > >>> ----- Original Message ---- > >>> From: Art S. Kagel > >>> To:informix-l...@iiug.org > >>> Sent: Monday, October 15, 2007 8:49:03 AM > >>> Subject: Re: Calculate optimal value for DS_POOLSIZE > > >>> On Oct 13, 3:58 pm, mohitanch...@gmail.com wrote: > >>>> Version: IDS 10 > >>>> I am trying to determine the optimal value for DS_POOLSIZE and > >>>> DS_HASHSIZE. I looked at the onstat -g dsc. I have following > >>> question: > > >>>> 1. How do I determine the correct size of DS_POOLSIZE. Should I look > >>>> at our update stats and see number of unique columns that we run > >>>> update statistics on and that would become the value of DS_POOLSIZE. > >>>> 2. How do I determine correct value of number of Hash Buckets ? How > >>>> many entries per bucket is good to have ? > >>>> 3. When I look at onstat -g dsc I see number of distribution entries, > >>>> should this be always lower than DS_POOLSIZE > >>> IB that Mark and David are misinterpreting the use and meaning of > >>> DS_HASHSIZE and DS_POOLSIZE, or at least not explaining it clearly. > > >>> DS_HASHSIZE is the number of hash buckets for distributions and > >>> DS_POOLSIZE is the maximum number of slots in each bucket for when > >>> multiple distributions hash to the same bucket. DS_HASHSIZE must be a > >>> prime number, DS_POOLSIZE should be relatively small (I've also been > >>> told that it's better if it is non-prime). > > >>> Columns are hashed on table and column name so if you have lots of > >>> databases with the same tables and columns in them then you will need > >>> a larger value for DS_POOLSIZE to match the level of hash clashes that > >>> you have. Ideally - for optimum performance of the hash - each > >>> distribution should hash to it's own bucket and the number of slots > >>> per bucket should be very low. Since we cannot adjust the hash > >>> algorithm itself, the only control we have is DS_HASHSIZE. Keeping > >>> David's experience with Linux and large values in mind (David what is > >>> a 'large' value in your testing?) I'd set DS_HASHSIZE to at a prime > >>> that's larger than the number of unique table names to minimize > >>> clashes and set DS_POOLSIZE large enough that the product of the two > >>> is about 1.5 times the number of distributions in SYSDISTRIB for all > > ... > > read more »- Hide quoted text - > > - Show quoted text - What happens in multi user environment when distriution cache is not large and multiple session try to write to this cache. I know in general, distribution cache will have to read from disk if it doesn't find the distribution for that table in cache, but could it also be potentially be waiting on acquiring locks, mutexes etc if this value is less. I am trying to determine implications of having these value less other than the one I already know. |
| |||
| On Oct 16, 4:10 am, Colin Dawson <cjd_1...@hotmail.com> wrote: > For what it's worth, these are my settings > > # Dictionary Cache Size > DD_HASHSIZE 499 # Number of dictionary hash buckets > DD_HASHMAX 68 # Maximum number of dictionary entries per hash bucket > > # Distribution Cache Size > DS_HASHSIZE 503 # Number of distribution hash buckets > DS_POOLSIZE 4000 # Maximum number of distribution entries retained in cache > > # UDR Cache Size > PC_HASHSIZE 127 # Number of Procedure hash buckets > PC_POOLSIZE 2000 # Maximum number of stored procedure entries retained in cache > > I had some discussions with UK Tech Support about the settings and their respective values. This is their response: > > Reproduction: > > # Created 1500 tables, populated each with 500 rows > > # Ran update stats high on all tables > > # Queried each table to force the population of the distributions into the cache. > > # The structure of interest, hangs off of rhead_t ... > > onstat -g dmp 0xa000000 hlQA | grep rhead > > onstat -g dmp 0xa008800 rhead_t | grep sh_copy_of_shdist > > onstat -g dmp 0x15588018 shcache_t > > # Same tests with different DS_POOLSIZE / DS_HASHSIZE values ... > > DS_POOLSIZE 4000 > > DS_HASHSIZE 1003 > > 0000: sh_hashsize = 1003 > > 0004: sh_usage = 1 > > 0008: sh_uid = 1508 > > 000c: sh_cleaned = 0 > > 0010: sh_cachesize = 4000 > > 0014: sh_curcount = 1508 > > 0018: sh_inusecount = 0 > > 001c: sh_lowthresh = 50 > > 0020: sh_highthresh = 75 > > 0024: sh_pool = 0x6d172c > > 0028: sh_entry = 0x15588050 > > 002c: sh_lru = 0x15a2ec30 > > 0030: sh_mru = 0x1558af88 > > 0034: sh_mutex = 0x15585328 > > } > > DS_POOLSIZE 3000 > > DS_HASHSIZE 503 > > *(struct shcache_t *)0x15586288 = { /* sizeof = 56 = 0x38 */ > > 0000: sh_hashsize = 503 > > 0004: sh_usage = 1 > > 0008: sh_uid = 1500 > > 000c: sh_cleaned = 1 > > 0010: sh_cachesize = 3000 > > 0014: sh_curcount = 1499 > > 0018: sh_inusecount = 0 > > 001c: sh_lowthresh = 50 > > 0020: sh_highthresh = 75 > > 0024: sh_pool = 0x6d172c > > 0028: sh_entry = 0x155862c0 > > 002c: sh_lru = 0x15802830 > > 0030: sh_mru = 0x175d4030 > > 0034: sh_mutex = 0x15585328 > > } > > DS_HASHSIZE 503 > > DS_POOLSIZE 2000 > > *(struct shcache_t *)0x15586288 = { /* sizeof = 56 = 0x38 */ > > 0000: sh_hashsize = 503 > > 0004: sh_usage = 1 > > 0008: sh_uid = 1500 > > 000c: sh_cleaned = 501 > > 0010: sh_cachesize = 2000 > > 0014: sh_curcount = 999 > > 0018: sh_inusecount = 0 > > 001c: sh_lowthresh = 50 > > 0020: sh_highthresh = 75 > > 0024: sh_pool = 0x6d172c > > 0028: sh_entry = 0x155862c0 > > 002c: sh_lru = 0x16215830 > > 0030: sh_mru = 0x16209be0 > > 0034: sh_mutex = 0x15585328 > > } > > The tests show the influencing parameter is DS_POOLSIZE, and the reason the integer SH_CURCOUNT only shows half the value defined (when the pool is flooded) is down to the sh_lowtheshold value. This appears to keep the distpool cache current value to half the pools size. > > I can't find any where in the code which suggests why we look to be keeping this cache within the 50-75 % threshold. The functions to manipulate these values look to be generic for any cache (procedure, distribution or the dictionary) and may be part of the problem. > > There is one place where this counter is incremented and decremented: > > retrieve_cache() / cleanup() > > >From the customers output of onstat -g dsc, the difference in current value (999) and that listed by an incremental counter while printing each entry (1004) I cannot replicate. I suspect this may be due to their system being in use at the time, and the values updated while this onstat was taken, and it would soon settle back to 999. So the actual poolsize is indeed 2000. > > Something else that was noticed by Advanced Support is the output of onstat -g dsc in later IDS 9.40 versions is different, the 'Number of entries' is no longer shown: > > InformixDynamic Server Version 9.40.UC1 -- On-Line -- Up 33 days 18:10:09-- 117120 Kbytes > > Distribution Cache: > > Number of lists : 31 > > DS_POOLSIZE : 127 > > Distribution Cache Entries: > > list# id ref_cnt dropped? heap_ptr distribution name > > ----------------------------------------------------------------- > > = = = 8< = = = > > Total number of distribution entries: 42. > > Number of entries in use : 0 > > Compare this to the onstat -g dsc output you provided: > > InformixDynamic Server Version 7.31.FD4 -- On-Line -- Up 02:41:14 -- 10290176 Kbytes > > Distribution Cache: > > Number of lists : 503 > > DS_POOLSIZE : 2000 > > Number of entries : 999 > > Number of entries in use : 0 > > Distribution Cache Entries: > > list# id ref_cnt dropped? heap_ptr distribution name > > ----------------------------------------------------------------- > > = = = 8< = = = > > Total number of distribution entries: 1004. > > I cannot find any known issues or bugs that suggest why this change was made, but as this behaviour in 9.40 is different I am doubtful that a bug fix will be provided. Although this is not an entirely conclusive answer, it does confirm that the actual poolsize is indeed 2000. > > Regards > > Colin > > There are 10 types of people in the world, those that understand binary and those that don't > > > > > Date: Tue, 16 Oct 2007 11:16:49 +0100 > > From: richard.harn...@lineone.net > > Subject: Re: Calculate optimal value for DS_POOLSIZE > > To:informix-l...@iiug.org > > > Art S. Kagel wrote: > >> On Oct 15, 6:49 pm, Mark Jamison wrote: > >>> Actually I believe the confusion is betwen the DS cache values with the DD cache variables. > > >>> DD_HASHSIZE and DD_HASHMAX mean something else. DD_HASHSIZE is the number of buckets or lists, and DD_HASHMAX is the number of entries (soft limit) per buck or list. > > >>> DS_POOLSIZE indicates the maximum number of entries for the Distribution Cache, this is a hard limit and not a soft limit. > >>> DS_HASHSIZE is the number of buckets or lists for the Cache. > > >>>From the Administrator's Reference: > > >> 'The DS_POOLSIZE parameter specifies the maximum number of entries in > >> each hash bucket in the data-distribution cache ...' > > >> 'The DS_HASHSIZE parameter specifies the number of hash buckets in the > >> data-distribution cache ...' > > >> DS_POOLSIZE is a max, but it's a max per hash bucket not an absolute > >> total. > > >> If there's confusion it's in the manuals and it's been there since the > >> beginning. > > > The manual is wrong. It works as Mark described it. > > > The Performance Guide (ct1t9na.pdf, 4-33) contradicts the Admin-Ref: > > > The following formula determines the number of column distributions > > that can be stored in one bucket. > > > Distributions_per_bucket = DS_POOLSIZE / DS_HASHSIZE > > > To modify the number of distributions per bucket, change either the > > DS_POOLSIZE or DS_HASHSIZE configuration parameter. > > For example, with the default values of 127 for DS_POOLSIZE and 31 for > > DS_HASHSIZE, you can potentially store distributions for about 127 columns > > in the data-distribution cache. The cache has 31 hash buckets, and eachhash > > bucket can have an average of 4 entries. > > >>> Consider that you wanted to configure no more than 4000 DD entires and no more than 4000 DS entries. > > >>> Ignoring any suggestions for prime numbers at the moment the DD variables could look like this. > > >>> DD_HASHSIZE 400 > >>> DD_HASHMAX 10 > > >>> However DS would be configured differently and could look like > > >>> DS_POOLSIZE 4000 > >>> DS_HASHSIZE 400 > > >> Nope: > > >> DS_HASHSIZE 400 > >> DS_POOLSIZE 10 > > > The cache seems to be cleaned to 50%. So, if you want 4000 entries, then > > you'd need to have: > > > DS_HASHSIZE 400 > > DS_POOLSIZE 8000 > > > Or, use a DS_HASHSIZE of 401 or 409, if you insist on a prime. > > >> Art S. Kagel > > >>> When making changes to any of the ??_HASHSIZE a prime number does give a slight performance improvement, and therefor is recommended. > > >>> Don't ask me why we changed the way the configuration worked between the two, I'm not entirely sure. However, among the caches, DS and PC act the same, DD and STMT are completely different. > > >>> ----- Original Message ---- > >>> From: Art S. Kagel > >>> To:informix-l...@iiug.org > >>> Sent: Monday, October 15, 2007 8:49:03 AM > >>> Subject: Re: Calculate optimal value for DS_POOLSIZE > > >>> On Oct 13, 3:58 pm, mohitanch...@gmail.com wrote: > >>>> Version: IDS 10 > >>>> I am trying to determine the optimal value for DS_POOLSIZE and > >>>> DS_HASHSIZE. I looked at the onstat -g dsc. I have following > >>> question: > > >>>> 1. How do I determine the correct size of DS_POOLSIZE. Should I look > >>>> at our update stats and see number of unique columns that we run > >>>> update statistics on and that would become the value of DS_POOLSIZE. > >>>> 2. How do I determine correct value of number of Hash Buckets ? How > >>>> many entries per bucket is good to have ? > >>>> 3. When I look at onstat -g dsc I see number of distribution entries, > >>>> should this be always lower than DS_POOLSIZE > >>> IB that Mark and David are misinterpreting the use and meaning of > >>> DS_HASHSIZE and DS_POOLSIZE, or at least not explaining it clearly. > > >>> DS_HASHSIZE is the number of hash buckets for distributions and > >>> DS_POOLSIZE is the maximum number of slots in each bucket for when > >>> multiple distributions hash to the same bucket. DS_HASHSIZE must be a > >>> prime number, DS_POOLSIZE should be relatively small (I've also been > >>> told that it's better if it is non-prime). > > >>> Columns are hashed on table and column name so if you have lots of > >>> databases with the same tables and columns in them then you will need > >>> a larger value for DS_POOLSIZE to match the level of hash clashes that > >>> you have. Ideally - for optimum performance of the hash - each > >>> distribution should hash to it's own bucket and the number of slots > >>> per bucket should be very low. Since we cannot adjust the hash > >>> algorithm itself, the only control we have is DS_HASHSIZE. Keeping > >>> David's experience with Linux and large values in mind (David what is > >>> a 'large' value in your testing?) I'd set DS_HASHSIZE to at a prime > >>> that's larger than the number of unique table names to minimize > >>> clashes and set DS_POOLSIZE large enough that the product of the two > >>> is about 1.5 times the number of distributions in SYSDISTRIB for all > > ... > > read more »- Hide quoted text - > > - Show quoted text - What happens in multi user environment when distriution cache is not large and multiple session try to write to this cache. I know in general, distribution cache will have to read from disk if it doesn't find the distribution for that table in cache, but could it also be potentially be waiting on acquiring locks, mutexes etc if this value is less. I am trying to determine implications of having these value less other than the one I already know. |
| |||
| On Oct 17, 5:34 pm, mohitanch...@gmail.com wrote: > On Oct 16, 4:10 am, Colin Dawson <cjd_1...@hotmail.com> wrote: > > > > > For what it's worth, these are my settings > > > # Dictionary Cache Size > > DD_HASHSIZE 499 # Number of dictionary hash buckets > > DD_HASHMAX 68 # Maximum number of dictionary entries per hash bucket > > > # Distribution Cache Size > > DS_HASHSIZE 503 # Number of distribution hash buckets > > DS_POOLSIZE 4000 # Maximum number of distribution entries retained in cache > > > # UDR Cache Size > > PC_HASHSIZE 127 # Number of Procedure hash buckets > > PC_POOLSIZE 2000 # Maximum number of stored procedure entries retained in cache > > > I had some discussions with UK Tech Support about the settings and their respective values. This is their response: > > > Reproduction: > > > # Created 1500 tables, populated each with 500 rows > > > # Ran update stats high on all tables > > > # Queried each table to force the population of the distributions into the cache. > > > # The structure of interest, hangs off of rhead_t ... > > > onstat -g dmp 0xa000000 hlQA | grep rhead > > > onstat -g dmp 0xa008800 rhead_t | grep sh_copy_of_shdist > > > onstat -g dmp 0x15588018 shcache_t > > > # Same tests with different DS_POOLSIZE / DS_HASHSIZE values ... > > > DS_POOLSIZE 4000 > > > DS_HASHSIZE 1003 > > > 0000: sh_hashsize = 1003 > > > 0004: sh_usage = 1 > > > 0008: sh_uid = 1508 > > > 000c: sh_cleaned = 0 > > > 0010: sh_cachesize = 4000 > > > 0014: sh_curcount = 1508 > > > 0018: sh_inusecount = 0 > > > 001c: sh_lowthresh = 50 > > > 0020: sh_highthresh = 75 > > > 0024: sh_pool = 0x6d172c > > > 0028: sh_entry = 0x15588050 > > > 002c: sh_lru = 0x15a2ec30 > > > 0030: sh_mru = 0x1558af88 > > > 0034: sh_mutex = 0x15585328 > > > } > > > DS_POOLSIZE 3000 > > > DS_HASHSIZE 503 > > > *(struct shcache_t *)0x15586288 = { /* sizeof = 56 = 0x38 */ > > > 0000: sh_hashsize = 503 > > > 0004: sh_usage = 1 > > > 0008: sh_uid = 1500 > > > 000c: sh_cleaned = 1 > > > 0010: sh_cachesize = 3000 > > > 0014: sh_curcount = 1499 > > > 0018: sh_inusecount = 0 > > > 001c: sh_lowthresh = 50 > > > 0020: sh_highthresh = 75 > > > 0024: sh_pool = 0x6d172c > > > 0028: sh_entry = 0x155862c0 > > > 002c: sh_lru = 0x15802830 > > > 0030: sh_mru = 0x175d4030 > > > 0034: sh_mutex = 0x15585328 > > > } > > > DS_HASHSIZE 503 > > > DS_POOLSIZE 2000 > > > *(struct shcache_t *)0x15586288 = { /* sizeof = 56 = 0x38 */ > > > 0000: sh_hashsize = 503 > > > 0004: sh_usage = 1 > > > 0008: sh_uid = 1500 > > > 000c: sh_cleaned = 501 > > > 0010: sh_cachesize = 2000 > > > 0014: sh_curcount = 999 > > > 0018: sh_inusecount = 0 > > > 001c: sh_lowthresh = 50 > > > 0020: sh_highthresh = 75 > > > 0024: sh_pool = 0x6d172c > > > 0028: sh_entry = 0x155862c0 > > > 002c: sh_lru = 0x16215830 > > > 0030: sh_mru = 0x16209be0 > > > 0034: sh_mutex = 0x15585328 > > > } > > > The tests show the influencing parameter is DS_POOLSIZE, and the reasonthe integer SH_CURCOUNT only shows half the value defined (when the pool is flooded) is down to the sh_lowtheshold value. This appears to keep the dist pool cache current value to half the pools size. > > > I can't find any where in the code which suggests why we look to be keeping this cache within the 50-75 % threshold. The functions to manipulate these values look to be generic for any cache (procedure, distribution or the dictionary) and may be part of the problem. > > > There is one place where this counter is incremented and decremented: > > > retrieve_cache() / cleanup() > > > >From the customers output of onstat -g dsc, the difference in current value (999) and that listed by an incremental counter while printing each entry (1004) I cannot replicate. I suspect this may be due to their system being in use at the time, and the values updated while this onstat was taken, and it would soon settle back to 999. So the actual poolsize is indeed 2000. > > > Something else that was noticed by Advanced Support is the output of onstat -g dsc in later IDS 9.40 versions is different, the 'Number of entries' is no longer shown: > > > InformixDynamic Server Version 9.40.UC1 -- On-Line -- Up 33 days 18:10:09 -- 117120 Kbytes > > > Distribution Cache: > > > Number of lists : 31 > > > DS_POOLSIZE : 127 > > > Distribution Cache Entries: > > > list# id ref_cnt dropped? heap_ptr distribution name > > > ----------------------------------------------------------------- > > > = = = 8< = = = > > > Total number of distribution entries: 42. > > > Number of entries in use : 0 > > > Compare this to the onstat -g dsc output you provided: > > > InformixDynamic Server Version 7.31.FD4 -- On-Line -- Up 02:41:14 -- 10290176 Kbytes > > > Distribution Cache: > > > Number of lists : 503 > > > DS_POOLSIZE : 2000 > > > Number of entries : 999 > > > Number of entries in use : 0 > > > Distribution Cache Entries: > > > list# id ref_cnt dropped? heap_ptr distribution name > > > ----------------------------------------------------------------- > > > = = = 8< = = = > > > Total number of distribution entries: 1004. > > > I cannot find any known issues or bugs that suggest why this change wasmade, but as this behaviour in 9.40 is different I am doubtful that a bug fix will be provided. Although this is not an entirely conclusive answer, it does confirm that the actual poolsize is indeed 2000. > > > Regards > > > Colin > > > There are 10 types of people in the world, those that understand binaryand those that don't > > > > Date: Tue, 16 Oct 2007 11:16:49 +0100 > > > From: richard.harn...@lineone.net > > > Subject: Re: Calculate optimal value for DS_POOLSIZE > > > To:informix-l...@iiug.org > > > > Art S. Kagel wrote: > > >> On Oct 15, 6:49 pm, Mark Jamison wrote: > > >>> Actually I believe the confusion is betwen the DS cache values withthe DD cache variables. > > > >>> DD_HASHSIZE and DD_HASHMAX mean something else. DD_HASHSIZE is the number of buckets or lists, and DD_HASHMAX is the number of entries (soft limit) per buck or list. > > > >>> DS_POOLSIZE indicates the maximum number of entries for the Distribution Cache, this is a hard limit and not a soft limit. > > >>> DS_HASHSIZE is the number of buckets or lists for the Cache. > > > >>>From the Administrator's Reference: > > > >> 'The DS_POOLSIZE parameter specifies the maximum number of entries in > > >> each hash bucket in the data-distribution cache ...' > > > >> 'The DS_HASHSIZE parameter specifies the number of hash buckets in the > > >> data-distribution cache ...' > > > >> DS_POOLSIZE is a max, but it's a max per hash bucket not an absolute > > >> total. > > > >> If there's confusion it's in the manuals and it's been there since the > > >> beginning. > > > > The manual is wrong. It works as Mark described it. > > > > The Performance Guide (ct1t9na.pdf, 4-33) contradicts the Admin-Ref: > > > > The following formula determines the number of column distributions > > > that can be stored in one bucket. > > > > Distributions_per_bucket = DS_POOLSIZE / DS_HASHSIZE > > > > To modify the number of distributions per bucket, change either the > > > DS_POOLSIZE or DS_HASHSIZE configuration parameter. > > > For example, with the default values of 127 for DS_POOLSIZE and 31 for > > > DS_HASHSIZE, you can potentially store distributions for about 127 columns > > > in the data-distribution cache. The cache has 31 hash buckets, and each hash > > > bucket can have an average of 4 entries. > > > >>> Consider that you wanted to configure no more than 4000 DD entires and no more than 4000 DS entries. > > > >>> Ignoring any suggestions for prime numbers at the moment the DD variables could look like this. > > > >>> DD_HASHSIZE 400 > > >>> DD_HASHMAX 10 > > > >>> However DS would be configured differently and could look like > > > >>> DS_POOLSIZE 4000 > > >>> DS_HASHSIZE 400 > > > >> Nope: > > > >> DS_HASHSIZE 400 > > >> DS_POOLSIZE 10 > > > > The cache seems to be cleaned to 50%. So, if you want 4000 entries, then > > > you'd need to have: > > > > DS_HASHSIZE 400 > > > DS_POOLSIZE 8000 > > > > Or, use a DS_HASHSIZE of 401 or 409, if you insist on a prime. > > > >> Art S. Kagel > > > >>> When making changes to any of the ??_HASHSIZE a prime number does give a slight performance improvement, and therefor is recommended. > > > >>> Don't ask me why we changed the way the configuration worked between the two, I'm not entirely sure. However, among the caches, DS and PC act the same, DD and STMT are completely different. > > > >>> ----- Original Message ---- > > >>> From: Art S. Kagel > > >>> To:informix-l...@iiug.org > > >>> Sent: Monday, October 15, 2007 8:49:03 AM > > >>> Subject: Re: Calculate optimal value for DS_POOLSIZE > > > >>> On Oct 13, 3:58 pm, mohitanch...@gmail.com wrote: > > >>>> Version: IDS 10 > > >>>> I am trying to determine the optimal value for DS_POOLSIZE and > > >>>> DS_HASHSIZE. I looked at the onstat -g dsc. I have following > > >>> question: > > > >>>> 1. How do I determine the correct size of DS_POOLSIZE. Should I look > > >>>> at our update stats and see number of unique columns that we run > > >>>> update statistics on and that would become the value of DS_POOLSIZE. > > >>>> 2. How do I determine correct value of number of Hash Buckets ? How > > >>>> many entries per bucket is good to have ? > > >>>> 3. When I look at onstat -g dsc I see number of distribution entries, > > >>>> should this be always lower than DS_POOLSIZE > > >>> IB that Mark and David are misinterpreting the use and meaning of > > >>> DS_HASHSIZE and DS_POOLSIZE, or at least not explaining it clearly. > > > >>> DS_HASHSIZE is the number of hash buckets for distributions and > > >>> DS_POOLSIZE is the maximum number of slots in each bucket for when > > >>> multiple distributions hash to the same bucket. DS_HASHSIZE must bea > > >>> prime number, DS_POOLSIZE should be relatively small (I've also been > > >>> told that it's better if it is non-prime). > > > >>> Columns are hashed on table and column name so if you have lots of > > >>> databases with the same tables and columns in them then you will need > > >>> a larger value for DS_POOLSIZE to match the level of hash clashes that > > >>> you have. Ideally - for optimum performance of the hash - each > > >>> distribution should hash to it's own bucket and the number of slots > > >>> per bucket should be very low. Since > > ... > > read more »- Hide quoted text - > > - Show quoted text - Sorry for multiple unwanted posting, it was because gmail not reporting back the status after hitting send. I was trying to ask: I know in general, if Informix doesn't find the distribution in the cache then it read from the disk and stores it in cache for future use. And by increasing the DS_POOLSIZE and DS_HASHSIZE we reduce the I/ O as columns are readily available in cache. I wanted to ask if there are any other downside of having these value low, something like, when there are increasing number of DB sessions user/session thread might have to wait to get the lock on the cache so that it could update the cache. Could similar problem also manifest when these values are low ? |
| ||||
| On Oct 18, 11:36 am, mohitanch...@gmail.com wrote: > On Oct 17, 5:34 pm, mohitanch...@gmail.com wrote: > > > On Oct 16, 4:10 am, Colin Dawson <cjd_1...@hotmail.com> wrote: > > > > For what it's worth, these are my settings > > > > # Dictionary Cache Size > > > DD_HASHSIZE 499 # Number of dictionary hash buckets > > > DD_HASHMAX 68 # Maximum number of dictionary entries per hash bucket > > > > # Distribution Cache Size > > > DS_HASHSIZE 503 # Number of distribution hash buckets > > > DS_POOLSIZE 4000 # Maximum number of distribution entries retained in cache > > > > # UDR Cache Size > > > PC_HASHSIZE 127 # Number of Procedure hash buckets > > > PC_POOLSIZE 2000 # Maximum number of stored procedure entries retained in cache > > > > I had some discussions with UK Tech Support about the settings and their respective values. This is their response: > > > > Reproduction: > > > > # Created 1500 tables, populated each with 500 rows > > > > # Ran update stats high on all tables > > > > # Queried each table to force the population of the distributions into the cache. > > > > # The structure of interest, hangs off of rhead_t ... > > > > onstat -g dmp 0xa000000 hlQA | grep rhead > > > > onstat -g dmp 0xa008800 rhead_t | grep sh_copy_of_shdist > > > > onstat -g dmp 0x15588018 shcache_t > > > > # Same tests with differen |