Unix Technical Forum

Re: Calculate optimal value for DS_POOLSIZE

This is a discussion on Re: Calculate optimal value for DS_POOLSIZE within the Informix forums, part of the Database Server Software category; --> Actually I believe the confusion is betwen the DS cache values with the DDcache variables. DD_HASHSIZE and DD_HASHMAX mean ...


Go Back   Unix Technical Forum > Database Server Software > Informix

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-20-2008, 05:28 PM
Mark Jamison
 
Posts: n/a
Default Re: Calculate optimal value for DS_POOLSIZE

Actually I believe the confusion is betwen the DS cache values with the DDcache variables.

DD_HASHSIZE and DD_HASHMAX mean something else. DD_HASHSIZE is the number of buckets or lists, and DD_HASHMAX is the number ofentries (soft limit) per buck or list.

DS_POOLSIZE indicates the maximum number of entries for the Distribution Cache, this is a hard limit and not a soft limit.
DS_HASHSIZE is the number of buckets or lists for the Cache.

Consider that you wanted to configure no more than 4000 DD entires and no more than 4000 DS entries.

Ignoring any suggestions for prime numbers at the moment the DD variables could look like this.

DD_HASHSIZE 400
DD_HASHMAX 10

However DS would be configured differently and could look like

DS_POOLSIZE 4000
DS_HASHSIZE 400

When making changes to any of the ??_HASHSIZE a prime number does give a slight performance improvement, and therefor is recommended.

Don't ask me why we changed the way the configuration worked between the two, I'm not entirely sure. However, among the caches, DS and PC act the same, DD and STMT are completely different.

----- Original Message ----
From: Art S. Kagel <art..kagel@gmail.com>
To: informix-list@iiug.org
Sent: Monday, October 15, 2007 8:49:03 AM
Subject: Re: Calculate optimal value for DS_POOLSIZE


On Oct 13, 3:58 pm, mohitanch...@gmail.com wrote:
> Version: IDS 10
>
> I am trying to determine the optimal value for DS_POOLSIZE and
>DS_HASHSIZE. I looked at the onstat -g dsc. I have following

question:
>
> 1. How do I determine the correct size of DS_POOLSIZE. Should I look
> at our update stats and see number of unique columns that we run
>update statistics on and that would become the value of DS_POOLSIZE.
> 2.. How do I determine correct value of number of Hash Buckets ? How
> manyentries per bucket is good to have ?
> 3. When I look at onstat -g dsc Isee number of distribution entries,
> should this be always lower than DS_POOLSIZE


IB that Mark and David are misinterpreting the use and meaning of
DS_HASHSIZE and DS_POOLSIZE, or at least not explaining it clearly..

DS_HASHSIZE is the number of hash buckets for distributions and
DS_POOLSIZE is the maximum number of slots in each bucket for when
multipledistributions hash to the same bucket. DS_HASHSIZE must be a
prime number, DS_POOLSIZE should be relatively small (I've also been
told that it'sbetter if it is non-prime).

Columns are hashed on table and column name so if you have lots of
databases with the same tables and columns in them then you will need
a larger value for DS_POOLSIZE to match the level of hash clashes that
you have. Ideally - for optimum performance of the hash - each
distribution should hash to it's own bucket and the number ofslots
per bucket should be very low. Since we cannot adjust the hash
algorithm itself, the only control we have is DS_HASHSIZE. Keeping
David's experience with Linux and large values in mind (David what is
a 'large' value in your testing?) I'd set DS_HASHSIZE to at a prime
that's largerthan the number of unique table names to minimize
clashes and set DS_POOLSIZE large enough that the product of the two
is about 1.5 times the number of distributions in SYSDISTRIB for all
databases - allowing a bit more if you know the number of databases/
tables/columns will be growing over time to put off retuning for a
while.

Art S. Kagel


_______________________________________________
Informix-list mailing list
Informix-list@iiug.org
http://www.iiug.org/mailman/listinfo/informix-list




Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-20-2008, 05:28 PM
Art S. Kagel
 
Posts: n/a
Default Re: Calculate optimal value for DS_POOLSIZE

On Oct 15, 6:49 pm, Mark Jamison <maj...@yahoo.com> wrote:
> Actually I believe the confusion is betwen the DS cache values with the DD cache variables.
>
> DD_HASHSIZE and DD_HASHMAX mean something else. DD_HASHSIZE is the number of buckets or lists, and DD_HASHMAX is the number of entries (soft limit) per buck or list.
>
> DS_POOLSIZE indicates the maximum number of entries for the Distribution Cache, this is a hard limit and not a soft limit.
> DS_HASHSIZE is the number of buckets or lists for the Cache.


>From the Administrator's Reference:


'The DS_POOLSIZE parameter specifies the maximum number of entries in
each hash bucket in the data-distribution cache ...'

'The DS_HASHSIZE parameter specifies the number of hash buckets in the
data-distribution cache ...'

DS_POOLSIZE is a max, but it's a max per hash bucket not an absolute
total.

If there's confusion it's in the manuals and it's been there since the
beginning.

>
> Consider that you wanted to configure no more than 4000 DD entires and no more than 4000 DS entries.
>
> Ignoring any suggestions for prime numbers at the moment the DD variables could look like this.
>
> DD_HASHSIZE 400
> DD_HASHMAX 10
>
> However DS would be configured differently and could look like
>
> DS_POOLSIZE 4000
> DS_HASHSIZE 400


Nope:

DS_HASHSIZE 400
DS_POOLSIZE 10

Art S. Kagel

> When making changes to any of the ??_HASHSIZE a prime number does give a slight performance improvement, and therefor is recommended.
>
> Don't ask me why we changed the way the configuration worked between the two, I'm not entirely sure. However, among the caches, DS and PC act the same, DD and STMT are completely different.
>
> ----- Original Message ----
> From: Art S. Kagel <art.ka...@gmail.com>
> To: informix-l...@iiug.org
> Sent: Monday, October 15, 2007 8:49:03 AM
> Subject: Re: Calculate optimal value for DS_POOLSIZE
>
> On Oct 13, 3:58 pm, mohitanch...@gmail.com wrote:
> > Version: IDS 10

>
> > I am trying to determine the optimal value for DS_POOLSIZE and
> > DS_HASHSIZE. I looked at the onstat -g dsc. I have following

> question:
>
> > 1. How do I determine the correct size of DS_POOLSIZE. Should I look
> > at our update stats and see number of unique columns that we run
> > update statistics on and that would become the value of DS_POOLSIZE.
> > 2. How do I determine correct value of number of Hash Buckets ? How
> > many entries per bucket is good to have ?
> > 3. When I look at onstat -g dsc I see number of distribution entries,
> > should this be always lower than DS_POOLSIZE

>
> IB that Mark and David are misinterpreting the use and meaning of
> DS_HASHSIZE and DS_POOLSIZE, or at least not explaining it clearly.
>
> DS_HASHSIZE is the number of hash buckets for distributions and
> DS_POOLSIZE is the maximum number of slots in each bucket for when
> multiple distributions hash to the same bucket. DS_HASHSIZE must be a
> prime number, DS_POOLSIZE should be relatively small (I've also been
> told that it's better if it is non-prime).
>
> Columns are hashed on table and column name so if you have lots of
> databases with the same tables and columns in them then you will need
> a larger value for DS_POOLSIZE to match the level of hash clashes that
> you have. Ideally - for optimum performance of the hash - each
> distribution should hash to it's own bucket and the number of slots
> per bucket should be very low. Since we cannot adjust the hash
> algorithm itself, the only control we have is DS_HASHSIZE. Keeping
> David's experience with Linux and large values in mind (David what is
> a 'large' value in your testing?) I'd set DS_HASHSIZE to at a prime
> that's larger than the number of unique table names to minimize
> clashes and set DS_POOLSIZE large enough that the product of the two
> is about 1.5 times the number of distributions in SYSDISTRIB for all
> databases - allowing a bit more if you know the number of databases/
> tables/columns will be growing over time to put off retuning for a
> while.
>
> Art S. Kagel
>
> _______________________________________________
> Informix-list mailing list
> Informix-l...@iiug.orghttp://www.iiug.org/mailman/listinfo/informix-list



Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-20-2008, 05:28 PM
Richard Harnden
 
Posts: n/a
Default Re: Calculate optimal value for DS_POOLSIZE

Art S. Kagel wrote:
> On Oct 15, 6:49 pm, Mark Jamison <maj...@yahoo.com> wrote:
>> Actually I believe the confusion is betwen the DS cache values with the DD cache variables.
>>
>> DD_HASHSIZE and DD_HASHMAX mean something else. DD_HASHSIZE is the number of buckets or lists, and DD_HASHMAX is the number of entries (soft limit) per buck or list.
>>
>> DS_POOLSIZE indicates the maximum number of entries for the Distribution Cache, this is a hard limit and not a soft limit.
>> DS_HASHSIZE is the number of buckets or lists for the Cache.

>
>>From the Administrator's Reference:

>
> 'The DS_POOLSIZE parameter specifies the maximum number of entries in
> each hash bucket in the data-distribution cache ...'
>
> 'The DS_HASHSIZE parameter specifies the number of hash buckets in the
> data-distribution cache ...'
>
> DS_POOLSIZE is a max, but it's a max per hash bucket not an absolute
> total.
>
> If there's confusion it's in the manuals and it's been there since the
> beginning.


The manual is wrong. It works as Mark described it.

The Performance Guide (ct1t9na.pdf, 4-33) contradicts the Admin-Ref:
<quote>
The following formula determines the number of column distributions
that can be stored in one bucket.

Distributions_per_bucket = DS_POOLSIZE / DS_HASHSIZE

To modify the number of distributions per bucket, change either the
DS_POOLSIZE or DS_HASHSIZE configuration parameter.
For example, with the default values of 127 for DS_POOLSIZE and 31 for
DS_HASHSIZE, you can potentially store distributions for about 127 columns
in the data-distribution cache. The cache has 31 hash buckets, and each hash
bucket can have an average of 4 entries.
</>

>
>> Consider that you wanted to configure no more than 4000 DD entires and no more than 4000 DS entries.
>>
>> Ignoring any suggestions for prime numbers at the moment the DD variables could look like this.
>>
>> DD_HASHSIZE 400
>> DD_HASHMAX 10
>>
>> However DS would be configured differently and could look like
>>
>> DS_POOLSIZE 4000
>> DS_HASHSIZE 400

>
> Nope:
>
> DS_HASHSIZE 400
> DS_POOLSIZE 10


The cache seems to be cleaned to 50%. So, if you want 4000 entries, then
you'd need to have:

DS_HASHSIZE 400
DS_POOLSIZE 8000

Or, use a DS_HASHSIZE of 401 or 409, if you insist on a prime.

>
> Art S. Kagel
>
>> When making changes to any of the ??_HASHSIZE a prime number does give a slight performance improvement, and therefor is recommended.
>>
>> Don't ask me why we changed the way the configuration worked between the two, I'm not entirely sure. However, among the caches, DS and PC act the same, DD and STMT are completely different.
>>
>> ----- Original Message ----
>> From: Art S. Kagel <art.ka...@gmail.com>
>> To: informix-l...@iiug.org
>> Sent: Monday, October 15, 2007 8:49:03 AM
>> Subject: Re: Calculate optimal value for DS_POOLSIZE
>>
>> On Oct 13, 3:58 pm, mohitanch...@gmail.com wrote:
>>> Version: IDS 10
>>> I am trying to determine the optimal value for DS_POOLSIZE and
>>> DS_HASHSIZE. I looked at the onstat -g dsc. I have following

>> question:
>>
>>> 1. How do I determine the correct size of DS_POOLSIZE. Should I look
>>> at our update stats and see number of unique columns that we run
>>> update statistics on and that would become the value of DS_POOLSIZE.
>>> 2. How do I determine correct value of number of Hash Buckets ? How
>>> many entries per bucket is good to have ?
>>> 3. When I look at onstat -g dsc I see number of distribution entries,
>>> should this be always lower than DS_POOLSIZE

>> IB that Mark and David are misinterpreting the use and meaning of
>> DS_HASHSIZE and DS_POOLSIZE, or at least not explaining it clearly.
>>
>> DS_HASHSIZE is the number of hash buckets for distributions and
>> DS_POOLSIZE is the maximum number of slots in each bucket for when
>> multiple distributions hash to the same bucket. DS_HASHSIZE must be a
>> prime number, DS_POOLSIZE should be relatively small (I've also been
>> told that it's better if it is non-prime).
>>
>> Columns are hashed on table and column name so if you have lots of
>> databases with the same tables and columns in them then you will need
>> a larger value for DS_POOLSIZE to match the level of hash clashes that
>> you have. Ideally - for optimum performance of the hash - each
>> distribution should hash to it's own bucket and the number of slots
>> per bucket should be very low. Since we cannot adjust the hash
>> algorithm itself, the only control we have is DS_HASHSIZE. Keeping
>> David's experience with Linux and large values in mind (David what is
>> a 'large' value in your testing?) I'd set DS_HASHSIZE to at a prime
>> that's larger than the number of unique table names to minimize
>> clashes and set DS_POOLSIZE large enough that the product of the two
>> is about 1.5 times the number of distributions in SYSDISTRIB for all
>> databases - allowing a bit more if you know the number of databases/
>> tables/columns will be growing over time to put off retuning for a
>> while.
>>
>> Art S. Kagel
>>
>> _______________________________________________
>> Informix-list mailing list
>> Informix-l...@iiug.orghttp://www.iiug.org/mailman/listinfo/informix-list

>
>

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-20-2008, 05:28 PM
Colin Dawson
 
Posts: n/a
Default RE: Calculate optimal value for DS_POOLSIZE


For what it's worth, these are my settings



# Dictionary Cache Size
DD_HASHSIZE 499 # Number of dictionary hash buckets
DD_HASHMAX 68 # Maximum number of dictionary entries per hash bucket

# Distribution Cache Size
DS_HASHSIZE 503 # Number of distribution hash buckets
DS_POOLSIZE 4000 # Maximum number of distribution entries retained in cache

# UDR Cache Size
PC_HASHSIZE 127 # Number of Procedure hash buckets
PC_POOLSIZE 2000 # Maximum number of stored procedure entries retained in cache

I had some discussions with UK Tech Support about the settings and their respective values. This is their response:



Reproduction:

# Created 1500 tables, populated each with 500 rows

# Ran update stats high on all tables

# Queried each table to force the population of the distributions into the cache.

# The structure of interest, hangs off of rhead_t ...

onstat -g dmp 0xa000000 hlQA | grep rhead

onstat -g dmp 0xa008800 rhead_t | grep sh_copy_of_shdist

onstat -g dmp 0x15588018 shcache_t

# Same tests with different DS_POOLSIZE / DS_HASHSIZE values ...

DS_POOLSIZE 4000

DS_HASHSIZE 1003

0000: sh_hashsize = 1003

0004: sh_usage = 1

0008: sh_uid = 1508

000c: sh_cleaned = 0

0010: sh_cachesize = 4000

0014: sh_curcount = 1508

0018: sh_inusecount = 0

001c: sh_lowthresh = 50

0020: sh_highthresh = 75

0024: sh_pool = 0x6d172c

0028: sh_entry = 0x15588050

002c: sh_lru = 0x15a2ec30

0030: sh_mru = 0x1558af88

0034: sh_mutex = 0x15585328

}

DS_POOLSIZE 3000

DS_HASHSIZE 503

*(struct shcache_t *)0x15586288 = { /* sizeof = 56 = 0x38 */

0000: sh_hashsize = 503

0004: sh_usage = 1

0008: sh_uid = 1500

000c: sh_cleaned = 1

0010: sh_cachesize = 3000

0014: sh_curcount = 1499

0018: sh_inusecount = 0

001c: sh_lowthresh = 50

0020: sh_highthresh = 75

0024: sh_pool = 0x6d172c

0028: sh_entry = 0x155862c0

002c: sh_lru = 0x15802830

0030: sh_mru = 0x175d4030

0034: sh_mutex = 0x15585328

}

DS_HASHSIZE 503

DS_POOLSIZE 2000

*(struct shcache_t *)0x15586288 = { /* sizeof = 56 = 0x38 */

0000: sh_hashsize = 503

0004: sh_usage = 1

0008: sh_uid = 1500

000c: sh_cleaned = 501

0010: sh_cachesize = 2000

0014: sh_curcount = 999

0018: sh_inusecount = 0

001c: sh_lowthresh = 50

0020: sh_highthresh = 75

0024: sh_pool = 0x6d172c

0028: sh_entry = 0x155862c0

002c: sh_lru = 0x16215830

0030: sh_mru = 0x16209be0

0034: sh_mutex = 0x15585328

}

The tests show the influencing parameter is DS_POOLSIZE, and the reason theinteger SH_CURCOUNT only shows half the value defined (when the pool is flooded) is down to the sh_lowtheshold value. This appears to keep the dist pool cache current value to half the pools size.

I can't find any where in the code which suggests why we look to be keepingthis cache within the 50-75 % threshold. The functions to manipulate thesevalues look to be generic for any cache (procedure, distribution or the dictionary) and may be part of the problem.

There is one place where this counter is incremented and decremented:

retrieve_cache() / cleanup()

>From the customers output of onstat -g dsc, the difference in current value(999) and that listed by an incremental counter while printing each entry (1004) I cannot replicate. I suspect this may be due to their system being in use at the time, and the values updated while this onstat was taken, andit would soon settle back to 999. So the actual poolsize is indeed 2000.


Something else that was noticed by Advanced Support is the output of onstat-g dsc in later IDS 9.40 versions is different, the 'Number of entries' isno longer shown:

Informix Dynamic Server Version 9.40.UC1 -- On-Line -- Up 33 days 18:10:09 -- 117120 Kbytes

Distribution Cache:

Number of lists : 31

DS_POOLSIZE : 127

Distribution Cache Entries:

list# id ref_cnt dropped? heap_ptr distribution name

-----------------------------------------------------------------

= = = 8< = = =

Total number of distribution entries: 42.

Number of entries in use : 0

Compare this to the onstat -g dsc output you provided:

Informix Dynamic Server Version 7.31.FD4 -- On-Line -- Up 02:41:14 -- 10290176 Kbytes

Distribution Cache:

Number of lists : 503

DS_POOLSIZE : 2000

Number of entries : 999

Number of entries in use : 0

Distribution Cache Entries:

list# id ref_cnt dropped? heap_ptr distribution name

-----------------------------------------------------------------

= = = 8< = = =

Total number of distribution entries: 1004.

I cannot find any known issues or bugs that suggest why this change was made, but as this behaviour in 9.40 is different I am doubtful that a bug fix will be provided. Although this is not an entirely conclusive answer, it does confirm that the actual poolsize is indeed 2000.






Regards

Colin



There are 10 types of people in the world, those that understand binary andthose that don't







> Date: Tue, 16 Oct 2007 11:16:49 +0100
> From: richard.harnden@lineone.net
> Subject: Re: Calculate optimal value for DS_POOLSIZE
> To: informix-list@iiug.org
>
> Art S. Kagel wrote:
>> On Oct 15, 6:49 pm, Mark Jamison wrote:
>>> Actually I believe the confusion is betwen the DS cache values with theDD cache variables.
>>>
>>> DD_HASHSIZE and DD_HASHMAX mean something else. DD_HASHSIZE is the number of buckets or lists, and DD_HASHMAX is the number of entries (soft limit) per buck or list.
>>>
>>> DS_POOLSIZE indicates the maximum number of entries for the Distribution Cache, this is a hard limit and not a soft limit.
>>> DS_HASHSIZE is the number of buckets or lists for the Cache.

>>
>>>From the Administrator's Reference:

>>
>> 'The DS_POOLSIZE parameter specifies the maximum number of entries in
>> each hash bucket in the data-distribution cache ...'
>>
>> 'The DS_HASHSIZE parameter specifies the number of hash buckets in the
>> data-distribution cache ...'
>>
>> DS_POOLSIZE is a max, but it's a max per hash bucket not an absolute
>> total.
>>
>> If there's confusion it's in the manuals and it's been there since the
>> beginning.

>
> The manual is wrong. It works as Mark described it.
>
> The Performance Guide (ct1t9na.pdf, 4-33) contradicts the Admin-Ref:
>
> The following formula determines the number of column distributions
> that can be stored in one bucket.
>
> Distributions_per_bucket = DS_POOLSIZE / DS_HASHSIZE
>
> To modify the number of distributions per bucket, change either the
> DS_POOLSIZE or DS_HASHSIZE configuration parameter.
> For example, with the default values of 127 for DS_POOLSIZE and 31 for
> DS_HASHSIZE, you can potentially store distributions for about 127 columns
> in the data-distribution cache. The cache has 31 hash buckets, and each hash
> bucket can have an average of 4 entries.
>
>
>>
>>> Consider that you wanted to configure no more than 4000 DD entires and no more than 4000 DS entries.
>>>
>>> Ignoring any suggestions for prime numbers at the moment the DD variables could look like this.
>>>
>>> DD_HASHSIZE 400
>>> DD_HASHMAX 10
>>>
>>> However DS would be configured differently and could look like
>>>
>>> DS_POOLSIZE 4000
>>> DS_HASHSIZE 400

>>
>> Nope:
>>
>> DS_HASHSIZE 400
>> DS_POOLSIZE 10

>
> The cache seems to be cleaned to 50%. So, if you want 4000 entries, then
> you'd need to have:
>
> DS_HASHSIZE 400
> DS_POOLSIZE 8000
>
> Or, use a DS_HASHSIZE of 401 or 409, if you insist on a prime.
>
>>
>> Art S. Kagel
>>
>>> When making changes to any of the ??_HASHSIZE a prime number does give a slight performance improvement, and therefor is recommended.
>>>
>>> Don't ask me why we changed the way the configuration worked between the two, I'm not entirely sure. However, among the caches, DS and PC act the same, DD and STMT are completely different.
>>>
>>> ----- Original Message ----
>>> From: Art S. Kagel
>>> To: informix-l...@iiug.org
>>> Sent: Monday, October 15, 2007 8:49:03 AM
>>> Subject: Re: Calculate optimal value for DS_POOLSIZE
>>>
>>> On Oct 13, 3:58 pm, mohitanch...@gmail.com wrote:
>>>> Version: IDS 10
>>>> I am trying to determine the optimal value for DS_POOLSIZE and
>>>> DS_HASHSIZE. I looked at the onstat -g dsc. I have following
>>> question:
>>>
>>>> 1. How do I determine the correct size of DS_POOLSIZE. Should I look
>>>> at our update stats and see number of unique columns that we run
>>>> update statistics on and that would become the value of DS_POOLSIZE.
>>>> 2. How do I determine correct value of number of Hash Buckets ? How
>>>> many entries per bucket is good to have ?
>>>> 3. When I look at onstat -g dsc I see number of distribution entries,
>>>> should this be always lower than DS_POOLSIZE
>>> IB that Mark and David are misinterpreting the use and meaning of
>>> DS_HASHSIZE and DS_POOLSIZE, or at least not explaining it clearly.
>>>
>>> DS_HASHSIZE is the number of hash buckets for distributions and
>>> DS_POOLSIZE is the maximum number of slots in each bucket for when
>>> multiple distributions hash to the same bucket. DS_HASHSIZE must be a
>>> prime number, DS_POOLSIZE should be relatively small (I've also been
>>> told that it's better if it is non-prime).
>>>
>>> Columns are hashed on table and column name so if you have lots of
>>> databases with the same tables and columns in them then you will need
>>> a larger value for DS_POOLSIZE to match the level of hash clashes that
>>> you have. Ideally - for optimum performance of the hash - each
>>> distribution should hash to it's own bucket and the number of slots
>>> per bucket should be very low. Since we cannot adjust the hash
>>> algorithm itself, the only control we have is DS_HASHSIZE. Keeping
>>> David's experience with Linux and large values in mind (David what is
>>> a 'large' value in your testing?) I'd set DS_HASHSIZE to at a prime
>>> that's larger than the number of unique table names to minimize
>>> clashes and set DS_POOLSIZE large enough that the product of the two
>>> is about 1.5 times the number of distributions in SYSDISTRIB for all
>>> databases - allowing a bit more if you know the number of databases/
>>> tables/columns will be growing over time to put off retuning for a
>>> while.
>>>
>>> Art S. Kagel
>>>
>>> _______________________________________________
>>> Informix-list mailing list
>>> Informix-l...@iiug.orghttp://www.iiug.org/mailman/listinfo/informix-list

>>
>>

> _______________________________________________
> Informix-list mailing list
> Informix-list@iiug.org
> http://www.iiug.org/mailman/listinfo/informix-list


__________________________________________________ _______________
Get free emoticon packs and customisation from Windows Live.
http://www.pimpmylive.co.uk
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-20-2008, 05:29 PM
mohitanchlia@gmail.com
 
Posts: n/a
Default Re: Calculate optimal value for DS_POOLSIZE

On Oct 16, 4:10 am, Colin Dawson <cjd_1...@hotmail.com> wrote:
> For what it's worth, these are my settings
>
> # Dictionary Cache Size
> DD_HASHSIZE 499 # Number of dictionary hash buckets
> DD_HASHMAX 68 # Maximum number of dictionary entries per hash bucket
>
> # Distribution Cache Size
> DS_HASHSIZE 503 # Number of distribution hash buckets
> DS_POOLSIZE 4000 # Maximum number of distribution entries retained in cache
>
> # UDR Cache Size
> PC_HASHSIZE 127 # Number of Procedure hash buckets
> PC_POOLSIZE 2000 # Maximum number of stored procedure entries retained in cache
>
> I had some discussions with UK Tech Support about the settings and their respective values. This is their response:
>
> Reproduction:
>
> # Created 1500 tables, populated each with 500 rows
>
> # Ran update stats high on all tables
>
> # Queried each table to force the population of the distributions into the cache.
>
> # The structure of interest, hangs off of rhead_t ...
>
> onstat -g dmp 0xa000000 hlQA | grep rhead
>
> onstat -g dmp 0xa008800 rhead_t | grep sh_copy_of_shdist
>
> onstat -g dmp 0x15588018 shcache_t
>
> # Same tests with different DS_POOLSIZE / DS_HASHSIZE values ...
>
> DS_POOLSIZE 4000
>
> DS_HASHSIZE 1003
>
> 0000: sh_hashsize = 1003
>
> 0004: sh_usage = 1
>
> 0008: sh_uid = 1508
>
> 000c: sh_cleaned = 0
>
> 0010: sh_cachesize = 4000
>
> 0014: sh_curcount = 1508
>
> 0018: sh_inusecount = 0
>
> 001c: sh_lowthresh = 50
>
> 0020: sh_highthresh = 75
>
> 0024: sh_pool = 0x6d172c
>
> 0028: sh_entry = 0x15588050
>
> 002c: sh_lru = 0x15a2ec30
>
> 0030: sh_mru = 0x1558af88
>
> 0034: sh_mutex = 0x15585328
>
> }
>
> DS_POOLSIZE 3000
>
> DS_HASHSIZE 503
>
> *(struct shcache_t *)0x15586288 = { /* sizeof = 56 = 0x38 */
>
> 0000: sh_hashsize = 503
>
> 0004: sh_usage = 1
>
> 0008: sh_uid = 1500
>
> 000c: sh_cleaned = 1
>
> 0010: sh_cachesize = 3000
>
> 0014: sh_curcount = 1499
>
> 0018: sh_inusecount = 0
>
> 001c: sh_lowthresh = 50
>
> 0020: sh_highthresh = 75
>
> 0024: sh_pool = 0x6d172c
>
> 0028: sh_entry = 0x155862c0
>
> 002c: sh_lru = 0x15802830
>
> 0030: sh_mru = 0x175d4030
>
> 0034: sh_mutex = 0x15585328
>
> }
>
> DS_HASHSIZE 503
>
> DS_POOLSIZE 2000
>
> *(struct shcache_t *)0x15586288 = { /* sizeof = 56 = 0x38 */
>
> 0000: sh_hashsize = 503
>
> 0004: sh_usage = 1
>
> 0008: sh_uid = 1500
>
> 000c: sh_cleaned = 501
>
> 0010: sh_cachesize = 2000
>
> 0014: sh_curcount = 999
>
> 0018: sh_inusecount = 0
>
> 001c: sh_lowthresh = 50
>
> 0020: sh_highthresh = 75
>
> 0024: sh_pool = 0x6d172c
>
> 0028: sh_entry = 0x155862c0
>
> 002c: sh_lru = 0x16215830
>
> 0030: sh_mru = 0x16209be0
>
> 0034: sh_mutex = 0x15585328
>
> }
>
> The tests show the influencing parameter is DS_POOLSIZE, and the reason the integer SH_CURCOUNT only shows half the value defined (when the pool is flooded) is down to the sh_lowtheshold value. This appears to keep the distpool cache current value to half the pools size.
>
> I can't find any where in the code which suggests why we look to be keeping this cache within the 50-75 % threshold. The functions to manipulate these values look to be generic for any cache (procedure, distribution or the dictionary) and may be part of the problem.
>
> There is one place where this counter is incremented and decremented:
>
> retrieve_cache() / cleanup()
>
> >From the customers output of onstat -g dsc, the difference in current value (999) and that listed by an incremental counter while printing each entry (1004) I cannot replicate. I suspect this may be due to their system being in use at the time, and the values updated while this onstat was taken, and it would soon settle back to 999. So the actual poolsize is indeed 2000.

>
> Something else that was noticed by Advanced Support is the output of onstat -g dsc in later IDS 9.40 versions is different, the 'Number of entries' is no longer shown:
>
> InformixDynamic Server Version 9.40.UC1 -- On-Line -- Up 33 days 18:10:09-- 117120 Kbytes
>
> Distribution Cache:
>
> Number of lists : 31
>
> DS_POOLSIZE : 127
>
> Distribution Cache Entries:
>
> list# id ref_cnt dropped? heap_ptr distribution name
>
> -----------------------------------------------------------------
>
> = = = 8< = = =
>
> Total number of distribution entries: 42.
>
> Number of entries in use : 0
>
> Compare this to the onstat -g dsc output you provided:
>
> InformixDynamic Server Version 7.31.FD4 -- On-Line -- Up 02:41:14 -- 10290176 Kbytes
>
> Distribution Cache:
>
> Number of lists : 503
>
> DS_POOLSIZE : 2000
>
> Number of entries : 999
>
> Number of entries in use : 0
>
> Distribution Cache Entries:
>
> list# id ref_cnt dropped? heap_ptr distribution name
>
> -----------------------------------------------------------------
>
> = = = 8< = = =
>
> Total number of distribution entries: 1004.
>
> I cannot find any known issues or bugs that suggest why this change was made, but as this behaviour in 9.40 is different I am doubtful that a bug fix will be provided. Although this is not an entirely conclusive answer, it does confirm that the actual poolsize is indeed 2000.
>
> Regards
>
> Colin
>
> There are 10 types of people in the world, those that understand binary and those that don't
>
>
>
> > Date: Tue, 16 Oct 2007 11:16:49 +0100
> > From: richard.harn...@lineone.net
> > Subject: Re: Calculate optimal value for DS_POOLSIZE
> > To:informix-l...@iiug.org

>
> > Art S. Kagel wrote:
> >> On Oct 15, 6:49 pm, Mark Jamison wrote:
> >>> Actually I believe the confusion is betwen the DS cache values with the DD cache variables.

>
> >>> DD_HASHSIZE and DD_HASHMAX mean something else. DD_HASHSIZE is the number of buckets or lists, and DD_HASHMAX is the number of entries (soft limit) per buck or list.

>
> >>> DS_POOLSIZE indicates the maximum number of entries for the Distribution Cache, this is a hard limit and not a soft limit.
> >>> DS_HASHSIZE is the number of buckets or lists for the Cache.

>
> >>>From the Administrator's Reference:

>
> >> 'The DS_POOLSIZE parameter specifies the maximum number of entries in
> >> each hash bucket in the data-distribution cache ...'

>
> >> 'The DS_HASHSIZE parameter specifies the number of hash buckets in the
> >> data-distribution cache ...'

>
> >> DS_POOLSIZE is a max, but it's a max per hash bucket not an absolute
> >> total.

>
> >> If there's confusion it's in the manuals and it's been there since the
> >> beginning.

>
> > The manual is wrong. It works as Mark described it.

>
> > The Performance Guide (ct1t9na.pdf, 4-33) contradicts the Admin-Ref:

>
> > The following formula determines the number of column distributions
> > that can be stored in one bucket.

>
> > Distributions_per_bucket = DS_POOLSIZE / DS_HASHSIZE

>
> > To modify the number of distributions per bucket, change either the
> > DS_POOLSIZE or DS_HASHSIZE configuration parameter.
> > For example, with the default values of 127 for DS_POOLSIZE and 31 for
> > DS_HASHSIZE, you can potentially store distributions for about 127 columns
> > in the data-distribution cache. The cache has 31 hash buckets, and eachhash
> > bucket can have an average of 4 entries.

>
> >>> Consider that you wanted to configure no more than 4000 DD entires and no more than 4000 DS entries.

>
> >>> Ignoring any suggestions for prime numbers at the moment the DD variables could look like this.

>
> >>> DD_HASHSIZE 400
> >>> DD_HASHMAX 10

>
> >>> However DS would be configured differently and could look like

>
> >>> DS_POOLSIZE 4000
> >>> DS_HASHSIZE 400

>
> >> Nope:

>
> >> DS_HASHSIZE 400
> >> DS_POOLSIZE 10

>
> > The cache seems to be cleaned to 50%. So, if you want 4000 entries, then
> > you'd need to have:

>
> > DS_HASHSIZE 400
> > DS_POOLSIZE 8000

>
> > Or, use a DS_HASHSIZE of 401 or 409, if you insist on a prime.

>
> >> Art S. Kagel

>
> >>> When making changes to any of the ??_HASHSIZE a prime number does give a slight performance improvement, and therefor is recommended.

>
> >>> Don't ask me why we changed the way the configuration worked between the two, I'm not entirely sure. However, among the caches, DS and PC act the same, DD and STMT are completely different.

>
> >>> ----- Original Message ----
> >>> From: Art S. Kagel
> >>> To:informix-l...@iiug.org
> >>> Sent: Monday, October 15, 2007 8:49:03 AM
> >>> Subject: Re: Calculate optimal value for DS_POOLSIZE

>
> >>> On Oct 13, 3:58 pm, mohitanch...@gmail.com wrote:
> >>>> Version: IDS 10
> >>>> I am trying to determine the optimal value for DS_POOLSIZE and
> >>>> DS_HASHSIZE. I looked at the onstat -g dsc. I have following
> >>> question:

>
> >>>> 1. How do I determine the correct size of DS_POOLSIZE. Should I look
> >>>> at our update stats and see number of unique columns that we run
> >>>> update statistics on and that would become the value of DS_POOLSIZE.
> >>>> 2. How do I determine correct value of number of Hash Buckets ? How
> >>>> many entries per bucket is good to have ?
> >>>> 3. When I look at onstat -g dsc I see number of distribution entries,
> >>>> should this be always lower than DS_POOLSIZE
> >>> IB that Mark and David are misinterpreting the use and meaning of
> >>> DS_HASHSIZE and DS_POOLSIZE, or at least not explaining it clearly.

>
> >>> DS_HASHSIZE is the number of hash buckets for distributions and
> >>> DS_POOLSIZE is the maximum number of slots in each bucket for when
> >>> multiple distributions hash to the same bucket. DS_HASHSIZE must be a
> >>> prime number, DS_POOLSIZE should be relatively small (I've also been
> >>> told that it's better if it is non-prime).

>
> >>> Columns are hashed on table and column name so if you have lots of
> >>> databases with the same tables and columns in them then you will need
> >>> a larger value for DS_POOLSIZE to match the level of hash clashes that
> >>> you have. Ideally - for optimum performance of the hash - each
> >>> distribution should hash to it's own bucket and the number of slots
> >>> per bucket should be very low. Since we cannot adjust the hash
> >>> algorithm itself, the only control we have is DS_HASHSIZE. Keeping
> >>> David's experience with Linux and large values in mind (David what is
> >>> a 'large' value in your testing?) I'd set DS_HASHSIZE to at a prime
> >>> that's larger than the number of unique table names to minimize
> >>> clashes and set DS_POOLSIZE large enough that the product of the two
> >>> is about 1.5 times the number of distributions in SYSDISTRIB for all

>
> ...
>
> read more »- Hide quoted text -
>
> - Show quoted text -


What happens in multi user environment when distriution cache is not
large and multiple session try to write to this cache. I know in
general, distribution cache will have to read from disk if it doesn't
find the distribution for that table in cache, but could it also be
potentially be waiting on acquiring locks, mutexes etc if this value
is less. I am trying to determine implications of having these value
less other than the one I already know.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-20-2008, 05:29 PM
mohitanchlia@gmail.com
 
Posts: n/a
Default Re: Calculate optimal value for DS_POOLSIZE

On Oct 16, 4:10 am, Colin Dawson <cjd_1...@hotmail.com> wrote:
> For what it's worth, these are my settings
>
> # Dictionary Cache Size
> DD_HASHSIZE 499 # Number of dictionary hash buckets
> DD_HASHMAX 68 # Maximum number of dictionary entries per hash bucket
>
> # Distribution Cache Size
> DS_HASHSIZE 503 # Number of distribution hash buckets
> DS_POOLSIZE 4000 # Maximum number of distribution entries retained in cache
>
> # UDR Cache Size
> PC_HASHSIZE 127 # Number of Procedure hash buckets
> PC_POOLSIZE 2000 # Maximum number of stored procedure entries retained in cache
>
> I had some discussions with UK Tech Support about the settings and their respective values. This is their response:
>
> Reproduction:
>
> # Created 1500 tables, populated each with 500 rows
>
> # Ran update stats high on all tables
>
> # Queried each table to force the population of the distributions into the cache.
>
> # The structure of interest, hangs off of rhead_t ...
>
> onstat -g dmp 0xa000000 hlQA | grep rhead
>
> onstat -g dmp 0xa008800 rhead_t | grep sh_copy_of_shdist
>
> onstat -g dmp 0x15588018 shcache_t
>
> # Same tests with different DS_POOLSIZE / DS_HASHSIZE values ...
>
> DS_POOLSIZE 4000
>
> DS_HASHSIZE 1003
>
> 0000: sh_hashsize = 1003
>
> 0004: sh_usage = 1
>
> 0008: sh_uid = 1508
>
> 000c: sh_cleaned = 0
>
> 0010: sh_cachesize = 4000
>
> 0014: sh_curcount = 1508
>
> 0018: sh_inusecount = 0
>
> 001c: sh_lowthresh = 50
>
> 0020: sh_highthresh = 75
>
> 0024: sh_pool = 0x6d172c
>
> 0028: sh_entry = 0x15588050
>
> 002c: sh_lru = 0x15a2ec30
>
> 0030: sh_mru = 0x1558af88
>
> 0034: sh_mutex = 0x15585328
>
> }
>
> DS_POOLSIZE 3000
>
> DS_HASHSIZE 503
>
> *(struct shcache_t *)0x15586288 = { /* sizeof = 56 = 0x38 */
>
> 0000: sh_hashsize = 503
>
> 0004: sh_usage = 1
>
> 0008: sh_uid = 1500
>
> 000c: sh_cleaned = 1
>
> 0010: sh_cachesize = 3000
>
> 0014: sh_curcount = 1499
>
> 0018: sh_inusecount = 0
>
> 001c: sh_lowthresh = 50
>
> 0020: sh_highthresh = 75
>
> 0024: sh_pool = 0x6d172c
>
> 0028: sh_entry = 0x155862c0
>
> 002c: sh_lru = 0x15802830
>
> 0030: sh_mru = 0x175d4030
>
> 0034: sh_mutex = 0x15585328
>
> }
>
> DS_HASHSIZE 503
>
> DS_POOLSIZE 2000
>
> *(struct shcache_t *)0x15586288 = { /* sizeof = 56 = 0x38 */
>
> 0000: sh_hashsize = 503
>
> 0004: sh_usage = 1
>
> 0008: sh_uid = 1500
>
> 000c: sh_cleaned = 501
>
> 0010: sh_cachesize = 2000
>
> 0014: sh_curcount = 999
>
> 0018: sh_inusecount = 0
>
> 001c: sh_lowthresh = 50
>
> 0020: sh_highthresh = 75
>
> 0024: sh_pool = 0x6d172c
>
> 0028: sh_entry = 0x155862c0
>
> 002c: sh_lru = 0x16215830
>
> 0030: sh_mru = 0x16209be0
>
> 0034: sh_mutex = 0x15585328
>
> }
>
> The tests show the influencing parameter is DS_POOLSIZE, and the reason the integer SH_CURCOUNT only shows half the value defined (when the pool is flooded) is down to the sh_lowtheshold value. This appears to keep the distpool cache current value to half the pools size.
>
> I can't find any where in the code which suggests why we look to be keeping this cache within the 50-75 % threshold. The functions to manipulate these values look to be generic for any cache (procedure, distribution or the dictionary) and may be part of the problem.
>
> There is one place where this counter is incremented and decremented:
>
> retrieve_cache() / cleanup()
>
> >From the customers output of onstat -g dsc, the difference in current value (999) and that listed by an incremental counter while printing each entry (1004) I cannot replicate. I suspect this may be due to their system being in use at the time, and the values updated while this onstat was taken, and it would soon settle back to 999. So the actual poolsize is indeed 2000.

>
> Something else that was noticed by Advanced Support is the output of onstat -g dsc in later IDS 9.40 versions is different, the 'Number of entries' is no longer shown:
>
> InformixDynamic Server Version 9.40.UC1 -- On-Line -- Up 33 days 18:10:09-- 117120 Kbytes
>
> Distribution Cache:
>
> Number of lists : 31
>
> DS_POOLSIZE : 127
>
> Distribution Cache Entries:
>
> list# id ref_cnt dropped? heap_ptr distribution name
>
> -----------------------------------------------------------------
>
> = = = 8< = = =
>
> Total number of distribution entries: 42.
>
> Number of entries in use : 0
>
> Compare this to the onstat -g dsc output you provided:
>
> InformixDynamic Server Version 7.31.FD4 -- On-Line -- Up 02:41:14 -- 10290176 Kbytes
>
> Distribution Cache:
>
> Number of lists : 503
>
> DS_POOLSIZE : 2000
>
> Number of entries : 999
>
> Number of entries in use : 0
>
> Distribution Cache Entries:
>
> list# id ref_cnt dropped? heap_ptr distribution name
>
> -----------------------------------------------------------------
>
> = = = 8< = = =
>
> Total number of distribution entries: 1004.
>
> I cannot find any known issues or bugs that suggest why this change was made, but as this behaviour in 9.40 is different I am doubtful that a bug fix will be provided. Although this is not an entirely conclusive answer, it does confirm that the actual poolsize is indeed 2000.
>
> Regards
>
> Colin
>
> There are 10 types of people in the world, those that understand binary and those that don't
>
>
>
> > Date: Tue, 16 Oct 2007 11:16:49 +0100
> > From: richard.harn...@lineone.net
> > Subject: Re: Calculate optimal value for DS_POOLSIZE
> > To:informix-l...@iiug.org

>
> > Art S. Kagel wrote:
> >> On Oct 15, 6:49 pm, Mark Jamison wrote:
> >>> Actually I believe the confusion is betwen the DS cache values with the DD cache variables.

>
> >>> DD_HASHSIZE and DD_HASHMAX mean something else. DD_HASHSIZE is the number of buckets or lists, and DD_HASHMAX is the number of entries (soft limit) per buck or list.

>
> >>> DS_POOLSIZE indicates the maximum number of entries for the Distribution Cache, this is a hard limit and not a soft limit.
> >>> DS_HASHSIZE is the number of buckets or lists for the Cache.

>
> >>>From the Administrator's Reference:

>
> >> 'The DS_POOLSIZE parameter specifies the maximum number of entries in
> >> each hash bucket in the data-distribution cache ...'

>
> >> 'The DS_HASHSIZE parameter specifies the number of hash buckets in the
> >> data-distribution cache ...'

>
> >> DS_POOLSIZE is a max, but it's a max per hash bucket not an absolute
> >> total.

>
> >> If there's confusion it's in the manuals and it's been there since the
> >> beginning.

>
> > The manual is wrong. It works as Mark described it.

>
> > The Performance Guide (ct1t9na.pdf, 4-33) contradicts the Admin-Ref:

>
> > The following formula determines the number of column distributions
> > that can be stored in one bucket.

>
> > Distributions_per_bucket = DS_POOLSIZE / DS_HASHSIZE

>
> > To modify the number of distributions per bucket, change either the
> > DS_POOLSIZE or DS_HASHSIZE configuration parameter.
> > For example, with the default values of 127 for DS_POOLSIZE and 31 for
> > DS_HASHSIZE, you can potentially store distributions for about 127 columns
> > in the data-distribution cache. The cache has 31 hash buckets, and eachhash
> > bucket can have an average of 4 entries.

>
> >>> Consider that you wanted to configure no more than 4000 DD entires and no more than 4000 DS entries.

>
> >>> Ignoring any suggestions for prime numbers at the moment the DD variables could look like this.

>
> >>> DD_HASHSIZE 400
> >>> DD_HASHMAX 10

>
> >>> However DS would be configured differently and could look like

>
> >>> DS_POOLSIZE 4000
> >>> DS_HASHSIZE 400

>
> >> Nope:

>
> >> DS_HASHSIZE 400
> >> DS_POOLSIZE 10

>
> > The cache seems to be cleaned to 50%. So, if you want 4000 entries, then
> > you'd need to have:

>
> > DS_HASHSIZE 400
> > DS_POOLSIZE 8000

>
> > Or, use a DS_HASHSIZE of 401 or 409, if you insist on a prime.

>
> >> Art S. Kagel

>
> >>> When making changes to any of the ??_HASHSIZE a prime number does give a slight performance improvement, and therefor is recommended.

>
> >>> Don't ask me why we changed the way the configuration worked between the two, I'm not entirely sure. However, among the caches, DS and PC act the same, DD and STMT are completely different.

>
> >>> ----- Original Message ----
> >>> From: Art S. Kagel
> >>> To:informix-l...@iiug.org
> >>> Sent: Monday, October 15, 2007 8:49:03 AM
> >>> Subject: Re: Calculate optimal value for DS_POOLSIZE

>
> >>> On Oct 13, 3:58 pm, mohitanch...@gmail.com wrote:
> >>>> Version: IDS 10
> >>>> I am trying to determine the optimal value for DS_POOLSIZE and
> >>>> DS_HASHSIZE. I looked at the onstat -g dsc. I have following
> >>> question:

>
> >>>> 1. How do I determine the correct size of DS_POOLSIZE. Should I look
> >>>> at our update stats and see number of unique columns that we run
> >>>> update statistics on and that would become the value of DS_POOLSIZE.
> >>>> 2. How do I determine correct value of number of Hash Buckets ? How
> >>>> many entries per bucket is good to have ?
> >>>> 3. When I look at onstat -g dsc I see number of distribution entries,
> >>>> should this be always lower than DS_POOLSIZE
> >>> IB that Mark and David are misinterpreting the use and meaning of
> >>> DS_HASHSIZE and DS_POOLSIZE, or at least not explaining it clearly.

>
> >>> DS_HASHSIZE is the number of hash buckets for distributions and
> >>> DS_POOLSIZE is the maximum number of slots in each bucket for when
> >>> multiple distributions hash to the same bucket. DS_HASHSIZE must be a
> >>> prime number, DS_POOLSIZE should be relatively small (I've also been
> >>> told that it's better if it is non-prime).

>
> >>> Columns are hashed on table and column name so if you have lots of
> >>> databases with the same tables and columns in them then you will need
> >>> a larger value for DS_POOLSIZE to match the level of hash clashes that
> >>> you have. Ideally - for optimum performance of the hash - each
> >>> distribution should hash to it's own bucket and the number of slots
> >>> per bucket should be very low. Since we cannot adjust the hash
> >>> algorithm itself, the only control we have is DS_HASHSIZE. Keeping
> >>> David's experience with Linux and large values in mind (David what is
> >>> a 'large' value in your testing?) I'd set DS_HASHSIZE to at a prime
> >>> that's larger than the number of unique table names to minimize
> >>> clashes and set DS_POOLSIZE large enough that the product of the two
> >>> is about 1.5 times the number of distributions in SYSDISTRIB for all

>
> ...
>
> read more »- Hide quoted text -
>
> - Show quoted text -


What happens in multi user environment when distriution cache is not
large and multiple session try to write to this cache. I know in
general, distribution cache will have to read from disk if it doesn't
find the distribution for that table in cache, but could it also be
potentially be waiting on acquiring locks, mutexes etc if this value
is less. I am trying to determine implications of having these value
less other than the one I already know.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-20-2008, 05:29 PM
mohitanchlia@gmail.com
 
Posts: n/a
Default Re: Calculate optimal value for DS_POOLSIZE

On Oct 17, 5:34 pm, mohitanch...@gmail.com wrote:
> On Oct 16, 4:10 am, Colin Dawson <cjd_1...@hotmail.com> wrote:
>
>
>
> > For what it's worth, these are my settings

>
> > # Dictionary Cache Size
> > DD_HASHSIZE 499 # Number of dictionary hash buckets
> > DD_HASHMAX 68 # Maximum number of dictionary entries per hash bucket

>
> > # Distribution Cache Size
> > DS_HASHSIZE 503 # Number of distribution hash buckets
> > DS_POOLSIZE 4000 # Maximum number of distribution entries retained in cache

>
> > # UDR Cache Size
> > PC_HASHSIZE 127 # Number of Procedure hash buckets
> > PC_POOLSIZE 2000 # Maximum number of stored procedure entries retained in cache

>
> > I had some discussions with UK Tech Support about the settings and their respective values. This is their response:

>
> > Reproduction:

>
> > # Created 1500 tables, populated each with 500 rows

>
> > # Ran update stats high on all tables

>
> > # Queried each table to force the population of the distributions into the cache.

>
> > # The structure of interest, hangs off of rhead_t ...

>
> > onstat -g dmp 0xa000000 hlQA | grep rhead

>
> > onstat -g dmp 0xa008800 rhead_t | grep sh_copy_of_shdist

>
> > onstat -g dmp 0x15588018 shcache_t

>
> > # Same tests with different DS_POOLSIZE / DS_HASHSIZE values ...

>
> > DS_POOLSIZE 4000

>
> > DS_HASHSIZE 1003

>
> > 0000: sh_hashsize = 1003

>
> > 0004: sh_usage = 1

>
> > 0008: sh_uid = 1508

>
> > 000c: sh_cleaned = 0

>
> > 0010: sh_cachesize = 4000

>
> > 0014: sh_curcount = 1508

>
> > 0018: sh_inusecount = 0

>
> > 001c: sh_lowthresh = 50

>
> > 0020: sh_highthresh = 75

>
> > 0024: sh_pool = 0x6d172c

>
> > 0028: sh_entry = 0x15588050

>
> > 002c: sh_lru = 0x15a2ec30

>
> > 0030: sh_mru = 0x1558af88

>
> > 0034: sh_mutex = 0x15585328

>
> > }

>
> > DS_POOLSIZE 3000

>
> > DS_HASHSIZE 503

>
> > *(struct shcache_t *)0x15586288 = { /* sizeof = 56 = 0x38 */

>
> > 0000: sh_hashsize = 503

>
> > 0004: sh_usage = 1

>
> > 0008: sh_uid = 1500

>
> > 000c: sh_cleaned = 1

>
> > 0010: sh_cachesize = 3000

>
> > 0014: sh_curcount = 1499

>
> > 0018: sh_inusecount = 0

>
> > 001c: sh_lowthresh = 50

>
> > 0020: sh_highthresh = 75

>
> > 0024: sh_pool = 0x6d172c

>
> > 0028: sh_entry = 0x155862c0

>
> > 002c: sh_lru = 0x15802830

>
> > 0030: sh_mru = 0x175d4030

>
> > 0034: sh_mutex = 0x15585328

>
> > }

>
> > DS_HASHSIZE 503

>
> > DS_POOLSIZE 2000

>
> > *(struct shcache_t *)0x15586288 = { /* sizeof = 56 = 0x38 */

>
> > 0000: sh_hashsize = 503

>
> > 0004: sh_usage = 1

>
> > 0008: sh_uid = 1500

>
> > 000c: sh_cleaned = 501

>
> > 0010: sh_cachesize = 2000

>
> > 0014: sh_curcount = 999

>
> > 0018: sh_inusecount = 0

>
> > 001c: sh_lowthresh = 50

>
> > 0020: sh_highthresh = 75

>
> > 0024: sh_pool = 0x6d172c

>
> > 0028: sh_entry = 0x155862c0

>
> > 002c: sh_lru = 0x16215830

>
> > 0030: sh_mru = 0x16209be0

>
> > 0034: sh_mutex = 0x15585328

>
> > }

>
> > The tests show the influencing parameter is DS_POOLSIZE, and the reasonthe integer SH_CURCOUNT only shows half the value defined (when the pool is flooded) is down to the sh_lowtheshold value. This appears to keep the dist pool cache current value to half the pools size.

>
> > I can't find any where in the code which suggests why we look to be keeping this cache within the 50-75 % threshold. The functions to manipulate these values look to be generic for any cache (procedure, distribution or the dictionary) and may be part of the problem.

>
> > There is one place where this counter is incremented and decremented:

>
> > retrieve_cache() / cleanup()

>
> > >From the customers output of onstat -g dsc, the difference in current value (999) and that listed by an incremental counter while printing each entry (1004) I cannot replicate. I suspect this may be due to their system being in use at the time, and the values updated while this onstat was taken, and it would soon settle back to 999. So the actual poolsize is indeed 2000.

>
> > Something else that was noticed by Advanced Support is the output of onstat -g dsc in later IDS 9.40 versions is different, the 'Number of entries' is no longer shown:

>
> > InformixDynamic Server Version 9.40.UC1 -- On-Line -- Up 33 days 18:10:09 -- 117120 Kbytes

>
> > Distribution Cache:

>
> > Number of lists : 31

>
> > DS_POOLSIZE : 127

>
> > Distribution Cache Entries:

>
> > list# id ref_cnt dropped? heap_ptr distribution name

>
> > -----------------------------------------------------------------

>
> > = = = 8< = = =

>
> > Total number of distribution entries: 42.

>
> > Number of entries in use : 0

>
> > Compare this to the onstat -g dsc output you provided:

>
> > InformixDynamic Server Version 7.31.FD4 -- On-Line -- Up 02:41:14 -- 10290176 Kbytes

>
> > Distribution Cache:

>
> > Number of lists : 503

>
> > DS_POOLSIZE : 2000

>
> > Number of entries : 999

>
> > Number of entries in use : 0

>
> > Distribution Cache Entries:

>
> > list# id ref_cnt dropped? heap_ptr distribution name

>
> > -----------------------------------------------------------------

>
> > = = = 8< = = =

>
> > Total number of distribution entries: 1004.

>
> > I cannot find any known issues or bugs that suggest why this change wasmade, but as this behaviour in 9.40 is different I am doubtful that a bug fix will be provided. Although this is not an entirely conclusive answer, it does confirm that the actual poolsize is indeed 2000.

>
> > Regards

>
> > Colin

>
> > There are 10 types of people in the world, those that understand binaryand those that don't

>
> > > Date: Tue, 16 Oct 2007 11:16:49 +0100
> > > From: richard.harn...@lineone.net
> > > Subject: Re: Calculate optimal value for DS_POOLSIZE
> > > To:informix-l...@iiug.org

>
> > > Art S. Kagel wrote:
> > >> On Oct 15, 6:49 pm, Mark Jamison wrote:
> > >>> Actually I believe the confusion is betwen the DS cache values withthe DD cache variables.

>
> > >>> DD_HASHSIZE and DD_HASHMAX mean something else. DD_HASHSIZE is the number of buckets or lists, and DD_HASHMAX is the number of entries (soft limit) per buck or list.

>
> > >>> DS_POOLSIZE indicates the maximum number of entries for the Distribution Cache, this is a hard limit and not a soft limit.
> > >>> DS_HASHSIZE is the number of buckets or lists for the Cache.

>
> > >>>From the Administrator's Reference:

>
> > >> 'The DS_POOLSIZE parameter specifies the maximum number of entries in
> > >> each hash bucket in the data-distribution cache ...'

>
> > >> 'The DS_HASHSIZE parameter specifies the number of hash buckets in the
> > >> data-distribution cache ...'

>
> > >> DS_POOLSIZE is a max, but it's a max per hash bucket not an absolute
> > >> total.

>
> > >> If there's confusion it's in the manuals and it's been there since the
> > >> beginning.

>
> > > The manual is wrong. It works as Mark described it.

>
> > > The Performance Guide (ct1t9na.pdf, 4-33) contradicts the Admin-Ref:

>
> > > The following formula determines the number of column distributions
> > > that can be stored in one bucket.

>
> > > Distributions_per_bucket = DS_POOLSIZE / DS_HASHSIZE

>
> > > To modify the number of distributions per bucket, change either the
> > > DS_POOLSIZE or DS_HASHSIZE configuration parameter.
> > > For example, with the default values of 127 for DS_POOLSIZE and 31 for
> > > DS_HASHSIZE, you can potentially store distributions for about 127 columns
> > > in the data-distribution cache. The cache has 31 hash buckets, and each hash
> > > bucket can have an average of 4 entries.

>
> > >>> Consider that you wanted to configure no more than 4000 DD entires and no more than 4000 DS entries.

>
> > >>> Ignoring any suggestions for prime numbers at the moment the DD variables could look like this.

>
> > >>> DD_HASHSIZE 400
> > >>> DD_HASHMAX 10

>
> > >>> However DS would be configured differently and could look like

>
> > >>> DS_POOLSIZE 4000
> > >>> DS_HASHSIZE 400

>
> > >> Nope:

>
> > >> DS_HASHSIZE 400
> > >> DS_POOLSIZE 10

>
> > > The cache seems to be cleaned to 50%. So, if you want 4000 entries, then
> > > you'd need to have:

>
> > > DS_HASHSIZE 400
> > > DS_POOLSIZE 8000

>
> > > Or, use a DS_HASHSIZE of 401 or 409, if you insist on a prime.

>
> > >> Art S. Kagel

>
> > >>> When making changes to any of the ??_HASHSIZE a prime number does give a slight performance improvement, and therefor is recommended.

>
> > >>> Don't ask me why we changed the way the configuration worked between the two, I'm not entirely sure. However, among the caches, DS and PC act the same, DD and STMT are completely different.

>
> > >>> ----- Original Message ----
> > >>> From: Art S. Kagel
> > >>> To:informix-l...@iiug.org
> > >>> Sent: Monday, October 15, 2007 8:49:03 AM
> > >>> Subject: Re: Calculate optimal value for DS_POOLSIZE

>
> > >>> On Oct 13, 3:58 pm, mohitanch...@gmail.com wrote:
> > >>>> Version: IDS 10
> > >>>> I am trying to determine the optimal value for DS_POOLSIZE and
> > >>>> DS_HASHSIZE. I looked at the onstat -g dsc. I have following
> > >>> question:

>
> > >>>> 1. How do I determine the correct size of DS_POOLSIZE. Should I look
> > >>>> at our update stats and see number of unique columns that we run
> > >>>> update statistics on and that would become the value of DS_POOLSIZE.
> > >>>> 2. How do I determine correct value of number of Hash Buckets ? How
> > >>>> many entries per bucket is good to have ?
> > >>>> 3. When I look at onstat -g dsc I see number of distribution entries,
> > >>>> should this be always lower than DS_POOLSIZE
> > >>> IB that Mark and David are misinterpreting the use and meaning of
> > >>> DS_HASHSIZE and DS_POOLSIZE, or at least not explaining it clearly.

>
> > >>> DS_HASHSIZE is the number of hash buckets for distributions and
> > >>> DS_POOLSIZE is the maximum number of slots in each bucket for when
> > >>> multiple distributions hash to the same bucket. DS_HASHSIZE must bea
> > >>> prime number, DS_POOLSIZE should be relatively small (I've also been
> > >>> told that it's better if it is non-prime).

>
> > >>> Columns are hashed on table and column name so if you have lots of
> > >>> databases with the same tables and columns in them then you will need
> > >>> a larger value for DS_POOLSIZE to match the level of hash clashes that
> > >>> you have. Ideally - for optimum performance of the hash - each
> > >>> distribution should hash to it's own bucket and the number of slots
> > >>> per bucket should be very low. Since

>
> ...
>
> read more »- Hide quoted text -
>
> - Show quoted text -


Sorry for multiple unwanted posting, it was because gmail not
reporting back the status after hitting send.

I was trying to ask:
I know in general, if Informix doesn't find the distribution in the
cache then it read from the disk and stores it in cache for future
use. And by increasing the DS_POOLSIZE and DS_HASHSIZE we reduce the I/
O as columns are readily available in cache. I wanted to ask if there
are any other downside of having these value low, something like, when
there are increasing number of DB sessions user/session thread might
have to wait to get the lock on the cache so that it could update the
cache. Could similar problem also manifest when these values are low ?

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-20-2008, 05:29 PM
Art S. Kagel
 
Posts: n/a
Default Re: Calculate optimal value for DS_POOLSIZE

On Oct 18, 11:36 am, mohitanch...@gmail.com wrote:
> On Oct 17, 5:34 pm, mohitanch...@gmail.com wrote:
>
> > On Oct 16, 4:10 am, Colin Dawson <cjd_1...@hotmail.com> wrote:

>
> > > For what it's worth, these are my settings

>
> > > # Dictionary Cache Size
> > > DD_HASHSIZE 499 # Number of dictionary hash buckets
> > > DD_HASHMAX 68 # Maximum number of dictionary entries per hash bucket

>
> > > # Distribution Cache Size
> > > DS_HASHSIZE 503 # Number of distribution hash buckets
> > > DS_POOLSIZE 4000 # Maximum number of distribution entries retained in cache

>
> > > # UDR Cache Size
> > > PC_HASHSIZE 127 # Number of Procedure hash buckets
> > > PC_POOLSIZE 2000 # Maximum number of stored procedure entries retained in cache

>
> > > I had some discussions with UK Tech Support about the settings and their respective values. This is their response:

>
> > > Reproduction:

>
> > > # Created 1500 tables, populated each with 500 rows

>
> > > # Ran update stats high on all tables

>
> > > # Queried each table to force the population of the distributions into the cache.

>
> > > # The structure of interest, hangs off of rhead_t ...

>
> > > onstat -g dmp 0xa000000 hlQA | grep rhead

>
> > > onstat -g dmp 0xa008800 rhead_t | grep sh_copy_of_shdist

>
> > > onstat -g dmp 0x15588018 shcache_t

>
> > > # Same tests with differen