Unix Technical Forum

DB2 physical layout - comments needed

This is a discussion on DB2 physical layout - comments needed within the DB2 forums, part of the Database Server Software category; --> DB2 500G database, Wintel, heavily loaded OLTP (5M+ transactions a day; all transactions are extremely small, all selects are ...


Go Back   Unix Technical Forum > Database Server Software > DB2

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 02-26-2008, 06:07 PM
Nobody
 
Posts: n/a
Default DB2 physical layout - comments needed

DB2 500G database, Wintel, heavily loaded OLTP (5M+ transactions a day;
all transactions are extremely small, all selects are controlled
(no ad-hoc), 99% of all selects are very small (no table
scans, index scans are very limited in size) ). Write performance
is generally more important than read performance, read performance
of heavy queries (the ones with table scans) is not important at all.

Question: how to spread data across physical disks to achieve
maximum performance?

Proposed layout: logs on separate mirror, system on separate
mirror, tempspace on separate mirror (?), all other disks
are 'data disks' organized as mirrors - see below. Tables are
divided into 'small' ones and 'big' ones. All small ones sit in
one tablespace that has associated bufferpool equal to tablespace
size (that should ensure 99.9999...% hit ratio). Big (historical)
ones have separate tablespaces for data and indexes, with
separate bufferpools. All tablespaces (both big and small)
are spread over all 'data disks' with each tablespace having
one container of equal size on each disk. Important: as
load is 99.9% random access opposed to sequential access,
to reduce contention at disk level it is proposed to have
PREFETCHSIZE exactly equal to EXTENTSIZE (!). As a result of this
configuration, each read operation is expected to affect exactly
one disk without touching neighbours; due to equal spreading
of tablespaces across all disks, load on every disk will
be statistically balanced.

Any comments (especially with explanations what seems to be wrong)
will be appreciated.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 02-26-2008, 06:07 PM
Mark A
 
Posts: n/a
Default Re: DB2 physical layout - comments needed

"Nobody" <ipsign123@yahoo.com> wrote in message
news:e5eda561.0402131632.443f7751@posting.google.c om...
> DB2 500G database, Wintel, heavily loaded OLTP (5M+ transactions a day;
> all transactions are extremely small, all selects are controlled
> (no ad-hoc), 99% of all selects are very small (no table
> scans, index scans are very limited in size) ). Write performance
> is generally more important than read performance, read performance
> of heavy queries (the ones with table scans) is not important at all.
>
> Question: how to spread data across physical disks to achieve
> maximum performance?
>
> Proposed layout: logs on separate mirror, system on separate
> mirror, tempspace on separate mirror (?), all other disks
> are 'data disks' organized as mirrors - see below. Tables are
> divided into 'small' ones and 'big' ones. All small ones sit in
> one tablespace that has associated bufferpool equal to tablespace
> size (that should ensure 99.9999...% hit ratio). Big (historical)
> ones have separate tablespaces for data and indexes, with
> separate bufferpools. All tablespaces (both big and small)
> are spread over all 'data disks' with each tablespace having
> one container of equal size on each disk. Important: as
> load is 99.9% random access opposed to sequential access,
> to reduce contention at disk level it is proposed to have
> PREFETCHSIZE exactly equal to EXTENTSIZE (!). As a result of this
> configuration, each read operation is expected to affect exactly
> one disk without touching neighbours; due to equal spreading
> of tablespaces across all disks, load on every disk will
> be statistically balanced.
>
> Any comments (especially with explanations what seems to be wrong)
> will be appreciated.


Given your application description, I would not expect much (if any)
prefetch activity to be happening. So I am not sure that I would worry about
making PREFETCHSIZE exactly equal to EXTENTSIZE. Prefetch is usually limited
to tablespace or index space scans.

Given the amount of buffer pool space for the "small" tables, you don't need
to worry much about data placement on the disk. But I would try to put
indexes on different disks than tables.


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 02-26-2008, 06:07 PM
Nobody
 
Posts: n/a
Default Re: DB2 physical layout - comments needed

"Mark A" <ma@switchboard.net> wrote in message news:<GxeXb.491$dq2.98728@news.uswest.net>...
Thanks for the answer, Mark.

> Given your application description, I would not expect much (if any)
> prefetch activity to be happening. So I am not sure that I would worry about
> making PREFETCHSIZE exactly equal to EXTENTSIZE. Prefetch is usually limited
> to tablespace or index space scans.

That was exactly the reasoning behind this decision.

> Given the amount of buffer pool space for the "small" tables, you don't need
> to worry much about data placement on the disk. But I would try to put
> indexes on different disks than tables.

That was another design decision we needed to make: either to
allocate disks for specific tables/indexes, or to spread all
the tables/indexes over all disks. We decided in favor of the
latter because it provides much better balancing. Or you are
proposing to have 2 sets of disks - one for all indexes (with
all indexes spread over all disks of this set), and one
for data (with all data spread over all disks of this set)?
If yes - why do you this it is better then our model with
only one set of disks for both indexes/data?
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 02-26-2008, 06:07 PM
Mark A
 
Posts: n/a
Default Re: DB2 physical layout - comments needed

> > Given the amount of buffer pool space for the "small" tables, you don't
need
> > to worry much about data placement on the disk. But I would try to put
> > indexes on different disks than tables.

> That was another design decision we needed to make: either to
> allocate disks for specific tables/indexes, or to spread all
> the tables/indexes over all disks. We decided in favor of the
> latter because it provides much better balancing. Or you are
> proposing to have 2 sets of disks - one for all indexes (with
> all indexes spread over all disks of this set), and one
> for data (with all data spread over all disks of this set)?
> If yes - why do you this it is better then our model with
> only one set of disks for both indexes/data?


You don't need to have all indexes on one set of disks and tables on another
set. But if you could ensure that for a given table, the table data was on a
separate disk from the indexes for that table, that would be fine. But I
would not overkill this in situations were the bufferpool is as large as the
data.


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 02-26-2008, 06:07 PM
AK
 
Posts: n/a
Default Re: DB2 physical layout - comments needed

> mirror, tempspace on separate mirror (?), all other disks
I was wondering if there is any need for mirroring temporary data.
Maybe I'm missing something
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 02-26-2008, 06:08 PM
Nobody
 
Posts: n/a
Default Re: DB2 physical layout - comments needed

ak_tiredofspam@yahoo.com (AK) wrote in message news:<46e627da.0402140915.173680b6@posting.google. com>...
> > mirror, tempspace on separate mirror (?), all other disks

> I was wondering if there is any need for mirroring temporary data.

To prevent downtime in case of single HDD failure.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 02-26-2008, 06:08 PM
Nobody
 
Posts: n/a
Default Re: DB2 physical layout - comments needed

"Mark A" <ma@switchboard.net> wrote in message news:<5JnXb.2$n53.4608@news.uswest.net>...
> > > Given the amount of buffer pool space for the "small" tables, you don't

> need
> > > to worry much about data placement on the disk. But I would try to put
> > > indexes on different disks than tables.

> > That was another design decision we needed to make: either to
> > allocate disks for specific tables/indexes, or to spread all
> > the tables/indexes over all disks. We decided in favor of the
> > latter because it provides much better balancing. Or you are
> > proposing to have 2 sets of disks - one for all indexes (with
> > all indexes spread over all disks of this set), and one
> > for data (with all data spread over all disks of this set)?
> > If yes - why do you this it is better then our model with
> > only one set of disks for both indexes/data?

>
> You don't need to have all indexes on one set of disks and tables on another
> set. But if you could ensure that for a given table, the table data was on a
> separate disk from the indexes for that table, that would be fine. But I
> would not overkill this in situations were the bufferpool is as large as the
> data.

You're right - this shouldn't matter for 'small' tables. But what about
'large' tables - do you think it's better to separate indexes and data
(given that access in 99.9% random anyway)? If yes, how would you organize
disks - 2 separate disks for each 'large' table (one for data, one for indexes),
or just two sets of disks (one with data for all the tables, another for
indexes for all the tables)? And the most important question - why?

Thx in advance
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 02-26-2008, 06:08 PM
Mark A
 
Posts: n/a
Default Re: DB2 physical layout - comments needed

> You're right - this shouldn't matter for 'small' tables. But what about
> 'large' tables - do you think it's better to separate indexes and data
> (given that access in 99.9% random anyway)? If yes, how would you organize
> disks - 2 separate disks for each 'large' table (one for data, one for

indexes),
> or just two sets of disks (one with data for all the tables, another for
> indexes for all the tables)? And the most important question - why?
>
> Thx in advance


I am not sure if understand the options above, but here is what I originally
meant to say:

For a given table, the tablespace for the table and the tablespace for the
indexes should be on different disks (or different arrays). Please not that
the above applies to a PARTICULAR table. That does not mean that all tables
should be on one disk, and all indexes on another disk.

For example, 2 tables A and B, each with 1 index AX and BX:

Disk 1 - tablespace for A, tablespace for BX
Disk 2 - tablespace for B, tablespace for AX

The reason for this is that for an OLTP system, the index data and
corresponding table data are accessed almost simultaneously, so it could
theoretically help to have them on separate disks for a given table. Whether
or the difference is enough to worry about depends on many factors.

But for your large tables, I would have multiple containers on different
disks for each tablespace . Even if you don't have table space scans in your
application, this will help speed up utilities that access the entire table.


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 02-26-2008, 06:08 PM
Mark Yudkin
 
Posts: n/a
Default Re: DB2 physical layout - comments needed

First off, and most importantly: Use hardware RAID-1 (based on Ultra-320 LVD
SCSI technology) rather than software mirroring, and make sure your RAID
hardware has a battery backed write cache (we got 7x DB2 performance
improvement merely by upgradinging the RAID card to one with a battery).
Also follow the advice in the DB2 documentation under "Optimizing table
space performance when data is on RAID devices" (this also addresses your
EXTENTSIZE and PREFETCHSIZE questions).

Next: ensure you have adequate RAM. Measure bufferpool hit cache rate, and
add RAM (whilst increasing buffer pool sizes) until you've reduced total I/O
to the minimum. BTW, be careful of increasing buffer pools too high for the
available RAM ==> measure system I/O and swapping.

Finally: spread the disk load over all disks. Enable disk performance
counters if you haven't already done so (enter the diskperf command at a
command prompt to see the current state; enable with diskperf -Y). Then
measure disk I/O, and move tablespaces around until you you have a nicely
balanced system. "Statistically balanced" (as you put it) is a meaningless
concept. The only way to know if you have balanced access is to measure it.
I have yet to see an a-priori analysis that bore any resemblance to measured
reality.

I'm still a believer in the old "1 table to a tablespace" rule, and have
found that increasing the number of bufferpools normally proportionally
decreases total throughput, hence I generally keep the number of bufferpools
to a minimum. Given your transaction load, I would probably not enable
parallel query. Spreading tablespaces over multiple volumes is important for
large tablespaces, but unless you database design consists of less large
tables than physical disks, generally can increase overhead rather than
reducing it. At 500GB data, and 73GB as a standard SCSI disk size (meaning
you have 7 RAID-0 pairs), you probably don't have such a configuration.

You claim that access is 99.9% random rather than "sequential". This is
highly unusual: normally access is localized to specific "hot spots" and is
far from uniformly random. I strongly recommend your measuring your system
to verify that access is truly random. If, as I suspect, it isn't, you can
tune for the effective access pattern.

"Nobody" <ipsign123@yahoo.com> wrote in message
news:e5eda561.0402131632.443f7751@posting.google.c om...
> DB2 500G database, Wintel, heavily loaded OLTP (5M+ transactions a day;
> all transactions are extremely small, all selects are controlled
> (no ad-hoc), 99% of all selects are very small (no table
> scans, index scans are very limited in size) ). Write performance
> is generally more important than read performance, read performance
> of heavy queries (the ones with table scans) is not important at all.
>
> Question: how to spread data across physical disks to achieve
> maximum performance?
>
> Proposed layout: logs on separate mirror, system on separate
> mirror, tempspace on separate mirror (?), all other disks
> are 'data disks' organized as mirrors - see below. Tables are
> divided into 'small' ones and 'big' ones. All small ones sit in
> one tablespace that has associated bufferpool equal to tablespace
> size (that should ensure 99.9999...% hit ratio). Big (historical)
> ones have separate tablespaces for data and indexes, with
> separate bufferpools. All tablespaces (both big and small)
> are spread over all 'data disks' with each tablespace having
> one container of equal size on each disk. Important: as
> load is 99.9% random access opposed to sequential access,
> to reduce contention at disk level it is proposed to have
> PREFETCHSIZE exactly equal to EXTENTSIZE (!). As a result of this
> configuration, each read operation is expected to affect exactly
> one disk without touching neighbours; due to equal spreading
> of tablespaces across all disks, load on every disk will
> be statistically balanced.
>
> Any comments (especially with explanations what seems to be wrong)
> will be appreciated.



Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 02-26-2008, 06:08 PM
Nobody
 
Posts: n/a
Default Re: DB2 physical layout - comments needed

> For a given table, the tablespace for the table and the tablespace for the
> indexes should be on different disks (or different arrays). Please not that
> the above applies to a PARTICULAR table. That does not mean that all tables
> should be on one disk, and all indexes on another disk.
>
> For example, 2 tables A and B, each with 1 index AX and BX:
>
> Disk 1 - tablespace for A, tablespace for BX
> Disk 2 - tablespace for B, tablespace for AX

I see. Thanks.

> The reason for this is that for an OLTP system, the index data and
> corresponding table data are accessed almost simultaneously, so it could
> theoretically help to have them on separate disks for a given table.

Because otherwise normal request for A will require accessing
A and AX, which would mean moving HDD head back and forth, which
is not a good thing, right? But on the other hand, IO is supposed
to be random, so head re-positioning will likely happen anyway.
Plus if we will have the following configuration ('our' model):
Disk 1 - A-1 (1st container for A data), AX-1, B-1, BX-1, C-1, CX-1
Disk 2 - A-2, AX-2, B-2, BX-2, C-2, CX-2
Disk 3 - A-3, AX-3, B-3, BX-3, C-3, CX-3
Disk 4 - A-4, AX-4, B-4, BX-4, C-4, CX-4
Disk 5 - A-5, AX-5, B-5, BX-5, C-5, CX-5,
then random IO for index and data will likely (80% chance) go
to the different disks. And A-B join (and any other
join) it is going to have the same 80% chance of any 2 requests
going to different disks.

I don't want to say that your model is not going to work -
it will, but our model seems (to us) to be easier manageable and
expandable, plus it is balanced 'by design' (all the disks has
statistically equal load, so no overload of one of the disks is
going to happen), and we do like it, but it is not usually
recommended for high-performance systems, so we are trying to find
any potential problems with it. And the question is - do you see any
problems with 'our' model?

Thanks in advance

> Whether
> or the difference is enough to worry about depends on many factors.
>
> But for your large tables, I would have multiple containers on different
> disks for each tablespace . Even if you don't have table space scans in your
> application, this will help speed up utilities that access the entire table.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 02:31 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com