Unix Technical Forum

Multiple join performance

This is a discussion on Multiple join performance within the DB2 forums, part of the Database Server Software category; --> Hi, to everybody, let's consider this scenario: you have 1 data-table and 10 dictionary-tables; the data-table has 5 million ...


Go Back   Unix Technical Forum > Database Server Software > DB2

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 02-26-2008, 07:23 PM
Pino
 
Posts: n/a
Default Multiple join performance

Hi, to everybody,

let's consider this scenario: you have 1 data-table and 10
dictionary-tables; the data-table has 5 million records and 30 columns, 10
of these columns have a foreign-key to the dictionary-tables: the
dictionary-tables have (almsot all) only two columns (code and description)
and a low number of records (less than 100 in most cases, but one table has
1000 records and another one has 8000 records). When you perform a query on
the data-table you must show the descriptions taken from the
dictionary-tables; you have two options to do this:

1) A multiple join (11 tables) to get both main data and descriptions.
2) Load permanently the dictionary-tables in memory (using hashmaps) and
query only the data-table, then the application looks-up the descriptions
from the hashmaps. Consider that you are writing a web application, so the
hashmaps can be hold in application scope to be used by all the users (more
than 1000 users).

Which solution performs better? I think this scenario is rather common.
I'm a little afraid of doing a multiple join on a table that has 5 millions
of records.

Tell your opinion.


Pino


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 02-26-2008, 07:23 PM
Serge Rielau
 
Posts: n/a
Default Re: Multiple join performance

Pino,

you can do the same within DB2.
Keep the tables in a dedicated bufferpool.
Let DB2 figure out whether it wants to use hashjoins or something else.

The rule of thumb is: If it's relational let the DBMS deal with it.

Cheers
Serge

--
Serge Rielau
DB2 SQL Compiler Development
IBM Toronto Lab
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 02-26-2008, 07:24 PM
Mark Barinstein
 
Posts: n/a
Default Re: Multiple join performance

I have another experience with w2k, v7, fp10a.
Some times ago I wrote a select statement on 6-7 tables with 200 000 - 300
000 rows and 9-10 dictionary tables with a few number of rows. This query
always returned 1 row (it was a contract information) and lasted about 1
minute. Most of this time was taken for a compilation (we used query
optimization 3). And I had to write SQL stored procedure where I splited
this query on 2 parts: first I populate global temporary table with select
only over 6-7 large tables and second I join this temporary table with
dictionary tables. This SQL SP executed about 5 seconds...
Conclusion: DB2 optimizer - great thing in common, but sometimes...

Best regards,
Mark.

> Keep the tables in a dedicated bufferpool.
> Let DB2 figure out whether it wants to use hashjoins or something else.
>
> The rule of thumb is: If it's relational let the DBMS deal with it.




Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 02-26-2008, 07:24 PM
Serge Rielau
 
Posts: n/a
Default Re: Multiple join performance

Mark,

Why did you use dynamic SQL?
If you could use a stored proc for the piecemeal you could equally have
written a proc with the whole join in it if you don't want to deal with
packages and the query doesn't execute often enough to stay in the cache.

Cheers
Serge
--
Serge Rielau
DB2 SQL Compiler Development
IBM Toronto Lab
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 02-26-2008, 07:25 PM
Mark Barinstein
 
Posts: n/a
Default Re: Multiple join performance

> Why did you use dynamic SQL?

Sure, first of all I wrote this SP with static select, but it worked
considerable slower (as I remember ~ 30 sec opposite 3-5 sec), than this
splitted dynamic... (I didn't understand why and decided to use dynamic).


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 02-26-2008, 07:25 PM
Tomas Hallin
 
Posts: n/a
Default Re: Multiple join performance

You may want to look at the DB2_REDUCED_OPTIMIZATION registry
variable. Set it to an integer value, and DB2 drops the optimization
level down for dynamic queries joining more tables than specified in
that integer. Very useful.

/T

"Mark Barinstein" <mark@crk.vsi.ru> wrote in message news:<cam5ds$2sep$1@serv3.vsi.ru>...
> I have another experience with w2k, v7, fp10a.
> Some times ago I wrote a select statement on 6-7 tables with 200 000 - 300
> 000 rows and 9-10 dictionary tables with a few number of rows. This query
> always returned 1 row (it was a contract information) and lasted about 1
> minute. Most of this time was taken for a compilation (we used query
> optimization 3). And I had to write SQL stored procedure where I splited
> this query on 2 parts: first I populate global temporary table with select
> only over 6-7 large tables and second I join this temporary table with
> dictionary tables. This SQL SP executed about 5 seconds...
> Conclusion: DB2 optimizer - great thing in common, but sometimes...
>
> Best regards,
> Mark.
>
> > Keep the tables in a dedicated bufferpool.
> > Let DB2 figure out whether it wants to use hashjoins or something else.
> >
> > The rule of thumb is: If it's relational let the DBMS deal with it.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 02-26-2008, 07:28 PM
Mark Yudkin
 
Posts: n/a
Default Re: Multiple join performance

I would use 1. I strongly suspect that 2 would be slower. It definitely
takes longer to code, and it fails to handle "concurrent" inserts and
updates to the dictionary tables (your hashmap can become out-of-date), so
that it would also be the source of various problems.

If join performance were to become a problem - which is not likely in your
simple join scenario - then you can resolve this by using a materialized
query table (see CREATE TABLE in the SQL reference).

"Pino" <nospam@novirus.invalid> wrote in message
news:gnlzc.57319$Wc.1769197@twister2.libero.it...
> Hi, to everybody,
>
> let's consider this scenario: you have 1 data-table and 10
> dictionary-tables; the data-table has 5 million records and 30 columns, 10
> of these columns have a foreign-key to the dictionary-tables: the
> dictionary-tables have (almsot all) only two columns (code and

description)
> and a low number of records (less than 100 in most cases, but one table

has
> 1000 records and another one has 8000 records). When you perform a query

on
> the data-table you must show the descriptions taken from the
> dictionary-tables; you have two options to do this:
>
> 1) A multiple join (11 tables) to get both main data and descriptions.
> 2) Load permanently the dictionary-tables in memory (using hashmaps) and
> query only the data-table, then the application looks-up the descriptions
> from the hashmaps. Consider that you are writing a web application, so

the
> hashmaps can be hold in application scope to be used by all the users

(more
> than 1000 users).
>
> Which solution performs better? I think this scenario is rather common.
> I'm a little afraid of doing a multiple join on a table that has 5

millions
> of records.
>
> Tell your opinion.
>
>
> Pino
>
>



Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 02-26-2008, 07:29 PM
Mark Barinstein
 
Posts: n/a
Default Re: Multiple join performance

Tomas,

I don't like optimization levels less than 3. It often leads to stupid query
plans for queries with 5-6 quite large tables... Besides our system is both
OLTP and DSS, and I think we cant't use this variable...

Best regards,
Mark.

"Tomas Hallin" <javvlar@hotmail.com> ???????/???????? ? ???????? ?????????:
news:2cceadd.0406151559.65b3e85@posting.google.com ...
> You may want to look at the DB2_REDUCED_OPTIMIZATION registry
> variable. Set it to an integer value, and DB2 drops the optimization
> level down for dynamic queries joining more tables than specified in
> that integer. Very useful.




Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 07:34 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com