This is a discussion on Multiple join performance within the DB2 forums, part of the Database Server Software category; --> Hi, to everybody, let's consider this scenario: you have 1 data-table and 10 dictionary-tables; the data-table has 5 million ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi, to everybody, let's consider this scenario: you have 1 data-table and 10 dictionary-tables; the data-table has 5 million records and 30 columns, 10 of these columns have a foreign-key to the dictionary-tables: the dictionary-tables have (almsot all) only two columns (code and description) and a low number of records (less than 100 in most cases, but one table has 1000 records and another one has 8000 records). When you perform a query on the data-table you must show the descriptions taken from the dictionary-tables; you have two options to do this: 1) A multiple join (11 tables) to get both main data and descriptions. 2) Load permanently the dictionary-tables in memory (using hashmaps) and query only the data-table, then the application looks-up the descriptions from the hashmaps. Consider that you are writing a web application, so the hashmaps can be hold in application scope to be used by all the users (more than 1000 users). Which solution performs better? I think this scenario is rather common. I'm a little afraid of doing a multiple join on a table that has 5 millions of records. Tell your opinion. Pino |
| |||
| Pino, you can do the same within DB2. Keep the tables in a dedicated bufferpool. Let DB2 figure out whether it wants to use hashjoins or something else. The rule of thumb is: If it's relational let the DBMS deal with it. Cheers Serge -- Serge Rielau DB2 SQL Compiler Development IBM Toronto Lab |
| |||
| I have another experience with w2k, v7, fp10a. Some times ago I wrote a select statement on 6-7 tables with 200 000 - 300 000 rows and 9-10 dictionary tables with a few number of rows. This query always returned 1 row (it was a contract information) and lasted about 1 minute. Most of this time was taken for a compilation (we used query optimization 3). And I had to write SQL stored procedure where I splited this query on 2 parts: first I populate global temporary table with select only over 6-7 large tables and second I join this temporary table with dictionary tables. This SQL SP executed about 5 seconds... Conclusion: DB2 optimizer - great thing in common, but sometimes... Best regards, Mark. > Keep the tables in a dedicated bufferpool. > Let DB2 figure out whether it wants to use hashjoins or something else. > > The rule of thumb is: If it's relational let the DBMS deal with it. |
| |||
| Mark, Why did you use dynamic SQL? If you could use a stored proc for the piecemeal you could equally have written a proc with the whole join in it if you don't want to deal with packages and the query doesn't execute often enough to stay in the cache. Cheers Serge -- Serge Rielau DB2 SQL Compiler Development IBM Toronto Lab |
| |||
| > Why did you use dynamic SQL? Sure, first of all I wrote this SP with static select, but it worked considerable slower (as I remember ~ 30 sec opposite 3-5 sec), than this splitted dynamic... (I didn't understand why and decided to use dynamic). |
| |||
| You may want to look at the DB2_REDUCED_OPTIMIZATION registry variable. Set it to an integer value, and DB2 drops the optimization level down for dynamic queries joining more tables than specified in that integer. Very useful. /T "Mark Barinstein" <mark@crk.vsi.ru> wrote in message news:<cam5ds$2sep$1@serv3.vsi.ru>... > I have another experience with w2k, v7, fp10a. > Some times ago I wrote a select statement on 6-7 tables with 200 000 - 300 > 000 rows and 9-10 dictionary tables with a few number of rows. This query > always returned 1 row (it was a contract information) and lasted about 1 > minute. Most of this time was taken for a compilation (we used query > optimization 3). And I had to write SQL stored procedure where I splited > this query on 2 parts: first I populate global temporary table with select > only over 6-7 large tables and second I join this temporary table with > dictionary tables. This SQL SP executed about 5 seconds... > Conclusion: DB2 optimizer - great thing in common, but sometimes... > > Best regards, > Mark. > > > Keep the tables in a dedicated bufferpool. > > Let DB2 figure out whether it wants to use hashjoins or something else. > > > > The rule of thumb is: If it's relational let the DBMS deal with it. |
| |||
| I would use 1. I strongly suspect that 2 would be slower. It definitely takes longer to code, and it fails to handle "concurrent" inserts and updates to the dictionary tables (your hashmap can become out-of-date), so that it would also be the source of various problems. If join performance were to become a problem - which is not likely in your simple join scenario - then you can resolve this by using a materialized query table (see CREATE TABLE in the SQL reference). "Pino" <nospam@novirus.invalid> wrote in message news:gnlzc.57319$Wc.1769197@twister2.libero.it... > Hi, to everybody, > > let's consider this scenario: you have 1 data-table and 10 > dictionary-tables; the data-table has 5 million records and 30 columns, 10 > of these columns have a foreign-key to the dictionary-tables: the > dictionary-tables have (almsot all) only two columns (code and description) > and a low number of records (less than 100 in most cases, but one table has > 1000 records and another one has 8000 records). When you perform a query on > the data-table you must show the descriptions taken from the > dictionary-tables; you have two options to do this: > > 1) A multiple join (11 tables) to get both main data and descriptions. > 2) Load permanently the dictionary-tables in memory (using hashmaps) and > query only the data-table, then the application looks-up the descriptions > from the hashmaps. Consider that you are writing a web application, so the > hashmaps can be hold in application scope to be used by all the users (more > than 1000 users). > > Which solution performs better? I think this scenario is rather common. > I'm a little afraid of doing a multiple join on a table that has 5 millions > of records. > > Tell your opinion. > > > Pino > > |
| ||||
| Tomas, I don't like optimization levels less than 3. It often leads to stupid query plans for queries with 5-6 quite large tables... Besides our system is both OLTP and DSS, and I think we cant't use this variable... Best regards, Mark. "Tomas Hallin" <javvlar@hotmail.com> ???????/???????? ? ???????? ?????????: news:2cceadd.0406151559.65b3e85@posting.google.com ... > You may want to look at the DB2_REDUCED_OPTIMIZATION registry > variable. Set it to an integer value, and DB2 drops the optimization > level down for dynamic queries joining more tables than specified in > that integer. Very useful. |