This is a discussion on ruminating on 390 performance vs. VSAM within the DB2 forums, part of the Database Server Software category; --> consider the following hypothetical (it isn't but....) - start with a COBOL/VSAM codebase that's REALLY old - end with ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| consider the following hypothetical (it isn't but....) - start with a COBOL/VSAM codebase that's REALLY old - end with both a COBOL/VSAM and a COBOL/DB2 system which differs in only where the data lives. code, including copybooks, remains the same. mostly the system does periodic (daily/weekly/<etc>) batch runs. the VSAM order file has 99 line items. this gets converted to an Order table with 99 line_items. beyond the need to fit the copybook definition, it is asserted that a two table (Order and Order_line) implementation will be too slow. this rumination was motivated by reading an article linked to from an earlier thread, which discussed join implementation on 390. i suspect what it had to say applies generally. what caught me was, as i understood it, that nested table reads is most often used. if this is true, and it seems that hash joins are only more efficient on equality constraints, then is there a known analysis which at least mitigates the reading? what (hypothetically) we tend to do is put each table in a tablespace. for the Order/Order_line implementation, it seems logical to put them into one tablespace, cluster Order on Order_num and Order_line on Order_num, line_num; and buffer the hell out of it. similarly for indexes. does this sound remotely on the right track? |
| |||
| gnuoytr@rcn.com (robert) wrote in message news:<da3c2186.0410021233.6c4c9c23@posting.google. com>... > consider the following hypothetical (it isn't but....) > > - start with a COBOL/VSAM codebase that's REALLY old > - end with both a COBOL/VSAM and a COBOL/DB2 system which differs in only > where the data lives. code, including copybooks, remains the same. > > mostly the system does periodic (daily/weekly/<etc>) batch runs. > > the VSAM order file has 99 line items. this gets converted to an > Order table with 99 line_items. > > beyond the need to fit the copybook definition, it is asserted that a > two table (Order and Order_line) implementation will be too slow. > > this rumination was motivated by reading an article linked to from an > earlier thread, which discussed join implementation on 390. i suspect what > it had to say applies generally. what caught me was, as i understood it, > that nested table reads is most often used. if this is true, and it seems > that hash joins are only more efficient on equality constraints, then is > there a known analysis which at least mitigates the reading? > > what (hypothetically) we tend to do is put each table in a tablespace. > for the Order/Order_line implementation, it seems logical to put them into > one tablespace, cluster Order on Order_num and Order_line on Order_num, > line_num; and buffer the hell out of it. similarly for indexes. > > does this sound remotely on the right track? Assuming that you never have an SQL statement result in a tablespace scan, it might be OK to put the Order/Order_line in one simple tablespace. But if a tablespace scan does occur, DB2 will (unnecessarily) scan both tables, when it could have just scanned the table needed if they were in separate tablespaces. If you use a segmented tablespace, this will not help you since the data for two tables in a single segmented tablespace will not be on the same page (the lines items and the associated order), they will reside in different segments. I think you are over designing just a bit. By using DB2 buffer pools effectively (this is the biggest difference between how DB2 and plain VSAM works), I don't think you need to put the two tables in the same tablespace. Have one bufferpool for the catalog, indexes, and small tables that frequently accessed. Second bufferpool for medium and large tables. Third bufferpool for large decision support tables (if you have any in your application). The speed of a join has nothing to do with whether the tables are using the same tablespace. The speed is related to whether the required data page is already in the bufferpool, or if it needs to be fetched from disk. |
| |||
| m0002a@yahoo.com (Mark) wrote in message news:<a5ebcc65.0410041533.364b429c@posting.google. com>... > gnuoytr@rcn.com (robert) wrote in message news:<da3c2186.0410021233.6c4c9c23@posting.google. com>... > > consider the following hypothetical (it isn't but....) > > > > - start with a COBOL/VSAM codebase that's REALLY old > > - end with both a COBOL/VSAM and a COBOL/DB2 system which differs in only > > where the data lives. code, including copybooks, remains the same. > > > > mostly the system does periodic (daily/weekly/<etc>) batch runs. > > > > the VSAM order file has 99 line items. this gets converted to an > > Order table with 99 line_items. > > > > beyond the need to fit the copybook definition, it is asserted that a > > two table (Order and Order_line) implementation will be too slow. > > > > this rumination was motivated by reading an article linked to from an > > earlier thread, which discussed join implementation on 390. i suspect what > > it had to say applies generally. what caught me was, as i understood it, > > that nested table reads is most often used. if this is true, and it seems > > that hash joins are only more efficient on equality constraints, then is > > there a known analysis which at least mitigates the reading? > > > > what (hypothetically) we tend to do is put each table in a tablespace. > > for the Order/Order_line implementation, it seems logical to put them into > > one tablespace, cluster Order on Order_num and Order_line on Order_num, > > line_num; and buffer the hell out of it. similarly for indexes. > > > > does this sound remotely on the right track? > > Assuming that you never have an SQL statement result in a tablespace > scan, it might be OK to put the Order/Order_line in one simple > tablespace. But if a tablespace scan does occur, DB2 will > (unnecessarily) scan both tables, when it could have just scanned the > table needed if they were in separate tablespaces. > > If you use a segmented tablespace, this will not help you since the > data for two tables in a single segmented tablespace will not be on > the same page (the lines items and the associated order), they will > reside in different segments. > > I think you are over designing just a bit. By using DB2 buffer pools > effectively (this is the biggest difference between how DB2 and plain > VSAM works), I don't think you need to put the two tables in the same > tablespace. > > Have one bufferpool for the catalog, indexes, and small tables that > frequently accessed. Second bufferpool for medium and large tables. > Third bufferpool for large decision support tables (if you have any in > your application). > > The speed of a join has nothing to do with whether the tables are > using the same tablespace. The speed is related to whether the > required data page is already in the bufferpool, or if it needs to be > fetched from disk. my working assumption was that, given the amount of data, that buffers would exhaust during this (essentially) sequential batch update process, and that emulating, to the extent possible, contiguous data storage would help. may be not. but i do agree, that bufferpools of adequate size are more important than most anything. |
| ||||
| In article <da3c2186.0410051248.52ae3f3a@posting.google.com >, robert <gnuoytr@rcn.com> wrote: > m0002a@yahoo.com (Mark) wrote in message news:<a5ebcc65.0410041533.364b429c@posting.google. com>... > > gnuoytr@rcn.com (robert) wrote in message news:<da3c2186.0410021233.6c4c9c23@posting.google. com>... > > > consider the following hypothetical (it isn't but....) > > > > > > - start with a COBOL/VSAM codebase that's REALLY old - end with > > > both a COBOL/VSAM and a COBOL/DB2 system which differs in only > > > where the data lives. code, including copybooks, remains the > > > same. > > > > > > mostly the system does periodic (daily/weekly/<etc>) batch runs. > > > > > > the VSAM order file has 99 line items. this gets converted to an > > > Order table with 99 line_items. > > > > > > beyond the need to fit the copybook definition, it is asserted that > > > a two table (Order and Order_line) implementation will be too slow. > > > > > > this rumination was motivated by reading an article linked to from > > > an earlier thread, which discussed join implementation on 390. i > > > suspect what it had to say applies generally. what caught me was, > > > as i understood it, that nested table reads is most often used. if > > > this is true, and it seems that hash joins are only more efficient > > > on equality constraints, then is there a known analysis which at > > > least mitigates the reading? > > > > > > what (hypothetically) we tend to do is put each table in a > > > tablespace. for the Order/Order_line implementation, it seems > > > logical to put them into one tablespace, cluster Order on Order_num > > > and Order_line on Order_num, line_num; and buffer the hell out of > > > it. similarly for indexes. > > > > > > does this sound remotely on the right track? > > > > Assuming that you never have an SQL statement result in a tablespace > > scan, it might be OK to put the Order/Order_line in one simple > > tablespace. But if a tablespace scan does occur, DB2 will > > (unnecessarily) scan both tables, when it could have just scanned the > > table needed if they were in separate tablespaces. > > > > If you use a segmented tablespace, this will not help you since the > > data for two tables in a single segmented tablespace will not be on > > the same page (the lines items and the associated order), they will > > reside in different segments. > > > > I think you are over designing just a bit. By using DB2 buffer pools > > effectively (this is the biggest difference between how DB2 and plain > > VSAM works), I don't think you need to put the two tables in the same > > tablespace. > > > > Have one bufferpool for the catalog, indexes, and small tables that > > frequently accessed. Second bufferpool for medium and large tables. > > Third bufferpool for large decision support tables (if you have any > > in your application). > > > > The speed of a join has nothing to do with whether the tables are > > using the same tablespace. The speed is related to whether the > > required data page is already in the bufferpool, or if it needs to be > > fetched from disk. > my working assumption was that, given the amount of data, that buffers > would exhaust during this (essentially) sequential batch update > process, and that emulating, to the extent possible, contiguous data > storage would help. may be not. but i do agree, that bufferpools of > adequate size are more important than most anything. IF you are saying that the batch process will read each of two tables sequentially, in the order that they will be physically held within DB2 (bearing in mind clustering sequence, free space, insert pattern, rows out of sequence, SQL code, runstats, program bind, etc) THEN I would expect that DB2 will automatically invoke sequential pre-fetch. This means that DB2 will try to read the next pages into the bufffer pool asynchronously BEFORE the program gets there, and updated pages will be written out to physical disc, again asynchronous with the program processing. This effectively (ie simplified) means there are 3 separate CPU tasks - one read, one update, one write, and the update task is only reading and writing data in the buffer pools. This can be very efficient. .... but if the access is random, then all performance bets are off, and you have to be much more careful! Hence why it is critical to have an understanding how data will be used before the database is designed, at least for high-performance systems. Martin -- Martin Avison Note that emails to News@ will be junked. Use Martin instead of News |