Art S. Kagel wrote:
> shsandeep wrote:
>> The ETL application loaded around 3000 rows in 14 seconds in a
>> Development
>> database while it took 2 hours to load in a UAT database.
>>
>> UAT db is partitioned.
>> Dev db is not partitioned.
>>
>> the application looks for existing rows in the table...if they already
>> exist then it updates otherwise inserts them.
>>
>> The table is pretty large, around 6.5 million rows.
>> Due to RI, it is looking up into parent tables which possibly reside of
>> different nodes.
>>
>> This issue has increased the batch window from 2 hrs to 4 hrs and this is
>> certainly not acceptable.
>> How can I get rid of this performance issue?
>> Cheers,
>> San.
>
> One side note. In my experience, an occassional failed update is much
> cheaper than checking to see if the record exists first before trying to
> update. Since the update will typically key on the primary key
> column(s) and use an index the update will either succeed or instantly
> fail with a very cheap index lookup. If the majority of the records
> being processed will result in an update then it is very expensive to do
> the SELECT to check for existence first.
>
> Similarly, a failed insert is very expensive as the typically the data
> record is written then the indexes are updated. Only when the primary
> key index key add is attempted will the duplication be detected and then
> the record add and any other index key updates have to be rolled back.
> Still, if the vast majority of records processed will result in an
> insert this may still be cheaper than the thousands of failed updates.
>
> My rule of thumb is if 75% of the records processed will result in an
> insert try the insert first and update if the insert fails. Otherwise
> try the update first and insert if that fails. In neither case should
> there be a SELECT to verify the prior existence of the row. If the
> weighting between inserts and updates tends to vary over time the task
> can be dynamic about whether to try inserts of updates first using some
> threshhold. So say if the last two operations were inserts try insert
> first next time, etc.
>
> This is not directly related to your question, however, following these
> suggestions will tend to improve performance of these types of tasks
> significantly and I have found that, at least in Informix servers and I
> have not reason to see why DB2 would behave differently, the
> improvements are noticably greater for partitioned (fragmented in IDS)
> tables unless that partitioning is based directly on the primary key.
> It's worth testing anyway.
Art's logic applies.
In addition DB2 supports the MERGE statement.
But let's climb that hill when we get there....
--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab
WAIUG Conference
http://www.iiug.org/waiug/present/Fo...Forum2006.html