Re: Performance issue in a partitioned database shsandeep wrote:
> The ETL application loaded around 3000 rows in 14 seconds in a Development
> database while it took 2 hours to load in a UAT database.
>
> UAT db is partitioned.
> Dev db is not partitioned.
>
> the application looks for existing rows in the table...if they already
> exist then it updates otherwise inserts them.
>
> The table is pretty large, around 6.5 million rows.
> Due to RI, it is looking up into parent tables which possibly reside of
> different nodes.
>
> This issue has increased the batch window from 2 hrs to 4 hrs and this is
> certainly not acceptable.
> How can I get rid of this performance issue?
>
> Cheers,
> San.
One side note. In my experience, an occassional failed update is much
cheaper than checking to see if the record exists first before trying to
update. Since the update will typically key on the primary key column(s)
and use an index the update will either succeed or instantly fail with a
very cheap index lookup. If the majority of the records being processed
will result in an update then it is very expensive to do the SELECT to check
for existence first.
Similarly, a failed insert is very expensive as the typically the data
record is written then the indexes are updated. Only when the primary key
index key add is attempted will the duplication be detected and then the
record add and any other index key updates have to be rolled back. Still,
if the vast majority of records processed will result in an insert this may
still be cheaper than the thousands of failed updates.
My rule of thumb is if 75% of the records processed will result in an insert
try the insert first and update if the insert fails. Otherwise try the
update first and insert if that fails. In neither case should there be a
SELECT to verify the prior existence of the row. If the weighting between
inserts and updates tends to vary over time the task can be dynamic about
whether to try inserts of updates first using some threshhold. So say if
the last two operations were inserts try insert first next time, etc.
This is not directly related to your question, however, following these
suggestions will tend to improve performance of these types of tasks
significantly and I have found that, at least in Informix servers and I have
not reason to see why DB2 would behave differently, the improvements are
noticably greater for partitioned (fragmented in IDS) tables unless that
partitioning is based directly on the primary key. It's worth testing anyway.
Art S. Kagel |