Unix Technical Forum

statistics

This is a discussion on statistics within the Informix forums, part of the Database Server Software category; --> 9.40.FC5 Solaris 9. Someone has suggested to me that I could unload the statistics data from the database catalog, ...


Go Back   Unix Technical Forum > Database Server Software > Informix

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-20-2008, 12:30 PM
Campbell, John \(GE Cons Fin\)
 
Posts: n/a
Default statistics

9.40.FC5 Solaris 9. Someone has suggested to me that I could unload the statistics data from the database catalog, run the batch data load, then put the old statistics back. They did this years ago on a Tandy - I don't know what database. They claim performance was not negatively impacted but in fact was good. They seem to feel I am wasting a lot of time running update statistics. I don't see how this could possibly work on Informix. Has anyone tried this or heard of it?

thanks in advance
John
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-20-2008, 12:30 PM
John Carlson
 
Posts: n/a
Default Re: statistics

On Thu, 6 Jul 2006 18:05:05 -0400, "Campbell, John \(GE Cons Fin\)"
<John.Campbell2@ge.com> wrote:

>9.40.FC5 Solaris 9. Someone has suggested to me that I could unload the statistics data from the database catalog, run the batch data load, then put the old statistics back. They did this years ago on a Tandy - I don't know what database. They claim performance was not negatively impacted but in fact was good. They seem to feel I am wasting a lot of time running update statistics. I don't see how this could possibly work on Informix. Has anyone tried this or heard of it?
>
>thanks in advance
>John


You *might* be thinking of playing around with the sysdistrib table .
.. but that means messing around with the system tables and I don't
believe that IBM would support that . . . .

If all you're talking about is loading some data, why are you
concerned about update statistics? The statistics are updated only
when you tell Informix to update them. if you're concerned about the
time taken to update statistics, then consider only updating stats
once a week . . . . or when there is a percentage growth difference
between sysmaster:sysptnhdr.nrows and the <datebase>:systables.nrows
(I think). I used a shell script to only generate the needed update
stats for the needed tables; that was before Art K's 'dostats' would
do it for you . .. . .

Relevant statistics are needed by the optimizer to make smart
decisions.

BTW, will "they" (whoever "they" are) support your instance if there's
a problem . . . performance or stability (due to messing around with
system tables)? Just checking . . . .

JWC
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-20-2008, 12:31 PM
Art S. Kagel
 
Posts: n/a
Default Re: statistics

Campbell, John (GE Cons Fin) wrote:
> 9.40.FC5 Solaris 9. Someone has suggested to me that I could unload the statistics
>data from the database catalog, run the batch data load, then put the old

statistics
>back. They did this years ago on a Tandy - I don't know what database. They claim
>performance was not negatively impacted but in fact was good. They seem to feel I
>am wasting a lot of time running update statistics. I don't see how this could
>possibly work on Informix. Has anyone tried this or heard of it?


The sysdistrib table is one of the few system catalog tables that it is
actually safe to muck with. You can't do any real harm. HOWEVER, it still
not a good idea. You CAN really screw up your query performance if you
don't know what you are doing. I'm not convinced that 'someone' knows what
he/she is doing.

I assume 'someone' is recommending that you export the data distributions
prior to the load, clear the stats with "UPDATE STATISTICS LOW ... DROP
DISTRIBUTIONS;" then load and restore the exported distribution data into
the sysdistrib table. This is wrong on several levels:

- Exporting and dropping the stats will NOT affect the batch data load run
time in any manner unless the entire load is updating existing records (in
which case you likely don't need a full dostats run afterwards anyway) or
you are doing what Jonathan has dubbed an UPSERT operation, ie update if the
record exists, insert otherwise. The effect of dropping the distributions
will only be felt if a large percentage of the operations are updates, or if
your job always tries the update first and most operations turn out to be
inserts. HOWEVER, the effect will be to slow things down and the per
operation slowdown will increase over the lifetime of the load. This is
because with no distributions the optimizer is likely to assume the table is
almost empty and use table scans to check for existing rows to update. Even
if the table IS empty at the start of the run, it will not be by the end.

- The DROP DISTRIBUTIONS, since it must be done as part of an UPDATE
STATISTICS LOW will not be fast which reduces the savings gained by not
doing a full stats run after the load.

- When you are done, you will be restoring the stats as of the last full
stats update so they will be out-of-date and inaccurate which will slow down
all subsequent queries. And since I assume this is not a one-shot load, but
a periodic operation, each time you take this shortcut the stats will be
more out-of-date.

- Finally, I've actually found that during long upsert jobs it is beneficial
to run an UPDATE STATISTICS MEDIUM on the table as a whole about 1/3 to 1/2
way through the load (and again at 2/3 if we've done it at the 1st 3rd).
This is again because it improves the update operations.

So, my recommendation, run dostats before, during, and after the load if it
is a large one. If the load is small relative to the table size and the key
distribution in the input is compatible with the distribution of keys
already in the table, then you can get away without the before and during
runs, but still run dostats after the load.

If your load ONLY updates existing rows, you may be able to get away with
only running update statistics on those secondary key columns (and their
indexes) that may have been modified by the load. Primary and foreign key
columns are unlikely to have been modified by such a run, so the existing
stats will still be good.

Art S. Kagel
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-20-2008, 12:32 PM
John Carlson
 
Posts: n/a
Default Re: statistics

On Fri, 7 Jul 2006 13:19:29 -0400, "Campbell, John \(GE Cons Fin\)"
<John.Campbell2@ge.com> wrote:

>Yes, I am very reluctant to mash on any system tables. This load routine runs a stored procedure every day that does an update/insert. If row found update it - if not insert it. After 3 or 4 days performance degrades significantly and deliverables are delayed. We didn't see this at first but the volume has been consistently increasing for two years. Running update statistics after the load process resolves the performance problem. It was suggested that I could keep the old statistics and put them back after the load process to avoid this problem and save time. I am skeptical and definitely wanted a second opinion. Responses so far have not been in favor of the idea...
>


How often does this load take place? is it a batch update / insert or
a trickle-type job? Could you update stats on the table at a given
time of day using cron?

JWC
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 11:21 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com