This is a discussion on statistics within the Informix forums, part of the Database Server Software category; --> 9.40.FC5 Solaris 9. Someone has suggested to me that I could unload the statistics data from the database catalog, ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| 9.40.FC5 Solaris 9. Someone has suggested to me that I could unload the statistics data from the database catalog, run the batch data load, then put the old statistics back. They did this years ago on a Tandy - I don't know what database. They claim performance was not negatively impacted but in fact was good. They seem to feel I am wasting a lot of time running update statistics. I don't see how this could possibly work on Informix. Has anyone tried this or heard of it? thanks in advance John |
| |||
| On Thu, 6 Jul 2006 18:05:05 -0400, "Campbell, John \(GE Cons Fin\)" <John.Campbell2@ge.com> wrote: >9.40.FC5 Solaris 9. Someone has suggested to me that I could unload the statistics data from the database catalog, run the batch data load, then put the old statistics back. They did this years ago on a Tandy - I don't know what database. They claim performance was not negatively impacted but in fact was good. They seem to feel I am wasting a lot of time running update statistics. I don't see how this could possibly work on Informix. Has anyone tried this or heard of it? > >thanks in advance >John You *might* be thinking of playing around with the sysdistrib table . .. but that means messing around with the system tables and I don't believe that IBM would support that . . . . If all you're talking about is loading some data, why are you concerned about update statistics? The statistics are updated only when you tell Informix to update them. if you're concerned about the time taken to update statistics, then consider only updating stats once a week . . . . or when there is a percentage growth difference between sysmaster:sysptnhdr.nrows and the <datebase>:systables.nrows (I think). I used a shell script to only generate the needed update stats for the needed tables; that was before Art K's 'dostats' would do it for you . .. . . Relevant statistics are needed by the optimizer to make smart decisions. BTW, will "they" (whoever "they" are) support your instance if there's a problem . . . performance or stability (due to messing around with system tables)? Just checking . . . . JWC |
| |||
| Campbell, John (GE Cons Fin) wrote: > 9.40.FC5 Solaris 9. Someone has suggested to me that I could unload the statistics >data from the database catalog, run the batch data load, then put the old statistics >back. They did this years ago on a Tandy - I don't know what database. They claim >performance was not negatively impacted but in fact was good. They seem to feel I >am wasting a lot of time running update statistics. I don't see how this could >possibly work on Informix. Has anyone tried this or heard of it? The sysdistrib table is one of the few system catalog tables that it is actually safe to muck with. You can't do any real harm. HOWEVER, it still not a good idea. You CAN really screw up your query performance if you don't know what you are doing. I'm not convinced that 'someone' knows what he/she is doing. I assume 'someone' is recommending that you export the data distributions prior to the load, clear the stats with "UPDATE STATISTICS LOW ... DROP DISTRIBUTIONS;" then load and restore the exported distribution data into the sysdistrib table. This is wrong on several levels: - Exporting and dropping the stats will NOT affect the batch data load run time in any manner unless the entire load is updating existing records (in which case you likely don't need a full dostats run afterwards anyway) or you are doing what Jonathan has dubbed an UPSERT operation, ie update if the record exists, insert otherwise. The effect of dropping the distributions will only be felt if a large percentage of the operations are updates, or if your job always tries the update first and most operations turn out to be inserts. HOWEVER, the effect will be to slow things down and the per operation slowdown will increase over the lifetime of the load. This is because with no distributions the optimizer is likely to assume the table is almost empty and use table scans to check for existing rows to update. Even if the table IS empty at the start of the run, it will not be by the end. - The DROP DISTRIBUTIONS, since it must be done as part of an UPDATE STATISTICS LOW will not be fast which reduces the savings gained by not doing a full stats run after the load. - When you are done, you will be restoring the stats as of the last full stats update so they will be out-of-date and inaccurate which will slow down all subsequent queries. And since I assume this is not a one-shot load, but a periodic operation, each time you take this shortcut the stats will be more out-of-date. - Finally, I've actually found that during long upsert jobs it is beneficial to run an UPDATE STATISTICS MEDIUM on the table as a whole about 1/3 to 1/2 way through the load (and again at 2/3 if we've done it at the 1st 3rd). This is again because it improves the update operations. So, my recommendation, run dostats before, during, and after the load if it is a large one. If the load is small relative to the table size and the key distribution in the input is compatible with the distribution of keys already in the table, then you can get away without the before and during runs, but still run dostats after the load. If your load ONLY updates existing rows, you may be able to get away with only running update statistics on those secondary key columns (and their indexes) that may have been modified by the load. Primary and foreign key columns are unlikely to have been modified by such a run, so the existing stats will still be good. Art S. Kagel |
| ||||
| On Fri, 7 Jul 2006 13:19:29 -0400, "Campbell, John \(GE Cons Fin\)" <John.Campbell2@ge.com> wrote: >Yes, I am very reluctant to mash on any system tables. This load routine runs a stored procedure every day that does an update/insert. If row found update it - if not insert it. After 3 or 4 days performance degrades significantly and deliverables are delayed. We didn't see this at first but the volume has been consistently increasing for two years. Running update statistics after the load process resolves the performance problem. It was suggested that I could keep the old statistics and put them back after the load process to avoid this problem and save time. I am skeptical and definitely wanted a second opinion. Responses so far have not been in favor of the idea... > How often does this load take place? is it a batch update / insert or a trickle-type job? Could you update stats on the table at a given time of day using cron? JWC |
| Thread Tools | |
| Display Modes | |
|
|