This is a discussion on Re: Need for speed 2 within the Pgsql Performance forums, part of the PostgreSQL category; --> > Putting pg_xlog on the IDE drives gave about 10% performance > improvement. Would faster disks give more performance? ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| > Putting pg_xlog on the IDE drives gave about 10% performance > improvement. Would faster disks give more performance? > > What my application does: > > Every five minutes a new logfile will be imported. Depending on the > source of the request it will be imported in one of three "raw click" > tables. (data from two months back, to be able to verify customer > complains) > For reporting I have a set of tables. These contain data from the last > two years. My app deletes all entries from today and reinserts updated > data calculated from the raw data tables. > > The queries contain no joins only aggregates. I have several indexes to > speed different kinds of queries. > > My problems occur when one users does a report that contains to much old > data. In that case all cache mechanisms will fail and disc io is the > limiting factor. It seems like you are pushing limit of what server can handle. This means: 1. expensive server upgrade. or 2. make software more efficient. Since you sound I/O bound, you can tackle 1. by a. adding more memory or b. increasing i/o throughput. Unfortunately, you already have a pretty decent server (for x86) so 1. means 64 bit platform and 2. means more expensive hard drives. The archives is full of information about this... Is your data well normalized? You can do tricks like: if table has fields a,b,c,d,e,f with a is primary key, and d,e,f not frequently queried or missing, move d,e,f to seprate table. well normalized structures are always more cache efficient. Do you have lots of repeating and/or empty data values in your tables? Make your indexes and data as small as possible to reduce pressure on the cache, here are just a few tricks: 1. use int2/int4 instead of numeric 2. know when to use char and varchar 3. use functional indexes to reduce index expression complexity. This can give extreme benefits if you can, for example, reduce double field index to Boolean. Merlin ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| Should I temporarily increase sort_mem, vacuum_mem, neither, or both when doing a CLUSTER on a large (100 million row) table where as many as half of the tuples are deadwood from UPDATEs or DELETEs? I have large batch (10 million row) inserts, updates, and deletes so I'm not sure frequent vacuuming would help. ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Andrew, On Thu, 2005-08-25 at 12:24 -0700, Andrew Lazarus wrote: > Should I temporarily increase sort_mem, vacuum_mem, neither, or both > when doing a CLUSTER on a large (100 million row) table where as many as > half of the tuples are deadwood from UPDATEs or DELETEs? I have large > batch (10 million row) inserts, updates, and deletes so I'm not sure > frequent vacuuming would help. You may need to experiment with both. What version of Postgres? What is the size of your database? How many concurrent users? If you're seeing half of the tuples are dead, I look at checking your max_fsm_pages and max_fsm_relations after a full vacuum analyze before doing too much with sort mem. Your mileage may vary. Best of luck. Steve Poe > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| ||||
| Andrew Lazarus <andrew@pillette.com> writes: > Should I temporarily increase sort_mem, vacuum_mem, neither, or both > when doing a CLUSTER on a large (100 million row) table The only part of that job that can use lots of memory is the index rebuilds. In recent PG versions maintenance_work_mem is the thing to increase for an index build; previously sort_mem controlled it. I forget when the changeover was; maybe 8.0. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |