vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi, I'm having a pretty serious problem with postgresql's performance. Currently, I have a cron task that is set to restart and vacuumdb -faz every six hours. If that doesn't happen, the disk goes from 10% full to 95% full within 2 days (and it's a 90GB disk...with the database being a 2MB download after dump), and the CPU goes from running at around a 2% load to a 99+% load right away (the stats look like a square wave). So it's problem-hunting time, I guess. The problem has something to do with the following errors (there are a lot; I'm posting a short sample) NOTICE: relation "pg_depend" TID 43/27: InsertTransactionInProgress 209545 --- can't shrink relation NOTICE: relation "pg_depend" TID 43/28: InsertTransactionInProgress 209545 --- can't shrink relation NOTICE: relation "pg_depend" TID 43/29: InsertTransactionInProgress 209545 --- can't shrink relation NOTICE: relation "pg_depend" TID 43/30: InsertTransactionInProgress 209545 --- can't shrink relation NOTICE: relation "pg_depend" TID 43/31: InsertTransactionInProgress 209545 --- can't shrink relation NOTICE: relation "pg_depend" TID 43/32: InsertTransactionInProgress 209545 --- can't shrink relation NOTICE: relation "pg_type" TID 17/44: InsertTransactionInProgress 209545 --- can't shrink relation NOTICE: relation "pg_type" TID 17/45: InsertTransactionInProgress 209545 --- can't shrink relation NOTICE: relation "pg_attribute" TID 133/11: InsertTransactionInProgress 209545 --- can't shrink relation NOTICE: relation "pg_attribute" TID 133/12: InsertTransactionInProgress 209545 --- can't shrink relation NOTICE: relation "pg_attribute" TID 133/13: InsertTransactionInProgress 209545 --- can't shrink relation NOTICE: relation "pg_attribute" TID 133/14: InsertTransactionInProgress 209545 --- can't shrink relation NOTICE: relation "pg_attribute" TID 133/15: InsertTransactionInProgress 209545 --- can't shrink relation NOTICE: relation "pg_attribute" TID 133/16: InsertTransactionInProgress 209545 --- can't shrink relation NOTICE: relation "pg_attribute" TID 133/17: InsertTransactionInProgress 209545 --- can't shrink relation NOTICE: relation "pg_class" TID 41/18: DeleteTransactionInProgress 209545 --- can't shrink relation NOTICE: relation "pg_class" TID 41/19: DeleteTransactionInProgress 209545 --- can't shrink relation NOTICE: relation "pg_class" TID 41/20: DeleteTransactionInProgress 209545 --- can't shrink relation When I vacuum full, I can't get rid of these errors unless I restart the database (and then I restart, vacuum full, and everything's fine). And once I do a successful vacuum full, CPU usage returns to normal, and the disk is no longer almost full (back to 10% full). I'm at a loss to figure out where the problem is coming from and how to fix it. My machine: XServe G5 Dual 2GHz running Mac OS X Server 10.3.9. Postgresql 8.0.1 Thanks for any responses/ideas/solutions (best of all!), Richard ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| > I'm having a pretty serious problem with postgresql's performance. > Currently, I have a cron task that is set to restart and vacuumdb -faz > every six hours. If that doesn't happen, the disk goes from 10% full > to 95% full within 2 days (and it's a 90GB disk...with the database > being a 2MB download after dump), and the CPU goes from running at > around a 2% load to a 99+% load right away (the stats look like a > square wave). Are you running frequent queries which use temporary tables? -- ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| No, I don't think so. I don't think there are any temp table queries (and I'll check), but even if there are, site traffic is very low, and queries would be very infrequent. On Apr 20, 2005, at 12:36 PM, Rod Taylor wrote: >> I'm having a pretty serious problem with postgresql's performance. >> Currently, I have a cron task that is set to restart and vacuumdb -faz >> every six hours. If that doesn't happen, the disk goes from 10% full >> to 95% full within 2 days (and it's a 90GB disk...with the database >> being a 2MB download after dump), and the CPU goes from running at >> around a 2% load to a 99+% load right away (the stats look like a >> square wave). > > Are you running frequent queries which use temporary tables? > > > -- > > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| As a follow-up, I've found a function that used the following code: CREATE TEMPORARY TABLE results (nOrder integer, page_id integer, name text) WITHOUT OIDS ON COMMIT DROP; I would assume that the "WITHOUT OIDS" would be part of the source of the problem, so I've commented it out. On Apr 20, 2005, at 12:36 PM, Rod Taylor wrote: >> I'm having a pretty serious problem with postgresql's performance. >> Currently, I have a cron task that is set to restart and vacuumdb -faz >> every six hours. If that doesn't happen, the disk goes from 10% full >> to 95% full within 2 days (and it's a 90GB disk...with the database >> being a 2MB download after dump), and the CPU goes from running at >> around a 2% load to a 99+% load right away (the stats look like a >> square wave). > > Are you running frequent queries which use temporary tables? > > > -- > ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Richard Plotkin <richard@richardplotkin.com> writes: > I'm having a pretty serious problem with postgresql's performance. > Currently, I have a cron task that is set to restart and vacuumdb -faz > every six hours. If that doesn't happen, the disk goes from 10% full > to 95% full within 2 days (and it's a 90GB disk...with the database > being a 2MB download after dump), and the CPU goes from running at > around a 2% load to a 99+% load right away (the stats look like a > square wave). Q: what have you got the FSM parameters set to? Q: what exactly is bloating? Without knowing which tables or indexes are growing, it's hard to speculate about the exact causes. Use du and oid2name, or look at pg_class.relpages after a plain VACUUM. It's likely that the real answer is "you need to vacuum more often than every six hours", but I'm trying not to jump to conclusions. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Hi Tom, > Q: what have you got the FSM parameters set to? Here's from postgresql.conf -- FSM at default settings. # - Memory - shared_buffers = 30400 # min 16, at least max_connections*2, 8KB each work_mem = 32168 # min 64, size in KB #maintenance_work_mem = 16384 # min 1024, size in KB #max_stack_depth = 2048 # min 100, size in KB # - Free Space Map - #max_fsm_pages = 20000 # min max_fsm_relations*16, 6 bytes each #max_fsm_relations = 1000 # min 100, ~50 bytes each # - Kernel Resource Usage - max_files_per_process = 750 #1000 # min 25 #preload_libraries = '' > Q: what exactly is bloating? Without knowing which tables or indexes > are growing, it's hard to speculate about the exact causes. Use du and > oid2name, or look at pg_class.relpages after a plain VACUUM. This I do not know. I've disabled the cron jobs and will let the system bloat, then I will gather statistics (I'll give it 12-24 hours). > It's likely that the real answer is "you need to vacuum more often > than every six hours", but I'm trying not to jump to conclusions. That could be it, except that I would expect the problem to then look more like a gradual increase in CPU usage and a gradual increase in use of disk space. Mine could be an invalid assumption, but the system here looks like it goes from no problem to 100% problem within a minute. Thanks again! Richard ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| More info on what is bloating: It's only in one database (the one that's most used), and after running oid2name on the bloated files, the result is (mysteriously) empty. Here's the run on the three enormous files: $ /usr/local/bin/oid2name -d smt -o 160779 From database "smt": Filenode Table Name ---------------------- $ /usr/local/bin/oid2name -d smt -o 65782869 From database "smt": Filenode Table Name ---------------------- $ /usr/local/bin/oid2name -d smt -o 83345634 From database "smt": Filenode Table Name ---------------------- The file list looks like this (with normal sized files mostly removed): 1.0G ./106779 1.0G ./106779.1 1.0G ./106779.2 1.0G ./106779.3 978M ./106779.4 1.0G ./65782869 248M ./65782869.1 0B ./65782871 8.0K ./65782873 780M ./83345634 0B ./83345636 8.0K ./83345638 So does the empty result mean it's a temporary table? There is one temporary table (in the function previously mentioned) that does get created and dropped with some regularity. Thanks again, Richard On Apr 20, 2005, at 2:06 PM, Richard Plotkin wrote: > Hi Tom, > >> Q: what have you got the FSM parameters set to? > > Here's from postgresql.conf -- FSM at default settings. > # - Memory - > > shared_buffers = 30400 # min 16, at least max_connections*2, > 8KB each > work_mem = 32168 # min 64, size in KB > #maintenance_work_mem = 16384 # min 1024, size in KB > #max_stack_depth = 2048 # min 100, size in KB > > # - Free Space Map - > > #max_fsm_pages = 20000 # min max_fsm_relations*16, 6 bytes > each > #max_fsm_relations = 1000 # min 100, ~50 bytes each > > # - Kernel Resource Usage - > > max_files_per_process = 750 #1000 # min 25 > #preload_libraries = '' > > >> Q: what exactly is bloating? Without knowing which tables or indexes >> are growing, it's hard to speculate about the exact causes. Use du >> and >> oid2name, or look at pg_class.relpages after a plain VACUUM. > > This I do not know. I've disabled the cron jobs and will let the > system bloat, then I will gather statistics (I'll give it 12-24 > hours). > >> It's likely that the real answer is "you need to vacuum more often >> than every six hours", but I'm trying not to jump to conclusions. > > That could be it, except that I would expect the problem to then look > more like a gradual increase in CPU usage and a gradual increase in > use of disk space. Mine could be an invalid assumption, but the > system here looks like it goes from no problem to 100% problem within > a minute. > > Thanks again! > Richard > > > ---------------------------(end of > broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if > your > joining column's datatypes do not match > ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| On Thu, Apr 21, 2005 at 11:38:22AM -0700, Richard Plotkin wrote: > More info on what is bloating: > > It's only in one database (the one that's most used), and after running > oid2name on the bloated files, the result is (mysteriously) empty. > Here's the run on the three enormous files: > > $ /usr/local/bin/oid2name -d smt -o 160779 > From database "smt": > Filenode Table Name > ---------------------- Try -f instead of -o ... -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "World domination is proceeding according to plan" (Andrew Morton) ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings |
| |||
| That returned the same result. I also tried oid2name -d smt -x -i -S and, separately -s, and also separately, -d with all other databases, and none of the databases turned up any listing, in either oid or filenode, for any of these three bloated files. One thing I've noticed is that these oids are all extremely large numbers, whereas the rest of the oids in /data/base/* are no higher than 40000 or 50000. On Apr 21, 2005, at 1:46 PM, Alvaro Herrera wrote: > On Thu, Apr 21, 2005 at 11:38:22AM -0700, Richard Plotkin wrote: >> More info on what is bloating: >> >> It's only in one database (the one that's most used), and after >> running >> oid2name on the bloated files, the result is (mysteriously) empty. >> Here's the run on the three enormous files: >> >> $ /usr/local/bin/oid2name -d smt -o 160779 >> From database "smt": >> Filenode Table Name >> ---------------------- > > Try -f instead of -o ... > > -- > Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) > "World domination is proceeding according to plan" (Andrew > Morton) > > ---------------------------(end of > broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings > ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) |
| ||||
| I've also now tried looking at pg_class.relpages. I compared the results before and after vacuum. The results stayed the same, except for five rows that increased after the vacuum. Here is the select on those rows after the vacuum: relname | relnamespace | reltype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | reltoastrelid | reltoastidxid | relhasindex | relisshared | relkind | relnatts | relchecks | reltriggers | relukeys | relfkeys | relrefs | relhasoids | relhaspkey | relhasrules | relhassubclass | relacl ---------------------------------+--------------+---------+---------- +-------+-------------+---------------+----------+----------- +---------------+---------------+-------------+-------------+--------- +----------+-----------+-------------+----------+----------+--------- +------------+------------+-------------+---------------- +--------------- pg_attribute_relid_attnam_index | 11 | 0 | 1 | 403 | 16686 | 0 | 292 | 10250 | 0 | 0 | f | f | i | 2 | 0 | 0 | 0 | 0 | 0 | f | f | f | f | pg_class_oid_index | 11 | 0 | 1 | 403 | 16690 | 0 | 18 | 2640 | 0 | 0 | f | f | i | 1 | 0 | 0 | 0 | 0 | 0 | f | f | f | f | pg_depend_depender_index | 11 | 0 | 1 | 403 | 16701 | 0 | 52 | 6442 | 0 | 0 | f | f | i | 3 | 0 | 0 | 0 | 0 | 0 | f | f | f | f | pg_type_oid_index | 11 | 0 | 1 | 403 | 16731 | 0 | 8 | 1061 | 0 | 0 | f | f | i | 1 | 0 | 0 | 0 | 0 | 0 | f | f | f | f | pg_depend | 11 | 16677 | 1 | 0 | 16676 | 0 | 32 | 4200 | 0 | 0 | t | f | r | 7 | 0 | 0 | 0 | 0 | 0 | f | f | f | f | {=r/postgres} On Apr 20, 2005, at 1:51 PM, Tom Lane wrote: > Richard Plotkin <richard@richardplotkin.com> writes: >> I'm having a pretty serious problem with postgresql's performance. >> Currently, I have a cron task that is set to restart and vacuumdb -faz >> every six hours. If that doesn't happen, the disk goes from 10% full >> to 95% full within 2 days (and it's a 90GB disk...with the database >> being a 2MB download after dump), and the CPU goes from running at >> around a 2% load to a 99+% load right away (the stats look like a >> square wave). > > Q: what have you got the FSM parameters set to? > > Q: what exactly is bloating? Without knowing which tables or indexes > are growing, it's hard to speculate about the exact causes. Use du and > oid2name, or look at pg_class.relpages after a plain VACUUM. > > It's likely that the real answer is "you need to vacuum more often > than every six hours", but I'm trying not to jump to conclusions. > > regards, tom lane > > ---------------------------(end of > broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| Thread Tools | |
| Display Modes | |
|
|