Unix Technical Forum

Re: Vacuum Full Analyze Stalled

This is a discussion on Re: Vacuum Full Analyze Stalled within the pgsql Admins forums, part of the PostgreSQL category; --> The previous report was from the same databases -- I'm a consultant tasked with evaluating the various open source ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Admins

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-10-2008, 06:19 AM
Kevin Grittner
 
Posts: n/a
Default Re: Vacuum Full Analyze Stalled

The previous report was from the same databases -- I'm a consultant
tasked with evaluating the various open source options and making
one of them work with our existing framework. Jeff developed a new
app (which is in beta testing) which is our first use of PostgreSQL
with real production load for prolonged periods of time.

Jeff was monitoring the app over the weekend, and posted here when
he had problems. Unfortunately, he didn't capture enough information
to diagnose much, but I'm passing along the sequence of events
since they lead up to a failure this morning for which we have more
detail.

Saturday, 9:00 a.m. -- Noticed performace beginning to degrade.
Saturday, 1:00 p.m. -- After continued degradation, stops client app,
and runs vacuum full analyze on both Windows and Linux boxes.
Saturday, 3:00 p.m. -- Windows box completes the vacuum.
Sunday, 1:00 p.m. -- Vacuum on Linux still not done. CPU on Linux
box is completely idle. Killed the vacuum process. Ran vacuum
full analyze on each table individually. All fine except for the 72 row
table with very frequent updates. Left this one running.
Sunday, 9:00 p.m. -- Vacuum of 72 row table still running. Killed all
processes and restarted postgres. Started a vacuum full analyze
of the 72 row table again.
Sunday, 11:00 p.m. -- Vacuum of 72 row table finishes. Started
applications, which had been stopped since Saturday, 1:00 p.m.
Monday, 9:20 a.m. -- Performance is beginning to degrade again.
Detailed information from this point below.

dtr=# select * from pg_locks order by pid;
locktype | database | relation | page | tuple | transactionid | classid | objid | objsubid | transaction | pid | mode | granted
---------------+----------+----------+------+-------+---------------+---------+-------+----------+-------------+------+--------------------------+---------
transactionid | | | | | 36430565 | | | | 36430565 | 2771 | ExclusiveLock | t
relation | 30793 | 30976 | | | | | | | 36430565 | 2771 | AccessShareLock | t
relation | 30793 | 30976 | | | | | | | 36430565 | 2771 | RowExclusiveLock | t
relation | 30793 | 30871 | | | | | | | 36430565 | 2771 | AccessShareLock | t
relation | 30793 | 30871 | | | | | | | 36430565 | 2771 | RowExclusiveLock | t
relation | 30793 | 30976 | | | | | | | 36430561 | 2777 | AccessShareLock | t
relation | 30793 | 30976 | | | | | | | 36430561 | 2777 | RowExclusiveLock | t
transactionid | | | | | 36430561 | | | | 36430561 | 2777 | ExclusiveLock | t
relation | 30793 | 30871 | | | | | | | 36430561 | 2777 | AccessShareLock | t
relation | 30793 | 30871 | | | | | | | 36430561 | 2777 | RowExclusiveLock | t
transactionid | | | | | 36430563 | | | | 36430563 | 2783 | ExclusiveLock | t
relation | 30793 | 30871 | | | | | | | 36430563 | 2783 | AccessShareLock | t
relation | 30793 | 30871 | | | | | | | 36430563 | 2783 | RowExclusiveLock | t
relation | 30793 | 30976 | | | | | | | 36430563 | 2783 | AccessShareLock | t
relation | 30793 | 30976 | | | | | | | 36430563 | 2783 | RowExclusiveLock | t
relation | 30793 | 30976 | | | | | | | 36430559 | 2790 | AccessShareLock | t
relation | 30793 | 30976 | | | | | | | 36430559 | 2790 | RowExclusiveLock | t
relation | 30793 | 30871 | | | | | | | 36430559 | 2790 | AccessShareLock | t
relation | 30793 | 30871 | | | | | | | 36430559 | 2790 | RowExclusiveLock | t
transactionid | | | | | 36430559 | | | | 36430559 | 2790 | ExclusiveLock | t
relation | 30793 | 30976 | | | | | | | 35633343 | 3158 | ShareUpdateExclusiveLock | t
transactionid | | | | | 35633343 | | | | 35633343 | 3158 | ExclusiveLock | t
relation | 30793 | 30871 | | | | | | | 35633343 | 3158 | ShareUpdateExclusiveLock | t
relation | 30793 | 10340 | | | | | | | 36430567 | 4238 | AccessShareLock | t
transactionid | | | | | 36430567 | | | | 36430567 | 4238 | ExclusiveLock | t
(25 rows)

dtr=# \q
postgres@linpost:/var/pgsql/data> ps aux|grep ^postgres
postgres 28937 0.0 0.0 2436 992 pts/1 S Sep30 0:00 su postgres
postgres 28938 0.0 0.0 2944 1620 pts/1 S Sep30 0:00 bash
postgres 2612 0.0 0.0 170968 4952 ? S Oct02 0:00 /usr/local/pgsql/bin/postmaster
postgres 2614 0.0 1.9 171200 165528 ? S Oct02 0:08 postgres: writer process
postgres 2615 0.0 0.0 7240 2788 ? S Oct02 0:27 postgres: stats buffer process
postgres 2617 0.0 0.0 6556 1988 ? S Oct02 0:22 postgres: stats collector process
postgres 2724 0.0 0.0 171828 7156 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(32986) idle
postgres 2771 6.4 2.0 171996 167176 ? S Oct02 44:33 postgres: dtr dtr 127.0.0.1(32990) idle
postgres 2773 0.0 0.0 171460 5824 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(32991) idle
postgres 2774 0.0 0.0 171460 5824 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(32992) idle
postgres 2775 0.0 0.0 171460 5824 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(32993) idle
postgres 2776 0.0 0.0 171720 6848 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(32994) idle
postgres 2777 6.4 2.0 171996 167228 ? R Oct02 44:29 postgres: dtr dtr 127.0.0.1(32995) UPDATE
postgres 2778 0.0 0.0 171460 5824 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(32996) idle
postgres 2779 0.0 0.0 171460 5824 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(32997) idle
postgres 2780 0.0 0.0 171460 5824 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(32998) idle
postgres 2781 0.0 0.0 171460 5824 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(32999) idle
postgres 2782 0.0 0.0 171460 5824 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(33000) idle
postgres 2783 6.4 2.0 172016 167292 ? S Oct02 44:38 postgres: dtr dtr 127.0.0.1(33001) idle
postgres 2784 0.0 0.0 171460 5824 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(33002) idle
postgres 2785 0.0 0.0 171460 5828 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(33003) idle
postgres 2786 6.4 2.0 172000 167216 ? S Oct02 44:38 postgres: dtr dtr 127.0.0.1(33004) idle
postgres 2787 6.4 2.0 172000 167236 ? R Oct02 44:33 postgres: dtr dtr 127.0.0.1(33005) UPDATE
postgres 2788 0.0 0.0 171460 5828 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(33006) idle
postgres 2789 0.0 0.0 171460 5828 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(33007) idle
postgres 2790 6.4 2.0 171992 167280 ? R Oct02 44:27 postgres: dtr dtr 127.0.0.1(33008) UPDATE
postgres 2791 0.0 0.0 171460 5828 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(33009) idle
postgres 3158 0.0 0.0 188324 8044 ? S Oct02 0:00 postgres: autovacuum process dtr
postgres 4289 0.0 0.0 2148 672 pts/1 R+ 09:21 0:00 ps aux
postgres 4290 0.0 0.0 1688 688 pts/1 R+ 09:21 0:00 grep ^postgres

oid 30871 is the 72 row table.
oid 30976 is the index for its primary key.

We stopped the applications, and the autovacuum process didn't
wake up.

dtr=# select * from pg_locks order by pid;
locktype | database | relation | page | tuple | transactionid | classid | objid | objsubid | transaction | pid | mode | granted
---------------+----------+----------+------+-------+---------------+---------+-------+----------+-------------+------+--------------------------+---------
relation | 30793 | 30976 | | | | | | | 35633343 | 3158 | ShareUpdateExclusiveLock | t
transactionid | | | | | 35633343 | | | | 35633343 | 3158 | ExclusiveLock | t
relation | 30793 | 30871 | | | | | | | 35633343 | 3158 | ShareUpdateExclusiveLock | t
relation | 30793 | 10340 | | | | | | | 36600517 | 4498 | AccessShareLock | t
transactionid | | | | | 36600517 | | | | 36600517 | 4498 | ExclusiveLock | t
(5 rows)

dtr=# \q
postgres@linpost:/var/pgsql/data> ps aux|grep ^postgres
postgres 2612 0.0 0.0 170968 4952 ? S Oct02 0:00 /usr/local/pgsql/bin/postmaster
postgres 2614 0.0 1.9 171200 165528 ? S Oct02 0:10 postgres: writer process
postgres 2615 0.0 0.0 7240 2788 ? S Oct02 0:32 postgres: stats buffer process
postgres 2617 0.0 0.0 6556 1988 ? S Oct02 0:26 postgres: stats collector process
postgres 3158 0.0 0.0 188324 8044 ? S Oct02 0:00 postgres: autovacuum process dtr
postgres 4481 0.0 0.0 2432 988 pts/1 S 10:48 0:00 su postgres
postgres 4482 0.0 0.0 2944 1604 pts/1 S 10:48 0:00 bash
postgres 4499 0.0 0.0 2148 672 pts/1 R+ 10:49 0:00 ps aux
postgres 4500 0.0 0.0 1680 636 pts/1 S+ 10:49 0:00 grep ^postgres
postgres@linpost:/var/pgsql/data> uptime
10:48am up 6 days 19:20, 2 users, load average: 0.00, 0.28, 0.73
postgres@linpost:/var/pgsql/data> vmstat 10
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 8 278376 62676 7770684 0 0 15 7 17 18 11 1 87 1
0 0 8 278376 62700 7770660 0 0 0 11 1035 0 1 0 99 0
0 0 8 278440 62700 7770660 0 0 0 0 1013 0 0 0 100 0
0 0 8 278440 62700 7770660 0 0 0 0 1016 0 1 0 99 0
0 0 8 278440 62700 7770660 0 0 0 4 1018 0 1 0 99 0

We found the 72 row table somewhat bloated, so we cleaned it up.

dtr=# vacuum analyze verbose "DbTranImageStatus";
INFO: vacuuming "public.DbTranImageStatus"
INFO: index "DbTranImageStatusPK" now contains 72 row versions in 6665 pages
DETAIL: 6555 index pages have been deleted, 6555 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO: "DbTranImageStatus": found 0 removable, 72 nonremovable row versions in 4244 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 479137 unused item pointers.
0 pages are entirely empty.
CPU 0.01s/0.01u sec elapsed 0.02 sec.
INFO: vacuuming "pg_toast.pg_toast_272793"
INFO: index "pg_toast_272793_index" now contains 0 row versions in 1 pages
DETAIL: 0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: "pg_toast_272793": found 0 removable, 0 nonremovable row versions in 0 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: analyzing "public.DbTranImageStatus"
INFO: "DbTranImageStatus": scanned 3000 of 4244 pages, containing 68 live rows and 0 dead rows; 68 rows in sample, 96 estimated total rows
VACUUM
dtr=# vacuum full analyze verbose "DbTranImageStatus";
INFO: vacuuming "public.DbTranImageStatus"
INFO: "DbTranImageStatus": found 0 removable, 72 nonremovable row versions in 4244 pages
DETAIL: 0 dead row versions cannot be removed yet.
Nonremovable row versions range from 68 to 68 bytes long.
There were 479137 unused item pointers.
Total free space (including removable row versions) is 32760236 bytes.
4227 pages are or will become empty, including 0 at the end of the table.
4244 pages containing 32760236 free bytes are potential move destinations.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: index "DbTranImageStatusPK" now contains 72 row versions in 6654 pages
DETAIL: 0 index row versions were removed.
6544 index pages have been deleted, 6544 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: "DbTranImageStatus": moved 22 row versions, truncated 4244 to 1 pages
DETAIL: CPU 0.00s/0.02u sec elapsed 0.03 sec.
INFO: index "DbTranImageStatusPK" now contains 72 row versions in 6654 pages
DETAIL: 22 index row versions were removed.
6544 index pages have been deleted, 6544 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: vacuuming "pg_toast.pg_toast_272793"
INFO: "pg_toast_272793": found 0 removable, 0 nonremovable row versions in 0 pages
DETAIL: 0 dead row versions cannot be removed yet.
Nonremovable row versions range from 0 to 0 bytes long.
There were 0 unused item pointers.
Total free space (including removable row versions) is 0 bytes.
0 pages are or will become empty, including 0 at the end of the table.
0 pages containing 0 free bytes are potential move destinations.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: index "pg_toast_272793_index" now contains 0 row versions in 1 pages
DETAIL: 0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: analyzing "public.DbTranImageStatus"
INFO: "DbTranImageStatus": scanned 1 of 1 pages, containing 72 live rows and 0 dead rows; 72 rows in sample, 72 estimated total rows
VACUUM
dtr=# reindex table "DbTranImageStatus";
REINDEX
dtr=# vacuum analyze verbose "DbTranImageStatus";
INFO: vacuuming "public.DbTranImageStatus"
INFO: index "DbTranImageStatusPK" now contains 72 row versions in 2 pages
DETAIL: 0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: "DbTranImageStatus": found 0 removable, 72 nonremovable row versions in 1 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 48 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: vacuuming "pg_toast.pg_toast_272793"
INFO: index "pg_toast_272793_index" now contains 0 row versions in 1 pages
DETAIL: 0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: "pg_toast_272793": found 0 removable, 0 nonremovable row versions in 0 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: analyzing "public.DbTranImageStatus"
INFO: "DbTranImageStatus": scanned 1 of 1 pages, containing 72 live rows and 0 dead rows; 72 rows in sample, 72 estimated total rows
VACUUM

These all ran sub-second. We updated postgresql.conf for more
aggressive autovacuum, and restarted postgres, then restarted the
application.

This failure seems rather different from the one I previously posted,
since it was blocking on the application table, rather than
pg_constraint_contypid_index, and it did not wake up when all
other processes where stopped.

As before, both boxes are running 8.1beta2. Windows is not showing
these problems with the autovacuum blocking; just Linux.

-Kevin


>>> Tom Lane <tgl@sss.pgh.pa.us> 10/02/05 8:53 PM >>>

"Jeff Kirby" <Jeff.Kirby@wicourts.gov> writes:
> the Linux box however is still chugging away this morning... and
> appears to be stuck on vacuuming "pg_constraint_contypid_index". How
> do I know... well I don't really... I'm inferring based on the order
> of the log output on the Windows box.


Looking in pg_locks would give you a more reliable indicator of what the
VACUUM is currently working on.

Is the Linux box otherwise idle? There was another report recently of a
vacuum hung up for a long time on pg_constraint_contypid_index, but it
seemed to be due to extremely heavy usage of domain types ...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-10-2008, 06:19 AM
Jim C. Nasby
 
Posts: n/a
Default Re: Vacuum Full Analyze Stalled

I haven't looked in detail at autovacuum in 8.1 yet, but I know that in
previous versions it was a bad idea to depend on it to vacuum a small
table that has a lot of update activity frequently enough. The issue is
that since it's single-threaded if it starts a vacuum on a large table
it could be hours before it gets around to vacuuming the small table
again.

In a case like this, your best bet is to either have the application
periodically vacuum the small table or just do it once a minute via
cron. It should take next to no time to vacuum a 72 row table, so
running vacuum once a minute shouldn't really present much overhead when
the vacuum isn't needed.

On Mon, Oct 03, 2005 at 02:09:37PM -0500, Kevin Grittner wrote:
> The previous report was from the same databases -- I'm a consultant
> tasked with evaluating the various open source options and making
> one of them work with our existing framework. Jeff developed a new
> app (which is in beta testing) which is our first use of PostgreSQL
> with real production load for prolonged periods of time.
>
> Jeff was monitoring the app over the weekend, and posted here when
> he had problems. Unfortunately, he didn't capture enough information
> to diagnose much, but I'm passing along the sequence of events
> since they lead up to a failure this morning for which we have more
> detail.
>
> Saturday, 9:00 a.m. -- Noticed performace beginning to degrade.
> Saturday, 1:00 p.m. -- After continued degradation, stops client app,
> and runs vacuum full analyze on both Windows and Linux boxes.
> Saturday, 3:00 p.m. -- Windows box completes the vacuum.
> Sunday, 1:00 p.m. -- Vacuum on Linux still not done. CPU on Linux
> box is completely idle. Killed the vacuum process. Ran vacuum
> full analyze on each table individually. All fine except for the 72 row
> table with very frequent updates. Left this one running.
> Sunday, 9:00 p.m. -- Vacuum of 72 row table still running. Killed all
> processes and restarted postgres. Started a vacuum full analyze
> of the 72 row table again.
> Sunday, 11:00 p.m. -- Vacuum of 72 row table finishes. Started
> applications, which had been stopped since Saturday, 1:00 p.m.
> Monday, 9:20 a.m. -- Performance is beginning to degrade again.
> Detailed information from this point below.
>
> dtr=# select * from pg_locks order by pid;
> locktype | database | relation | page | tuple | transactionid | classid | objid | objsubid | transaction | pid | mode | granted
> ---------------+----------+----------+------+-------+---------------+---------+-------+----------+-------------+------+--------------------------+---------
> transactionid | | | | | 36430565 | | | | 36430565 | 2771 | ExclusiveLock | t
> relation | 30793 | 30976 | | | | | | | 36430565 | 2771 | AccessShareLock | t
> relation | 30793 | 30976 | | | | | | | 36430565 | 2771 | RowExclusiveLock | t
> relation | 30793 | 30871 | | | | | | | 36430565 | 2771 | AccessShareLock | t
> relation | 30793 | 30871 | | | | | | | 36430565 | 2771 | RowExclusiveLock | t
> relation | 30793 | 30976 | | | | | | | 36430561 | 2777 | AccessShareLock | t
> relation | 30793 | 30976 | | | | | | | 36430561 | 2777 | RowExclusiveLock | t
> transactionid | | | | | 36430561 | | | | 36430561 | 2777 | ExclusiveLock | t
> relation | 30793 | 30871 | | | | | | | 36430561 | 2777 | AccessShareLock | t
> relation | 30793 | 30871 | | | | | | | 36430561 | 2777 | RowExclusiveLock | t
> transactionid | | | | | 36430563 | | | | 36430563 | 2783 | ExclusiveLock | t
> relation | 30793 | 30871 | | | | | | | 36430563 | 2783 | AccessShareLock | t
> relation | 30793 | 30871 | | | | | | | 36430563 | 2783 | RowExclusiveLock | t
> relation | 30793 | 30976 | | | | | | | 36430563 | 2783 | AccessShareLock | t
> relation | 30793 | 30976 | | | | | | | 36430563 | 2783 | RowExclusiveLock | t
> relation | 30793 | 30976 | | | | | | | 36430559 | 2790 | AccessShareLock | t
> relation | 30793 | 30976 | | | | | | | 36430559 | 2790 | RowExclusiveLock | t
> relation | 30793 | 30871 | | | | | | | 36430559 | 2790 | AccessShareLock | t
> relation | 30793 | 30871 | | | | | | | 36430559 | 2790 | RowExclusiveLock | t
> transactionid | | | | | 36430559 | | | | 36430559 | 2790 | ExclusiveLock | t
> relation | 30793 | 30976 | | | | | | | 35633343 | 3158 | ShareUpdateExclusiveLock | t
> transactionid | | | | | 35633343 | | | | 35633343 | 3158 | ExclusiveLock | t
> relation | 30793 | 30871 | | | | | | | 35633343 | 3158 | ShareUpdateExclusiveLock | t
> relation | 30793 | 10340 | | | | | | | 36430567 | 4238 | AccessShareLock | t
> transactionid | | | | | 36430567 | | | | 36430567 | 4238 | ExclusiveLock | t
> (25 rows)
>
> dtr=# \q
> postgres@linpost:/var/pgsql/data> ps aux|grep ^postgres
> postgres 28937 0.0 0.0 2436 992 pts/1 S Sep30 0:00 su postgres
> postgres 28938 0.0 0.0 2944 1620 pts/1 S Sep30 0:00 bash
> postgres 2612 0.0 0.0 170968 4952 ? S Oct02 0:00 /usr/local/pgsql/bin/postmaster
> postgres 2614 0.0 1.9 171200 165528 ? S Oct02 0:08 postgres: writer process
> postgres 2615 0.0 0.0 7240 2788 ? S Oct02 0:27 postgres: stats buffer process
> postgres 2617 0.0 0.0 6556 1988 ? S Oct02 0:22 postgres: stats collector process
> postgres 2724 0.0 0.0 171828 7156 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(32986) idle
> postgres 2771 6.4 2.0 171996 167176 ? S Oct02 44:33 postgres: dtr dtr 127.0.0.1(32990) idle
> postgres 2773 0.0 0.0 171460 5824 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(32991) idle
> postgres 2774 0.0 0.0 171460 5824 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(32992) idle
> postgres 2775 0.0 0.0 171460 5824 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(32993) idle
> postgres 2776 0.0 0.0 171720 6848 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(32994) idle
> postgres 2777 6.4 2.0 171996 167228 ? R Oct02 44:29 postgres: dtr dtr 127.0.0.1(32995) UPDATE
> postgres 2778 0.0 0.0 171460 5824 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(32996) idle
> postgres 2779 0.0 0.0 171460 5824 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(32997) idle
> postgres 2780 0.0 0.0 171460 5824 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(32998) idle
> postgres 2781 0.0 0.0 171460 5824 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(32999) idle
> postgres 2782 0.0 0.0 171460 5824 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(33000) idle
> postgres 2783 6.4 2.0 172016 167292 ? S Oct02 44:38 postgres: dtr dtr 127.0.0.1(33001) idle
> postgres 2784 0.0 0.0 171460 5824 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(33002) idle
> postgres 2785 0.0 0.0 171460 5828 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(33003) idle
> postgres 2786 6.4 2.0 172000 167216 ? S Oct02 44:38 postgres: dtr dtr 127.0.0.1(33004) idle
> postgres 2787 6.4 2.0 172000 167236 ? R Oct02 44:33 postgres: dtr dtr 127.0.0.1(33005) UPDATE
> postgres 2788 0.0 0.0 171460 5828 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(33006) idle
> postgres 2789 0.0 0.0 171460 5828 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(33007) idle
> postgres 2790 6.4 2.0 171992 167280 ? R Oct02 44:27 postgres: dtr dtr 127.0.0.1(33008) UPDATE
> postgres 2791 0.0 0.0 171460 5828 ? S Oct02 0:00 postgres: dtr dtr 127.0.0.1(33009) idle
> postgres 3158 0.0 0.0 188324 8044 ? S Oct02 0:00 postgres: autovacuum process dtr
> postgres 4289 0.0 0.0 2148 672 pts/1 R+ 09:21 0:00 ps aux
> postgres 4290 0.0 0.0 1688 688 pts/1 R+ 09:21 0:00 grep ^postgres
>
> oid 30871 is the 72 row table.
> oid 30976 is the index for its primary key.
>
> We stopped the applications, and the autovacuum process didn't
> wake up.
>
> dtr=# select * from pg_locks order by pid;
> locktype | database | relation | page | tuple | transactionid | classid | objid | objsubid | transaction | pid | mode | granted
> ---------------+----------+----------+------+-------+---------------+---------+-------+----------+-------------+------+--------------------------+---------
> relation | 30793 | 30976 | | | | | | | 35633343 | 3158 | ShareUpdateExclusiveLock | t
> transactionid | | | | | 35633343 | | | | 35633343 | 3158 | ExclusiveLock | t
> relation | 30793 | 30871 | | | | | | | 35633343 | 3158 | ShareUpdateExclusiveLock | t
> relation | 30793 | 10340 | | | | | | | 36600517 | 4498 | AccessShareLock | t
> transactionid | | | | | 36600517 | | | | 36600517 | 4498 | ExclusiveLock | t
> (5 rows)
>
> dtr=# \q
> postgres@linpost:/var/pgsql/data> ps aux|grep ^postgres
> postgres 2612 0.0 0.0 170968 4952 ? S Oct02 0:00 /usr/local/pgsql/bin/postmaster
> postgres 2614 0.0 1.9 171200 165528 ? S Oct02 0:10 postgres: writer process
> postgres 2615 0.0 0.0 7240 2788 ? S Oct02 0:32 postgres: stats buffer process
> postgres 2617 0.0 0.0 6556 1988 ? S Oct02 0:26 postgres: stats collector process
> postgres 3158 0.0 0.0 188324 8044 ? S Oct02 0:00 postgres: autovacuum process dtr
> postgres 4481 0.0 0.0 2432 988 pts/1 S 10:48 0:00 su postgres
> postgres 4482 0.0 0.0 2944 1604 pts/1 S 10:48 0:00 bash
> postgres 4499 0.0 0.0 2148 672 pts/1 R+ 10:49 0:00 ps aux
> postgres 4500 0.0 0.0 1680 636 pts/1 S+ 10:49 0:00 grep ^postgres
> postgres@linpost:/var/pgsql/data> uptime
> 10:48am up 6 days 19:20, 2 users, load average: 0.00, 0.28, 0.73
> postgres@linpost:/var/pgsql/data> vmstat 10
> procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id wa
> 0 0 8 278376 62676 7770684 0 0 15 7 17 18 11 1 87 1
> 0 0 8 278376 62700 7770660 0 0 0 11 1035 0 1 0 99 0
> 0 0 8 278440 62700 7770660 0 0 0 0 1013 0 0 0 100 0
> 0 0 8 278440 62700 7770660 0 0 0 0 1016 0 1 0 99 0
> 0 0 8 278440 62700 7770660 0 0 0 4 1018 0 1 0 99 0
>
> We found the 72 row table somewhat bloated, so we cleaned it up.
>
> dtr=# vacuum analyze verbose "DbTranImageStatus";
> INFO: vacuuming "public.DbTranImageStatus"
> INFO: index "DbTranImageStatusPK" now contains 72 row versions in 6665 pages
> DETAIL: 6555 index pages have been deleted, 6555 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.01 sec.
> INFO: "DbTranImageStatus": found 0 removable, 72 nonremovable row versions in 4244 pages
> DETAIL: 0 dead row versions cannot be removed yet.
> There were 479137 unused item pointers.
> 0 pages are entirely empty.
> CPU 0.01s/0.01u sec elapsed 0.02 sec.
> INFO: vacuuming "pg_toast.pg_toast_272793"
> INFO: index "pg_toast_272793_index" now contains 0 row versions in 1 pages
> DETAIL: 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO: "pg_toast_272793": found 0 removable, 0 nonremovable row versions in 0 pages
> DETAIL: 0 dead row versions cannot be removed yet.
> There were 0 unused item pointers.
> 0 pages are entirely empty.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO: analyzing "public.DbTranImageStatus"
> INFO: "DbTranImageStatus": scanned 3000 of 4244 pages, containing 68 live rows and 0 dead rows; 68 rows in sample, 96 estimated total rows
> VACUUM
> dtr=# vacuum full analyze verbose "DbTranImageStatus";
> INFO: vacuuming "public.DbTranImageStatus"
> INFO: "DbTranImageStatus": found 0 removable, 72 nonremovable row versions in 4244 pages
> DETAIL: 0 dead row versions cannot be removed yet.
> Nonremovable row versions range from 68 to 68 bytes long.
> There were 479137 unused item pointers.
> Total free space (including removable row versions) is 32760236 bytes.
> 4227 pages are or will become empty, including 0 at the end of the table.
> 4244 pages containing 32760236 free bytes are potential move destinations.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO: index "DbTranImageStatusPK" now contains 72 row versions in 6654 pages
> DETAIL: 0 index row versions were removed.
> 6544 index pages have been deleted, 6544 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO: "DbTranImageStatus": moved 22 row versions, truncated 4244 to 1 pages
> DETAIL: CPU 0.00s/0.02u sec elapsed 0.03 sec.
> INFO: index "DbTranImageStatusPK" now contains 72 row versions in 6654 pages
> DETAIL: 22 index row versions were removed.
> 6544 index pages have been deleted, 6544 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO: vacuuming "pg_toast.pg_toast_272793"
> INFO: "pg_toast_272793": found 0 removable, 0 nonremovable row versions in 0 pages
> DETAIL: 0 dead row versions cannot be removed yet.
> Nonremovable row versions range from 0 to 0 bytes long.
> There were 0 unused item pointers.
> Total free space (including removable row versions) is 0 bytes.
> 0 pages are or will become empty, including 0 at the end of the table.
> 0 pages containing 0 free bytes are potential move destinations.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO: index "pg_toast_272793_index" now contains 0 row versions in 1 pages
> DETAIL: 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO: analyzing "public.DbTranImageStatus"
> INFO: "DbTranImageStatus": scanned 1 of 1 pages, containing 72 live rows and 0 dead rows; 72 rows in sample, 72 estimated total rows
> VACUUM
> dtr=# reindex table "DbTranImageStatus";
> REINDEX
> dtr=# vacuum analyze verbose "DbTranImageStatus";
> INFO: vacuuming "public.DbTranImageStatus"
> INFO: index "DbTranImageStatusPK" now contains 72 row versions in 2 pages
> DETAIL: 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO: "DbTranImageStatus": found 0 removable, 72 nonremovable row versions in 1 pages
> DETAIL: 0 dead row versions cannot be removed yet.
> There were 48 unused item pointers.
> 0 pages are entirely empty.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO: vacuuming "pg_toast.pg_toast_272793"
> INFO: index "pg_toast_272793_index" now contains 0 row versions in 1 pages
> DETAIL: 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO: "pg_toast_272793": found 0 removable, 0 nonremovable row versions in 0 pages
> DETAIL: 0 dead row versions cannot be removed yet.
> There were 0 unused item pointers.
> 0 pages are entirely empty.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO: analyzing "public.DbTranImageStatus"
> INFO: "DbTranImageStatus": scanned 1 of 1 pages, containing 72 live rows and 0 dead rows; 72 rows in sample, 72 estimated total rows
> VACUUM
>
> These all ran sub-second. We updated postgresql.conf for more
> aggressive autovacuum, and restarted postgres, then restarted the
> application.
>
> This failure seems rather different from the one I previously posted,
> since it was blocking on the application table, rather than
> pg_constraint_contypid_index, and it did not wake up when all
> other processes where stopped.
>
> As before, both boxes are running 8.1beta2. Windows is not showing
> these problems with the autovacuum blocking; just Linux.
>
> -Kevin
>
>
> >>> Tom Lane <tgl@sss.pgh.pa.us> 10/02/05 8:53 PM >>>

> "Jeff Kirby" <Jeff.Kirby@wicourts.gov> writes:
> > the Linux box however is still chugging away this morning... and
> > appears to be stuck on vacuuming "pg_constraint_contypid_index". How
> > do I know... well I don't really... I'm inferring based on the order
> > of the log output on the Windows box.

>
> Looking in pg_locks would give you a more reliable indicator of what the
> VACUUM is currently working on.
>
> Is the Linux box otherwise idle? There was another report recently of a
> vacuum hung up for a long time on pg_constraint_contypid_index, but it
> seemed to be due to extremely heavy usage of domain types ...
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
>


--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-10-2008, 06:19 AM
Alvaro Herrera
 
Posts: n/a
Default Re: Vacuum Full Analyze Stalled

On Mon, Oct 03, 2005 at 02:41:14PM -0500, Jim C. Nasby wrote:
> I haven't looked in detail at autovacuum in 8.1 yet, but I know that in
> previous versions it was a bad idea to depend on it to vacuum a small
> table that has a lot of update activity frequently enough. The issue is
> that since it's single-threaded if it starts a vacuum on a large table
> it could be hours before it gets around to vacuuming the small table
> again.


Does it really make a lot of difference? While the big vacuum is
running, old tuples accumulate on the small table, because it shows as a
running transaction and the small vacuum won't delete them. The new
autovacuum is no different than the old one in this regard. (There's a
patch by Hannu Krossing IIRC that, if included in 8.2, will make things
better.)

However, one thing that is different from old autovacuum is that you can
set better thresholds for small tables. You don't have to wait until a
thousand tuples have accumulated -- you can change the threshold
inserting a tuple in pg_autovacuum. In the 72-tuples case, I'd try
setting the vacuum threshold to 100 and see if it improves matters.


However, I'm looking at the autovacuum code to see why it's sitting
holding locks on the small table and not vacuuming it. I see on the
pg_locks output that process 3158 (autovacuum) has got locks on the
table and index, but it apparently isn't vacuuming the table. If this
is correct, it's a bug. However I can't seem to find out why this
happens.

Kevin, Jeff, next time this happens please attach gdb to the autovacuum
process and get a stack trace ("bt" to gdb), if at all possible, and/or
strace it to see what it's doing.

--
Alvaro Herrera http://www.PlanetPostgreSQL.org
"I personally became interested in Linux while I was dating an English major
who wouldn't know an operating system if it walked up and bit him."
(Val Henson)

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-10-2008, 06:19 AM
Tom Lane
 
Posts: n/a
Default Re: Vacuum Full Analyze Stalled

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> However, I'm looking at the autovacuum code to see why it's sitting
> holding locks on the small table and not vacuuming it. I see on the
> pg_locks output that process 3158 (autovacuum) has got locks on the
> table and index, but it apparently isn't vacuuming the table. If this
> is correct, it's a bug. However I can't seem to find out why this
> happens.


We can see clearly from the pg_locks output that VACUUM isn't waiting
for an lmgr lock, so the problem must be at a lower level. The
hypothesis I'm thinking about is that VACUUM is trying to do
LockBufferForCleanup() and for some reason it never finishes. There are
a number of possible scenarios that could explain this: leaked buffer
pin, dropped signal, etc.

> Kevin, Jeff, next time this happens please attach gdb to the autovacuum
> process and get a stack trace ("bt" to gdb), if at all possible, and/or
> strace it to see what it's doing.


Please!

Also, we need to keep a little clarity about what we are dealing with.
This thread has mentioned hangups in both plain vacuum (autovacuum) and
VACUUM FULL. It seems very likely to me that there are different
mechanisms involved --- since VACUUM FULL takes an exclusive lock on the
whole table, that eliminates an entire class of possible explanations
for the plain-VACUUM case, while introducing a whole new set of
explanations having to do with the VACUUM being queued up behind
ordinary table locks. Please be perfectly clear about which scenario
each report is about.

Finally, I'm wondering whether this bug is new in 8.1 or is
pre-existing. Has this same application been running successfully
in 8.0?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-10-2008, 06:19 AM
Simon Riggs
 
Posts: n/a
Default Re: Vacuum Full Analyze Stalled

On Mon, 2005-10-03 at 16:48 -0400, Tom Lane wrote:
> Finally, I'm wondering whether this bug is new in 8.1 or is
> pre-existing. Has this same application been running successfully
> in 8.0?


My money would be on a pre-existing bug. I recall two cases of
unexplained VACUUM hang in the last year.

Best Regards, Simon Riggs


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-10-2008, 06:19 AM
Jim C. Nasby
 
Posts: n/a
Default Re: Vacuum Full Analyze Stalled

On Mon, Oct 03, 2005 at 04:12:25PM -0400, Alvaro Herrera wrote:
> On Mon, Oct 03, 2005 at 02:41:14PM -0500, Jim C. Nasby wrote:
> > I haven't looked in detail at autovacuum in 8.1 yet, but I know that in
> > previous versions it was a bad idea to depend on it to vacuum a small
> > table that has a lot of update activity frequently enough. The issue is
> > that since it's single-threaded if it starts a vacuum on a large table
> > it could be hours before it gets around to vacuuming the small table
> > again.

>
> Does it really make a lot of difference? While the big vacuum is
> running, old tuples accumulate on the small table, because it shows as a
> running transaction and the small vacuum won't delete them. The new
> autovacuum is no different than the old one in this regard. (There's a
> patch by Hannu Krossing IIRC that, if included in 8.2, will make things
> better.)


Hrm, http://archives.postgresql.org/pgsql...7/msg00086.php
indicates that multiple transactions are used during vacuum (see item
1). If that's the case, it is still better to run a stand-alone vacuum
if there are tables being autovacuumed that are large and have indexes.

In any case, Tom's got some serious objections to that patch, so it
might never make it in. I'd like to see an item on the TODO so this
doesn't slip through the cracks. It's also possible that a simpler,
cleaner version of this would be to have vacuum do it's work in batches
rather than messing around with visibility code (which I agree is
dangerous).
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 08:27 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com