Unix Technical Forum

Maintenance question / DB size anomaly...

This is a discussion on Maintenance question / DB size anomaly... within the Pgsql Performance forums, part of the PostgreSQL category; --> Drat! I'm wrong again. I thought for sure there wouldn't be a wraparound problem. So does this affect the ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #21 (permalink)  
Old 04-19-2008, 10:04 AM
Kurt Overberg
 
Posts: n/a
Default Re: Maintenance question / DB size anomaly...

Drat! I'm wrong again. I thought for sure there wouldn't be a
wraparound problem.
So does this affect the entire database server, or just this table?
Is best way to
proceed to immediately ditch this db and promote one of my slaves to
a master? I'm just
concerned about the data integrity. Note that I don't use OID for
anything really, so I'm
hoping I'll be safe.

Thanks again, Tom.

/kurt


pg_controldata output:

-bash-3.00$ pg_controldata
pg_control version number: 74
Catalog version number: 200411041
Database system identifier: 4903924957417782767
Database cluster state: in production
pg_control last modified: Wed 20 Jun 2007 03:19:52 PM CDT
Current log file ID: 952
Next log file segment: 154
Latest checkpoint location: 3B8/920F0D78
Prior checkpoint location: 3B8/8328E4A4
Latest checkpoint's REDO location: 3B8/9200BBF0
Latest checkpoint's UNDO location: 0/0
Latest checkpoint's TimeLineID: 1
Latest checkpoint's NextXID: 1490547335
Latest checkpoint's NextOID: 3714961319
Time of latest checkpoint: Wed 20 Jun 2007 03:17:50 PM CDT
Database block size: 8192
Blocks per segment of large relation: 131072
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum number of function arguments: 32
Date/time type storage: floating-point numbers
Maximum length of locale name: 128
LC_COLLATE: en_US.UTF-8
LC_CTYPE: en_US.UTF-8
-bash-3.00$ echo $PGDATA


Here's the list from pg_clog for June:

-rw------- 1 postgres postgres 262144 Jun 1 03:36 054D
-rw------- 1 postgres postgres 262144 Jun 1 08:16 054E
-rw------- 1 postgres postgres 262144 Jun 1 10:24 054F
-rw------- 1 postgres postgres 262144 Jun 1 17:03 0550
-rw------- 1 postgres postgres 262144 Jun 2 03:32 0551
-rw------- 1 postgres postgres 262144 Jun 2 10:04 0552
-rw------- 1 postgres postgres 262144 Jun 2 19:24 0553
-rw------- 1 postgres postgres 262144 Jun 3 03:38 0554
-rw------- 1 postgres postgres 262144 Jun 3 13:19 0555
-rw------- 1 postgres postgres 262144 Jun 4 00:02 0556
-rw------- 1 postgres postgres 262144 Jun 4 07:12 0557
-rw------- 1 postgres postgres 262144 Jun 4 12:37 0558
-rw------- 1 postgres postgres 262144 Jun 4 19:46 0559
-rw------- 1 postgres postgres 262144 Jun 5 03:36 055A
-rw------- 1 postgres postgres 262144 Jun 5 10:54 055B
-rw------- 1 postgres postgres 262144 Jun 5 18:11 055C
-rw------- 1 postgres postgres 262144 Jun 6 03:38 055D
-rw------- 1 postgres postgres 262144 Jun 6 10:15 055E
-rw------- 1 postgres postgres 262144 Jun 6 15:10 055F
-rw------- 1 postgres postgres 262144 Jun 6 23:21 0560
-rw------- 1 postgres postgres 262144 Jun 7 07:15 0561
-rw------- 1 postgres postgres 262144 Jun 7 13:43 0562
-rw------- 1 postgres postgres 262144 Jun 7 22:53 0563
-rw------- 1 postgres postgres 262144 Jun 8 07:12 0564
-rw------- 1 postgres postgres 262144 Jun 8 14:42 0565
-rw------- 1 postgres postgres 262144 Jun 9 01:30 0566
-rw------- 1 postgres postgres 262144 Jun 9 09:19 0567
-rw------- 1 postgres postgres 262144 Jun 9 20:19 0568
-rw------- 1 postgres postgres 262144 Jun 10 03:39 0569
-rw------- 1 postgres postgres 262144 Jun 10 15:38 056A
-rw------- 1 postgres postgres 262144 Jun 11 03:34 056B
-rw------- 1 postgres postgres 262144 Jun 11 09:14 056C
-rw------- 1 postgres postgres 262144 Jun 11 13:59 056D
-rw------- 1 postgres postgres 262144 Jun 11 19:41 056E
-rw------- 1 postgres postgres 262144 Jun 12 03:37 056F
-rw------- 1 postgres postgres 262144 Jun 12 09:59 0570
-rw------- 1 postgres postgres 262144 Jun 12 17:23 0571
-rw------- 1 postgres postgres 262144 Jun 13 03:32 0572
-rw------- 1 postgres postgres 262144 Jun 13 09:16 0573
-rw------- 1 postgres postgres 262144 Jun 13 16:25 0574
-rw------- 1 postgres postgres 262144 Jun 14 01:28 0575
-rw------- 1 postgres postgres 262144 Jun 14 08:40 0576
-rw------- 1 postgres postgres 262144 Jun 14 15:07 0577
-rw------- 1 postgres postgres 262144 Jun 14 22:00 0578
-rw------- 1 postgres postgres 262144 Jun 15 03:36 0579
-rw------- 1 postgres postgres 262144 Jun 15 12:21 057A
-rw------- 1 postgres postgres 262144 Jun 15 18:10 057B
-rw------- 1 postgres postgres 262144 Jun 16 03:32 057C
-rw------- 1 postgres postgres 262144 Jun 16 09:17 057D
-rw------- 1 postgres postgres 262144 Jun 16 19:32 057E
-rw------- 1 postgres postgres 262144 Jun 17 03:39 057F
-rw------- 1 postgres postgres 262144 Jun 17 13:26 0580
-rw------- 1 postgres postgres 262144 Jun 17 23:11 0581
-rw------- 1 postgres postgres 262144 Jun 18 04:40 0582
-rw------- 1 postgres postgres 262144 Jun 18 12:23 0583
-rw------- 1 postgres postgres 262144 Jun 18 17:22 0584
-rw------- 1 postgres postgres 262144 Jun 18 19:40 0585
-rw------- 1 postgres postgres 262144 Jun 19 03:38 0586
-rw------- 1 postgres postgres 262144 Jun 19 09:30 0587
-rw------- 1 postgres postgres 262144 Jun 19 10:23 0588
-rw------- 1 postgres postgres 262144 Jun 19 16:10 0589
-rw------- 1 postgres postgres 262144 Jun 19 21:45 058A
-rw------- 1 postgres postgres 262144 Jun 20 03:38 058B
-rw------- 1 postgres postgres 262144 Jun 20 12:17 058C
-rw------- 1 postgres postgres 131072 Jun 20 15:13 058D


On Jun 20, 2007, at 2:37 PM, Tom Lane wrote:
>
> so I have to conclude that you've got a wraparound problem. What
> is the
> current XID counter? (pg_controldata will give you that, along with a
> lot of other junk.) It might also be interesting to take a look at
> "ls -l $PGDATA/pg_clog"; the mod times on the files in there would
> give
> us an idea how fast XIDs are being consumed.
>
> regards, tom lane



---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #22 (permalink)  
Old 04-19-2008, 10:04 AM
Tom Lane
 
Posts: n/a
Default Re: Maintenance question / DB size anomaly...

Kurt Overberg <kurt@hotdogrecords.com> writes:
> Drat! I'm wrong again. I thought for sure there wouldn't be a
> wraparound problem.


Well, I'm not sure what it is now. You showed some invisible tuples
with XMINs of
XMIN: 1489323584 CMIN: 1 XMAX: 0 CMAX|XVAC: 0
XMIN: 1489323590 CMIN: 2 XMAX: 0 CMAX|XVAC: 0
XMIN: 1489323592 CMIN: 1 XMAX: 0 CMAX|XVAC: 0
but the nextXID is
1490547335
which is not that far ahead of those --- about 1.2 million transactions,
or less than a day's time according to the clog timestamps, which
suggest that you're burning several million XIDs a day. Perhaps you've
wrapped past them since your earlier check --- if you try the same
"select where ctid = " queries now, do they show rows?

The other thing that's strange here is that an 8.0 installation should
be pretty aggressive about recycling pg_clog segments, and yet you've
got a bunch there. How far back do the files in pg_clog go --- what's
the numeric range of the filenames, and the date range of their mod
times? Have you checked the postmaster log to see if you're getting any
complaints about checkpoint failures or anything like that? It would
also be useful to look at the output of
select datname, age(datfrozenxid) from pg_database;

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #23 (permalink)  
Old 04-19-2008, 10:04 AM
Kurt Overberg
 
Posts: n/a
Default Re: Maintenance question / DB size anomaly...

Okay,


select * from _my_cluster.sl_log_1 where ctid = '(1,1)';
select * from _my_cluster.sl_log_1 where ctid = '(1,2)';
select * from _my_cluster.sl_log_1 where ctid = '(1,3)';
select * from _my_cluster.sl_log_1 where ctid = '(1,4)';

all returns zero rows. When I do a dump of that file, I get:

Block 1 ************************************************** ******
<Header> -----
Block Offset: 0x00002000 Offsets: Lower 408 (0x0198)
Block: Size 8192 Version 2 Upper 7680 (0x1e00)
LSN: logid 955 recoff 0x0daed68c Special 8192 (0x2000)
Items: 97 Free Space: 7272
Length (including item array): 412

<Data> ------
Item 1 -- Length: 121 Offset: 8068 (0x1f84) Flags: USED
XMIN: 1491480520 CMIN: 1 XMAX: 0 CMAX|XVAC: 0
Block Id: 1 linp Index: 1 Attributes: 6 Size: 32
infomask: 0x0912 (HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID)

....the fact that they weren't in the table, but in the file (I did
the filedump first,
then the query), then redid the filedump, the results are the same,
the rows are still
in the file. I have no idea how frequently these files are getting
written to, I assume
frequently. I also looked at the last block listed in the file,
6445, and also looked for
items 1-4, and also did not find them in the table using a similar
select as above. That seems
kinda strange, since there's right this second 11,000 items in that
table, but I'll roll with it for awhile.

Intrigued, I wanted to see what a filedump looked like of a row that
WAS in the table:

ctid | log_origin | log_xid | log_tableid | log_actionseq |
log_cmdtype |
(7,1) | 10 | 1491481037 | 8 | 473490934 |
I | (memberid,answerid,taskinstanceid) values
('144854','148707','0')



Block 7 ************************************************** ******
<Header> -----
Block Offset: 0x0000e000 Offsets: Lower 424 (0x01a8)
Block: Size 8192 Version 2 Upper 508 (0x01fc)
LSN: logid 955 recoff 0x0dc4bcc0 Special 8192 (0x2000)
Items: 101 Free Space: 84
Length (including item array): 428

<Data> ------
Item 1 -- Length: 129 Offset: 8060 (0x1f7c) Flags: USED
XMIN: 1491481037 CMIN: 7 XMAX: 0 CMAX|XVAC: 0
Block Id: 7 linp Index: 1 Attributes: 6 Size: 32
infomask: 0x0912 (HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID)


....the NextID was (taken about 5 mins after the previous filedump):
Latest checkpoint's NextXID: 1491498183


I don't see any real differences in the file entry for a row that is
in the table, and one that I
don't see in the table. I hope I'm getting this right, its totally
fascinating seeing how
all this works.

About your other questions:

1. I have pg_clog segments all the way back to the start of the
database, all the way back
to March 14th, 2006 (most likely when the database was first brought
up on this machine).
The numeric names start at 0000 and go to 058E. I checked the recent
(within last 8 days)
and saw no errors containing the word 'checkpoint'. In fact, very
few errors at all.
The dang thing seems to be running pretty well, just a little slow.

mydb=# select datname, age(datfrozenxid) from pg_database;
datname | age
-----------+------------
template1 | 1491520270
template0 | 1491520270
postgres | 1491520270
mydb | 1076194904


Oooooooo..... thats not good, is it? Thanks for taking an interest,
Tom. I'm most likely going to
promote one of my subscribers to be master, then nuke this database,
but I have no problems keeping it
around if you think I may have found some obscure bug that could help
someone debug. Again, this
DB gets vacuumed every day, and in the beginning, I think I remember
doing a vacuum full every
day.

Thanks,

/kurt


On Jun 20, 2007, at 5:08 PM, Tom Lane wrote:

> Kurt Overberg <kurt@hotdogrecords.com> writes:
>> Drat! I'm wrong again. I thought for sure there wouldn't be a
>> wraparound problem.

>
> Well, I'm not sure what it is now. You showed some invisible tuples
> with XMINs of
> XMIN: 1489323584 CMIN: 1 XMAX: 0 CMAX|XVAC: 0
> XMIN: 1489323590 CMIN: 2 XMAX: 0 CMAX|XVAC: 0
> XMIN: 1489323592 CMIN: 1 XMAX: 0 CMAX|XVAC: 0
> but the nextXID is
> 1490547335
> which is not that far ahead of those --- about 1.2 million
> transactions,
> or less than a day's time according to the clog timestamps, which
> suggest that you're burning several million XIDs a day. Perhaps
> you've
> wrapped past them since your earlier check --- if you try the same
> "select where ctid = " queries now, do they show rows?
>
> The other thing that's strange here is that an 8.0 installation should
> be pretty aggressive about recycling pg_clog segments, and yet you've
> got a bunch there. How far back do the files in pg_clog go --- what's
> the numeric range of the filenames, and the date range of their mod
> times? Have you checked the postmaster log to see if you're
> getting any
> complaints about checkpoint failures or anything like that? It would
> also be useful to look at the output of
> select datname, age(datfrozenxid) from pg_database;
>
> regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq



---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 07:13 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com