vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi, I'm currently doctoring a situation wherein we've got table inheritance scheme that over the years that has ballooned like only in your nightmares (think well over 100K tables + indexes on those). The obvious solution is to re-design the schema with a better partitioning scheme in mind (see another msg from me later today on that) but that's a big project that's just getting underway and an immediate concern is the I/O on out data partition due in large part to the stats file(s) getting hammered. We can verify this by looking at our write volume 45+ Mbits/s and watching it drop to well below 10 on average when we disable stat_row_level as well as watching the insane amounts of writes to pgstat.tmp when running the rwsnoop dtrace script. So, for the interim we're looking to move where the stats files are written to. I've made the changes to the file paths for pgstat.stat and pgstat.tmp in src/backend/postmaster/pgstat.c, recompiled and verified that everything seems to be working ok on our test machine. However, seeing as how I'm not all that familiar with the code base, I'm asking here: is that all I need to do? Is there anything I've missed? Erik Jones Software Developer | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| Erik Jones <erik@myemma.com> writes: > Hi, I'm currently doctoring a situation wherein we've got table > inheritance scheme that over the years that has ballooned like only > in your nightmares (think well over 100K tables + indexes on those). > The obvious solution is to re-design the schema with a better > partitioning scheme in mind (see another msg from me later today on > that) but that's a big project that's just getting underway and an > immediate concern is the I/O on out data partition due in large part > to the stats file(s) getting hammered. Which PG version? Early 8.2.x releases had a nasty bug that caused excessive stats file writes. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| On Dec 3, 2007, at 4:16 PM, Tom Lane wrote: > Erik Jones <erik@myemma.com> writes: >> Hi, I'm currently doctoring a situation wherein we've got table >> inheritance scheme that over the years that has ballooned like only >> in your nightmares (think well over 100K tables + indexes on those). >> The obvious solution is to re-design the schema with a better >> partitioning scheme in mind (see another msg from me later today on >> that) but that's a big project that's just getting underway and an >> immediate concern is the I/O on out data partition due in large part >> to the stats file(s) getting hammered. > > Which PG version? Early 8.2.x releases had a nasty bug that caused > excessive stats file writes. 8.2.5 on Solaris 10. Before we upgraded to 8.2.4 it was doing about 65 Mbs/sec. Interestingly, a while back we were running with the data directory mounted with forcedirectio and saw none of this, I'm guessing that fsync calls would have something to do with that? Erik Jones Software Developer | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| Erik Jones <erik@myemma.com> writes: > 8.2.5 on Solaris 10. Before we upgraded to 8.2.4 it was doing about > 65 Mbs/sec. Interestingly, a while back we were running with the > data directory mounted with forcedirectio and saw none of this, I'm > guessing that fsync calls would have something to do with that? Hmm ... no, because the stats file never gets fsync'd. I should think that forcedirectio would have made things worse. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| On Dec 3, 2007, at 6:10 PM, Tom Lane wrote: > Erik Jones <erik@myemma.com> writes: >> 8.2.5 on Solaris 10. Before we upgraded to 8.2.4 it was doing about >> 65 Mbs/sec. Interestingly, a while back we were running with the >> data directory mounted with forcedirectio and saw none of this, I'm >> guessing that fsync calls would have something to do with that? > > Hmm ... no, because the stats file never gets fsync'd. I should think > that forcedirectio would have made things worse. Interesting. If this is anything you'd like to look into I can provide whatever diagnostic output you need (iostat, vmstat, dtrace script outputs, etc...) but I do have to reiterate that we are an extreme corner case due to out schema size. For now, is renaming the #define'd paths for the stats file and temp file sufficient for moving them? Basically, we'd like to move them onto a RAM disk to give our disks a break. Erik Jones Software Developer | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Erik Jones <erik@myemma.com> writes: > For now, is renaming the > #define'd paths for the stats file and temp file sufficient for > moving them? I would think so, but haven't tried it. There definitely shouldn't be anything outside pgstat.c that's touching them. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| On Monday 03 December 2007 20:22, Erik Jones wrote: > On Dec 3, 2007, at 6:10 PM, Tom Lane wrote: > > Erik Jones <erik@myemma.com> writes: > >> 8.2.5 on Solaris 10. Before we upgraded to 8.2.4 it was doing about > >> 65 Mbs/sec. Interestingly, a while back we were running with the > >> data directory mounted with forcedirectio and saw none of this, I'm > >> guessing that fsync calls would have something to do with that? > > > > Hmm ... no, because the stats file never gets fsync'd. I should think > > that forcedirectio would have made things worse. > > Interesting. If this is anything you'd like to look into I can > provide whatever diagnostic output you need (iostat, vmstat, dtrace > script outputs, etc...) but I do have to reiterate that we are an > extreme corner case due to out schema size. For now, is renaming the > #define'd paths for the stats file and temp file sufficient for > moving them? Basically, we'd like to move them onto a RAM disk to > give our disks a break. > Yeah, we've noticed the same problem (pgstat is the most active file on the system... uncovered in much the same way... go solaris). Actually I was wondering if it could be done with symlinks, a la moving xlogs. Since we do custom builds, that's not a real issue, but I was curious. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Robert Treat wrote: > On Monday 03 December 2007 20:22, Erik Jones wrote: > > Interesting. If this is anything you'd like to look into I can > > provide whatever diagnostic output you need (iostat, vmstat, dtrace > > script outputs, etc...) but I do have to reiterate that we are an > > extreme corner case due to out schema size. For now, is renaming the > > #define'd paths for the stats file and temp file sufficient for > > moving them? Basically, we'd like to move them onto a RAM disk to > > give our disks a break. > > Yeah, we've noticed the same problem (pgstat is the most active file on the > system... uncovered in much the same way... go solaris). Actually I was > wondering if it could be done with symlinks, a la moving xlogs. Not really, because a new file is created and renamed in place each time it's going to be rewritten. So the symlink would be lost in the first file rewrite. The first idea that comes to mind is to make the path configurable via GUC, so the user could set it to be written to an in-memory filesystem (/tmp in Solaris?). But then I thought, why do we need it to be a file at all? Why not use a mmap'ed memory area or something like that, and only write it to a file on postmaster shutdown? (Losing the file on unclean shutdown is not a problem, because the file is removed anyway.) -- Alvaro Herrera Valdivia, Chile ICBM: S 39º 49' 18.1", W 73º 13' 56.4" "Prefiero omelette con amigos que caviar con tontos" (Alain Nonnet) ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| On Wednesday 05 December 2007 07:22, Alvaro Herrera wrote: > Robert Treat wrote: > > On Monday 03 December 2007 20:22, Erik Jones wrote: > > > Interesting. If this is anything you'd like to look into I can > > > provide whatever diagnostic output you need (iostat, vmstat, dtrace > > > script outputs, etc...) but I do have to reiterate that we are an > > > extreme corner case due to out schema size. For now, is renaming the > > > #define'd paths for the stats file and temp file sufficient for > > > moving them? Basically, we'd like to move them onto a RAM disk to > > > give our disks a break. > > > > Yeah, we've noticed the same problem (pgstat is the most active file on > > the system... uncovered in much the same way... go solaris). Actually I > > was wondering if it could be done with symlinks, a la moving xlogs. > > Not really, because a new file is created and renamed in place each time > it's going to be rewritten. So the symlink would be lost in the first > file rewrite. > Ah yeah, thats what I concluded back then. > The first idea that comes to mind is to make the path configurable via > GUC, so the user could set it to be written to an in-memory filesystem > (/tmp in Solaris?). Yep, thought of that to, though it was after feature freeze so I didn't propose it. Course if someone wants to sneak that in it would be cool :-) > But then I thought, why do we need it to be a file > at all? Why not use a mmap'ed memory area or something like that, and > only write it to a file on postmaster shutdown? (Losing the file on > unclean shutdown is not a problem, because the file is removed anyway.) I suppose you need some facility to spill to disk, so maybe being in a file is better? Seems it might not be in most cases... I wonder how big a memory space we (or Erik) need. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| ||||
| Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > But then I thought, why do we need it to be a file > at all? Why not use a mmap'ed memory area or something like that, and > only write it to a file on postmaster shutdown? Yeah, we definitely need some other technology for this. The difficulty is in dealing with a highly variably sized chunk of data --- our existing shmem approach won't work well, and once you get away from that the old portability question raises its head. There's also a synchronization issue: how can the stats collector make updates appear atomic? mmap by itself doesn't solve that AFAIK. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |