vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I wrote: > It's usually going to be the case that the oldest datvacuumxid is > template0's, meaning that it will never be possible to truncate clog > until autovacuum decides that template0 is at risk of wraparound and > goes and vacuums it. Shortening the freeze horizon will reduce the size > that pg_clog occupies just *after* that happens, but we're still going > to see pg_clog bloating up to something close to 256MB before autovacuum > kicks in. After further thought I see that there are actually two parameters involved in this process: 1. the critical age (currentXID - relfrozenxid) beyond which autovacuum will force a vacuum of a particular table to forestall XID wraparound. (Note: as the 8.2 code stands, the system will launch autovacuums even when autovac is nominally disabled in order to fix tables that have exceeded a hard-wired critical age.) 2. the freeze distance vacuum (whether auto or normal) uses to determine the new cutoff point, ie, the new relfrozenxid for the table. We can make a few observations: * For a table that otherwise goes unvacuumed, the interval between forced anti-wraparound vacuums will be critical_age - freeze_distance. Therefore, for large static tables there is value in being able to adjust this difference to be as large as possible. * The size of pg_clog is determined by the system-wide maximum of critical_age + number-of-transactions-needed-to-finish-vacuuming. Therefore, critical_age is the knob we must expose if we want to provide user control of pg_clog growth. * It might seem that there's no point in per-table adjustment of critical_age, since only the system-wide maximum means anything for resource consumption. I'm not so sure though --- for a really large table, the time needed to finish vacuuming it could be significant, meaning it would need a lower critical age than other tables. With the current one-process-at-a-time autovac infrastructure, this probably isn't very important, but we've been talking about allowing multiple parallel autovacuums specifically to deal with the problem of some tables being much larger than others. So it seems to me that an argument can be made for creating two new GUC variables and adding two columns to pg_autovacuum: vacuum_freeze_distance: number of transactions back from current that a VACUUM will use as the freeze cutoff point, ie, XIDs older than that will be replaced by FrozenXID, and the cutoff point will become the table's new relfrozenxid value. Valid range zero to perhaps 1 billion. VACUUM FREEZE is a shorthand for doing a vacuum with vacuum_freeze_distance = 0. autovacuum_freeze_limit: maximum age of relfrozenxid before autovacuum will force a vacuum for anti-wraparound purposes. Valid range perhaps 100 million to (2 billion - 100 million). pg_autovacuum.freeze_distance: per-table vacuum_freeze_distance setting for autovacuum to use. pg_autovacuum.freeze_limit: per-table autovacuum_freeze_limit for autovacuum to use. (I'm not wedded to these names, anyone have better ideas?) I'd propose default values of 200 million for autovacuum_freeze_limit and half that for vacuum_freeze_distance, resulting in a maximum pg_clog size of 50MB and forced autovacs about every 100 million transactions. One minor point is that while the values of these variables have to have sane relationships to each other, the GUC infrastructure doesn't really allow us to enforce such a constraint directly (the behavior would be too dependent on which variable got set first). I'd suggest making vacuum just silently limit the effective freeze_distance to not more than half of the system's autovacuum_freeze_limit, rather than trying to enforce any relationship within GUC. This is kind of a lot to be inventing in late beta, but if we want to have a really credible solution to the WAL-versus-freezing problem I think we need to do all of this. Comments? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| On Sat, 2006-11-04 at 12:35 -0500, Tom Lane wrote: > I wrote: > > It's usually going to be the case that the oldest datvacuumxid is > > template0's, meaning that it will never be possible to truncate clog > > until autovacuum decides that template0 is at risk of wraparound and > > goes and vacuums it. Shortening the freeze horizon will reduce the size > > that pg_clog occupies just *after* that happens, but we're still going > > to see pg_clog bloating up to something close to 256MB before autovacuum > > kicks in. > > After further thought I see that there are actually two parameters > involved in this process: > > 1. the critical age (currentXID - relfrozenxid) beyond which autovacuum > will force a vacuum of a particular table to forestall XID wraparound. > (Note: as the 8.2 code stands, the system will launch autovacuums even > when autovac is nominally disabled in order to fix tables that have > exceeded a hard-wired critical age.) > > 2. the freeze distance vacuum (whether auto or normal) uses to determine > the new cutoff point, ie, the new relfrozenxid for the table. I guess I don't fully understand the way you've described this, so I'd like to put my own understanding to see if they both agree. Getting that right is the key to understanding the proposal more fully (for me) - I get the bit about 2 parameters... If we take the current Xid as zero, we can go back in time to various Xids using a timeline. That timeline can be divided into various Eras, as with Geologic time (Jurassic, Triassic, Ice Ages etc). With PostgreSQL, we used to have 5 eras: 1. In Progress Transactions 0 - OldestXmin <---- limit is OldestXmin 2. Completed, normal Xids, status in clog <---- limit is oldest Xid in clog (no specific name in code) 3. Completed, normal Xids, status marked on tuple <---- limit is Freeze distance? 4. Frozen Xids, tuples frozen early to ensure nothing passes Wrap <---- critical age? - practical last point to forestall Wrap <---- limit is Wraparound 5. Frozen Xids (PreHistory) now we have agreed to have only 4 eras, IIRC: 1. In Progress Transactions 0 - OldestXmin <---- limit is OldestXmin 2. Completed, normal Xids, status in clog <---- limit is Freeze distance? 3. Frozen Xids, tuples frozen early to ensure nothing passes Wrap <---- critical age? - practical last point to forestall Wrap <---- limit is Wraparound 4. Frozen Xids (PreHistory) Perhaps you could edit the above if needed? I'm not making a separate proposal, just trying to get it very clear. If we can explain this simply now, then we stand a chance of other people understanding it as well and setting these parameters correctly. Otherwise we'll be doing the explanation hundreds of times on list/IRC. AFICS freeze_limit and freeze_distance are both expressed in number of Xids before current, so the "units" are the same for both. In that case, maybe slightly more differentiated names would be appropriate. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Tom Lane wrote: > * It might seem that there's no point in per-table adjustment of > critical_age, since only the system-wide maximum means anything for > resource consumption. I'm not so sure though --- for a really large > table, the time needed to finish vacuuming it could be significant, > meaning it would need a lower critical age than other tables. With the > current one-process-at-a-time autovac infrastructure, this probably > isn't very important, but we've been talking about allowing multiple > parallel autovacuums specifically to deal with the problem of some > tables being much larger than others. I think a global critical_age parameter is just fine. If you have one huge table that takes a long time to vacuum, just adjust critical_age so that there's enough time for the huge table vacuum to finish before wrap-around. That means that other smaller tables are vacuumed more frequently than would otherwise be necessary, but that's not a big deal if the other tables really are much smaller. > pg_autovacuum.freeze_distance: per-table vacuum_freeze_distance setting > for autovacuum to use. Shouldn't this be used for manual vacuums as well? > I'd propose default values of 200 million for autovacuum_freeze_limit > and half that for vacuum_freeze_distance, resulting in a maximum pg_clog > size of 50MB and forced autovacs about every 100 million transactions. Sounds fine to me. > One minor point is that while the values of these variables have to have > sane relationships to each other, the GUC infrastructure doesn't really > allow us to enforce such a constraint directly (the behavior would be > too dependent on which variable got set first). I'd suggest making > vacuum just silently limit the effective freeze_distance to not more > than half of the system's autovacuum_freeze_limit, rather than trying > to enforce any relationship within GUC. Makes sense. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| "Simon Riggs" <simon@2ndquadrant.com> writes: > now we have agreed to have only 4 eras, IIRC: > 1. In Progress Transactions 0 - OldestXmin > <---- limit is OldestXmin > 2. Completed, normal Xids, status in clog > <---- limit is Freeze distance? > 3. Frozen Xids, tuples frozen early to ensure nothing passes Wrap > <---- critical age? - practical last point to forestall Wrap > <---- limit is Wraparound > 4. Frozen Xids (PreHistory) You've got the eras right but not the boundaries. The limit on era 2 is the current system-wide minimum relfrozenxid (or equivalently, the current system-wide minimum datfrozenxid), which is where we have truncated clog. That will normally be a little more than autovacuum_freeze_limit --- the difference corresponding to the "reaction time" needed to fire up autovac and get through all the tables that have exceeded autovacuum_freeze_limit. The freeze_distance has to be considerably *less* than this, else we'll constantly be firing new autovac cycles each of which will freeze just a few more tuples. > Perhaps you could edit the above if needed? 1. In Progress Transactions 0 - OldestXmin <---- limit is OldestXmin 2. Completed, normal Xids, status in clog <---- limit is length of clog (a bit more than freeze_limit) <---- Xids older than freeze_distance will be frozen whenever next visited by VACUUM, but there is no forcing function until they exceed freeze_limit 3. Frozen Xids, tuples frozen early to ensure nothing passes Wrap <---- limit is Wraparound 4. Frozen Xids (PreHistory) We also have that the time between forced autovacuums of unchanging tuples is approximately freeze_limit - freeze_distance. > AFICS freeze_limit and freeze_distance are both expressed in number of > Xids before current, so the "units" are the same for both. Check. Actually, as coded there's a bit of difference: freeze_distance is subtracted from OldestXmin whereas freeze_limit is subtracted from ReadNextTransactionId(). Normally OldestXmin should be so much smaller than these parameters that it won't matter, but vacuum.c does contain logic to do something reasonable if not. > In that case, > maybe slightly more differentiated names would be appropriate. Got a suggestion? I think the names *should* be clearly related, but as I said, I'm by no means wedded to these particular ones. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| "Heikki Linnakangas" <heikki@enterprisedb.com> writes: > I think a global critical_age parameter is just fine. If you have one > huge table that takes a long time to vacuum, just adjust critical_age so > that there's enough time for the huge table vacuum to finish before > wrap-around. That means that other smaller tables are vacuumed more > frequently than would otherwise be necessary, but that's not a big deal > if the other tables really are much smaller. Well, that's what you have to do right now, but as soon as we support multiple autovac processes it'll be useful to do the other; so I figured we might as well add the pg_autovacuum column while we're at it. >> pg_autovacuum.freeze_distance: per-table vacuum_freeze_distance setting >> for autovacuum to use. > Shouldn't this be used for manual vacuums as well? I thought about that but it seemed pretty dubious --- manual vacuums don't look at pg_autovacuum for anything else, eg not for the cost variables. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| I wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: >> Perhaps you could edit the above if needed? > <---- Xids older than freeze_distance will be frozen whenever > next visited by VACUUM, but there is no forcing function > until they exceed freeze_limit >> In that case, >> maybe slightly more differentiated names would be appropriate. > Got a suggestion? I think the names *should* be clearly related, but > as I said, I'm by no means wedded to these particular ones. After re-reading the above, it strikes me that maybe names based around "freeze_min" and "freeze_max" would be useful? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| On Sun, 2006-11-05 at 12:01 -0500, Tom Lane wrote: > After re-reading the above, it strikes me that maybe names based around > "freeze_min" and "freeze_max" would be useful? Works for me. They are clearly related, yet different and allow a straightforward explanation of their need and use. e.g. vacuum_freeze_min The latest TransactionId that will be "frozen" during a VACUUM is calculated by CurrentTransactionId - vacuum_freeze_min. vacuum_freeze_max The maximum age, calculated as distance from CurrentTransactionId, that will be allowed before a autovacuum will be forced for that database object. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| "Simon Riggs" <simon@2ndquadrant.com> writes: > On Sun, 2006-11-05 at 12:01 -0500, Tom Lane wrote: >> After re-reading the above, it strikes me that maybe names based around >> "freeze_min" and "freeze_max" would be useful? > Works for me. They are clearly related, yet different and allow a > straightforward explanation of their need and use. Sold, I'll rename them before committing the patch. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| "Simon Riggs" <simon@2ndquadrant.com> writes: > vacuum_freeze_min The latest TransactionId that will be "frozen" during > a VACUUM is calculated by CurrentTransactionId - vacuum_freeze_min. > > vacuum_freeze_max > The maximum age, calculated as distance from CurrentTransactionId, that > will be allowed before a autovacuum will be forced for that database > object. I think it's clearer if "min" and "max" are considered adjectives and always have a subject they modify. Otherwise it's unclear what they refer to. So "vacuum_freeze_min_age" and "vacuum_freeze_max_age" instead. That way it's unambiguous which is which. Ie, that it's minimum and maximum age and not minimum and maximum transaction id which would be the other way around. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| ||||
| On Sun, 2006-11-05 at 14:47 -0500, Gregory Stark wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: > > > vacuum_freeze_min The latest TransactionId that will be "frozen" during > > a VACUUM is calculated by CurrentTransactionId - vacuum_freeze_min. > > > > vacuum_freeze_max > > The maximum age, calculated as distance from CurrentTransactionId, that > > will be allowed before a autovacuum will be forced for that database > > object. > > I think it's clearer if "min" and "max" are considered adjectives and always > have a subject they modify. Otherwise it's unclear what they refer to. > > So "vacuum_freeze_min_age" and "vacuum_freeze_max_age" instead. > > That way it's unambiguous which is which. Ie, that it's minimum and maximum > age and not minimum and maximum transaction id which would be the other way > around. Sounds logical. Hadn't seen that you could take it both ways. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |