This is a discussion on Re: Remove xmin and cmin from frozen tuples within the pgsql Hackers forums, part of the PostgreSQL category; --> ITAGAKI Takahiro <itagaki.takahiro@lab.ntt.co.jp> writes: > I think it would be a waste to retain xmin and cmin for frozen ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| ITAGAKI Takahiro <itagaki.takahiro@lab.ntt.co.jp> writes: > I think it would be a waste to retain xmin and cmin for frozen tuples > because their values represent only 'visible for all transactions'. True, but the hard part is getting rid of the storage for them. > I wrote a makeshift patch to compress xmin and cmin (8bytes) to > 1-bit flag, using tuple overlaping. > Is this idea worth trying? I think this is incredibly ugly :-(. It eliminates a fairly basic assumption which is that items on a page don't overlap. The space savings cannot be worth the loss in testability and reliability. To take just one problem, it is no longer possible to check an item offset for validity against pd_upper. If we're going to do this, we need a more invasive patch that changes the structure of heaptuple headers in a more fundamental way, and avoids breaking the page layout representation. (Something like the way Oids are now handled might work, although there are alignment issues to worry about, and it'd take more work on VACUUM's part to convert a tuple to frozen state.) I'm also less than enthused about using up our last infomask bit for a relatively unimportant purpose. We might need that for something bigger someday... though I can't presently guess what. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| Tom Lane <tgl@sss.pgh.pa.us> wrote: > I think this is incredibly ugly :-(. Yes, I think so, too :-( My patch is product of the thought that I don't want to modify codes widely. So if we want to do it more cool way, lots of changes are needed as you said. > I'm also less than enthused about using up our last infomask bit for > a relatively unimportant purpose. We might need that for something > bigger someday... though I can't presently guess what. I think it is not a problem, because the header still has rooms for several bits. I assume that the combination of HEAP_XMIN_COMMITTED + HEAP_XMIN_INVALID has currently no meaning, right? If so, HEAP_FROZEN can be assigned here. Also, t_natts is currently 16-bits, but it can be cut to 11-bits because MaxTupleAttributeNumber is 1664 < 2^11. --- ITAGAKI Takahiro NTT Cyber Space Laboratories ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| Tom Lane wrote: > ITAGAKI Takahiro <itagaki.takahiro@lab.ntt.co.jp> writes: > > I think it would be a waste to retain xmin and cmin for frozen tuples > > because their values represent only 'visible for all transactions'. > > True, but the hard part is getting rid of the storage for them. > > > I wrote a makeshift patch to compress xmin and cmin (8bytes) to > > 1-bit flag, using tuple overlaping. > > Is this idea worth trying? > > I think this is incredibly ugly :-(. It eliminates a fairly basic > assumption which is that items on a page don't overlap. The space > savings cannot be worth the loss in testability and reliability. > To take just one problem, it is no longer possible to check an item > offset for validity against pd_upper. If we're going to do this, > we need a more invasive patch that changes the structure of heaptuple > headers in a more fundamental way, and avoids breaking the page layout > representation. (Something like the way Oids are now handled might > work, although there are alignment issues to worry about, and it'd > take more work on VACUUM's part to convert a tuple to frozen state.) > > I'm also less than enthused about using up our last infomask bit for > a relatively unimportant purpose. We might need that for something > bigger someday... though I can't presently guess what. Considering the cost/benefits, rather than doing some optimization for long-lived tuples, I would like to see us merge the existing xmin/xmax/cmin/cmax values back into three storage fields like we had in 7.4 and had to expand to a full four in 8.0 to support subtransactions. The benefit is that every row would be reduced in size by 4 bytes or 14% for all rows: * Merge xmin/xmax/cmin/cmax back into three header fields Before subtransactions, there used to be only three fields needed to store these four values. This was possible because only the current transaction looks at the cmin/cmax values. If the current transaction created and expired the row the fields stored where xmin (same as xmax), cmin, cmax, and if the transaction was expiring a row from a another transaction, the fields stored were xmin (cmin was not needed), xmax, and cmax. Such a system worked because a transaction could only see rows from another completed transaction. However, subtransactions can see rows from outer transactions, and once the subtransaction completes, the outer transaction continues, requiring the storage of all four fields. With subtransactions, an outer transaction can create a row, a subtransaction expire it, and when the subtransaction completes, the outer transaction still has to have proper visibility of the row's cmin, for example, for cursors. One possible solution is to create a phantom cid which represents a cmin/cmax pair and is stored in local memory. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Bruce Momjian <pgman@candle.pha.pa.us> writes: > Considering the cost/benefits, rather than doing some optimization for > long-lived tuples, I would like to see us merge the existing > xmin/xmax/cmin/cmax values back into three storage fields like we had in > 7.4 and had to expand to a full four in 8.0 to support subtransactions. There is another reason for trying to do that rather than the frozen-row optimization, which is that to get it down to two visibility-related fields, we'd have to find another representation for tuples that are Datums in memory. The current Datum representation overlays three int32 fields where the visibility fields are for a tuple on-disk. This works fine now, and would still work fine if we could revert to the 7.4 approach, but it doesn't play nicely with a scheme to remove 2 of the 4 fields. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| People: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Considering the cost/benefits, rather than doing some optimization for > > long-lived tuples, I would like to see us merge the existing > > xmin/xmax/cmin/cmax values back into three storage fields like we had > > in 7.4 and had to expand to a full four in 8.0 to support > > subtransactions. Hmmm. I personally don't see a whole lot of value in trimming 4 bytes per row off an archive table, particularly if the table would need to go through some kind of I/O intensive operation to do it. Where I do see value is in enabling index-only access for "frozen" tables. That would be a *huge* gain, especially with bitmaps. I think we've discussed this before,though. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| On Fri, Sep 02, 2005 at 01:35:42PM -0700, Josh Berkus wrote: > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > Considering the cost/benefits, rather than doing some optimization for > > > long-lived tuples, I would like to see us merge the existing > > > xmin/xmax/cmin/cmax values back into three storage fields like we had > > > in 7.4 and had to expand to a full four in 8.0 to support > > > subtransactions. > > Hmmm. I personally don't see a whole lot of value in trimming 4 bytes per > row off an archive table, particularly if the table would need to go > through some kind of I/O intensive operation to do it. I think you are missing something. These 4 bytes are not trimmed by an I/O-intensive operation, they are not written in the first place. Now, I agree for a very wide table those 4 bytes per tuple may not be a lot. But the optimization could be significant for not-wide (uh, sorry, I don't remember the word) tables. > Where I do see value is in enabling index-only access for "frozen" tables. > That would be a *huge* gain, especially with bitmaps. I think we've > discussed this before, though. That's a completely different discussion. Btree-organized heaps may help you there. -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com "Having your biases confirmed independently is how scientific progress is made, and hence made our great society what it is today" (Mary Gardiner) ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > On Fri, Sep 02, 2005 at 01:35:42PM -0700, Josh Berkus wrote: >> Where I do see value is in enabling index-only access for "frozen" tables. >> That would be a *huge* gain, especially with bitmaps. I think we've >> discussed this before, though. > That's a completely different discussion. Btree-organized heaps may > help you there. There was some talk of using a spare bit in index entries to mark "known good" index entries (xmin committed and less than GlobalXmin, and xmax invalid) but the cost of maintaining such bits seems nontrivial. In any case I agree that's an independent issue. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| On Fri, 2 Sep 2005 15:51:15 -0400 (EDT), Bruce Momjian <pgman@candle.pha.pa.us> wrote: > * Merge xmin/xmax/cmin/cmax back into three header fields And don't forget xvac, please. > Before subtransactions, there used to be only three fields needed to > store these four values. .... five values. > This was possible because only the current > transaction looks at the cmin/cmax values. Which is a reason to get rid of cmin/cmax in tuple headers entirely. Once I had a patch based on 7.4 that stored cmin and cmax in backend-local memory. It passed make check and some volume tests, but I felt it was not ready to be applied without any spill-to-disk mechanism. Development stalled when I tried to eliminate xvac as well, which would have required deep cuts into VACUUM code :-( Servus Manfred ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Manfred Koizar wrote: > On Fri, 2 Sep 2005 15:51:15 -0400 (EDT), Bruce Momjian > <pgman@candle.pha.pa.us> wrote: > > * Merge xmin/xmax/cmin/cmax back into three header fields > > And don't forget xvac, please. > > > Before subtransactions, there used to be only three fields needed to > > store these four values. > > ... five values. > > > This was possible because only the current > > transaction looks at the cmin/cmax values. > > Which is a reason to get rid of cmin/cmax in tuple headers entirely. > Once I had a patch based on 7.4 that stored cmin and cmax in > backend-local memory. It passed make check and some volume tests, but > I felt it was not ready to be applied without any spill-to-disk > mechanism. Development stalled when I tried to eliminate xvac as > well, which would have required deep cuts into VACUUM code :-( Interesting idea, but how would you record the cmin/xmin values without requiring unlimited memory? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| ||||
| On Fri, 2 Sep 2005 20:41:48 -0400 (EDT), Bruce Momjian <pgman@candle.pha.pa.us> wrote: >> Once I had a patch based on 7.4 that stored cmin and cmax in >> backend-local memory. >Interesting idea, but how would you record the cmin/xmin values without >requiring unlimited memory? That's exactly the reason for not sending it to -patches. Without spilling to disk this is just not ready for real life. The problem is that -- unlike other data structures that build up during a transaction, e.g. trigger queues -- cmin/cmax lookup requires random access, so we'd need some form of tree or hash. Unfornunately I never got beyond brainstorming :-( BTW, is there anything else that'd need spilling to disk during long transactions? Servus Manfred ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |