Unix Technical Forum

Re: HOT latest patch - version 8

This is a discussion on Re: HOT latest patch - version 8 within the Pgsql Patches forums, part of the PostgreSQL category; --> Pavan Deolasee wrote: > Please see updated version of the patch. This includes further code > refactoring and bug ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Patches

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-18-2008, 10:22 AM
Heikki Linnakangas
 
Posts: n/a
Default Re: HOT latest patch - version 8

Pavan Deolasee wrote:
> Please see updated version of the patch. This includes further code
> refactoring and bug fixes.


Thanks for the update, Pavan!

I've been looking at this patch in the last couple of weeks in detail. I
wrote a short summary of how it works (attached) to help reviewing it.
Especially the glossary is helpful, since the patch introduces a lot of
new concepts.

I have some suggestions which I'll post separately, this just describes
the status quo of the patch.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Use case
--------
The best use case for HOT is a table that's frequently UPDATEd, and is large
enough that VACUUM is painful. On small tables that fit in cache, running VACUUM every few minutes isn't a problem.

Heap-only tuples
----------------
When a HOT update is performed, the new tuple is placed on the same page as the old one, marked with the HEAP_ONLY_TUPLE flag. HEAP_ONLY_TUPLE means that there's no index pointers to the tuple, which allows pruning the chain in the future. The old tuple is marked with HEAP_HOT_UPDATE-flag, which means that the tuple pointed to by t_ctid is a heap-only tuple. That needs to be taken into account when vacuuming, so that we don't remove the root tuple in the update chain, when there's no index pointers to the later tuples.

When doing an index scan, whenever we reach a non-visible tuple, we need to check if the tuple has been HOT-updated (== HEAP_HOT_UPDATE flag is set). If so, we need to follow the ctid pointer until we reach a visible one, or one that hasn't been HOT-updated.

Sequential scans (and bitmap heap scans with a lossy bitmap) don't need to pay attention to the flags.

Pre-requirements for HOT updates:
1. None of the indexed columns are changed
2. There is no functional indexes on the table
3. There is no partial indexes on the table

2. and 3. could be relaxed, the code just hasn't been written yet.

These requirements are checked at execution time, comparing the binary representation of the old and new values. That means that dummy updates, like "UPDATE foo SET col1 = ?", where ? is the same as the old value can be HOT.

In addition to the above, there needs to be room on the page for the new tuple. If the page is full, we try to make room by pruning the page.

Pruning
-------
When we're doing a HOT update, and there isn't enough space on the page, and there's no suitably sized LP_DELETEd tuples to reuse, all HOT update chains on the page are pruned to make room. Pruning can be thought of as a lightweight retail vacuum, that marks all dead heap-only tuples with LP_DELETE flag, allowing them to be reused. We can't just outright remove the tuples like we do in vacuum, because we'd need a vacuum-strength lock for that.

To reclaim the index-visible (i.e. first) tuple in a HOT chain, the line pointer is turned into a redirecting line pointer that points to the line pointer of the next tuple in the chain. To keep track of the space occupied by the dead tuple, so that we can reuse the space, a new line pointer is allocated and marked with LP_DELETE to point to the dead tuple. That means its tid changes, but that's ok since it's dead.

When the last live tuple in an update chain becomes dead (after a DELETE or a cold update), all tuples in the chain can be marked with LP_DELETE, and the redirecting line pointer is marked as redirected dead.

We've effectively resurrected the "truncate dead tuples to just line pointer" idea that has been proposed and rejected before because of fear of line pointer bloat. To limit the damage in worst case, and to keep numerous arrays as well as the bitmaps in bitmap scans reasonably sized, the maximum number of line pointers (MaxHeapTuplesPerPage) is somewhat arbitrarily capped at 2 * what it was before.

In addition to pruning when a page gets full, pruning of a single HOT chain is done when doing an index fetch. That avoids doing the same chain-following work on future fetches of the same row.

VACUUM FULL
-----------
To make vacuum full work, any DEAD tuples in the middle of an update chain needs to be removed (see comments at the top of heap_prune_hotchain_hard for details). Vacuum full performs a more aggressive pruning that not only removes dead tuples at the beginning of an update chain, it scans the whole chain and removes any intermediate dead tuples as well.

Reusing LP_DELETEd heap tuples
------------------------------
When doing an update, HOT or not, we check if there's a tuple on the page marked with LP_DELETE that's big enough to accommodate the new tuple. If there is, that slot is reused, overwriting the deleted tuple.

We could reuse the slots for inserts as well, but as the patch stands, we don't.

Row-level fragmentation
-----------------------
If the new tuple is smaller than the old LP_DELETEd tuple that's reused, the new tuple is marked as fragmented, which means that there is some unused space between the end of this tuple and the beginning of the next tuple.

If there's no LP_DELETEd tuples large enough to fit the new tuple in, the row-level fragmentation is repaired in the hope that some of the slots were actually big enough, but were just fragmented. That's done by mapping the offsets in the page, and enlarging all LP_DELETEd line pointers up to the beginning of the next tuple.

Vacuum
------
Vacuum prunes all HOT chains, and removes any LP_DELETEd tuples, making the space available for any use.

In lazy vacuum, we must not freeze a tuple that's in the middle of an update chain. That can happen when a tuple has xmin > xmax; it's the same scenario that requires "hard pruning" in VACUUM FULL. Freezing such tuples will break the check that xmin and xmax matches when following the chain. It's not a problem without HOT, because the preceding tuple in the chain must be dead as well so no-one will try to follow the chain, but with HOT the preceding tuple would be DEAD_CHAIN, and someone might still need to follow the chain to find the live tuple. We avoid that by just not freezing such tuples. They can be frozen eventually, when the xmax of the preceding tuple is < OldestXmin as well.

Statistics
----------
XXX: How do HOT-updates affect statistics? How often do we need to run autovacuum?

CREATE INDEX
CREATE INDEX CONCURRENTLY
-------------------------
I'm not very familiar with how these, so I'll just shut up..

Glossary
--------

Fragmented tuple slot
A line pointer with lp_len smaller than the actual space available before the next tuple on the page.

Heap-only tuple
A heap tuple with no index pointers. Marked with HEAP_ONLY_TUPLE flag.

HOT-updated tuple
An updated tuple, so that the next tuple in the chain is a heap-only tuple. Marked with HEAP_HOT_UPDATE flag.

Redirecting line pointer
A line pointer that points to another line pointer. lp_len is set to a magic value (ITEMID_REDIRECTED), and lp_off is the OffsetNumber of the line pointer it points to.

Redirected dead line pointer
A stub line pointer, that doesn't point to anything, but can't be removed or reused yet because there is index pointers to it. Semantically same as a dead tuple.

Root tuple
The first tuple in a HOT update chain, that indexes point to.

Update chain
A chain of updated tuples, so that each tuple's ctid points to the next tuple in the chain. A HOT update chain is an update chain that consists of a root tuple and one or more heap-only tuples. An update chain can contain both HOT and non-HOT (cold) updated tuples.

Cold update
A normal, non-HOT update.

HOT update
An UPDATE, where the new tuple becomes a heap-only-tuple, and no index entries are made.

DEAD_CHAIN (HEAPTUPLE_DEAD_CHAIN)
New return value for HeapTupleSatisfiesVacuum, which means that the tuple is not visible to anyone, but it's been HOT updated so we can't remove it yet because the following tuples in the chain would become inaccessible from indexes.



---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-18-2008, 10:22 AM
Heikki Linnakangas
 
Posts: n/a
Default Re: HOT latest patch - version 8

Heikki Linnakangas wrote:
> I have some suggestions which I'll post separately,


A significant chunk of the complexity and new code in the patch comes
from pruning hot chains and reusing the space for new updates. Because
we can't reclaim dead space in the page like a VACUUM does, without
holding the vacuum lock, we have to deal with pages that contain deleted
tuples, and be able to reuse them, and keep track of the changes in
tuple length etc.

A much simpler approach would be to try to acquire the vacuum lock, and
compact the page the usual way, and fall back to a cold update if we
can't get the lock immediately.

The obvious downside of that is that if a page is continuously pinned,
we can't HOT update tuples on it. Keeping in mind that the primary use
case for HOT is largeish tables, small tables are handled pretty well by
autovacuum, chances are pretty good that you can get a vacuum lock when
you need it.

Thoughts?

I'm looking for ways to make the patch simpler and less invasive. We may
want to put back some of this stuff, or come up with a more clever
solution, in future releases, but right let's keep it simple.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-18-2008, 10:22 AM
Tom Lane
 
Posts: n/a
Default Re: HOT latest patch - version 8

Heikki Linnakangas <heikki@enterprisedb.com> writes:
> A much simpler approach would be to try to acquire the vacuum lock, and
> compact the page the usual way, and fall back to a cold update if we
> can't get the lock immediately.


Seems like that could work.

> I'm looking for ways to make the patch simpler and less invasive.


+1 on that, for sure.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-18-2008, 10:22 AM
Heikki Linnakangas
 
Posts: n/a
Default Re: HOT latest patch - version 8

Tom Lane wrote:
> Heikki Linnakangas <heikki@enterprisedb.com> writes:
>> A much simpler approach would be to try to acquire the vacuum lock, and
>> compact the page the usual way, and fall back to a cold update if we
>> can't get the lock immediately.

>
> Seems like that could work.


I just realized that there's a big problem with that. The process doing
the update surely holds a pin on the buffer itself. Needs more thought..

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-18-2008, 10:22 AM
Tom Lane
 
Posts: n/a
Default Re: HOT latest patch - version 8

Heikki Linnakangas <heikki@enterprisedb.com> writes:
> Tom Lane wrote:
>> Heikki Linnakangas <heikki@enterprisedb.com> writes:
>>> A much simpler approach would be to try to acquire the vacuum lock, and
>>> compact the page the usual way, and fall back to a cold update if we
>>> can't get the lock immediately.

>>
>> Seems like that could work.


> I just realized that there's a big problem with that. The process doing
> the update surely holds a pin on the buffer itself. Needs more thought..


So does VACUUM when it's trying to lock a page, no? In any case we
could surely make an extra parameter to LockBufferForCleanup if it
really needs to distinguish the cases.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-18-2008, 10:22 AM
Heikki Linnakangas
 
Posts: n/a
Default Re: HOT latest patch - version 8

Tom Lane wrote:
> Heikki Linnakangas <heikki@enterprisedb.com> writes:
>> Tom Lane wrote:
>>> Heikki Linnakangas <heikki@enterprisedb.com> writes:
>>>> A much simpler approach would be to try to acquire the vacuum lock, and
>>>> compact the page the usual way, and fall back to a cold update if we
>>>> can't get the lock immediately.
>>> Seems like that could work.

>
>> I just realized that there's a big problem with that. The process doing
>> the update surely holds a pin on the buffer itself. Needs more thought..

>
> So does VACUUM when it's trying to lock a page, no? In any case we
> could surely make an extra parameter to LockBufferForCleanup if it
> really needs to distinguish the cases.


The problem is that if we trigger the page cleanup from heap_update, as
I was planning to do, the executor has already pinned the page and holds
a reference to the old tuple on the page. We can't shuffle the data on
the page, because the pointer to the old tuple would become invalid.

We could do still it from somewhere else, though. For example, in
heap_fetch, the first time a page is touched. Or in bgwriter.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-18-2008, 10:22 AM
Michael Glaesemann
 
Posts: n/a
Default Re: HOT latest patch - version 8

Heikki,

Thanks for providing this summary. As someone unfamiliar with the
domain (both HOT development specifically and tuple management in
general), I found it easy to follow.

On Jul 13, 2007, at 8:31 , Heikki Linnakangas wrote:

> Pruning
> -------


> To reclaim the index-visible (i.e. first) tuple in a HOT chain, the
> line pointer is turned into a redirecting line pointer that points
> to the line pointer of the next tuple in the chain. To keep track
> of the space occupied by the dead tuple, so that we can reuse the
> space, a new line pointer is allocated and marked with LP_DELETE to
> point to the dead tuple. That means its tid changes, but that's ok
> since it's dead.


> Row-level fragmentation
> -----------------------


> If there's no LP_DELETEd tuples large enough to fit the new tuple
> in, the row-level fragmentation is repaired in the hope that some
> of the slots were actually big enough, but were just fragmented.
> That's done by mapping the offsets in the page, and enlarging all
> LP_DELETEd line pointers up to the beginning of the next tuple.


Would it make sense to enlarge the LP_DELETEd line pointers up to the
beginning of the next tuple at the time the tuple is marked LP_DELETE?

> Vacuum
> ------
> Vacuum prunes all HOT chains, and removes any LP_DELETEd tuples,
> making the space available for any use.


In the case of a fragmented row, am I right to assume vacuum reclaims
all space up to the next (live) tuple?

Michael Glaesemann
grzm seespotcode net



---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-18-2008, 10:22 AM
Heikki Linnakangas
 
Posts: n/a
Default Re: HOT latest patch - version 8

Michael Glaesemann wrote:
> On Jul 13, 2007, at 8:31 , Heikki Linnakangas wrote:
>> Row-level fragmentation
>> -----------------------

>
>> If there's no LP_DELETEd tuples large enough to fit the new tuple in,
>> the row-level fragmentation is repaired in the hope that some of the
>> slots were actually big enough, but were just fragmented. That's done
>> by mapping the offsets in the page, and enlarging all LP_DELETEd line
>> pointers up to the beginning of the next tuple.

>
> Would it make sense to enlarge the LP_DELETEd line pointers up to the
> beginning of the next tuple at the time the tuple is marked LP_DELETE?


The thing is, it's relatively expensive to figure out where the next
tuple starts. We need to scan all line pointers to do that. Though given
that pruning all tuples on a page is a relatively expensive operation
anyway, maybe it wouldn't be so bad.

>> Vacuum
>> ------
>> Vacuum prunes all HOT chains, and removes any LP_DELETEd tuples,
>> making the space available for any use.

>
> In the case of a fragmented row, am I right to assume vacuum reclaims
> all space up to the next (live) tuple?


Yes, Vacuum will 'defrag' the page.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-18-2008, 10:23 AM
Simon Riggs
 
Posts: n/a
Default Re: HOT latest patch - version 8

On Fri, 2007-07-13 at 16:22 +0100, Heikki Linnakangas wrote:
> Heikki Linnakangas wrote:
> > I have some suggestions which I'll post separately,


> I'm looking for ways to make the patch simpler and less invasive. We may
> want to put back some of this stuff, or come up with a more clever
> solution, in future releases, but right let's keep it simple.


I believe we're all trying to do that, but I would like to see some
analysis of which techniques are truly effective and which are not.
Simpler code may not have desirable behaviour and then the whole lot of
code is pointless. Let's make it effective by making it complex enough.
I'm not clear where the optimum lies. (c.f. Flying Buttresses).

> A significant chunk of the complexity and new code in the patch comes
> from pruning hot chains and reusing the space for new updates. Because
> we can't reclaim dead space in the page like a VACUUM does, without
> holding the vacuum lock, we have to deal with pages that contain deleted
> tuples, and be able to reuse them, and keep track of the changes in
> tuple length etc.
>
> A much simpler approach would be to try to acquire the vacuum lock, and
> compact the page the usual way, and fall back to a cold update if we
> can't get the lock immediately.
>
> The obvious downside of that is that if a page is continuously pinned,
> we can't HOT update tuples on it. Keeping in mind that the primary use
> case for HOT is largeish tables, small tables are handled pretty well by
> autovacuum, chances are pretty good that you can get a vacuum lock when
> you need it.


The main problem HOT seeks to avoid is wasted inserts into indexes, and
the subsequent VACUUMing that requires. Small tables have smaller
indexes, so that the VACUUMing is less of a problem. If we have hot
spots in larger tables, DSM would allow us to avoid the I/O on the main
table, but we would still need to scan the indexes. So HOT *can* be
better than DSM. I'm worried that requiring the vacuum lock in all cases
will mean that HOT will be ineffective where it is needed most - in the
hot spots - i.e. the blocks that contain frequently updated rows. [As an
aside, in OLTP it is frequently the index blocks that become hot spots,
so reducing index inserts because of UPDATEs will also reduce block
contention]

Our main test case for OLTP is DBT-2 which follows TPC-C in being
perfectly scalable with no hot spots in the heap and limited hot spots
in the indexes. As such it's a poor test of real world applications,
where Benfold's Law holds true. Requiring the vacuum lock in all cases
would allow good benchmark performance but would probably fail in the
real world at providing good long term performance.

I'm interested in some numbers that show which we need. I'm thinking of
some pg_stats output that shows how many vac locks were taken and how
many prunes were made. Something general that allows some beta testers
to provide feedback on the efficacy of the patch.

That leads to the suggestion that we should make the HOT pruning logic
into an add-on patch, commit it, but evaluate its performance during
beta. If we have no clear evidence of additional benefit, we remove it
again.

I'm not in favour of background retail vacuuming by the bgwriter. The
timeliness of that is (similarly) in question and I think bgwriter has
enough work to do already.

[Just as a note to all performance testers: HOT is designed to show
long-term steady performance. Short performance tests frequently show no
benefit if sufficient RAM is available to avoid the table bloat and we
avoid hitting the point where autovacuums kick in. I know Heikki knows
this, just not sure we actually said it.]

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-18-2008, 10:23 AM
Greg Smith
 
Posts: n/a
Default Re: HOT latest patch - version 8

On Sun, 15 Jul 2007, Simon Riggs wrote:

> Our main test case for OLTP is DBT-2 which follows TPC-C in being
> perfectly scalable with no hot spots in the heap and limited hot spots
> in the indexes. As such it's a poor test of real world applications,
> where Benfold's Law holds true.


I assume this is a typo on Benford's Law:
http://en.wikipedia.org/wiki/Benford's_law which notes there are far more
ones in real-world data sets.

If there were a Benfold's Law, it would surely involve the number 5
instead.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 03:20 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com