Unix Technical Forum

Single Index Tuple Chain (SITC) method

This is a discussion on Single Index Tuple Chain (SITC) method within the pgsql Hackers forums, part of the PostgreSQL category; --> bruce wrote: > Greg Stark wrote: > > > > Bruce Momjian <bruce@momjian.us> writes: > > > > > ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-12-2008, 03:15 AM
Bruce Momjian
 
Posts: n/a
Default Single Index Tuple Chain (SITC) method

bruce wrote:
> Greg Stark wrote:
> >
> > Bruce Momjian <bruce@momjian.us> writes:
> >
> > > PFC wrote:
> > > >
> > > > > My idea is that if an UPDATE places the new tuple on the same page as
> > > > > the old tuple, it will not create new index entries for any indexes
> > > > > where the key doesn't change.
> > > >
> > > > Basically the idea behind preventing index bloat by updates is to have
> > > > one index tuple point to several actual tuples having the same value.
> > > >
> > >
> > > The idea is not to avoid index bloat, but to allow heap reuse, and having
> > > one index entry for multiple versions of an UPDATEd row is merely an
> > > implementation detail.

> >
> > It sort of sounds like you're describing a whole new index type that stores
> > only the page, not the precise record of any tuple it indexes. If your table

>
> Background, indexes point to page item pointers, not to actual offsets
> in the page. This is how vacuum can move around tuples without modifying the
> indexes. The index points to a page item pointer that is a chain of
> tuples with the same indexed columns.


Here is an overview of the SITC method:

http://momjian.us/cgi-bin/pgsitc

Anyone want to start coding?

--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-12-2008, 03:15 AM
Tom Lane
 
Posts: n/a
Default Re: Single Index Tuple Chain (SITC) method

Bruce Momjian <bruce@momjian.us> writes:
> Here is an overview of the SITC method:
> http://momjian.us/cgi-bin/pgsitc


A pretty fundamental problem is that the method assumes it's OK to
change the CTID of a live tuple (by swapping its item pointer with some
expired version). It is not --- this will break:
* active UPDATEs and DELETEs that may have fetched the CTID
but not yet completed processing to decide whether to change
the tuple;
* pending AFTER ROW triggers, such as foreign key checks;
* ODBC as well as other applications that assume CTID is a
usable unique row identifier within transactions.
VACUUM FULL can get away with moving tuples to new CTIDs because it takes
AccessExclusiveLock, so there can be no open transactions with knowledge
of current CTIDs in the table. This is not OK for something that's
supposed to happen in plain UPDATEs, though.

Another problem is you can't recycle tuples, nor item ids, without
taking a VACUUM-style lock on the page (LockBufferForCleanup). If
anyone else is holding a pin on the page they risk getting totally
confused --- for instance, a seqscan will either miss a tuple or scan it
twice depending on which direction you're juggling item ids around it.
The concurrency loss involved in LockBufferForCleanup is OK for
background-maintenance operations like VACUUM, but I seriously doubt
anyone will find it acceptable for UPDATE. It could easily create
application-level deadlocks, too. (VACUUM is safe against that because
it only holds one lock.)

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-12-2008, 03:15 AM
Zeugswetter Andreas DCP SD
 
Posts: n/a
Default Re: Single Index Tuple Chain (SITC) method


> > Here is an overview of the SITC method:
> > http://momjian.us/cgi-bin/pgsitc

>
> A pretty fundamental problem is that the method assumes it's
> OK to change the CTID of a live tuple (by swapping its item
> pointer with some expired version). It is not --- this will break:


I am having difficulty visualizing that. The plan is not to change
CTID's
(only the CTID's offset into the page is to be changed).
The CTID of the new version is one that is up to now invisible to all
backends,
so noone can actually have remembered that CTID.

Also you would first insert the slot content and then change the CTID
offset
(this offset change might need to be made atomic).

Andreas

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-12-2008, 03:15 AM
Hannu Krosing
 
Posts: n/a
Default Re: Single Index Tuple Chain (SITC) method

Ühel kenal päeval, K, 2006-06-28 kell 18:19, kirjutas Tom Lane:
> Bruce Momjian <bruce@momjian.us> writes:
> > Here is an overview of the SITC method:
> > http://momjian.us/cgi-bin/pgsitc

>
> A pretty fundamental problem is that the method assumes it's OK to
> change the CTID of a live tuple (by swapping its item pointer with some
> expired version). It is not --- this will break:
> * active UPDATEs and DELETEs that may have fetched the CTID
> but not yet completed processing to decide whether to change
> the tuple;
> * pending AFTER ROW triggers, such as foreign key checks;
> * ODBC as well as other applications that assume CTID is a
> usable unique row identifier within transactions.


We should *always* return the ctid of CITC head, as this is the one that
does not change.

And anyway, ctid is a usable unique row identifier only within read-only
transactions, or not ?

> VACUUM FULL can get away with moving tuples to new CTIDs because it takes
> AccessExclusiveLock, so there can be no open transactions with knowledge
> of current CTIDs in the table. This is not OK for something that's
> supposed to happen in plain UPDATEs, though.


Would it still be a problem, if we *always* refer to the whole CITC
chain by its externally visible ctid, an look up the real tuple inside
tuple fetch op at every access.

(1) If we had some special bits for tuples at CITC chain head and inside
CITC but not at head, then even seqscan can ignore non-head CITC chain
members at its find next tuple op and do the real tuple lookup in some
inner function when it hits CITC head.

Is it correct to assume, that only one row version can be in process of
being modified at any one time?

> Another problem is you can't recycle tuples, nor item ids, without
> taking a VACUUM-style lock on the page (LockBufferForCleanup). If
> anyone else is holding a pin on the page they risk getting totally
> confused --- for instance, a seqscan will either miss a tuple or scan it
> twice depending on which direction you're juggling item ids around it.


I think (1) above solves this, at cost of looking twice at CITC internal
tuple headers.

> The concurrency loss involved in LockBufferForCleanup is OK for
> background-maintenance operations like VACUUM, but I seriously doubt
> anyone will find it acceptable for UPDATE. It could easily create
> application-level deadlocks, too. (VACUUM is safe against that because
> it only holds one lock.)



Tom - what do you think of the other related idea, that of reusing dead
index entries ?

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-12-2008, 03:16 AM
Martijn van Oosterhout
 
Posts: n/a
Default Re: Single Index Tuple Chain (SITC) method

On Thu, Jun 29, 2006 at 01:39:51AM +0300, Hannu Krosing wrote:
> And anyway, ctid is a usable unique row identifier only within read-only
> transactions, or not ?


Err, no. The ctid is the only identifer of a tuple in any case. When
you do a delete, the tuple to be deleted is indicated by the ctid field
which has been passed up from the base table through the rest of the
query. When you reach the top the ctid better refer to the same tuple
or you'll delete the wrong one. UPDATE is the same.

In READ COMMITTED mode, the tuple is rechecked for visibility and if
it's invisible, the ctid chain is followed to find the visible one
(which may not necessarily be the last one).

For all intents and purposes, the CTID of tuple can't change unless
you're 100% certain no-one is using it in any way. That's what the
vacuum lock is for.

> Is it correct to assume, that only one row version can be in process of
> being modified at any one time?


No, different transactions may be updating differing versions, depending
on what was visible at the time. In serialisable transactions you'll
get a serialisation failure though, and for read committed, the query
will be rerun for the latest version of the tuple.

One thing I am confused about, currently the ctid chain follows tuple
history so that transactions can find the latest version of any tuple,
even if the key fields have changed. This proposal breaks that, I'm not
sure how important that is though.

> Tom - what do you think of the other related idea, that of reusing dead
> index entries ?


I'd like to know about this too, including ideas about truncating
tuples to just the header.

Have a nice day.
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFEo708IB7bNG8LQkwRAkgZAJ4unpOiJw0pXjuPpfW8CA Daq3FaHQCeLMbj
ODdpyfdanwbM2JL2Bm4Ih8o=
=Ymp8
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-12-2008, 03:16 AM
Zeugswetter Andreas DCP SD
 
Posts: n/a
Default Re: Single Index Tuple Chain (SITC) method


> > And anyway, ctid is a usable unique row identifier only within
> > read-only transactions, or not ?


actually for as long as no vacuum comes along. This would change
with SITC. (Maybe it would help to only reuse old versions of the same
row,
then anybody holding a ctid would at least be still looking at a version
of
the same row, and should thus be able to follow the update chain)

> Err, no. The ctid is the only identifer of a tuple in any
> case. When you do a delete, the tuple to be deleted is
> indicated by the ctid field which has been passed up from the
> base table through the rest of the query. When you reach the
> top the ctid better refer to the same tuple or you'll delete
> the wrong one. UPDATE is the same.


For all these purposes you will be holding the ctid of a visible
(to someone) tuple. Those don't qualify for a new SITC tuple anyway.

> For all intents and purposes, the CTID of tuple can't change
> unless you're 100% certain no-one is using it in any way.


For all I know, noone is using dead tuples except for visibility
lookup. We would need to make sure that other backends see the new
tuple eighter as dead or txopen as long as the contents are not valid.
I think we could do that without a vacuum lock on platforms that support
4 byte atomic operations.

Andreas

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-12-2008, 03:16 AM
Tom Lane
 
Posts: n/a
Default Re: Single Index Tuple Chain (SITC) method

Martijn van Oosterhout <kleptog@svana.org> writes:
>> Tom - what do you think of the other related idea, that of reusing dead
>> index entries ?


Possibly workable for btree now that we do page-at-a-time index scans;
however I'm pretty hesitant to build any large infrastructure atop that
change until we've got more performance results. We might yet end up
reverting it.

Another issue is that this would replace a simple hint-bit setting with
an index change that requires a WAL entry. There'll be more WAL traffic
altogether from backends retail-deleting index tuples than there would
be from VACUUM cleaning the whole page at once --- and it won't cut the
I/O demand from VACUUM any, either, since VACUUM still has to scan the
index. AFAICS this wouldn't make VACUUM either cheaper or less
necessary, so I'm not sure I see the point.

> I'd like to know about this too, including ideas about truncating
> tuples to just the header.


You can't truncate a tuple to just the header, or at least it's not
going to be very useful to do it, unless you can also move other tuples
to coalesce the free space on the page. Which means you need a
VACUUM-strength page lock. If you're trying to do this in foreground
queries, you get into the same performance and deadlock issues I already
mentioned. And I think the net-increase-in-WAL-traffic point would
apply too, since VACUUM will still need to clean the page when it
removes the header.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-12-2008, 03:16 AM
Hannu Krosing
 
Posts: n/a
Default Re: Single Index Tuple Chain (SITC) method

Ühel kenal päeval, N, 2006-06-29 kell 12:35, kirjutas Tom Lane:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> >> Tom - what do you think of the other related idea, that of reusing dead
> >> index entries ?

>
> Possibly workable for btree now that we do page-at-a-time index scans;
> however I'm pretty hesitant to build any large infrastructure atop that
> change until we've got more performance results. We might yet end up
> reverting it.
>
> Another issue is that this would replace a simple hint-bit setting with
> an index change that requires a WAL entry. There'll be more WAL traffic
> altogether from backends retail-deleting index tuples than there would
> be from VACUUM cleaning the whole page at once --- and it won't cut the
> I/O demand from VACUUM any, either, since VACUUM still has to scan the
> index. AFAICS this wouldn't make VACUUM either cheaper or less
> necessary, so I'm not sure I see the point.


How can it generate more traffic ?

When you replace a dead index entry with a live one, you just reuse
space - you would have to WAL log the index in both cases (adding a new
entry or replacing dead entry)

Espacially in the case, where you replace an index entryu with the same
value.

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com



---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-12-2008, 03:16 AM
Alvaro Herrera
 
Posts: n/a
Default Longer startup delay (was Re: Single Index Tuple Chain (SITC) method)

Tom Lane wrote:

> Another issue is that this would replace a simple hint-bit setting with
> an index change that requires a WAL entry. There'll be more WAL traffic
> altogether from backends retail-deleting index tuples than there would
> be from VACUUM cleaning the whole page at once


Speaking of which, I think I've noticed a longer delay in server start
after initdb. I haven't measured nor profiled it, but I think it may be
because of the heap_inplace_update xlogging that we weren't doing
previously.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-12-2008, 03:16 AM
Martijn van Oosterhout
 
Posts: n/a
Default Re: Single Index Tuple Chain (SITC) method

On Thu, Jun 29, 2006 at 12:35:12PM -0400, Tom Lane wrote:
> Another issue is that this would replace a simple hint-bit setting with
> an index change that requires a WAL entry. There'll be more WAL traffic
> altogether from backends retail-deleting index tuples than there would
> be from VACUUM cleaning the whole page at once --- and it won't cut the
> I/O demand from VACUUM any, either, since VACUUM still has to scan the
> index. AFAICS this wouldn't make VACUUM either cheaper or less
> necessary, so I'm not sure I see the point.


Ok, I'm going to suggest something that's either radical, or very dumb:
does this truncation really need to be xlogged? After all, just like
hint bits, nothing is being done that won't be recovered from later if
it doesn't happen. The only possible thing I can imagine is that during
xlog replay you end up trying to add more tuples than appear to fit.
But if you have a vacuum page procedure you could call it again to try
to compress it down.

Then again, perhaps visibility checks are not safe within xlog replay
state.

I'm hoping that overall disk traffic reduces because total disk space
used by tables/indexes reduces.

> You can't truncate a tuple to just the header, or at least it's not
> going to be very useful to do it, unless you can also move other tuples
> to coalesce the free space on the page. Which means you need a
> VACUUM-strength page lock. If you're trying to do this in foreground
> queries, you get into the same performance and deadlock issues I already
> mentioned. And I think the net-increase-in-WAL-traffic point would
> apply too, since VACUUM will still need to clean the page when it
> removes the header.


Well, I was only thinking of having the bgwriter do it in the
background, just bfore writing the block to disk. I'm hoping that it
only tries to write out pages not recently used, so hopefully there
would be very little contention there.

And perhaps you can avoid the xlogging for the same reason as I
suggested above.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFEpA79IB7bNG8LQkwRAjOGAJ4/Enui9SPzbSPB/Pn9BqD5W2M2SQCfWW7o
WVZZrmV2Jc+B+Xmc6OAghYw=
=Vx+T
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 08:24 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com