Unix Technical Forum

Including Snapshot Info with Indexes

This is a discussion on Including Snapshot Info with Indexes within the pgsql Hackers forums, part of the PostgreSQL category; --> Hi, Currently The index implementation in Postgresql does not store the Snapshot information in the Index. If we add ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-15-2008, 10:22 PM
Gokulakannan Somasundaram
 
Posts: n/a
Default Including Snapshot Info with Indexes

Hi,
Currently The index implementation in Postgresql does not store the
Snapshot information in the Index. If we add the snapshot information into
the indexing structure, we will have the following advantages.

a) There can be index only scans like Oracle
b) Unique indexes will become less costly, as older index tuples can be
found out.
c) Even the index scans will get faster, since some of the index tuples
won't translate into HeapScans.
d) Deletes and Updates will become slightly costly, as they have to update
these indexes.

I propose we add a DDL like "Create Index .. With Snapshot", which will have
different relkind.

The design in my mind is to add the Snapshot info together with the values
as first variables. We need to change the Index_getattr slightly to have the
relkind as input parameter, based on which we can have an offset.

Please reply back with your comments.

Thanks,
Gokul.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-15-2008, 10:22 PM
Heikki Linnakangas
 
Posts: n/a
Default Re: Including Snapshot Info with Indexes

Gokulakannan Somasundaram wrote:
> Currently The index implementation in Postgresql does not store the
> Snapshot information in the Index. If we add the snapshot information into
> the indexing structure, we will have the following advantages.


This idea has been discussed to death many times before. Please search
the archives.

> a) There can be index only scans like Oracle


IMO, the most promising approach to achieving index-only-scans at the
moment is the Dead Space Map, as discussed in the 8.3 dev cycle.

> b) Unique indexes will become less costly, as older index tuples can be
> found out.


Doesn't seem like a big benefit, considering that in most cases there
won't be any tuples in the index with a duplicate key. A common
exception to that is (non-HOT) updating a row. But in that case, the
page containing the old tuple is already in cache, so the lookup of the
visibility from the heap is cheap.

> c) Even the index scans will get faster, since some of the index tuples
> won't translate into HeapScans.


That's the same as doing an index-only-scan, right?

> d) Deletes and Updates will become slightly costly, as they have to update
> these indexes.


I think you're grossly underestimating the cost of that. For example, on
a table with 3 indexes. a delete currently requires one index lookup +
one heap lookup. With visibility in the indexes, that would require 3
index lookups + one heap lookup. That's 4 vs. 2 page accesses, not
taking into account the non-leaf b-tree pages. The real impact will
depend on what's in cache, but the cost can be very high.

Also, the full visibility information would need 12 bytes of space per
tuple. An index tuple on an int4 key currently takes 12 bytes, so that
would double the index size. Storage size has a big impact on
performance. More bytes means more I/O, less data fits in cache, and
more WAL traffic.

There's non-trivial implementation issues involved as well. You'd need a
way to reliably find all the index pointers for a given heap tuple
(search the archives for "retail vacuum" for the issues involved in
that. Broken user defined functions are a problem for example). And
you'd need to keep them all locked at the same time to modify them all
atomically, which is prone to deadlocks.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-15-2008, 10:23 PM
Csaba Nagy
 
Posts: n/a
Default Re: Including Snapshot Info with Indexes

On Mon, 2007-10-08 at 09:40 +0100, Heikki Linnakangas wrote:
> This idea has been discussed to death many times before. Please search
> the archives.


Somewhat related to the "visibility in index" thing: would it be
possible to have a kind of index-table ? We do have here some tables
which have 2-4 fields, and the combination of them forms the primary key
of the table. There are usually no other indexes on the table, and the
net result is that the PK index duplicates the heap. So it would be cool
if the index would be THE heap somehow...

The most prominent example of this in our DBs is this table:

db> \d table_a
Table "public.table_a"
Column | Type | Modifiers
-----------+--------+-----------
id_1 | bigint | not null
id_2 | bigint | not null
Indexes:
"pk_table_a" PRIMARY KEY, btree (id_1, id_2)

db> select reltuples::bigint, relpages from pg_class where
relname='table_a';
reltuples | relpages
-----------+----------
445286464 | 710090
(1 row)

db> select reltuples::bigint, relpages from pg_class where
relname='pk_table_a';
reltuples | relpages
-----------+----------
445291072 | 599848
(1 row)

This postgres install is compiled with 32K page size (for the ones who
wonder about the page counts). In any case, it is clear that the index
basically duplicates the heap...

Cheers,
Csaba.



---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-15-2008, 10:23 PM
Heikki Linnakangas
 
Posts: n/a
Default Re: Including Snapshot Info with Indexes

Csaba Nagy wrote:
> On Mon, 2007-10-08 at 09:40 +0100, Heikki Linnakangas wrote:
>> This idea has been discussed to death many times before. Please search
>> the archives.

>
> Somewhat related to the "visibility in index" thing: would it be
> possible to have a kind of index-table ? We do have here some tables
> which have 2-4 fields, and the combination of them forms the primary key
> of the table. There are usually no other indexes on the table, and the
> net result is that the PK index duplicates the heap. So it would be cool
> if the index would be THE heap somehow...


The clustered index patch I worked on for 8.3, but didn't make it in,
would have helped that use case a lot.

A column-store kind of mechanism would be nice. Some columns could be
stored in index-like structures, while other would be in the heap. I
haven't seen any practical proposal on how to do it though.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-15-2008, 10:23 PM
Gokulakannan Somasundaram
 
Posts: n/a
Default Re: Including Snapshot Info with Indexes

On 10/8/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote:
>
> Gokulakannan Somasundaram wrote:
> > Currently The index implementation in Postgresql does not store the
> > Snapshot information in the Index. If we add the snapshot information

> into
> > the indexing structure, we will have the following advantages.

>
> This idea has been discussed to death many times before. Please search
> the archives.
>
> > a) There can be index only scans like Oracle

>
> IMO, the most promising approach to achieving index-only-scans at the
> moment is the Dead Space Map, as discussed in the 8.3 dev cycle.



Index only scans means that in order to get certain results, we may not
goto the table at all. For example, if you have an index on columns a and b,
and if there is a query like "select b from table where a between a1 and
a2", then the explain plan need not goto the table. I can't understand how
dead space map will provide such a functionality. In short each index will
act like an Index Organized Table, if the all the columns of the query are
present in the index.

> b) Unique indexes will become less costly, as older index tuples can be
> > found out.

>
> Doesn't seem like a big benefit, considering that in most cases there
> won't be any tuples in the index with a duplicate key. A common
> exception to that is (non-HOT) updating a row. But in that case, the
> page containing the old tuple is already in cache, so the lookup of the
> visibility from the heap is cheap.



Its not a big benefit. agreed.

> c) Even the index scans will get faster, since some of the index tuples
> > won't translate into HeapScans.

>
> That's the same as doing an index-only-scan, right?



No here if you have an index on a(say). If there is a query like select *
form table where a between a1 and a2, currently the scan goes to the table
to verify the visibility. Of course if the tuple satisfies vacuum, then it
is marked in the index, which is an optimization. This is not index-only
scan. This is a normal index scan, which can skip certain random I/Os.

> d) Deletes and Updates will become slightly costly, as they have to update
> > these indexes.

>
> I think you're grossly underestimating the cost of that. For example, on
> a table with 3 indexes. a delete currently requires one index lookup +
> one heap lookup. With visibility in the indexes, that would require 3
> index lookups + one heap lookup. That's 4 vs. 2 page accesses, not
> taking into account the non-leaf b-tree pages. The real impact will
> depend on what's in cache, but the cost can be very high.



That's true. But i am not asking to replace the current index
implementation, but to provide an extra option while indexing. Say if a
particular database setup doesn't do much deletes and updates(imagine tables
with partitioning, where the partitions/tables are dropped instead of
deletes. They can have an option to "create index .. with snapshot"

Imagine the Index Vacuum also will do lesser Random I/Os

Also, the full visibility information would need 12 bytes of space per
> tuple. An index tuple on an int4 key currently takes 12 bytes, so that
> would double the index size. Storage size has a big impact on
> performance. More bytes means more I/O, less data fits in cache, and
> more WAL traffic.



I am thinking of certain optimizations here. we have a bit unused in
indextuple structure. If a particular tuple is not deleted, then we can
signify that using that bit and save 6 bytes of saving the xmax and cmax. We
are trading of this space efficiency in place of Random I/Os, which is not a
bad trade-off , i suppose. Again this is going to optional for the user. If
users have an option to create Bitmap index/ Binary index, why can't they
have this option as well?

There's non-trivial implementation issues involved as well. You'd need a
> way to reliably find all the index pointers for a given heap tuple
> (search the archives for "retail vacuum" for the issues involved in
> that. Broken user defined functions are a problem for example). And
> you'd need to keep them all locked at the same time to modify them all
> atomically, which is prone to deadlocks.




I think Vacuum need not goto the table, as the visibility information is
present in the index itself. I don't know whether i have given the correct
answer here.

Expecting your reply..


Thanks,
Gokul.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-15-2008, 10:23 PM
Gokulakannan Somasundaram
 
Posts: n/a
Default Re: Including Snapshot Info with Indexes

On 10/8/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote:
>
> Csaba Nagy wrote:
> > On Mon, 2007-10-08 at 09:40 +0100, Heikki Linnakangas wrote:
> >> This idea has been discussed to death many times before. Please search
> >> the archives.

> >
> > Somewhat related to the "visibility in index" thing: would it be
> > possible to have a kind of index-table ? We do have here some tables
> > which have 2-4 fields, and the combination of them forms the primary key
> > of the table. There are usually no other indexes on the table, and the
> > net result is that the PK index duplicates the heap. So it would be cool
> > if the index would be THE heap somehow...

>
> The clustered index patch I worked on for 8.3, but didn't make it in,
> would have helped that use case a lot.
>
> A column-store kind of mechanism would be nice. Some columns could be
> stored in index-like structures, while other would be in the heap. I
> haven't seen any practical proposal on how to do it though.



I think it more resembles the Oracle's IOT with overflow. I think my
proposal, once implemented can be easily extended to incorporate
IOT/Clustered indexes. One thing is for sure. Without storing Visibility
info, Unique Secondary indexes(Indexes on IOT/Clustered indexed tables) is
not possible, as it is not possible to re-locate the same entry in a b-tree,
if we are storing the Primary key in place of tuple-id.


--
> Heikki Linnakangas
> EnterpriseDB http://www.enterprisedb.com
>


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-15-2008, 10:23 PM
Heikki Linnakangas
 
Posts: n/a
Default Re: Including Snapshot Info with Indexes

Gokulakannan Somasundaram wrote:
> On 10/8/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote:
>> IMO, the most promising approach to achieving index-only-scans at the
>> moment is the Dead Space Map, as discussed in the 8.3 dev cycle.

>
> Index only scans means that in order to get certain results, we may not
> goto the table at all. For example, if you have an index on columns a and b,
> and if there is a query like "select b from table where a between a1 and
> a2", then the explain plan need not goto the table. I can't understand how
> dead space map will provide such a functionality.


Please read the discussions in the archives. The dead space map holds
visibility information in a condensed form. For index-only-scans, we
need to know if all tuples on page are are visible to us. If the dead
space map is designed with index-only-scans in mind, we can store a bit
there indicating "all tuples on this page are visible to everyone".
Pages that have that bit set don't need to be visited to check visibility.

What information exactly is going to be stored in the dead space map is
still debated. For vacuuming, we need to know which pages contain dead
tuples that are worth vacuuming, which isn't the same thing as "all
tuples are visible to everyone", but it's quite close.

Heap pages that do have dead or recently modified rows do need to be
visited, so the executor needs to always be prepared to check visibility
from the heap. However, on a table that's not very frequently updated,
most bits will be set and the effect will be the same as genuine
index-only-scans. On a table that is frequently updated, you would
suffer a big hit in update performance with the "duplicate visibility
information in all indexes" scheme as well, as the updates would need to
update the indexes as well, so the performance characteristics are
roughly the same.

> That's true. But i am not asking to replace the current index
> implementation, but to provide an extra option while indexing. Say if a
> particular database setup doesn't do much deletes and updates(imagine tables
> with partitioning, where the partitions/tables are dropped instead of
> deletes. They can have an option to "create index .. with snapshot"


A nice property of utilizing the dead space map for index-only-scans is
that it doesn't require any action from the DBA. It will "just work". It
also adapts well to tables that have parts that are frequently updated,
and other parts that are not. The frequently updated parts will behave
like what we have now, index-only-scans are not possible because, but
deletes/updates are cheap. But the less frequently updated parts will
eventually have the bits set in the map, and we can do index-only-scans
for those parts.

> Imagine the Index Vacuum also will do lesser Random I/Os


Index vacuum doesn't do random I/Os as it is.

> Also, the full visibility information would need 12 bytes of space per
>> tuple. An index tuple on an int4 key currently takes 12 bytes, so that
>> would double the index size. Storage size has a big impact on
>> performance. More bytes means more I/O, less data fits in cache, and
>> more WAL traffic.

>
> I am thinking of certain optimizations here. we have a bit unused in
> indextuple structure. If a particular tuple is not deleted, then we can
> signify that using that bit and save 6 bytes of saving the xmax and cmax. We
> are trading of this space efficiency in place of Random I/Os, which is not a
> bad trade-off , i suppose. Again this is going to optional for the user. If
> users have an option to create Bitmap index/ Binary index, why can't they
> have this option as well?


Because we have to maintain it?

Speaking of bitmap indexes, that would be nice to have. It looks like
Gavin dropped the ball on the patch, so if you want to continue that
work, that would be great.

> There's non-trivial implementation issues involved as well. You'd need a
>> way to reliably find all the index pointers for a given heap tuple
>> (search the archives for "retail vacuum" for the issues involved in
>> that. Broken user defined functions are a problem for example). And
>> you'd need to keep them all locked at the same time to modify them all
>> atomically, which is prone to deadlocks.

>
> I think Vacuum need not goto the table, as the visibility information is
> present in the index itself.


Vacuum isn't the problem here. The problem is: given heap tuple X, how
do you find the the corresponding index tuples? The obvious solution is
to calculate the index keys from the heap tuple, and use them to look
up. But what if you have an expression index on a user-defined function,
and that user-defined function is broken so that it returns a different
value than when the index tuple was inserted? You won't find the index
tuples in that case, so you won't be able to update the visibility
information. Granted, you've got a broken user-defined-function in that
case, and you're going to get bogus query results anyway. But not
finding the index tuple when needed would lead to more serious
corruption. This is the same problem many proposed retail vacuum schemes
have stumbled into, which is why I suggested searching for that.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-15-2008, 10:23 PM
Hannu Krosing
 
Posts: n/a
Default Re: Including Snapshot Info with Indexes

Ühel kenal päeval, E, 2007-10-08 kell 11:41, kirjutas Heikki
Linnakangas:
> The dead space map holds
> visibility information in a condensed form. For index-only-scans, we
> need to know if all tuples on page are are visible to us. If the dead
> space map is designed with index-only-scans in mind, we can store a bit
> there indicating "all tuples on this page are visible to everyone".
> Pages that have that bit set don't need to be visited to check visibility.
>
> What information exactly is going to be stored in the dead space map is
> still debated. For vacuuming, we need to know which pages contain dead
> tuples that are worth vacuuming, which isn't the same thing as "all
> tuples are visible to everyone", but it's quite close.


I would prefer a separate MVC visibility heap (aka. extended "dead space
map") which would duplicate whole visibility info from heap pages, just
in compressed form. After a few releases with duplicated visibility
info, we could remove it from the data heap.

If the whole visibility info (cmin, cmax, tmin, tmax, flags, (+ size for
DSM uses)) for tuples, were in a separate heap, it would allow for a lot
of on-the-fly compression. for example we could:

* get rid of both tmin and tmax for all completed transactions
* reduce any deleted tuple to just flags
* reduce any tuple produced by aborted transaction to just flags
* reduce any tuple visible to all backends to just flags
* RRL compress (runs of) pages produced by same transaction
* RRL compress (runs of) pages with all tuples visible
* RRL compress (runs of) pages with all tuples deleted

depending on distribution of Inserts/Updates/Deletes it will make
visibility info a little or a lot smaller than it is currently, greatly
enchancing chances that it stays in cache (even for OLAP loads, because
data for these are usually produced by bulk inserts and thus their
visibility info is highly compressable)

It also makes VACUUM more efficient, as it's initial scan (find
vacuumable tuples) will need to do lot less IO.

And it allows for more intelligent choices for new tuple placement ,
especially if we want to preserve clustering.

And of course it gives you kind of index-only scans (mostly read index + check in vis.heap)

-------------
Hannu



---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-15-2008, 10:23 PM
Hannu Krosing
 
Posts: n/a
Default Another Idea: Try Including snapshot with TOAS (was: IncludingSnapshot Info with Indexes)

Ühel kenal päeval, E, 2007-10-08 kell 12:12, kirjutas Gokulakannan
Somasundaram:
> Hi,
> Currently The index implementation in Postgresql does not store
> the Snapshot information in the Index.

...
> Please reply back with your comments.


I think you got enoght "search for previous discussion" answers on this
aone already

So I propose a few another ideas to investigate

1. get rid of indexes for TOAST tables

instead of TOAST tuple id store CTID of first TOAST block directly, and
use something like skip lists inside the TOAST block headers to access
next TOAST tuples.

2. store visibility info in TOAST tables

if you store visibility info in TOAST tuples, it becomes possible to
update just the small part of the tuple in the main heap and keep the
bulk of toasted data in place.


both of these ideas are much more complicated to implement than it
appears from my simple description, but should have big benefits for a
sizable number of scenarios which use TOAST.

-------------
Hannu



---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-15-2008, 10:23 PM
Simon Riggs
 
Posts: n/a
Default Re: Another Idea: Try Including snapshot with TOAS (was:Including Snapshot Info with Indexes)

On Mon, 2007-10-08 at 14:58 +0300, Hannu Krosing wrote:

> 1. get rid of indexes for TOAST tables
>
> instead of TOAST tuple id store CTID of first TOAST block directly, and
> use something like skip lists inside the TOAST block headers to access
> next TOAST tuples.


It should be possible to optimise TOAST for when there is just a single
chunk that is toasted. Since we often (and by default) compress data
stored in TOAST tables this would be a frequently used optimisation.

Instead of storing the TOAST OID, which is then looked-up in the index
to find the TOAST tid, we can just store the tid directly in the toast
pointer on the main heap. That way we'll get faster read and write
access for a good proportion of rows by avoiding the toast index and
going straight to the toast heap.

We'd need a different kind of toast pointer which would be 2 bytes
longer than the normal pointer. I think we have a spare flag bit to
indicate this.

That's just a rough sketch, I've not checked the details on this.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 03:01 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com