This is a discussion on Including Snapshot Info with Indexes within the pgsql Hackers forums, part of the PostgreSQL category; --> Hi, Currently The index implementation in Postgresql does not store the Snapshot information in the Index. If we add ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi, Currently The index implementation in Postgresql does not store the Snapshot information in the Index. If we add the snapshot information into the indexing structure, we will have the following advantages. a) There can be index only scans like Oracle b) Unique indexes will become less costly, as older index tuples can be found out. c) Even the index scans will get faster, since some of the index tuples won't translate into HeapScans. d) Deletes and Updates will become slightly costly, as they have to update these indexes. I propose we add a DDL like "Create Index .. With Snapshot", which will have different relkind. The design in my mind is to add the Snapshot info together with the values as first variables. We need to change the Index_getattr slightly to have the relkind as input parameter, based on which we can have an offset. Please reply back with your comments. Thanks, Gokul. |
| |||
| Gokulakannan Somasundaram wrote: > Currently The index implementation in Postgresql does not store the > Snapshot information in the Index. If we add the snapshot information into > the indexing structure, we will have the following advantages. This idea has been discussed to death many times before. Please search the archives. > a) There can be index only scans like Oracle IMO, the most promising approach to achieving index-only-scans at the moment is the Dead Space Map, as discussed in the 8.3 dev cycle. > b) Unique indexes will become less costly, as older index tuples can be > found out. Doesn't seem like a big benefit, considering that in most cases there won't be any tuples in the index with a duplicate key. A common exception to that is (non-HOT) updating a row. But in that case, the page containing the old tuple is already in cache, so the lookup of the visibility from the heap is cheap. > c) Even the index scans will get faster, since some of the index tuples > won't translate into HeapScans. That's the same as doing an index-only-scan, right? > d) Deletes and Updates will become slightly costly, as they have to update > these indexes. I think you're grossly underestimating the cost of that. For example, on a table with 3 indexes. a delete currently requires one index lookup + one heap lookup. With visibility in the indexes, that would require 3 index lookups + one heap lookup. That's 4 vs. 2 page accesses, not taking into account the non-leaf b-tree pages. The real impact will depend on what's in cache, but the cost can be very high. Also, the full visibility information would need 12 bytes of space per tuple. An index tuple on an int4 key currently takes 12 bytes, so that would double the index size. Storage size has a big impact on performance. More bytes means more I/O, less data fits in cache, and more WAL traffic. There's non-trivial implementation issues involved as well. You'd need a way to reliably find all the index pointers for a given heap tuple (search the archives for "retail vacuum" for the issues involved in that. Broken user defined functions are a problem for example). And you'd need to keep them all locked at the same time to modify them all atomically, which is prone to deadlocks. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| On Mon, 2007-10-08 at 09:40 +0100, Heikki Linnakangas wrote: > This idea has been discussed to death many times before. Please search > the archives. Somewhat related to the "visibility in index" thing: would it be possible to have a kind of index-table ? We do have here some tables which have 2-4 fields, and the combination of them forms the primary key of the table. There are usually no other indexes on the table, and the net result is that the PK index duplicates the heap. So it would be cool if the index would be THE heap somehow... The most prominent example of this in our DBs is this table: db> \d table_a Table "public.table_a" Column | Type | Modifiers -----------+--------+----------- id_1 | bigint | not null id_2 | bigint | not null Indexes: "pk_table_a" PRIMARY KEY, btree (id_1, id_2) db> select reltuples::bigint, relpages from pg_class where relname='table_a'; reltuples | relpages -----------+---------- 445286464 | 710090 (1 row) db> select reltuples::bigint, relpages from pg_class where relname='pk_table_a'; reltuples | relpages -----------+---------- 445291072 | 599848 (1 row) This postgres install is compiled with 32K page size (for the ones who wonder about the page counts). In any case, it is clear that the index basically duplicates the heap... Cheers, Csaba. ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Csaba Nagy wrote: > On Mon, 2007-10-08 at 09:40 +0100, Heikki Linnakangas wrote: >> This idea has been discussed to death many times before. Please search >> the archives. > > Somewhat related to the "visibility in index" thing: would it be > possible to have a kind of index-table ? We do have here some tables > which have 2-4 fields, and the combination of them forms the primary key > of the table. There are usually no other indexes on the table, and the > net result is that the PK index duplicates the heap. So it would be cool > if the index would be THE heap somehow... The clustered index patch I worked on for 8.3, but didn't make it in, would have helped that use case a lot. A column-store kind of mechanism would be nice. Some columns could be stored in index-like structures, while other would be in the heap. I haven't seen any practical proposal on how to do it though. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| On 10/8/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote: > > Gokulakannan Somasundaram wrote: > > Currently The index implementation in Postgresql does not store the > > Snapshot information in the Index. If we add the snapshot information > into > > the indexing structure, we will have the following advantages. > > This idea has been discussed to death many times before. Please search > the archives. > > > a) There can be index only scans like Oracle > > IMO, the most promising approach to achieving index-only-scans at the > moment is the Dead Space Map, as discussed in the 8.3 dev cycle. Index only scans means that in order to get certain results, we may not goto the table at all. For example, if you have an index on columns a and b, and if there is a query like "select b from table where a between a1 and a2", then the explain plan need not goto the table. I can't understand how dead space map will provide such a functionality. In short each index will act like an Index Organized Table, if the all the columns of the query are present in the index. > b) Unique indexes will become less costly, as older index tuples can be > > found out. > > Doesn't seem like a big benefit, considering that in most cases there > won't be any tuples in the index with a duplicate key. A common > exception to that is (non-HOT) updating a row. But in that case, the > page containing the old tuple is already in cache, so the lookup of the > visibility from the heap is cheap. Its not a big benefit. agreed. > c) Even the index scans will get faster, since some of the index tuples > > won't translate into HeapScans. > > That's the same as doing an index-only-scan, right? No here if you have an index on a(say). If there is a query like select * form table where a between a1 and a2, currently the scan goes to the table to verify the visibility. Of course if the tuple satisfies vacuum, then it is marked in the index, which is an optimization. This is not index-only scan. This is a normal index scan, which can skip certain random I/Os. > d) Deletes and Updates will become slightly costly, as they have to update > > these indexes. > > I think you're grossly underestimating the cost of that. For example, on > a table with 3 indexes. a delete currently requires one index lookup + > one heap lookup. With visibility in the indexes, that would require 3 > index lookups + one heap lookup. That's 4 vs. 2 page accesses, not > taking into account the non-leaf b-tree pages. The real impact will > depend on what's in cache, but the cost can be very high. That's true. But i am not asking to replace the current index implementation, but to provide an extra option while indexing. Say if a particular database setup doesn't do much deletes and updates(imagine tables with partitioning, where the partitions/tables are dropped instead of deletes. They can have an option to "create index .. with snapshot" Imagine the Index Vacuum also will do lesser Random I/Os Also, the full visibility information would need 12 bytes of space per > tuple. An index tuple on an int4 key currently takes 12 bytes, so that > would double the index size. Storage size has a big impact on > performance. More bytes means more I/O, less data fits in cache, and > more WAL traffic. I am thinking of certain optimizations here. we have a bit unused in indextuple structure. If a particular tuple is not deleted, then we can signify that using that bit and save 6 bytes of saving the xmax and cmax. We are trading of this space efficiency in place of Random I/Os, which is not a bad trade-off , i suppose. Again this is going to optional for the user. If users have an option to create Bitmap index/ Binary index, why can't they have this option as well? There's non-trivial implementation issues involved as well. You'd need a > way to reliably find all the index pointers for a given heap tuple > (search the archives for "retail vacuum" for the issues involved in > that. Broken user defined functions are a problem for example). And > you'd need to keep them all locked at the same time to modify them all > atomically, which is prone to deadlocks. I think Vacuum need not goto the table, as the visibility information is present in the index itself. I don't know whether i have given the correct answer here. Expecting your reply.. Thanks, Gokul. |
| |||
| On 10/8/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote: > > Csaba Nagy wrote: > > On Mon, 2007-10-08 at 09:40 +0100, Heikki Linnakangas wrote: > >> This idea has been discussed to death many times before. Please search > >> the archives. > > > > Somewhat related to the "visibility in index" thing: would it be > > possible to have a kind of index-table ? We do have here some tables > > which have 2-4 fields, and the combination of them forms the primary key > > of the table. There are usually no other indexes on the table, and the > > net result is that the PK index duplicates the heap. So it would be cool > > if the index would be THE heap somehow... > > The clustered index patch I worked on for 8.3, but didn't make it in, > would have helped that use case a lot. > > A column-store kind of mechanism would be nice. Some columns could be > stored in index-like structures, while other would be in the heap. I > haven't seen any practical proposal on how to do it though. I think it more resembles the Oracle's IOT with overflow. I think my proposal, once implemented can be easily extended to incorporate IOT/Clustered indexes. One thing is for sure. Without storing Visibility info, Unique Secondary indexes(Indexes on IOT/Clustered indexed tables) is not possible, as it is not possible to re-locate the same entry in a b-tree, if we are storing the Primary key in place of tuple-id. -- > Heikki Linnakangas > EnterpriseDB http://www.enterprisedb.com > |
| |||
| Gokulakannan Somasundaram wrote: > On 10/8/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote: >> IMO, the most promising approach to achieving index-only-scans at the >> moment is the Dead Space Map, as discussed in the 8.3 dev cycle. > > Index only scans means that in order to get certain results, we may not > goto the table at all. For example, if you have an index on columns a and b, > and if there is a query like "select b from table where a between a1 and > a2", then the explain plan need not goto the table. I can't understand how > dead space map will provide such a functionality. Please read the discussions in the archives. The dead space map holds visibility information in a condensed form. For index-only-scans, we need to know if all tuples on page are are visible to us. If the dead space map is designed with index-only-scans in mind, we can store a bit there indicating "all tuples on this page are visible to everyone". Pages that have that bit set don't need to be visited to check visibility. What information exactly is going to be stored in the dead space map is still debated. For vacuuming, we need to know which pages contain dead tuples that are worth vacuuming, which isn't the same thing as "all tuples are visible to everyone", but it's quite close. Heap pages that do have dead or recently modified rows do need to be visited, so the executor needs to always be prepared to check visibility from the heap. However, on a table that's not very frequently updated, most bits will be set and the effect will be the same as genuine index-only-scans. On a table that is frequently updated, you would suffer a big hit in update performance with the "duplicate visibility information in all indexes" scheme as well, as the updates would need to update the indexes as well, so the performance characteristics are roughly the same. > That's true. But i am not asking to replace the current index > implementation, but to provide an extra option while indexing. Say if a > particular database setup doesn't do much deletes and updates(imagine tables > with partitioning, where the partitions/tables are dropped instead of > deletes. They can have an option to "create index .. with snapshot" A nice property of utilizing the dead space map for index-only-scans is that it doesn't require any action from the DBA. It will "just work". It also adapts well to tables that have parts that are frequently updated, and other parts that are not. The frequently updated parts will behave like what we have now, index-only-scans are not possible because, but deletes/updates are cheap. But the less frequently updated parts will eventually have the bits set in the map, and we can do index-only-scans for those parts. > Imagine the Index Vacuum also will do lesser Random I/Os Index vacuum doesn't do random I/Os as it is. > Also, the full visibility information would need 12 bytes of space per >> tuple. An index tuple on an int4 key currently takes 12 bytes, so that >> would double the index size. Storage size has a big impact on >> performance. More bytes means more I/O, less data fits in cache, and >> more WAL traffic. > > I am thinking of certain optimizations here. we have a bit unused in > indextuple structure. If a particular tuple is not deleted, then we can > signify that using that bit and save 6 bytes of saving the xmax and cmax. We > are trading of this space efficiency in place of Random I/Os, which is not a > bad trade-off , i suppose. Again this is going to optional for the user. If > users have an option to create Bitmap index/ Binary index, why can't they > have this option as well? Because we have to maintain it? Speaking of bitmap indexes, that would be nice to have. It looks like Gavin dropped the ball on the patch, so if you want to continue that work, that would be great. > There's non-trivial implementation issues involved as well. You'd need a >> way to reliably find all the index pointers for a given heap tuple >> (search the archives for "retail vacuum" for the issues involved in >> that. Broken user defined functions are a problem for example). And >> you'd need to keep them all locked at the same time to modify them all >> atomically, which is prone to deadlocks. > > I think Vacuum need not goto the table, as the visibility information is > present in the index itself. Vacuum isn't the problem here. The problem is: given heap tuple X, how do you find the the corresponding index tuples? The obvious solution is to calculate the index keys from the heap tuple, and use them to look up. But what if you have an expression index on a user-defined function, and that user-defined function is broken so that it returns a different value than when the index tuple was inserted? You won't find the index tuples in that case, so you won't be able to update the visibility information. Granted, you've got a broken user-defined-function in that case, and you're going to get bogus query results anyway. But not finding the index tuple when needed would lead to more serious corruption. This is the same problem many proposed retail vacuum schemes have stumbled into, which is why I suggested searching for that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| Ühel kenal päeval, E, 2007-10-08 kell 11:41, kirjutas Heikki Linnakangas: > The dead space map holds > visibility information in a condensed form. For index-only-scans, we > need to know if all tuples on page are are visible to us. If the dead > space map is designed with index-only-scans in mind, we can store a bit > there indicating "all tuples on this page are visible to everyone". > Pages that have that bit set don't need to be visited to check visibility. > > What information exactly is going to be stored in the dead space map is > still debated. For vacuuming, we need to know which pages contain dead > tuples that are worth vacuuming, which isn't the same thing as "all > tuples are visible to everyone", but it's quite close. I would prefer a separate MVC visibility heap (aka. extended "dead space map") which would duplicate whole visibility info from heap pages, just in compressed form. After a few releases with duplicated visibility info, we could remove it from the data heap. If the whole visibility info (cmin, cmax, tmin, tmax, flags, (+ size for DSM uses)) for tuples, were in a separate heap, it would allow for a lot of on-the-fly compression. for example we could: * get rid of both tmin and tmax for all completed transactions * reduce any deleted tuple to just flags * reduce any tuple produced by aborted transaction to just flags * reduce any tuple visible to all backends to just flags * RRL compress (runs of) pages produced by same transaction * RRL compress (runs of) pages with all tuples visible * RRL compress (runs of) pages with all tuples deleted depending on distribution of Inserts/Updates/Deletes it will make visibility info a little or a lot smaller than it is currently, greatly enchancing chances that it stays in cache (even for OLAP loads, because data for these are usually produced by bulk inserts and thus their visibility info is highly compressable) It also makes VACUUM more efficient, as it's initial scan (find vacuumable tuples) will need to do lot less IO. And it allows for more intelligent choices for new tuple placement , especially if we want to preserve clustering. And of course it gives you kind of index-only scans (mostly read index + check in vis.heap) ------------- Hannu ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| Ühel kenal päeval, E, 2007-10-08 kell 12:12, kirjutas Gokulakannan Somasundaram: > Hi, > Currently The index implementation in Postgresql does not store > the Snapshot information in the Index. ... > Please reply back with your comments. I think you got enoght "search for previous discussion" answers on this aone already So I propose a few another ideas to investigate 1. get rid of indexes for TOAST tables instead of TOAST tuple id store CTID of first TOAST block directly, and use something like skip lists inside the TOAST block headers to access next TOAST tuples. 2. store visibility info in TOAST tables if you store visibility info in TOAST tuples, it becomes possible to update just the small part of the tuple in the main heap and keep the bulk of toasted data in place. both of these ideas are much more complicated to implement than it appears from my simple description, but should have big benefits for a sizable number of scenarios which use TOAST. ------------- Hannu ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| ||||
| On Mon, 2007-10-08 at 14:58 +0300, Hannu Krosing wrote: > 1. get rid of indexes for TOAST tables > > instead of TOAST tuple id store CTID of first TOAST block directly, and > use something like skip lists inside the TOAST block headers to access > next TOAST tuples. It should be possible to optimise TOAST for when there is just a single chunk that is toasted. Since we often (and by default) compress data stored in TOAST tables this would be a frequently used optimisation. Instead of storing the TOAST OID, which is then looked-up in the index to find the TOAST tid, we can just store the tid directly in the toast pointer on the main heap. That way we'll get faster read and write access for a good proportion of rows by avoiding the toast index and going straight to the toast heap. We'd need a different kind of toast pointer which would be 2 bytes longer than the normal pointer. I think we have a spare flag bit to indicate this. That's just a rough sketch, I've not checked the details on this. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |