vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Here's a patch that implements page at a time index scans discussed at pgsql-hackers earlier. See proposal 1 at: http://archives.postgresql.org/pgsql...3/msg01237.php It passes regression tests, and there's no known bugs. There's some minor issues I'd like to point out, though: 1. An index scan now needs to allocate enough memory to hold potentially a whole page worth of items. And if you use markpos/restrpos, twice that much. I don't know if that's an issue, but I thought I'd bring that up. 2. Vacuum is now done in one phase, scanning the index in physical order. That significantly speeds up index vacuums of large indexes that don't fit into memory. However, btbulkdelete doesn't know if the vacuum is a full or lazy one. The patch just assumes it's a lazy vacuum, but the API really needs to be changed to pass that information earlier than at vacuum_cleanup. 3. Before the patch, a scan would keep the current page pinned to keep vacuum from deleting the current item. The patch doesn't change that behaviour, but it now seems to me that even a pin is no longer needed. The patch needs testing and review, to ensure it doesn't brake anything, and to see the effect on performance. It doesn't change disk layout or catalogs, so you can run it using the same data directory as with the unpatched version. - Heikki ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| On Mon, 2006-05-01 at 22:00 +0300, Heikki Linnakangas wrote: > Here's a patch that implements page at a time index scans discussed at > pgsql-hackers earlier. See proposal 1 at: > http://archives.postgresql.org/pgsql...3/msg01237.php > > It passes regression tests, and there's no known bugs. There's > some minor issues I'd like to point out, though: > > 1. An index scan now needs to allocate enough memory to hold potentially a > whole page worth of items. And if you use markpos/restrpos, twice that > much. I don't know if that's an issue, but I thought I'd bring that up. AFAICS the code: - allocates memory for the markpos whether or not its ever needed? Most index scans never call markpos and not all merge joins either, so that seems wasteful. We could allocate when btmarkpos() is called for the first time, if ever. - allocates 1024 offsets in every case. If this were just a unique index retrieval, that would be too much. When scan->is_multiscan == true, go straight for 1024, otherwise start with just 32 offsets and double that when/if required. > 2. Vacuum is now done in one phase, scanning the index in physical order. > That significantly speeds up index vacuums of large indexes that don't fit > into memory. Also for those that *aren't in* memory. Should speed up medium-sized VACUUMs also because of sequential disk access. > However, btbulkdelete doesn't know if the vacuum is a full or > lazy one. The patch just assumes it's a lazy vacuum, but the API really > needs to be changed to pass that information earlier than at vacuum_cleanup. Looks like it needs work. Do you have suggestions while you're there? > 3. Before the patch, a scan would keep the current page pinned to keep > vacuum from deleting the current item. The patch doesn't change that > behaviour, but it now seems to me that even a pin is no longer needed. Agreed. The pin has two functions: - keep the page from being moved out of the bufmgr - no need anymore - stop a vacuum from removing the page - no need anymore. We'll not stop on a removable row anymore, so no need. Some of the code doesn't use standard spacing e.g. "if(" should be "if (", but other than that it looks very neat and well implemented. Overall, I'm optimistic that this patch will help in a number of ways. Speeding up a VACUUM index scan is a primary objective and it looks like that will work well. Also, this looks like it will reduce LWLocking overhead and encourage sequential memory scans of blocks, both of which will improve index scan performance. It should also reduce buffer residency time making shared_buffers more fluid. So, subject to performance tests of this I'm very interested in this. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| On Tue, 2 May 2006, Simon Riggs wrote: > On Mon, 2006-05-01 at 22:00 +0300, Heikki Linnakangas wrote: >> >> 1. An index scan now needs to allocate enough memory to hold potentially a >> whole page worth of items. And if you use markpos/restrpos, twice that >> much. I don't know if that's an issue, but I thought I'd bring that up. > > AFAICS the code: > > - allocates memory for the markpos whether or not its ever needed? Most > index scans never call markpos and not all merge joins either, so that > seems wasteful. We could allocate when btmarkpos() is called for the > first time, if ever. Right. I'll do that. > - allocates 1024 offsets in every case. If this were just a unique index > retrieval, that would be too much. When scan->is_multiscan == true, go > straight for 1024, otherwise start with just 32 offsets and double that > when/if required. I wonder if that gets a bit too complicated for saving a small amount of memory? I'll do it if people think it's worth it. Also, could we calculate a better estimate of the maximum number of offsets an index page can hold? There's no way a page can really hold 1024 items. Page headers and special space, line pointers, and the actual keys need some space. >> However, btbulkdelete doesn't know if the vacuum is a full or >> lazy one. The patch just assumes it's a lazy vacuum, but the API really >> needs to be changed to pass that information earlier than at vacuum_cleanup. > > Looks like it needs work. Do you have suggestions while you're there? Now that I look at it: Why do we have a separate vacuum_cleanup function at all? Calls to index_vacuum_cleanup go hand in hand with calls to index_bulk_delete. I thought that index_vacuum_cleanup would only be called after the last cycle of a multi-cycle vacuum, but that doesn't seem to be the case. >> 3. Before the patch, a scan would keep the current page pinned to keep >> vacuum from deleting the current item. The patch doesn't change that >> behaviour, but it now seems to me that even a pin is no longer needed. > > Agreed. The pin has two functions: > - keep the page from being moved out of the bufmgr - no need anymore > - stop a vacuum from removing the page - no need anymore. We'll not stop > on a removable row anymore, so no need. At the moment, backward scan returns to the page to walk left from there. It could be changed to take a copy of the left sibling pointer like forward scans do, eliminating the need to return to the original page in most cases. Also, if there's dead tuples on the page, the scan will need to return to the page to kill them. But you can always re-pin the page when needed. > Some of the code doesn't use standard spacing e.g. "if(" should be "if > (", but other than that it looks very neat and well implemented. Thanks, I'll fix the spacings. - Heikki ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| Heikki Linnakangas <hlinnaka@iki.fi> writes: > Also, could we calculate a better estimate of the maximum number of > offsets an index page can hold? We could make something analogous to MaxHeapTuplesPerPage --- the correct number ought to be approximately BLCKSZ/16 I should think. (It's not possible for an entry to be *just* the header, there has to be either a datum or a null bitmap. Hence, with maxalign padding, at least 12 bytes for item, plus 4 for item pointer.) > Now that I look at it: Why do we have a separate vacuum_cleanup function > at all? Calls to index_vacuum_cleanup go hand in hand with calls to > index_bulk_delete. Yeah, I was just thinking we ought to revisit that. The original idea was that vacuumcleanup would be called just once at the end of vacuuming, not once per bulkdelete. I don't recall why I changed it but it was probably a bad idea to do so. >> Agreed. The pin has two functions: >> - keep the page from being moved out of the bufmgr - no need anymore >> - stop a vacuum from removing the page - no need anymore. We'll not stop >> on a removable row anymore, so no need. > At the moment, backward scan returns to the page to walk left from there. Backwards scan may break this whole concept; are you sure you've thought it through? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Heikki Linnakangas <hlinnaka@iki.fi> writes: >> Here's a patch that implements page at a time index scans discussed at >> pgsql-hackers earlier. See proposal 1 at: >> http://archives.postgresql.org/pgsql...3/msg01237.php One potential performance lossage from this is that it partially defeats the keys_are_unique optimization: bt_checkkeys will be run across all the matching tuples on the index page even if the waiting caller is going to stop after the first live one. (I don't see any way to avoid that without breaking the entire concept, since we can't know which of the index entries the caller will think is live.) I suspect this is not a deal-breaker, but we have to test to make sure that case isn't getting markedly worse. The thing to look at would be unique indexes with expensive comparison functions (eg, text in a non-C locale). regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| I wrote: > Heikki Linnakangas <hlinnaka@iki.fi> writes: >> Now that I look at it: Why do we have a separate vacuum_cleanup function >> at all? Calls to index_vacuum_cleanup go hand in hand with calls to >> index_bulk_delete. > Yeah, I was just thinking we ought to revisit that. The original idea > was that vacuumcleanup would be called just once at the end of vacuuming, > not once per bulkdelete. I don't recall why I changed it but it was > probably a bad idea to do so. I remember why: the design involves passing a palloc'd struct from bulkdelete to vacuumcleanup and there needed to be matching calls to make that behave sanely. We could fix this if the API is something like "the first bulkdelete call palloc's the struct, and *it's passed in to* each subsequent bulkdelete, as well as being passed to vacuumcleanup". So we're short one argument to bulkdelete. This would provide a saner way of dealing with the case of nothing to delete, too: if vacuumcleanup gets a NULL stats pointer, that means bulkdelete wasn't ever called (or was, but chose never to return a non-null pointer). Also, as noted in other contexts, it'd be a good idea if vacuumcleanup was told the total number of heap tuples (GIN needs this), and both steps really ought to be able to find out if it's a full or lazy vacuum. I'll work on making these API changes; the recent GIN patch has left some other detritus that needs to be cleaned up in the same area. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| On Tue, 2 May 2006, Tom Lane wrote: > Also, as noted in other contexts, it'd be a good idea if vacuumcleanup > was told the total number of heap tuples (GIN needs this), and both > steps really ought to be able to find out if it's a full or lazy vacuum. It's already in IndexVacuumCleanupInfo, isn't it? /* Struct for additional arguments passed to vacuum-cleanup operation */ typedef struct IndexVacuumCleanupInfo { bool vacuum_full; /* VACUUM FULL (we have exclusive lock) */ int message_level; /* ereport level for progress messages */ --> double num_heap_tuples; /* tuples remaining in heap */ } IndexVacuumCleanupInfo; gistvacuumcleanup uses num_heap_tuples to set num_index_tuples when it doesn't need to scan the index otherwise. BTW: Is it possible to have a partial gist index? If it is, num_index_tuples = num_heap_tuples isn't right. - Heikki ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| On Tue, 2 May 2006, Tom Lane wrote: >>> Agreed. The pin has two functions: >>> - keep the page from being moved out of the bufmgr - no need anymore >>> - stop a vacuum from removing the page - no need anymore. We'll not stop >>> on a removable row anymore, so no need. > >> At the moment, backward scan returns to the page to walk left from there. > > Backwards scan may break this whole concept; are you sure you've thought > it through? I think so. The patch doesn't change the walk-left code. Do you have something specific in mind? - Heikki ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Heikki Linnakangas <hlinnaka@iki.fi> writes: > On Tue, 2 May 2006, Tom Lane wrote: >> Also, as noted in other contexts, it'd be a good idea if vacuumcleanup >> was told the total number of heap tuples (GIN needs this), and both >> steps really ought to be able to find out if it's a full or lazy vacuum. > It's already in IndexVacuumCleanupInfo, isn't it? Yeah, I had forgotten that, but just noticed it again now. The patch I'm working on at the moment defines /* * Struct for input arguments passed to ambulkdelete and amvacuumcleanup * * Note that num_heap_tuples will not be valid during ambulkdelete, * only amvacuumcleanup. */ typedef struct IndexVacuumInfo { Relation index; /* the index being vacuumed */ bool vacuum_full; /* VACUUM FULL (we have exclusive lock) */ int message_level; /* ereport level for progress messages */ double num_heap_tuples; /* tuples remaining in heap */ } IndexVacuumInfo; with IndexBulkDeleteResult * ambulkdelete (IndexVacuumInfo *info, IndexBulkDeleteResult *stats, IndexBulkDeleteCallback callback, void *callback_state); Because of limited <varname>maintenance_work_mem</>, <function>ambulkdelete</> may need to be called more than once when many tuples are to be deleted. The <literal>stats</> argument is the result of the previous call for this index (it is NULL for the first call within a <command>VACUUM</> operation). This allows the AM to accumulate statistics across the whole operation. Typically, <function>ambulkdelete</> will modify and return the same struct if the passed <literal>stats</> is not null. IndexBulkDeleteResult * amvacuumcleanup (IndexVacuumInfo *info, IndexBulkDeleteResult *stats); Clean up after a <command>VACUUM</command> operation (zero or more <function>ambulkdelete</> calls). This does not have to do anything beyond returning index statistics, but it may perform bulk cleanup such as reclaiming empty index pages. <literal>stats</> is whatever the last <function>ambulkdelete</> call returned, or NULL if <function>ambulkdelete</> was not called because no tuples needed to be deleted. If the result is not NULL it must be a palloc'd struct. The statistics it contains will be reported by <command>VACUUM</> if <literal>VERBOSE</> is given. > BTW: Is it possible to have a partial gist index? If it is, > num_index_tuples = num_heap_tuples isn't right. It is, and it isn't ;-). We'll need to see about fixing that. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| ||||
| Heikki Linnakangas <hlinnaka@iki.fi> writes: > On Tue, 2 May 2006, Tom Lane wrote: >> Backwards scan may break this whole concept; are you sure you've thought >> it through? > I think so. The patch doesn't change the walk-left code. Do you have > something specific in mind? I'm worried about synchronization, particularly what happens if the page gets deleted from under you while you don't have it pinned. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |