vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| One complaint we've heard from clients trying out EDB or PostgreSQL is that loading data is slower than on other DBMSs. I ran oprofile on a COPY FROM to get an overview of where the CPU time is spent. To my amazement, the function at the top of the list was PageAddItem with 16% of samples. On every row, PageAddItem will scan all the line pointers on the target page, just to see that they're all in use, and create a new line pointer. That adds up, especially with narrow tuples like what I used in the test. Attached is a fix for that. It adds a flag to each heap page that indicates that "there isn't any free line pointers on this page, so don't bother trying". Heap pages haven't had any heap-specific per-page data before, so this patch adds a HeapPageOpaqueData-struct that's stored in the special space. My simple test case of a COPY FROM of 10000000 tuples took 19.6 s without the patch, and 17.7 s with the patch applied. Your mileage may vary. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Heikki Linnakangas wrote: > > Attached is a fix for that. It adds a flag to each heap page that > indicates that "there isn't any free line pointers on this page, so > don't bother trying". Heap pages haven't had any heap-specific > per-page data before, so this patch adds a HeapPageOpaqueData-struct > that's stored in the special space. > I would really like this change. I was thinking on similar lines to optimize some of the HOT code paths Thanks, Pavan ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Heikki Linnakangas wrote: > One complaint we've heard from clients trying out EDB or PostgreSQL is > that loading data is slower than on other DBMSs. > > I ran oprofile on a COPY FROM to get an overview of where the CPU time > is spent. To my amazement, the function at the top of the list was > PageAddItem with 16% of samples. > > On every row, PageAddItem will scan all the line pointers on the > target page, just to see that they're all in use, and create a new > line pointer. That adds up, especially with narrow tuples like what I > used in the test. > > Attached is a fix for that. It adds a flag to each heap page that > indicates that "there isn't any free line pointers on this page, so > don't bother trying". Heap pages haven't had any heap-specific > per-page data before, so this patch adds a HeapPageOpaqueData-struct > that's stored in the special space. > > My simple test case of a COPY FROM of 10000000 tuples took 19.6 s > without the patch, and 17.7 s with the patch applied. Your mileage may > vary. What is the speedup with less narrow tuples? 10% improvement is good but not stellar. cheers andrew ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Heikki Linnakangas <heikki@enterprisedb.com> writes: > On every row, PageAddItem will scan all the line pointers on the target > page, just to see that they're all in use, and create a new line > pointer. That adds up, especially with narrow tuples like what I used in > the test. > Attached is a fix for that. This has been proposed before, and rejected before. IIRC the previous patch was quite a lot less invasive than this one (it didn't require making special space on heap pages). I don't recall why it wasn't accepted. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| Tom Lane wrote: > Heikki Linnakangas <heikki@enterprisedb.com> writes: >> On every row, PageAddItem will scan all the line pointers on the target >> page, just to see that they're all in use, and create a new line >> pointer. That adds up, especially with narrow tuples like what I used in >> the test. >> Attached is a fix for that. > > This has been proposed before, and rejected before. IIRC the previous > patch was quite a lot less invasive than this one (it didn't require > making special space on heap pages). I don't recall why it wasn't > accepted. Ahh, found that thread: http://archives.postgresql.org/pgsql...7/msg00609.php The main differences between that patch and mine is that - the previous patch used an offset to the first free line pointer, and I used just a flag. - the previous patch stored the offset in the page header, and I used the special space I think using the special space is a cleaner approach; the field is only meaningful in heap pages. However, now that I think of it, if we could squeeze the flag into one of the existing fields in the page header, we could put it there without decreasing the amount of space available for tuples. We could use the unused pd_tli field, as you suggested later in that thread. At the end of the thread, Bruce added the patch to his hold-queue, but I couldn't find a trace of it after that so I'm not clear why it was rejected in the end. This comment (by you) seems most relevant: > I tried making a million-row table with just two int4 columns and then > duplicating it with CREATE TABLE AS SELECT. In this context gprof > shows PageAddItem as taking 7% of the runtime, which your patch knocks > down to 1.5%. This seems to be about the best possible real-world case, > though (the wider the rows, the fewer times PageAddItem can loop), and > so I'm still unconvinced that there's a generic gain here. Adding an > additional word to page headers has a very definite cost --- we can > assume about a .05% increase in net I/O demands across *every* > application, whether they do a lot of inserts or not --- and so a > patch that provides a noticeable improvement in only a very small set > of circumstances is going to have to be rejected. I believe the PageAddItem overhead has become more noticeable since then because of other improvements to COPY. In 8.3, we're also going to reduce the tuple length (combocids and the varvarlen thing), so we can fit more tuples per page, again making it slightly more significant. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| Heikki Linnakangas <heikki@enterprisedb.com> writes: > At the end of the thread, Bruce added the patch to his hold-queue, but I > couldn't find a trace of it after that so I'm not clear why it was > rejected in the end. This comment (by you) seems most relevant: I believe we concluded that the distributed cost of enlarging the page headers would probably outweigh a gain that seemed to have fairly narrow application. I wouldn't be surprised if the tradeoffs have changed since then, but I'm still loath to increase the overhead space. If we can do it without that, the argument to reject gets much weaker. As you say, pd_tli is not really pulling its weight, but I'm also loath to remove it, as in a multi-timeline situation the page LSN is really not well defined if you don't know which timeline it refers to. Now we'd only need 16 bits to store the last-used offset, or a flags field if you'd prefer that, so one possible compromise is to store only the 16 least significant bits of TLI (which ought to be enough to disambiguate in any real-world situation), and insert the new field where the MSBs had been. I'm not sure whether I like your flag approach better than the last-used-offset one. The previous patch probably buys some teeny amount more performance, but the flag seems more robust (noting in passing that neither patch attempts to WAL-log its changes, so we really need to treat the values as hints). And a change as sketched here would leave us 15 free bits for future expansion, which might be nice. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| Tom Lane wrote: > As you say, pd_tli is not really pulling its weight, but I'm also loath > to remove it, as in a multi-timeline situation the page LSN is really > not well defined if you don't know which timeline it refers to. > > Now we'd only need 16 bits to store the last-used offset, or a flags > field if you'd prefer that, so one possible compromise is to store only > the 16 least significant bits of TLI (which ought to be enough to > disambiguate in any real-world situation), and insert the new field > where the MSBs had been. Sounds good to me. It's nice to keep the TLI for debugging/forensics purposes even if it's not used at the moment, but 16 bits is enough for that. Another possibility would be to use the unused bits in pd_upper/lower/special, but that requires more bit-trickery. > I'm not sure whether I like your flag approach better than the > last-used-offset one. The previous patch probably buys some teeny > amount more performance, but the flag seems more robust (noting in > passing that neither patch attempts to WAL-log its changes, so we really > need to treat the values as hints). And a change as sketched here would > leave us 15 free bits for future expansion, which might be nice. I'll post a patch along those lines. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Heikki Linnakangas wrote: > Tom Lane wrote: >> I'm not sure whether I like your flag approach better than the >> last-used-offset one. The previous patch probably buys some teeny >> amount more performance, but the flag seems more robust (noting in >> passing that neither patch attempts to WAL-log its changes, so we really >> need to treat the values as hints). And a change as sketched here would >> leave us 15 free bits for future expansion, which might be nice. > > I'll post a patch along those lines. Here it is. I'm not fond of the macro names for the flag, but couldn't think of anything shorter yet descriptive. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Heikki Linnakangas <heikki@enterprisedb.com> writes: >> I'll post a patch along those lines. > Here it is. > I'm not fond of the macro names for the flag, but couldn't think of > anything shorter yet descriptive. Let's reverse the sense of the flag bit; this seems a good idea since the initial state should be 0 = there are no free pointers. Also I'd go with PD_ as the prefix, for consistency with the field names. This brings us to PD_HAS_FREE_LINE_POINTERS or some abbreviation thereof (maybe PD_HAS_FREE_LINES is sufficient). I'd also go with "pd_flags" as the field name; "amprivate" seems to imply way too much about what we might later use the flags for. Barring objections, I'll tweak this as above and apply. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| ||||
| I wrote: > Barring objections, I'll tweak this as above and apply. I've applied the attached modified version of this patch. It seemed better to me to centralize the handling of this flag bit in PageAddItem and PageRepairFragmentation, instead of having it in the callers as you did. This means that the bit applies to all pages not only heap pages, but at least for the moment that has no downside. I note that GIN indexes do use PageAddItem with offsetNumber = InvalidOffsetNumber, so they will get some performance benefit too. regards, tom lane Index: doc/src/sgml/storage.sgml ================================================== ================= RCS file: /cvsroot/pgsql/doc/src/sgml/storage.sgml,v retrieving revision 1.14 diff -c -r1.14 storage.sgml *** doc/src/sgml/storage.sgml 31 Jan 2007 20:56:19 -0000 1.14 --- doc/src/sgml/storage.sgml 2 Mar 2007 00:48:25 -0000 *************** *** 427,434 **** The first 20 bytes of each page consists of a page header (PageHeaderData). Its format is detailed in <xref linkend="pageheaderdata-table">. The first two fields track the most ! recent WAL entry related to this page. They are followed by three 2-byte ! integer fields (<structfield>pd_lower</structfield>, <structfield>pd_upper</structfield>, and <structfield>pd_special</structfield>). These contain byte offsets from the page start to the start --- 427,434 ---- The first 20 bytes of each page consists of a page header (PageHeaderData). Its format is detailed in <xref linkend="pageheaderdata-table">. The first two fields track the most ! recent WAL entry related to this page. Next is a 2-byte field ! containing flag bits. This is followed by three 2-byte integer fields (<structfield>pd_lower</structfield>, <structfield>pd_upper</structfield>, and <structfield>pd_special</structfield>). These contain byte offsets from the page start to the start *************** *** 437,448 **** The last 2 bytes of the page header, <structfield>pd_pagesize_version</structfield>, store both the page size and a version indicator. Beginning with ! <productname>PostgreSQL</productname> 8.1 the version number is 3; <productname>PostgreSQL</productname> 8.0 used version number 2; <productname>PostgreSQL</productname> 7.3 and 7.4 used version number 1; prior releases used version number 0. ! (The basic page layout and header format has not changed in these versions, ! but the layout of heap row headers has.) The page size is basically only present as a cross-check; there is no support for having more than one page size in an installation. --- 437,449 ---- The last 2 bytes of the page header, <structfield>pd_pagesize_version</structfield>, store both the page size and a version indicator. Beginning with ! <productname>PostgreSQL</productname> 8.3 the version number is 4; ! <productname>PostgreSQL</productname> 8.1 and 8.2 used version number 3; <productname>PostgreSQL</productname> 8.0 used version number 2; <productname>PostgreSQL</productname> 7.3 and 7.4 used version number 1; prior releases used version number 0. ! (The basic page layout and header format has not changed in most of these ! versions, but the layout of heap row headers has.) The page size is basically only present as a cross-check; there is no support for having more than one page size in an installation. *************** *** 470,478 **** </row> <row> <entry>pd_tli</entry> ! <entry>TimeLineID</entry> ! <entry>4 bytes</entry> ! <entry>TLI of last change</entry> </row> <row> <entry>pd_lower</entry> --- 471,485 ---- </row> <row> <entry>pd_tli</entry> ! <entry>uint16</entry> ! <entry>2 bytes</entry> ! <entry>TimeLineID of last change (only its lowest 16 bits)</entry> ! </row> ! <row> ! <entry>pd_flags</entry> ! <entry>uint16</entry> ! <entry>2 bytes</entry> ! <entry>Flag bits</entry> </row> <row> <entry>pd_lower</entry> Index: src/backend/storage/page/bufpage.c ================================================== ================= RCS file: /cvsroot/pgsql/src/backend/storage/page/bufpage.c,v retrieving revision 1.71 diff -c -r1.71 bufpage.c *** src/backend/storage/page/bufpage.c 21 Feb 2007 20:02:17 -0000 1.71 --- src/backend/storage/page/bufpage.c 2 Mar 2007 00:48:26 -0000 *************** *** 39,44 **** --- 39,45 ---- /* Make sure all fields of page are zero, as well as unused space */ MemSet(p, 0, pageSize); + /* p->pd_flags = 0; done by above MemSet */ p->pd_lower = SizeOfPageHeaderData; p->pd_upper = pageSize - specialSize; p->pd_special = pageSize - specialSize; *************** *** 73,78 **** --- 74,80 ---- /* Check normal case */ if (PageGetPageSize(page) == BLCKSZ && PageGetPageLayoutVersion(page) == PG_PAGE_LAYOUT_VERSION && + (page->pd_flags & ~PD_VALID_FLAG_BITS) == 0 && page->pd_lower >= SizeOfPageHeaderData && page->pd_lower <= page->pd_upper && page->pd_upper <= page->pd_special && *************** *** 165,178 **** else { /* offsetNumber was not passed in, so find a free slot */ ! /* look for "recyclable" (unused & deallocated) ItemId */ ! for (offsetNumber = 1; offsetNumber < limit; offsetNumber++) { ! itemId = PageGetItemId(phdr, offsetNumber); ! if (!ItemIdIsUsed(itemId) && ItemIdGetLength(itemId) == 0) ! break; } - /* if no free slot, we'll put it at limit (1st open slot) */ } if (offsetNumber > limit) --- 167,193 ---- else { /* offsetNumber was not passed in, so find a free slot */ ! /* if no free slot, we'll put it at limit (1st open slot) */ ! if (PageHasFreeLinePointers(phdr)) ! { ! /* look for "recyclable" (unused & deallocated) ItemId */ ! for (offsetNumber = 1; offsetNumber < limit; offsetNumber++) ! { ! itemId = PageGetItemId(phdr, offsetNumber); ! if (!ItemIdIsUsed(itemId) && ItemIdGetLength(itemId) == 0) ! break; ! } ! if (offsetNumber >= limit) ! { ! /* the hint is wrong, so reset it */ ! PageClearHasFreeLinePointers(phdr); ! } ! } ! else { ! /* don't bother searching if hint says there's no free slot */ ! offsetNumber = limit; } } if (offsetNumber > limit) *************** *** 413,425 **** pfree(itemidbase); } return (nline - nused); } /* * PageGetFreeSpace * Returns the size of the free (allocatable) space on a page, ! * deducted by the space needed for a new line pointer. */ Size PageGetFreeSpace(Page page) --- 428,446 ---- pfree(itemidbase); } + /* Set hint bit for PageAddItem */ + if (nused < nline) + PageSetHasFreeLinePointers(page); + else + PageClearHasFreeLinePointers(page); + return (nline - nused); } /* * PageGetFreeSpace * Returns the size of the free (allocatable) space on a page, ! * reduced by the space needed for a new line pointer. */ Size PageGetFreeSpace(Page page) Index: src/include/catalog/catversion.h ================================================== ================= RCS file: /cvsroot/pgsql/src/include/catalog/catversion.h,v retrieving revision 1.388 diff -c -r1.388 catversion.h *** src/include/catalog/catversion.h 20 Feb 2007 17:32:17 -0000 1.388 --- src/include/catalog/catversion.h 2 Mar 2007 00:48:26 -0000 *************** *** 53,58 **** */ /* yyyymmddN */ ! #define CATALOG_VERSION_NO 200702202 #endif --- 53,58 ---- */ /* yyyymmddN */ ! #define CATALOG_VERSION_NO 200703011 #endif Index: src/include/storage/bufpage.h ================================================== ================= RCS file: /cvsroot/pgsql/src/include/storage/bufpage.h,v retrieving revision 1.71 diff -c -r1.71 bufpage.h *** src/include/storage/bufpage.h 21 Feb 2007 20:02:17 -0000 1.71 --- src/include/storage/bufpage.h 2 Mar 2007 00:48:26 -0000 *************** *** 90,95 **** --- 90,96 ---- * * pd_lsn - identifies xlog record for last change to this page. * pd_tli - ditto. + * pd_flags - flag bits. * pd_lower - offset to start of free space. * pd_upper - offset to end of free space. * pd_special - offset to start of special space. *************** *** 98,105 **** * The LSN is used by the buffer manager to enforce the basic rule of WAL: * "thou shalt write xlog before data". A dirty buffer cannot be dumped * to disk until xlog has been flushed at least as far as the page's LSN. ! * We also store the TLI for identification purposes (it is not clear that ! * this is actually necessary, but it seems like a good idea). * * The page version number and page size are packed together into a single * uint16 field. This is for historical reasons: before PostgreSQL 7.3, --- 99,107 ---- * The LSN is used by the buffer manager to enforce the basic rule of WAL: * "thou shalt write xlog before data". A dirty buffer cannot be dumped * to disk until xlog has been flushed at least as far as the page's LSN. ! * We also store the 16 least significant bits of the TLI for identification ! * purposes (it is not clear that this is actually necessary, but it seems ! * like a good idea). * * The page version number and page size are packed together into a single * uint16 field. This is for historical reasons: before PostgreSQL 7.3, *************** *** 119,125 **** /* XXX LSN is member of *any* block, not only page-organized ones */ XLogRecPtr pd_lsn; /* LSN: next byte after last byte of xlog * record for last change to this page */ ! TimeLineID pd_tli; /* TLI of last change */ LocationIndex pd_lower; /* offset to start of free space */ LocationIndex pd_upper; /* offset to end of free space */ LocationIndex pd_special; /* offset to start of special space */ --- 121,129 ---- /* XXX LSN is member of *any* block, not only page-organized ones */ XLogRecPtr pd_lsn; /* LSN: next byte after last byte of xlog * record for last change to this page */ ! uint16 pd_tli; /* least significant bits of the TimeLineID ! * containing the LSN */ ! uint16 pd_flags; /* flag bits, see below */ LocationIndex pd_lower; /* offset to start of free space */ LocationIndex pd_upper; /* offset to end of free space */ LocationIndex pd_special; /* offset to start of special space */ *************** *** 130,140 **** typedef PageHeaderData *PageHeader; /* * Page layout version number 0 is for pre-7.3 Postgres releases. * Releases 7.3 and 7.4 use 1, denoting a new HeapTupleHeader layout. * Release 8.0 uses 2; it changed the HeapTupleHeader layout again. * Release 8.1 uses 3; it redefined HeapTupleHeader infomask bits. ! * Release 8.3 uses 4; it changed the HeapTupleHeader layout again. */ #define PG_PAGE_LAYOUT_VERSION 4 --- 134,157 ---- typedef PageHeaderData *PageHeader; /* + * pd_flags contains the following flag bits. Undefined bits are initialized + * to zero and may be used in the future. + * + * PD_HAS_FREE_LINES is set if there are any not-LP_USED line pointers before + * pd_lower. This should be considered a hint rather than the truth, since + * changes to it are not WAL-logged. + */ + #define PD_HAS_FREE_LINES 0x0001 /* are there any unused line pointers? */ + + #define PD_VALID_FLAG_BITS 0x0001 /* OR of all valid pd_flags bits */ + + /* * Page layout version number 0 is for pre-7.3 Postgres releases. * Releases 7.3 and 7.4 use 1, denoting a new HeapTupleHeader layout. * Release 8.0 uses 2; it changed the HeapTupleHeader layout again. * Release 8.1 uses 3; it redefined HeapTupleHeader infomask bits. ! * Release 8.3 uses 4; it changed the HeapTupleHeader layout again, and ! * added the pd_flags field (by stealing some bits from pd_tli). */ #define PG_PAGE_LAYOUT_VERSION 4 *************** *** 299,313 **** ((((PageHeader) (page))->pd_lower - SizeOfPageHeaderData) \ / sizeof(ItemIdData))) #define PageGetLSN(page) \ (((PageHeader) (page))->pd_lsn) #define PageSetLSN(page, lsn) \ (((PageHeader) (page))->pd_lsn = (lsn)) #define PageGetTLI(page) \ (((PageHeader) (page))->pd_tli) #define PageSetTLI(page, tli) \ ! (((PageHeader) (page))->pd_tli = (tli)) /* ---------------------------------------------------------------- * extern declarations --- 316,342 ---- ((((PageHeader) (page))->pd_lower - SizeOfPageHeaderData) \ / sizeof(ItemIdData))) + /* + * Additional macros for access to page headers + */ #define PageGetLSN(page) \ (((PageHeader) (page))->pd_lsn) #define PageSetLSN(page, lsn) \ (((PageHeader) (page))->pd_lsn = (lsn)) + /* NOTE: only the 16 least significant bits are stored */ #define PageGetTLI(page) \ (((PageHeader) (page))->pd_tli) #define PageSetTLI(page, tli) \ ! (((PageHeader) (page))->pd_tli = (uint16) (tli)) ! ! #define PageHasFreeLinePointers(page) \ ! (((PageHeader) (page))->pd_flags & PD_HAS_FREE_LINES) ! #define PageSetHasFreeLinePointers(page) \ ! (((PageHeader) (page))->pd_flags |= PD_HAS_FREE_LINES) ! #define PageClearHasFreeLinePointers(page) \ ! (((PageHeader) (page))->pd_flags &= ~PD_HAS_FREE_LINES) ! /* ---------------------------------------------------------------- * extern declarations ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |