This is a discussion on A little COPY speedup within the Pgsql Patches forums, part of the PostgreSQL category; --> On Thu, 2007-03-01 at 17:01 +0000, Heikki Linnakangas wrote: > I ran oprofile on a COPY FROM to get ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| On Thu, 2007-03-01 at 17:01 +0000, Heikki Linnakangas wrote: > I ran oprofile on a COPY FROM to get an overview of where the CPU time > is spent. To my amazement, the function at the top of the list was > PageAddItem with 16% of samples. Excellent. I'm slightly worried though since that seems to have changed from 8.2, which I oprofiled over Christmas. I can't recall what's changed though. Was this just on one system? Is it possible there is an effect on one system and not another? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Simon Riggs wrote: > On Thu, 2007-03-01 at 17:01 +0000, Heikki Linnakangas wrote: > >> I ran oprofile on a COPY FROM to get an overview of where the CPU time >> is spent. To my amazement, the function at the top of the list was >> PageAddItem with 16% of samples. > > Excellent. > > I'm slightly worried though since that seems to have changed from 8.2, > which I oprofiled over Christmas. I can't recall what's changed though. > Was this just on one system? Is it possible there is an effect on one > system and not another? Well, there's one big change: your patch to suppress WAL logging on tables created in the same transaction. I ran the test again, this time creating the table in a separate transaction: samples % image name app name symbol name 5480 17.0366 postgres postgres XLogInsert 3684 11.4531 postgres postgres PageAddItem 3580 11.1298 libc-2.3.6.so postgres memcpy 2498 7.7660 postgres postgres DoCopy 1265 3.9327 postgres postgres LWLockAcquire 1210 3.7617 postgres postgres CopyReadLine 1042 3.2394 postgres postgres LWLockRelease 1038 3.2270 postgres postgres heap_formtuple 1033 3.2115 libc-2.3.6.so postgres ____strtol_l_internal 875 2.7203 postgres postgres hash_any The profile will probably look somewhat different depending on your data, encoding etc. All the page locking related functions account for ~10% in total, including the LWLockAcquire/Release, Pin/UnBuffer, hash_any and so on. And then there's all the memcpying... -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| On Fri, 2007-03-02 at 10:09 +0000, Heikki Linnakangas wrote: > Well, there's one big change: your patch to suppress WAL logging on > tables created in the same transaction. OK, just checking thats what you meant. > All the page locking related functions account for ~10% in total, > including the LWLockAcquire/Release, Pin/UnBuffer, hash_any and so on. > And then there's all the memcpying... I think its a great spot that PageAddItem() was so bad. I realise I didn't actually look at what it was doing, just looked at ways to avoid doing it on each individual call to the block for each row. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| "Simon Riggs" <simon@2ndquadrant.com> writes: > I'm slightly worried though since that seems to have changed from 8.2, > which I oprofiled over Christmas. If you were testing a case with wider rows than Heikki tested, you'd see less impact --- the cost of the old way was O(N^2) in the number of tuples that fit on a page, so the behavior gets rapidly worse as you get down to smaller tuple sizes. (Come to think of it, the cmin/cmax collapse would be a factor here too.) regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| "Tom Lane" <tgl@sss.pgh.pa.us> writes: > "Simon Riggs" <simon@2ndquadrant.com> writes: >> I'm slightly worried though since that seems to have changed from 8.2, >> which I oprofiled over Christmas. > > If you were testing a case with wider rows than Heikki tested, you'd see > less impact --- the cost of the old way was O(N^2) in the number of > tuples that fit on a page, so the behavior gets rapidly worse as you get > down to smaller tuple sizes. (Come to think of it, the cmin/cmax > collapse would be a factor here too.) Or larger block sizes of course. A 32kb block would be 16x as bad which starts to be pretty serious. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| On Fri, 2007-03-02 at 16:25 +0000, Gregory Stark wrote: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: > > > "Simon Riggs" <simon@2ndquadrant.com> writes: > >> I'm slightly worried though since that seems to have changed from 8.2, > >> which I oprofiled over Christmas. > > > > If you were testing a case with wider rows than Heikki tested, you'd see > > less impact --- the cost of the old way was O(N^2) in the number of > > tuples that fit on a page, so the behavior gets rapidly worse as you get > > down to smaller tuple sizes. (Come to think of it, the cmin/cmax > > collapse would be a factor here too.) > > Or larger block sizes of course. A 32kb block would be 16x as bad which starts > to be pretty serious. Well, I was only using 8kb blocks. But I think the message is clear: we need to profile lots of different combinations. I was using a 2 col table with integer, char(100). IIRC there are issues with delimiter handling when we have lots of columns in the input on COPY FROM, and num of cols on COPY TO. I've not looked at those recently though. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Simon Riggs wrote: > > IIRC there are issues with delimiter handling when we have lots of > columns in the input on COPY FROM, and num of cols on COPY TO. I've not > looked at those recently though. > > What sort of issues? Anything that breaks on this has catastrophic consequences. cheers andrew ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| On Fri, 2007-03-02 at 11:58 -0500, Andrew Dunstan wrote: > Simon Riggs wrote: > > > > IIRC there are issues with delimiter handling when we have lots of > > columns in the input on COPY FROM, and num of cols on COPY TO. I've not > > looked at those recently though. > > > > > > What sort of issues? Anything that breaks on this has catastrophic > consequences. We were talking about performance, not data integrity.... -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| Simon Riggs wrote: > On Fri, 2007-03-02 at 11:58 -0500, Andrew Dunstan wrote: > >> Simon Riggs wrote: >> >>> IIRC there are issues with delimiter handling when we have lots of >>> columns in the input on COPY FROM, and num of cols on COPY TO. I've not >>> looked at those recently though. >>> >>> >>> >> What sort of issues? Anything that breaks on this has catastrophic >> consequences. >> > > We were talking about performance, not data integrity.... > > OK. I'm still curious to know what the issues are with delimiter handling. cheers andrew ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| ||||
| On Fri, 2007-03-02 at 12:09 -0500, Andrew Dunstan wrote: > OK. I'm still curious to know what the issues are with delimiter handling. Rumours only. Feedback from someone else looking to the problem last year. IIRC there was a feeling that if we didn't have to search for delimiters the COPY FROM input parsing could be easier. Vague recollection that the COPY TO uses the slower API for getting heap attributes, but didn't seem to show up when I profiled few months back. I'm not personally proposing to take those thoughts any further. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |