View Single Post

   
  #1 (permalink)  
Old 04-18-2008, 09:38 AM
Heikki Linnakangas
 
Posts: n/a
Default A little COPY speedup

One complaint we've heard from clients trying out EDB or PostgreSQL is
that loading data is slower than on other DBMSs.

I ran oprofile on a COPY FROM to get an overview of where the CPU time
is spent. To my amazement, the function at the top of the list was
PageAddItem with 16% of samples.

On every row, PageAddItem will scan all the line pointers on the target
page, just to see that they're all in use, and create a new line
pointer. That adds up, especially with narrow tuples like what I used in
the test.

Attached is a fix for that. It adds a flag to each heap page that
indicates that "there isn't any free line pointers on this page, so
don't bother trying". Heap pages haven't had any heap-specific per-page
data before, so this patch adds a HeapPageOpaqueData-struct that's
stored in the special space.

My simple test case of a COPY FROM of 10000000 tuples took 19.6 s
without the patch, and 17.7 s with the patch applied. Your mileage may vary.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Reply With Quote