Unix Technical Forum

Bulk load performance

This is a discussion on Bulk load performance within the MySQL forums, part of the Database Server Software category; --> Hi all, I'm in the process of selecting a DBMS for a project. We've narrowed down our choices to ...


Go Back   Unix Technical Forum > Database Server Software > MySQL

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 02-28-2008, 08:47 AM
Xiaolei Li
 
Posts: n/a
Default Bulk load performance

Hi all,

I'm in the process of selecting a DBMS for a project. We've narrowed
down our choices to MySQL, Oracle, and PostgreSQL. One thing we're not
sure is the speed of bulk loading. Some of the tables could contain
tens or hundreds of millions of tuples. So, does anyone know of any
benchmarks about this stuff? I tried searching online but couldn't find
anything. Thank you.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 02-28-2008, 08:48 AM
Robert Klemme
 
Posts: n/a
Default Re: Bulk load performance

On 18.12.2006 17:50, Xiaolei Li wrote:
> I'm in the process of selecting a DBMS for a project. We've narrowed
> down our choices to MySQL, Oracle, and PostgreSQL. One thing we're not
> sure is the speed of bulk loading. Some of the tables could contain
> tens or hundreds of millions of tuples. So, does anyone know of any
> benchmarks about this stuff? I tried searching online but couldn't find
> anything. Thank you.


There are tons of benchmarks out there but I am not sure whether you
will find something specialized for bulk loading. Why not set up a test
database of each brand and do your own tests?

Kind regards

robert
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 02-28-2008, 08:48 AM
Walter Vaughan
 
Posts: n/a
Default Re: Bulk load performance

Xiaolei Li wrote:

> I'm in the process of selecting a DBMS for a project. We've narrowed
> down our choices to MySQL, Oracle, and PostgreSQL. One thing we're not
> sure is the speed of bulk loading.


Well I can address MySQL 5.0.27 and PostgreSQL 8.1.5 bulk loading on a freeBSD
server. PostgreSQL is can be about twice as fast in our testing (YMMV) when
mySQL is using InnoDB tables.

One thing that is either good/bad about mySQL vs pgSQL loading is that pgSQL is
extremely picky about datatypes, and the number of fields must match exactly. So
if you are missing fields or try to cram invalid data using MySQL you might not
get warned. At least I haven't been.

I've used mySQL for years, so don't take this message as FUD/bashing. We run a
large data warehouse on MySQL. It just that things that need a ton of data
integrity and data import verification work much better for *us* on postgresql.

I'm sure someone else has horror stories quite different from our experience.

You may want to post at pgsql-general@postgresql.org for postgresql input on
this topic. No idea about Oracle.

Good luck with your search...
--
Walter
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 02-28-2008, 08:48 AM
Axel Schwenke
 
Posts: n/a
Default Re: Bulk load performance

Walter Vaughan <wvaughan@steelerubber.com> wrote:
> Xiaolei Li wrote:
>
>> I'm in the process of selecting a DBMS for a project. We've narrowed
>> down our choices to MySQL, Oracle, and PostgreSQL. One thing we're not
>> sure is the speed of bulk loading.

>
> Well I can address MySQL 5.0.27 and PostgreSQL 8.1.5 bulk loading on a freeBSD
> server. PostgreSQL is can be about twice as fast in our testing (YMMV) when
> mySQL is using InnoDB tables.


Bulk loading performance varies very much depending on how *exactly*
it is done. The MySQL manual contains a whole chapter on that topic.

In a nutshell:

- turn off AUTO-COMMIT !
- you can COMMIT in batches or just once after loading all data. I
suggest to COMMIT every 10.000 (or so) rows.
- if you import from SQL commands (SQL dump) - using MySQLs proprietary
multi-value INSERT will speedup the load
- prepared statements are good too, but can insert just one row per
execute. Multi-value INSERT and PS are approx. equally fast.
- LOAD DATA INFILE will be even faster, because parsing the raw file is
faster than parsing SQL
- keys should be deactivated for bulk loading


A MySQL issue could be the fact that MySQL uses only one thread - that
is: only one cpu core - per connection. So if you have multiple cpu
cores you should split your data and load in multiple connections.
I did so on a 16-core box and it scales quite nicely.


XL
--
Axel Schwenke, Senior Software Developer, MySQL AB

Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/
MySQL User Forums: http://forums.mysql.com/
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 08:25 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com