Rilson, please keep the discussion here. I'm answering your email
in public here. Hope that's OK.
On Sun, Mar 16, 2008 at 07:20:06PM -0700, Rilson wrote:
>
> > Nothing. Creating indexes will not use a temporary table. With one
> > exception: if this is an InnoDB table and you re-create the PRIMARY
> > KEY, then a copy of the table is created inside the InnoDB table space.
> > If you want to relieve the I/O system of your server, configure a huge
> > innodb_buffer_pool_size.
>
> I create the PK after loading the data (which is pre-sorted on PK's
> keys). So I'm not re-creating PKs, right?
You are exactly doing that. If your (InnoDB) table definition does not
contain a PRIMARY KEY then InnoDB will create a synthetic one (much
like BIGINT UNSIGNED AUTO_INCREMENT). If you later define a natural
PRIMARY KEY then InnoDB will recreate the whole table.
This behavior is InnoDB-specific. InnoDB clusters data by PK and uses
the PK to reference rows in secondary indexes. So you cannot have no PK
and a modification of the PK will require to rebuild not only all
indexes but also the tree holding the records itself.
> Allow me to explain the context:
> - I have 5 InnoDB tables loaded and without indexes (total 10GB)
> - I need to create PK and secondary indexes for all tables (the PK is
> created first, then 3 or 4 indexes for each table)
> - I have 32GB RAM dedicated to MySQL
> - I'm using InnoDB
>
> Questions:
> - What's the strategy to have the indexes created with the least
> amount of time possible?
> - It would be faster to load the data with PK and secondary indexes
> pre-defined?
The best strategy would be:
1. create table, including PK
2. load data in batches. Either use extended INSERT syntax with
*long* statements (worth some hundreds of rows) or use explicite
transactions, committing every 1000 rows [1]
alternatively: use LOAD DATA INFILE
If possible, load data in PK order
3. create secondary indexes with a single ALTER TABLE statement
With 32GB memory and only 10GB of data, this should be possible without
touching the disk. Just configure a huge innodb_buffer_pool_size.
Lets say 24GB. InnoDB funnels *everything* through the page buffer.
The above will use only little more than a single CPU core at a time.
You could use 5 independent database connections to create/load the
5 tables simultaneously.
> - sort_buffer_size affects index creation?
I'm not 100% sure but AFAIK: No.
> - tmp_table_size applies to tmp tables created by a 'create index'
> operation?
No. Normally creating indexes does not copy a table.
> I learn/understand when I "play". Reading is not learning/
> understanding ==> "What we have to learn to do, we learn by doing."
> -- Aristotle
You won't learn anything by doing if you don't understand which effect
was caused by which of your actions. With systematic "playing" you
might be able to find the strategy from above, but it would take a
long time. With some insight [2] in the inner workings of InnoDB, you
could deduce the above strategy in minutes
[1] by default MySQL runs in auto-commit mode, implicitly committing
each INSERT. This leads to very poor performance. Grouping INSERTs
to bigger transactions dramatically improves this. See the figure
on page 6 here:
http://shinguz.ch/MySQL/transaction_performance.pdf
[2] i.e.
http://dev.mysql.com/doc/refman/5.0/...db-tuning.html
and many more pages from the MySQL manual
XL
--
Axel Schwenke, Support Engineer, MySQL AB
Online User Manual:
http://dev.mysql.com/doc/refman/5.0/en/
MySQL User Forums:
http://forums.mysql.com/