On Mar 17, 10:17 am, Axel Schwenke <axel.schwe...@gmx.de> wrote:
> Rilson, please keep the discussion here. I'm answering your email
> in public here. Hope that's OK.
>
> On Sun, Mar 16, 2008 at 07:20:06PM -0700, Rilson wrote:
>
> > > Nothing. Creating indexes will not use a temporary table. With one
> > > exception: if this is an InnoDB table and you re-create the PRIMARY
> > > KEY, then a copy of the table is created inside the InnoDB table space.
> > > If you want to relieve the I/O system of your server, configure a huge
> > > innodb_buffer_pool_size.
>
> > I create the PK after loading the data (which is pre-sorted on PK's
> > keys). So I'm not re-creating PKs, right?
>
> You are exactly doing that. If your (InnoDB) table definition does not
> contain a PRIMARY KEY then InnoDB will create a synthetic one (much
> like BIGINT UNSIGNED AUTO_INCREMENT). If you later define a natural
> PRIMARY KEY then InnoDB will recreate the whole table.
>
> This behavior is InnoDB-specific. InnoDB clusters data by PK and uses
> the PK to reference rows in secondary indexes. So you cannot have no PK
> and a modification of the PK will require to rebuild not only all
> indexes but also the tree holding the records itself.
>
> > Allow me to explain the context:
> > - I have 5 InnoDB tables loaded and without indexes (total 10GB)
> > - I need to create PK and secondary indexes for all tables (the PK is
> > created first, then 3 or 4 indexes for each table)
> > - I have 32GB RAM dedicated to MySQL
> > - I'm using InnoDB
>
> > Questions:
> > - What's the strategy to have the indexes created with the least
> > amount of time possible?
> > - It would be faster to load the data with PK and secondary indexes
> > pre-defined?
>
> The best strategy would be:
>
> 1. create table, including PK
>
> 2. load data in batches. Either use extended INSERT syntax with
> *long* statements (worth some hundreds of rows) or use explicite
> transactions, committing every 1000 rows [1]
> alternatively: use LOAD DATA INFILE
> If possible, load data in PK order
>
> 3. create secondary indexes with a single ALTER TABLE statement
>
> With 32GB memory and only 10GB of data, this should be possible without
> touching the disk. Just configure a huge innodb_buffer_pool_size.
> Lets say 24GB. InnoDB funnels *everything* through the page buffer.
>
> The above will use only little more than a single CPU core at a time.
> You could use 5 independent database connections to create/load the
> 5 tables simultaneously.
>
> > - sort_buffer_size affects index creation?
>
> I'm not 100% sure but AFAIK: No.
>
> > - tmp_table_size applies to tmp tables created by a 'create index'
> > operation?
>
> No. Normally creating indexes does not copy a table.
>
> > I learn/understand when I "play". Reading is not learning/
> > understanding ==> "What we have to learn to do, we learn by doing."
> > -- Aristotle
>
> You won't learn anything by doing if you don't understand which effect
> was caused by which of your actions. With systematic "playing" you
> might be able to find the strategy from above, but it would take a
> long time. With some insight [2] in the inner workings of InnoDB, you
> could deduce the above strategy in minutes 
>
> [1] by default MySQL runs in auto-commit mode, implicitly committing
> each INSERT. This leads to very poor performance. Grouping INSERTs
> to bigger transactions dramatically improves this. See the figure
> on page 6 here:http://shinguz.ch/MySQL/transaction_performance.pdf
>
> [2] i.e.http://dev.mysql.com/doc/refman/5.0/...db-tuning.html
> and many more pages from the MySQL manual
>
> XL
> --
> Axel Schwenke, Support Engineer, MySQL AB
>
> Online User Manual:http://dev.mysql.com/doc/refman/5.0/en/
> MySQL User Forums: http://forums.mysql.com/
I got an error in the middle of the index creation:
Creating indexes. Mon Mar 17 23:44:18 PDT 2008
ERROR 1114 (HY000) at line 7: The table '#sql-1ade_16' is full
I dedicated two disks of 16G for innodb data (fixed size). As per my
experience the set of indexes defined would double the size of the
database. Since my database has 10GB, data + indexes would take ca.
20GB on disk, so 32G (16GB + 16GB) would be more than enough. In fact,
data_free in SHOW TABLE STATUS is showing zero.
I was assuming temporary tables (like #sql-1ade_16) was being
maintained fully-cached (20GB RAM is being grabbed for mysqld, it
should be more than enough). It seems clear to me that this temp table
is being created as an auxiliary entity of a 'create index' operation
and that's being maintained on disk. Please correct me if I'm wrong.
So, which parameters should I alter to make these temp tables be
created/maintained in memory? I'm wondering how tmp_table_size and
sort_buffer_size affect index creation...
Hmm if the temp tables are being created where the tmpdir variable
points to then I think I found out my problem. /var/tmp is almost
swamped. I could redirect tmpdir to a separate disk or even to tmpfs
to speed up things.
You said that 'create index' normally does not copy a table. However,
'show processlist' always shows 'copy to tmp table' during index
creation.
Thanks,
-Rilson