vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I used tmpfs to mount a dir on my system (a 10GB RAM dir) and passed this to tmpdir in my.cnf. My idea was to try to speed up index creation, because it is taking hours. However, it did not work. The tmpfs dir is completely idle during the index creation even though mysql states "copy to tmp table". What is it wrong? I want MySQL create temporary tables in memory rather than create them on disk. Thanks, |
| |||
| Rilson <rilson.nascimento@gmail.com> wrote: > I used tmpfs to mount a dir on my system (a 10GB RAM dir) and passed > this to tmpdir in my.cnf. My idea was to try to speed up index > creation, because it is taking hours. However, it did not work. Bad idea anyway. tmpfs will overflow to swap if no RAM is available. Also it will interfere with MySQL in-process caching and kernel-level page cache. MySQL can and will create temporary tables in memory. See the tmp_table_size variable here: http://dev.mysql.com/doc/refman/5.0/...variables.html > The tmpfs dir is completely idle during the index creation even though > mysql states "copy to tmp table". > > What is it wrong? Nothing. Creating indexes will not use a temporary table. With one exception: if this is an InnoDB table and you re-create the PRIMARY KEY, then a copy of the table is created inside the InnoDB table space. If you want to relieve the I/O system of your server, configure a huge innodb_buffer_pool_size. > I want MySQL create temporary tables in memory rather than create them > on disk. I suggest you first learn how MySQL uses memory before you play with settings you don't understand. Putting tmpdir into a RAM disk is the wrong solution in 99% of the cases. XL -- Axel Schwenke, Support Engineer, MySQL AB Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/ MySQL User Forums: http://forums.mysql.com/ |
| |||
| Rilson, please keep the discussion here. I'm answering your email in public here. Hope that's OK. On Sun, Mar 16, 2008 at 07:20:06PM -0700, Rilson wrote: > > > Nothing. Creating indexes will not use a temporary table. With one > > exception: if this is an InnoDB table and you re-create the PRIMARY > > KEY, then a copy of the table is created inside the InnoDB table space. > > If you want to relieve the I/O system of your server, configure a huge > > innodb_buffer_pool_size. > > I create the PK after loading the data (which is pre-sorted on PK's > keys). So I'm not re-creating PKs, right? You are exactly doing that. If your (InnoDB) table definition does not contain a PRIMARY KEY then InnoDB will create a synthetic one (much like BIGINT UNSIGNED AUTO_INCREMENT). If you later define a natural PRIMARY KEY then InnoDB will recreate the whole table. This behavior is InnoDB-specific. InnoDB clusters data by PK and uses the PK to reference rows in secondary indexes. So you cannot have no PK and a modification of the PK will require to rebuild not only all indexes but also the tree holding the records itself. > Allow me to explain the context: > - I have 5 InnoDB tables loaded and without indexes (total 10GB) > - I need to create PK and secondary indexes for all tables (the PK is > created first, then 3 or 4 indexes for each table) > - I have 32GB RAM dedicated to MySQL > - I'm using InnoDB > > Questions: > - What's the strategy to have the indexes created with the least > amount of time possible? > - It would be faster to load the data with PK and secondary indexes > pre-defined? The best strategy would be: 1. create table, including PK 2. load data in batches. Either use extended INSERT syntax with *long* statements (worth some hundreds of rows) or use explicite transactions, committing every 1000 rows [1] alternatively: use LOAD DATA INFILE If possible, load data in PK order 3. create secondary indexes with a single ALTER TABLE statement With 32GB memory and only 10GB of data, this should be possible without touching the disk. Just configure a huge innodb_buffer_pool_size. Lets say 24GB. InnoDB funnels *everything* through the page buffer. The above will use only little more than a single CPU core at a time. You could use 5 independent database connections to create/load the 5 tables simultaneously. > - sort_buffer_size affects index creation? I'm not 100% sure but AFAIK: No. > - tmp_table_size applies to tmp tables created by a 'create index' > operation? No. Normally creating indexes does not copy a table. > I learn/understand when I "play". Reading is not learning/ > understanding ==> "What we have to learn to do, we learn by doing." > -- Aristotle You won't learn anything by doing if you don't understand which effect was caused by which of your actions. With systematic "playing" you might be able to find the strategy from above, but it would take a long time. With some insight [2] in the inner workings of InnoDB, you could deduce the above strategy in minutes [1] by default MySQL runs in auto-commit mode, implicitly committing each INSERT. This leads to very poor performance. Grouping INSERTs to bigger transactions dramatically improves this. See the figure on page 6 here: http://shinguz.ch/MySQL/transaction_performance.pdf [2] i.e. http://dev.mysql.com/doc/refman/5.0/...db-tuning.html and many more pages from the MySQL manual XL -- Axel Schwenke, Support Engineer, MySQL AB Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/ MySQL User Forums: http://forums.mysql.com/ |
| |||
| On Mar 17, 10:17 am, Axel Schwenke <axel.schwe...@gmx.de> wrote: > Rilson, please keep the discussion here. I'm answering your email > in public here. Hope that's OK. > > On Sun, Mar 16, 2008 at 07:20:06PM -0700, Rilson wrote: > > > > Nothing. Creating indexes will not use a temporary table. With one > > > exception: if this is an InnoDB table and you re-create the PRIMARY > > > KEY, then a copy of the table is created inside the InnoDB table space. > > > If you want to relieve the I/O system of your server, configure a huge > > > innodb_buffer_pool_size. > > > I create the PK after loading the data (which is pre-sorted on PK's > > keys). So I'm not re-creating PKs, right? > > You are exactly doing that. If your (InnoDB) table definition does not > contain a PRIMARY KEY then InnoDB will create a synthetic one (much > like BIGINT UNSIGNED AUTO_INCREMENT). If you later define a natural > PRIMARY KEY then InnoDB will recreate the whole table. > > This behavior is InnoDB-specific. InnoDB clusters data by PK and uses > the PK to reference rows in secondary indexes. So you cannot have no PK > and a modification of the PK will require to rebuild not only all > indexes but also the tree holding the records itself. > > > Allow me to explain the context: > > - I have 5 InnoDB tables loaded and without indexes (total 10GB) > > - I need to create PK and secondary indexes for all tables (the PK is > > created first, then 3 or 4 indexes for each table) > > - I have 32GB RAM dedicated to MySQL > > - I'm using InnoDB > > > Questions: > > - What's the strategy to have the indexes created with the least > > amount of time possible? > > - It would be faster to load the data with PK and secondary indexes > > pre-defined? > > The best strategy would be: > > 1. create table, including PK > > 2. load data in batches. Either use extended INSERT syntax with > *long* statements (worth some hundreds of rows) or use explicite > transactions, committing every 1000 rows [1] > alternatively: use LOAD DATA INFILE > If possible, load data in PK order > > 3. create secondary indexes with a single ALTER TABLE statement > > With 32GB memory and only 10GB of data, this should be possible without > touching the disk. Just configure a huge innodb_buffer_pool_size. > Lets say 24GB. InnoDB funnels *everything* through the page buffer. > > The above will use only little more than a single CPU core at a time. > You could use 5 independent database connections to create/load the > 5 tables simultaneously. > > > - sort_buffer_size affects index creation? > > I'm not 100% sure but AFAIK: No. > > > - tmp_table_size applies to tmp tables created by a 'create index' > > operation? > > No. Normally creating indexes does not copy a table. > > > I learn/understand when I "play". Reading is not learning/ > > understanding ==> "What we have to learn to do, we learn by doing." > > -- Aristotle > > You won't learn anything by doing if you don't understand which effect > was caused by which of your actions. With systematic "playing" you > might be able to find the strategy from above, but it would take a > long time. With some insight [2] in the inner workings of InnoDB, you > could deduce the above strategy in minutes > > [1] by default MySQL runs in auto-commit mode, implicitly committing > each INSERT. This leads to very poor performance. Grouping INSERTs > to bigger transactions dramatically improves this. See the figure > on page 6 here:http://shinguz.ch/MySQL/transaction_performance.pdf > > [2] i.e.http://dev.mysql.com/doc/refman/5.0/...db-tuning.html > and many more pages from the MySQL manual > > XL > -- > Axel Schwenke, Support Engineer, MySQL AB > > Online User Manual:http://dev.mysql.com/doc/refman/5.0/en/ > MySQL User Forums: http://forums.mysql.com/ Thanks Axel! I'm gonna test the strategy and will let you know how it went. -Rilson |
| ||||
| On Mar 17, 10:17 am, Axel Schwenke <axel.schwe...@gmx.de> wrote: > Rilson, please keep the discussion here. I'm answering your email > in public here. Hope that's OK. > > On Sun, Mar 16, 2008 at 07:20:06PM -0700, Rilson wrote: > > > > Nothing. Creating indexes will not use a temporary table. With one > > > exception: if this is an InnoDB table and you re-create the PRIMARY > > > KEY, then a copy of the table is created inside the InnoDB table space. > > > If you want to relieve the I/O system of your server, configure a huge > > > innodb_buffer_pool_size. > > > I create the PK after loading the data (which is pre-sorted on PK's > > keys). So I'm not re-creating PKs, right? > > You are exactly doing that. If your (InnoDB) table definition does not > contain a PRIMARY KEY then InnoDB will create a synthetic one (much > like BIGINT UNSIGNED AUTO_INCREMENT). If you later define a natural > PRIMARY KEY then InnoDB will recreate the whole table. > > This behavior is InnoDB-specific. InnoDB clusters data by PK and uses > the PK to reference rows in secondary indexes. So you cannot have no PK > and a modification of the PK will require to rebuild not only all > indexes but also the tree holding the records itself. > > > Allow me to explain the context: > > - I have 5 InnoDB tables loaded and without indexes (total 10GB) > > - I need to create PK and secondary indexes for all tables (the PK is > > created first, then 3 or 4 indexes for each table) > > - I have 32GB RAM dedicated to MySQL > > - I'm using InnoDB > > > Questions: > > - What's the strategy to have the indexes created with the least > > amount of time possible? > > - It would be faster to load the data with PK and secondary indexes > > pre-defined? > > The best strategy would be: > > 1. create table, including PK > > 2. load data in batches. Either use extended INSERT syntax with > *long* statements (worth some hundreds of rows) or use explicite > transactions, committing every 1000 rows [1] > alternatively: use LOAD DATA INFILE > If possible, load data in PK order > > 3. create secondary indexes with a single ALTER TABLE statement > > With 32GB memory and only 10GB of data, this should be possible without > touching the disk. Just configure a huge innodb_buffer_pool_size. > Lets say 24GB. InnoDB funnels *everything* through the page buffer. > > The above will use only little more than a single CPU core at a time. > You could use 5 independent database connections to create/load the > 5 tables simultaneously. > > > - sort_buffer_size affects index creation? > > I'm not 100% sure but AFAIK: No. > > > - tmp_table_size applies to tmp tables created by a 'create index' > > operation? > > No. Normally creating indexes does not copy a table. > > > I learn/understand when I "play". Reading is not learning/ > > understanding ==> "What we have to learn to do, we learn by doing." > > -- Aristotle > > You won't learn anything by doing if you don't understand which effect > was caused by which of your actions. With systematic "playing" you > might be able to find the strategy from above, but it would take a > long time. With some insight [2] in the inner workings of InnoDB, you > could deduce the above strategy in minutes > > [1] by default MySQL runs in auto-commit mode, implicitly committing > each INSERT. This leads to very poor performance. Grouping INSERTs > to bigger transactions dramatically improves this. See the figure > on page 6 here:http://shinguz.ch/MySQL/transaction_performance.pdf > > [2] i.e.http://dev.mysql.com/doc/refman/5.0/...db-tuning.html > and many more pages from the MySQL manual > > XL > -- > Axel Schwenke, Support Engineer, MySQL AB > > Online User Manual:http://dev.mysql.com/doc/refman/5.0/en/ > MySQL User Forums: http://forums.mysql.com/ I got an error in the middle of the index creation: Creating indexes. Mon Mar 17 23:44:18 PDT 2008 ERROR 1114 (HY000) at line 7: The table '#sql-1ade_16' is full I dedicated two disks of 16G for innodb data (fixed size). As per my experience the set of indexes defined would double the size of the database. Since my database has 10GB, data + indexes would take ca. 20GB on disk, so 32G (16GB + 16GB) would be more than enough. In fact, data_free in SHOW TABLE STATUS is showing zero. I was assuming temporary tables (like #sql-1ade_16) was being maintained fully-cached (20GB RAM is being grabbed for mysqld, it should be more than enough). It seems clear to me that this temp table is being created as an auxiliary entity of a 'create index' operation and that's being maintained on disk. Please correct me if I'm wrong. So, which parameters should I alter to make these temp tables be created/maintained in memory? I'm wondering how tmp_table_size and sort_buffer_size affect index creation... Hmm if the temp tables are being created where the tmpdir variable points to then I think I found out my problem. /var/tmp is almost swamped. I could redirect tmpdir to a separate disk or even to tmpfs to speed up things. You said that 'create index' normally does not copy a table. However, 'show processlist' always shows 'copy to tmp table' during index creation. Thanks, -Rilson |