vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I need some advice on the best way to load data to a table while maintaining an index. I have a table that is very small at the moment but will have more than 70 million rows by the end of the year. Data is added to it several times a day, parsed from a text file. I'd like to create an index on the table now, when it is small, and maintain it as the table grows. What is the fastest way to both load the data and maintain the index? Should I drop the index, load the data and then reindex? Is there any benefit to loading the data to a temporary file, indexing that file and then inserting the data into the master table which is indexed on the same field? Sorry if this is a really newbie question but I'd appreciate any advice you can give. |
| |||
| On 12 Jan 2005 17:59:04 -0800, kaelin358@gmail.com wrote: >I need some advice on the best way to load data to a table while >maintaining an index. I have a table that is very small at the moment >but will have more than 70 million rows by the end of the year. Data >is added to it several times a day, parsed from a text file. I'd like >to create an index on the table now, when it is small, and maintain it >as the table grows. What is the fastest way to both load the data and >maintain the index? Should I drop the index, load the data and then >reindex? Hi kaelin358, It depends. If large numbers of rows are inserted in the table, it is often useful to drop all indexes before the insert and recreate them afterwards. This goes especially for nonclustered indexes; the gain for a clustered index is less, because the cost of rebuilding the clustered index is higher than the cost of rebuilding a nonclustered index. However, if the amount of data added is only a fraction of what's already there (e.g. 60 million rows in the table and a mere 100,000 rows to be added), then the cost of rebuilding indexes for 60 million rows might well exceed the cost of updating the index with 100,000 new rows. In that case, it's cheaper to just keep the index during the insert. Another consideration is whether the insert is during down time or on a live system. If you drop, then recreate the indexes, other processes reading from or writing to the table will slow down. On the other hand, if you keep the index, other processes writing to the table have a higher chance of being blocked. Since performance depends on lots of things (hardware, row size, size of indexed columns, etc), the only way to know for sure what method is the quickest is to test both and compare the results. > Is there any benefit to loading the data to a temporary file, >indexing that file and then inserting the data into the master table >which is indexed on the same field? I would definitely recommend this if you have to do the insert on a live system. The actual import process will probably be relatively slow; if you use a staging table to import the data, clean it up (if needed) and only then import if to the actual table, you'll keep the time that the actual table is blocked by the import process as low as possible. You'll have to experiment to find out what (if any) indexes on the staging table result in the quickest copying to the actual table. If you only do the insert during down time AND you're sure that you'll never have to cleanup bad data, then I wouldn't bother and simply insert straight into the actual table. Best, Hugo -- (Remove _NO_ and _SPAM_ to get my e-mail address) |