This is a discussion on Constant char length is a good practice? within the MySQL forums, part of the Database Server Software category; --> Tony Marston wrote: > "Jerry Stuckle" <jstucklex@attglobal.net> wrote in message > news:RZadnUJrOqkPgDTZnZ2dnUVZ_omdnZ2d@comcast.com. .. > >>Axel Schwenke wrote: >> >>>Jerry ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Tony Marston wrote: > "Jerry Stuckle" <jstucklex@attglobal.net> wrote in message > news:RZadnUJrOqkPgDTZnZ2dnUVZ_omdnZ2d@comcast.com. .. > >>Axel Schwenke wrote: >> >>>Jerry Stuckle <jstucklex@attglobal.net> wrote: >>> >>> >>>>Axel Schwenke wrote: >>> >>> >>>[MyISAM fragmentation] >>> > > <snip> > >>>>More advanced databases have REORG commands. MySQL doesn't have one, but >>>>you >>>>can do a backup/delete/restore to get the same effect. > > > MySQL has the OPTIMIZE TABLE command for MyISAM tables which will reclaim > the unused space and to defragment the data file. Refer to > http://dev.mysql.com/doc/refman/5.0/...ize-table.html > Tony, Yes, I'm familiar with the OPTIMIZE TABLE command. But it only does part of what a REORG command does. -- ================== Remove the "x" from my email address Jerry Stuckle JDS Computer Training Corp. jstucklex@attglobal.net ================== |
| |||
| Jerry Stuckle 寫道: > Tony Marston wrote: > > "Jerry Stuckle" <jstucklex@attglobal.net> wrote in message > > news:RZadnUJrOqkPgDTZnZ2dnUVZ_omdnZ2d@comcast.com. .. > > > >>Axel Schwenke wrote: > >> > >>>Jerry Stuckle <jstucklex@attglobal.net> wrote: > >>> > >>> > >>>>Axel Schwenke wrote: > >>> > >>> > >>>[MyISAM fragmentation] > >>> > > > > <snip> > > > >>>>More advanced databases have REORG commands. MySQL doesn't have one,but > >>>>you > >>>>can do a backup/delete/restore to get the same effect. > > > > > > MySQL has the OPTIMIZE TABLE command for MyISAM tables which will reclaim > > the unused space and to defragment the data file. Refer to > > http://dev.mysql.com/doc/refman/5.0/...ize-table.html > > > > Tony, > > Yes, I'm familiar with the OPTIMIZE TABLE command. But it only does partof > what a REORG command does. > > -- > ================== > Remove the "x" from my email address > Jerry Stuckle > JDS Computer Training Corp. > jstucklex@attglobal.net > ================== so what normally "REORG" do ? |
| |||
| Jerry Stuckle <jstucklex@attglobal.net> wrote: > Axel Schwenke wrote: >>>More advanced databases have REORG commands. MySQL doesn't have one, but you >>>can do a backup/delete/restore to get the same effect. >> >> >> For InnoDB you can use ALTER TABLE ... ENGINE=InnoDB. Of course this >> is documented in the manual: >> >> http://dev.mysql.com/doc/refman/5.0/...agmenting.html > > Yes, you can change the table to be managed by InnoDB. But that's not at all > the same as reorganizing the table. Jerry, maybe you should follow my pointers to the manual - at least sometimes. If you ALTER TABLE an InnoDB table to ENGINE=InnoDB again, it will get reorganized. > All of the more advanced engines - Oracle, SQL Server, DB2, etc. also have > fragmentation-minimizing algorithms. Most are more advanced that InnoDB's - > because they are more tightly integrated into the database instead of being a > separate engine. And all of them have a need for a REORG command to defragment > tables. This is not a valid proof. A defrag tool for Linux' ext2 file system exists as well. This does not proove that ext2 is susceptible to fragmentation. In fact it isn't. > And all of them agree that working with fixed length columns is more efficient > than variable length columns for the above reasons. Agreed. But again: the advantages from using fixed length records cannot easily be quantified. In most cases the positive effects are overestimated. My own measurements in the past showed effects next to nothing. Except quite heavy fragmentation for a "spool" table. This was easily fixed by a weekly OPTIMIZE TABLE. If you need total certainty, you should measure yourself. Every single case. XL -- Axel Schwenke, Senior Software Developer, MySQL AB Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/ MySQL User Forums: http://forums.mysql.com/ |
| |||
| howachen@gmail.com wrote: > Jerry Stuckle 寫道: > > >>Tony Marston wrote: >> >>>"Jerry Stuckle" <jstucklex@attglobal.net> wrote in message >>>news:RZadnUJrOqkPgDTZnZ2dnUVZ_omdnZ2d@comcast.c om... >>> >>> >>>>Axel Schwenke wrote: >>>> >>>> >>>>>Jerry Stuckle <jstucklex@attglobal.net> wrote: >>>>> >>>>> >>>>> >>>>>>Axel Schwenke wrote: >>>>> >>>>> >>>>>[MyISAM fragmentation] >>>>> >>> >>><snip> >>> >>>>>>More advanced databases have REORG commands. MySQL doesn't have one, but >>>>>>you >>>>>>can do a backup/delete/restore to get the same effect. >>> >>> >>>MySQL has the OPTIMIZE TABLE command for MyISAM tables which will reclaim >>>the unused space and to defragment the data file. Refer to >>>http://dev.mysql.com/doc/refman/5.0/...ize-table.html >>> >> >>Tony, >> >>Yes, I'm familiar with the OPTIMIZE TABLE command. But it only does part of >>what a REORG command does. >> >>-- >>================== >>Remove the "x" from my email address >>Jerry Stuckle >>JDS Computer Training Corp. >>jstucklex@attglobal.net >>================== > > > so what normally "REORG" do ? > It also reorganizes the data in a table according to a specific index to make access faster, compresses the table to get rid of data and optionally frees the extra space (it can also keep extra space so it doesn't need to be reallocated immediately on the next INSERT operation or UPDATE operation with longer variable length data). Also, depending on the underlying file system, it can defragment the table so that all clusters are together. Additionally, it will provide input to the internal statistics monitor to allow the database manager to more intelligently select the most efficient method of accessing the data. It really does a lot more than just optimize table - which basically only compresses the empty space out of the table. -- ================== Remove the "x" from my email address Jerry Stuckle JDS Computer Training Corp. jstucklex@attglobal.net ================== |
| |||
| Axel Schwenke wrote: > Jerry Stuckle <jstucklex@attglobal.net> wrote: > >>Axel Schwenke wrote: > > >>>>More advanced databases have REORG commands. MySQL doesn't have one, but you >>>>can do a backup/delete/restore to get the same effect. >>> >>> >>>For InnoDB you can use ALTER TABLE ... ENGINE=InnoDB. Of course this >>>is documented in the manual: >>> >>>http://dev.mysql.com/doc/refman/5.0/...agmenting.html >> >>Yes, you can change the table to be managed by InnoDB. But that's not at all >>the same as reorganizing the table. > > > Jerry, > > maybe you should follow my pointers to the manual - at least sometimes. > If you ALTER TABLE an InnoDB table to ENGINE=InnoDB again, it will get > reorganized. > > >>All of the more advanced engines - Oracle, SQL Server, DB2, etc. also have >>fragmentation-minimizing algorithms. Most are more advanced that InnoDB's - >>because they are more tightly integrated into the database instead of being a >>separate engine. And all of them have a need for a REORG command to defragment >>tables. > > > This is not a valid proof. A defrag tool for Linux' ext2 file system > exists as well. This does not proove that ext2 is susceptible to > fragmentation. In fact it isn't. > If there was no need for the REORG command, why would all of the major databases put all the time and money into developing such a command? It's not a trivial job. REORG is much more than just compression. And, unfortunately, ext2 is susceptible to fragmentation. However, it's just a lot more efficient about how it handles fragmentation. The result being it really doesn't need defragging. But this isn't about the underlying file system's fragmentation. It's also about fragmentation WITHIN the file, since MySQL keeps most data types in a single file. So even if the file itself isn't fragmented in the file system, the data can (and very likely is after a few operations) be. > >>And all of them agree that working with fixed length columns is more efficient >>than variable length columns for the above reasons. > > > Agreed. But again: the advantages from using fixed length records > cannot easily be quantified. In most cases the positive effects are > overestimated. My own measurements in the past showed effects next to > nothing. Except quite heavy fragmentation for a "spool" table. This was > easily fixed by a weekly OPTIMIZE TABLE. If you need total certainty, > you should measure yourself. Every single case. > It also depends on the database manager, the actual data and a whole host of variables. But lets take an simple example. We have a table with five columns, all 40 chars long. We are doing a table scan, checking for a specific value in the 3rd column. If these are CHAR(40) columns, the size of a row will be 200 bytes. If the manager is doing a table scan, it can pull 200 bytes and compare starting 80 bytes in. If there is a match, it can transfer 200 bytes to its output buffer for further processing. When done, it can go 200 bytes past the current row to get the next one. Compare this to VARCHAR(40) columns. A row can be between 5 and 205 bytes (one byte for the length). The manager needs to fetch the first row and get the length of the first field (x1). It needs to go x2+1 bytes to get the length of the next field (x2). It needs to go x2 bytes past the second field to get to the third. It then needs to check the length of the third field (x3) to limit the length of the needed comparison. It has to remember if there is a match, then get x4 and x5 to get the length of the last two fields. If there is a match, it then transfers the row to its output buffer. It then needs to see if there is any leftover space at the end of this row, and if so add that to the length of the data. Finally it knows where the next record starts. It can fetch that row (which may have all, none or part of the row already in the buffer from the previous read). This is a significant amount of additional processing for VARCHAR fields. Now - if most of the data is generally much smaller than the max (i.e. average 5 bytes per column), the space savings will be significant - 30 bytes vs. 200, requiring fewer disk operations (and even fewer reads because it will get up to six rows in a single read). In a case like this the overhead of processing the VARCHAR may be more than compensated for by the fewer disk operations required. The result can be that VARCHAR access would be faster. However, if all the fields are very close to the maximum (i.e. 38 chars), processing VARCHAR fields may be slower. But I do agree with you on one thing. You need to test each individual case to determine which is the best for *this* instance. It is way too data and access dependent. > > XL > -- > Axel Schwenke, Senior Software Developer, MySQL AB > > Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/ > MySQL User Forums: http://forums.mysql.com/ -- ================== Remove the "x" from my email address Jerry Stuckle JDS Computer Training Corp. jstucklex@attglobal.net ================== |
| |||
| Jerry Stuckle <jstucklex@attglobal.net> wrote: > Axel Schwenke wrote: >> again: the advantages from using fixed length records >> cannot easily be quantified. In most cases the positive effects are >> overestimated. > > lets take an simple example. We have a table with five columns, > all 40 chars long. We are doing a table scan, checking for a specific value in > the 3rd column. [lengthy description snipped] > This is a significant amount of additional processing for VARCHAR fields. Jerry, you are right. Processing variable length records is more expensive than processing fixed length records. But it's only CPU cycles. Real world databases are always I/O bound, never CPU bound. Lets do another calculation. Assume we have a 1GHz CPU and a hard disk with 5ms avg. seek time. Then for each seek of the hard disk we have 5.000.000 CPU cycles to spare. You can do a lot of processing with 5 million CPU cycles. Another calculation. Lets stay with 200 byte records. Assume we get 20MB/s from the hard disks. Thats 100.000 rows per second. Then we can spend 10.000 CPU cycles per row without losing speed. You don't need 10.000 cycles to read a few length fields from the record and add them. (remember: the record is already in memory) Also, nobody reads single rows for table scans. MyISAM reads hunks of 8MB (default read_buffer_size, can be increased). There should be many records in this 8MB buffer. The only uncomfortability from variable length records is the copying of the last, probably incomplete record to the buffer start to complete it with the next read. This effect will be negligible if 100 or more rows fit in the read buffer. XL -- Axel Schwenke, Senior Software Developer, MySQL AB Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/ MySQL User Forums: http://forums.mysql.com/ |
| |||
| Axel Schwenke 寫道: > Jerry Stuckle <jstucklex@attglobal.net> wrote: > > Axel Schwenke wrote: > > >> again: the advantages from using fixed length records > >> cannot easily be quantified. In most cases the positive effects are > >> overestimated. > > > > lets take an simple example. We have a table with five columns, > > all 40 chars long. We are doing a table scan, checking for a specific value in > > the 3rd column. > > [lengthy description snipped] > > > This is a significant amount of additional processing for VARCHAR fields. > > Jerry, you are right. Processing variable length records is more > expensive than processing fixed length records. But it's only CPU > cycles. Real world databases are always I/O bound, never CPU bound. > > Lets do another calculation. Assume we have a 1GHz CPU and a hard disk > with 5ms avg. seek time. Then for each seek of the hard disk we have > 5.000.000 CPU cycles to spare. You can do a lot of processing with 5 > million CPU cycles. > > Another calculation. Lets stay with 200 byte records. Assume we get > 20MB/s from the hard disks. Thats 100.000 rows per second. Then we can > spend 10.000 CPU cycles per row without losing speed. You don't need > 10.000 cycles to read a few length fields from the record and add them. > (remember: the record is already in memory) > > Also, nobody reads single rows for table scans. MyISAM reads hunks of > 8MB (default read_buffer_size, can be increased). There should be many > records in this 8MB buffer. The only uncomfortability from variable > length records is the copying of the last, probably incomplete record > to the buffer start to complete it with the next read. This effect > will be negligible if 100 or more rows fit in the read buffer. > > > XL > -- > Axel Schwenke, Senior Software Developer, MySQL AB > > Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/ > MySQL User Forums: http://forums.mysql.com/ thanks.... in fact, currently our solution is always divide a table into two sub-tables - static part (fixed length) & dynamic part (varchar/text) table join with other primary tables will frequently use the static part so as to speed up the query of the overall system. Of coz there will additional overhead when you need to call the dynamic part for detail information, as you need to perform join on the static part & dynamic part instead of just a single query. |
| |||
| Axel Schwenke wrote: > Jerry Stuckle <jstucklex@attglobal.net> wrote: > >>Axel Schwenke wrote: > > >>>again: the advantages from using fixed length records >>>cannot easily be quantified. In most cases the positive effects are >>>overestimated. >> >>lets take an simple example. We have a table with five columns, >>all 40 chars long. We are doing a table scan, checking for a specific value in >>the 3rd column. > > > [lengthy description snipped] > > >>This is a significant amount of additional processing for VARCHAR fields. > > > Jerry, you are right. Processing variable length records is more > expensive than processing fixed length records. But it's only CPU > cycles. Real world databases are always I/O bound, never CPU bound. > Not necessarily. I've seen many cases where the system is running virtually 100% cpu for the database. Everything is in memory, so there is no I/O work being done. Also, remember the both the OS and hardware can buffer data, shorting data access time. And there is nothing such as "Only cpu cycles". Those same cpu cycles are used by the web server, MTA and all kinds of other things. The dbm isn't the only thing running on the machine. Even if it were - you'd still have the overhead of the networking to get data to and from the dbm. > Lets do another calculation. Assume we have a 1GHz CPU and a hard disk > with 5ms avg. seek time. Then for each seek of the hard disk we have > 5.000.000 CPU cycles to spare. You can do a lot of processing with 5 > million CPU cycles. > Yep. But that's also not 5,000,000 instructions. Virtually every instruction takes multiple cycles. Depending on the instruction, it could be as many as 7 or 8 CPU cycles. And don't forget - the OS needs instructions to perform I/O, also. And finally, the CPU must perform the transfer from the disk buffer to RAM, or, if DMA (Direct Memory Access) is available, the CPU must pause while the data bus is being used for the transfer. > Another calculation. Lets stay with 200 byte records. Assume we get > 20MB/s from the hard disks. Thats 100.000 rows per second. Then we can > spend 10.000 CPU cycles per row without losing speed. You don't need > 10.000 cycles to read a few length fields from the record and add them. > (remember: the record is already in memory) > Which is a lot less than 10K instructions - probably around 2-4K machine instructions, maybe fewer. And don't forget - many of those will be in the OS setting up and performing the I/O. In fact, such a request will most likely be CPU bound - it will take much more than 10K CPU cycles to do all the necessary work. > Also, nobody reads single rows for table scans. MyISAM reads hunks of > 8MB (default read_buffer_size, can be increased). There should be many > records in this 8MB buffer. The only uncomfortability from variable > length records is the copying of the last, probably incomplete record > to the buffer start to complete it with the next read. This effect > will be negligible if 100 or more rows fit in the read buffer. > Yes, 8M is the read buffer size. But that doesn't mean it reads 8M in at one time. That would be terribly inefficient. First of all, there is no reason to read 8M if all you want is 200 bytes. That's a lot of time spent transferring data unnecessarily from the disk. Secondly, now you've just wiped any previous operations in your buffer. Even more inefficient. Lets say you have 8K worth of data which is heavily used - in fact every other operation gets data from this 8K. That means you read 8M to get your 8K. Then another operation comes in and reads 8M to get 200 bytes, wiping out your original 8K in the buffer. Now you need to access the original 8K again. Whoops, it's been wiped out by the second read. Go get 8M so you can access your 8K again. I can't think of anything LESS optimal. Individual reads would be much more efficient! > > XL > -- > Axel Schwenke, Senior Software Developer, MySQL AB > > Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/ > MySQL User Forums: http://forums.mysql.com/ -- ================== Remove the "x" from my email address Jerry Stuckle JDS Computer Training Corp. jstucklex@attglobal.net ================== |
| |||
| howachen@gmail.com wrote: > > > thanks.... > > in fact, currently our solution is always divide a table into two > sub-tables - static part (fixed length) & dynamic part (varchar/text) > > table join with other primary tables will frequently use the static > part so as to speed up the query of the overall system. > > Of coz there will additional overhead when you need to call the dynamic > part for detail information, as you need to perform join on the static > part & dynamic part instead of just a single query. > But joining two tables has its own overhead, also. It takes (at least) two seeks to retrieve the data from the disk, for instance. And comparing the extra primary key takes additional time. Your "solution" may in fact be significantly slower than having everything in one table. Again, you need to test using live data. -- ================== Remove the "x" from my email address Jerry Stuckle JDS Computer Training Corp. jstucklex@attglobal.net ================== |
| ||||
| Jerry Stuckle <jstucklex@attglobal.net> wrote: > Axel Schwenke wrote: >> >> Jerry, you are right. Processing variable length records is more >> expensive than processing fixed length records. But it's only CPU >> cycles. Real world databases are always I/O bound, never CPU bound. > > Not necessarily. I've seen many cases where the system is running virtually 100% cpu for the > database. Everything is in memory, so there is no I/O work being done. I wouldn't call this a /database/ then. It's more like a cache. Also this discussion started with "there may be a severe impact on doing table scans" - your point in <d-2dndCiTaQXHzXZnZ2dnUVZ_vOdnZ2d@comcast.com>. Most of my arguments are specific to that. You're no longer talking about table scans. > the CPU must perform the transfer from the disk buffer to RAM, or, > if DMA (Direct Memory Access) is available, the CPU must pause while > the data bus is being used for the transfer. Are we talking about servers? There always is DMA. Also memory bandwidth is far above the assumed 20MB/s. IIRC it is somewhere in the 1GB/s range. So a running DMA does not hog the bus completely. BTW, my home "server" - in fact a desktop machine with some more RAM and disks - shows about 10% cpu time spent in kernel/interrupts when the I/O system is maxed out (rebuild of soft RAID). For the curious: it's an AMD Athlon64/3000+ with 2GB RAM and 4 SATA disks @400GB each. Running Debian/Sarge x64 edition. >> Another calculation. Lets stay with 200 byte records. Assume we get >> 20MB/s from the hard disks. Thats 100.000 rows per second. Then we can >> spend 10.000 CPU cycles per row without losing speed. You don't need >> 10.000 cycles to read a few length fields from the record and add them. >> (remember: the record is already in memory) >> > > Which is a lot less than 10K instructions - probably around 2-4K > machine instructions, maybe fewer. So how many instructions will it take to read two integers (length of record, length of empty space behind record) from the record buffer and add them to the record pointer? 20, 50 instructions? That's less than 1% of the available. Come on! >> Also, nobody reads single rows for table scans. MyISAM reads hunks of >> 8MB (default read_buffer_size, can be increased). There should be many >> records in this 8MB buffer. The only uncomfortability from variable >> length records is the copying of the last, probably incomplete record >> to the buffer start to complete it with the next read. This effect >> will be negligible if 100 or more rows fit in the read buffer. > > Yes, 8M is the read buffer size. But that doesn't mean it reads 8M > in at one time. Sure it does. For TABLE SCANS. > there is no reason to read 8M if all you want is 200 bytes. We're doing a TABLE SCAN. We want *all* the records. > Secondly, now you've just wiped any previous operations in your buffer. The read buffer is per thread. A MySQL thread will only do one thing at a time. This time it's doing a TABLE SCAN. Jerry, sorry to say so, but I don't see any progress is this discussion. I'm feeling I'm wasting my time. Unless you can come up with some hard facts, substanciating your claims - please stop it! Thanks. XL -- Axel Schwenke, Senior Software Developer, MySQL AB Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/ MySQL User Forums: http://forums.mysql.com/ |