This is a discussion on storage optimization for read-only table within the MySQL forums, part of the Database Server Software category; --> Hello, According to MySQL document: The expected row length for dynamic-sized rows is calculated using the following expression: 3 ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hello, According to MySQL document: The expected row length for dynamic-sized rows is calculated using the following expression: 3 + (number of columns + 7) / 8 + (number of char columns) + (packed size of numeric columns) + (length of strings) + (number of NULL columns + 7) / 8 However, I found that this formula is not accurate for the following trivial case. mysql> desc vc2; +-------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+-------------+------+-----+---------+-------+ | c | varchar(64) | YES | | NULL | | +-------+-------------+------+-----+---------+-------+ 1 row in set (0.00 sec) mysql> select * from vc2; +------+ | c | +------+ | '' | | '' | | '' | | '' | | '' | | '' | | '' | | '' | | '' | | '' | +------+ 10 rows in set (0.00 sec) mysql> quit Bye $ ls -lt vc2* -rw-rw---- 1 mysql users 200 Nov 4 11:02 vc2.MYD -rw-rw---- 1 mysql users 1024 Nov 4 11:02 vc2.MYI -rw-rw---- 1 mysql users 8554 Nov 4 11:02 vc2.frm $ od -x vc2.MYD 0000000 0003 0b05 fe01 2702 0027 0000 0000 0000 0000020 0000 0000 0003 0b05 fe01 2702 0027 0000 0000040 0000 0000 0000 0000 0003 0b05 fe01 2702 0000060 0027 0000 0000 0000 0000 0000 0003 0b05 0000100 fe01 2702 0027 0000 0000 0000 0000 0000 0000120 0003 0b05 fe01 2702 0027 0000 0000 0000 0000140 0000 0000 0003 0b05 fe01 2702 0027 0000 0000160 0000 0000 0000 0000 0003 0b05 fe01 2702 0000200 0027 0000 0000 0000 0000 0000 0003 0b05 0000220 fe01 2702 0027 0000 0000 0000 0000 0000 0000240 0003 0b05 fe01 2702 0027 0000 0000 0000 0000260 0000 0000 0003 0b05 fe01 2702 0027 0000 0000300 0000 0000 0000 0000 0000310 Note that for 10 empty string values, the disk storage is 200 bytes. If we use the fomula, each row should have a length of 3 + (1 + 7) / 8 + (0) + (0) + (1) + (1 + 7) / 8 = 6 So for 10 rows, the expected storage size is 60 bytes. Where does the extra 140 bytes used for? I have a case where the table is used only for archiving so it is a write- once and read-only table. Is there any special syntax for the create table statement to optimize disk storage without doing compression? Thanks, Minghui |
| |||
| mhuiyang@yahoo.com wrote: > > According to MySQL document: What document? The most detailed document (besides the source code) is the internals manual: http://dev.mysql.com/doc/internals/en/index.html > mysql> desc vc2; > +-------+-------------+------+-----+---------+-------+ >| Field | Type | Null | Key | Default | Extra | > +-------+-------------+------+-----+---------+-------+ >| c | varchar(64) | YES | | NULL | | > +-------+-------------+------+-----+---------+-------+ > mysql> select * from vc2; > +------+ >| c | > +------+ >| '' | .... > +------+ > 10 rows in set (0.00 sec) > > $ ls -lt vc2* > -rw-rw---- 1 mysql users 200 Nov 4 11:02 vc2.MYD > -rw-rw---- 1 mysql users 1024 Nov 4 11:02 vc2.MYI > -rw-rw---- 1 mysql users 8554 Nov 4 11:02 vc2.frm > $ od -x vc2.MYD [snip] do not format with -x, use -t x1 for better formatting: ~ $od -Ax -t x1 .../t1.MYD 000000 03 00 04 0c fe 02 27 27 00 00 00 00 00 00 00 00 000010 00 00 00 00 03 00 04 0c fe 02 27 27 00 00 00 00 000020 00 00 00 00 00 00 00 00 03 00 04 0c fe 02 27 27 000030 00 00 00 00 00 00 00 00 00 00 00 00 03 00 04 0c 000040 fe 02 27 27 00 00 00 00 00 00 00 00 00 00 00 00 000050 03 00 04 0c fe 02 27 27 00 00 00 00 00 00 00 00 000060 00 00 00 00 03 00 04 0c fe 02 27 27 00 00 00 00 000070 00 00 00 00 00 00 00 00 03 00 04 0c fe 02 27 27 000080 00 00 00 00 00 00 00 00 00 00 00 00 03 00 04 0c 000090 fe 02 27 27 00 00 00 00 00 00 00 00 00 00 00 00 0000a0 03 00 04 0c fe 02 27 27 00 00 00 00 00 00 00 00 0000b0 00 00 00 00 03 00 04 0c fe 02 27 27 00 00 00 00 0000c0 00 00 00 00 00 00 00 00 > Note that for 10 empty string values, the disk storage is 200 bytes. Right. In fact there is a minumum row size of 2+2+16 = 20 bytes if you use dynamic row format. Also your strings are not empty but contain the value "''" (two single quote characters). So the first record is from 0x00 to 0x13. 03 00 ... is the record header 04 ...... number of bytes actually used by the record 0c ...... number of unused bytes, the sum of used + unused is >= 0x10 --- used bytes start here fe ...... overflow pointer + flags 02 ...... length of the VARCHAR column 27 27 ... the data itself: 0x27 = "'" --- unused bytes start here rest .... filler bytes 12x 00 XL -- Axel Schwenke, Senior Software Developer, MySQL AB Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/ MySQL User Forums: http://forums.mysql.com/ |
| |||
| Thanks a lot for the answer. > > What document? The most detailed document (besides the source code) is > the internals manual: http://dev.mysql.com/doc/internals/en/index.html > http://dev.mysql.com/doc/refman/4.1/...ic-format.html Section 14.1.3.2. Dynamic Table Characteristics I checked 5.0 and 5.1 manuals and they have the same formula. > do not format with -x, use -t x1 for better formatting: That does look better. > Right. In fact there is a minumum row size of 2+2+16 = 20 bytes if you > use dynamic row format. Also your strings are not empty but contain the > value "''" (two single quote characters). > You are right. I was using "load data infile" and I thought that I had put empty strings by writing '' in the file. I have another question: when adding a varchar column that mostly stores empty string (or NULL if that saves more space) does not cost more disk space? I guess it has to do with the existing number of columns and check whether it is on the verge of 8's multiple or not. It appears that in the worst case, adding a nullable varchar column that stores null or empty string will cost 1 byte per row. Am I right? |
| ||||
| mhuiyang@yahoo.com wrote: >> >> What document? The most detailed document (besides the source code) is >> the internals manual: http://dev.mysql.com/doc/internals/en/index.html >> > http://dev.mysql.com/doc/refman/4.1/...ic-format.html > Section 14.1.3.2. Dynamic Table Characteristics > I checked 5.0 and 5.1 manuals and they have the same formula. Hmm. This seems to be inaccurate at best. Could you please file a documentation bug at http://bugs.mysql.com ? >> In fact there is a minumum row size of 2+2+16 = 20 bytes if you >> use dynamic row format. > > I have another question: when adding a varchar column that mostly > stores empty string (or NULL if that saves more space) does not cost > more disk space? I guess it has to do with the existing number of > columns and check whether it is on the verge of 8's multiple or not. It > appears that in the worst case, adding a nullable varchar column that > stores null or empty string will cost 1 byte per row. Am I right? Each nullable column costs you another bit in the flags field. Since space in the record is allocated in bytes, you can have up to 7 unused bits here. If a nullable column *is* NULL, it does not take any additional space. However there is still the minimum length of 16 bytes for the data part. So your original table with a single, nullable VARCHAR(64) column will yield 20 bytes/row as long as - the column is NULL - the column contains no more than 14 characters HTH, XL -- Axel Schwenke, Senior Software Developer, MySQL AB Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/ MySQL User Forums: http://forums.mysql.com/ |