This is a discussion on Help with table design ( over 2.000.000 row ) within the MySQL forums, part of the Database Server Software category; --> Hi All I've started a project connect with geolocation IP adresses I have 2 table, one with all the ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi All I've started a project connect with geolocation IP adresses I have 2 table, one with all the iprange and one with the location Table block over 2.800.000 row CREATE TABLE `block` ( `startIpNum` bigint(15) NOT NULL, `endIpNum` bigint(15) NOT NULL, `locId` int(11) NOT NULL, PRIMARY KEY (`startIpNum`,`endIpNum`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; Field name min value max value min / max lenght startIpNum 33996344 3741315072 8 /10 endIpnum 33996351 3741319167 8 /10 locId 2 196511 1/6 table location 200.000 row CREATE TABLE `location` ( `locId` int(9) NOT NULL, `country` text collate utf8_unicode_ci, `region` varchar(50) collate utf8_unicode_ci default NULL, `city` varchar(50) collate utf8_unicode_ci default NULL, `latitude` int(20) NOT NULL, `longitude` int(20) NOT NULL, KEY `locId` (`locId`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; the query I use SELECT country,region,city,latitude,longitude FROM location WHERE locId = ( SELECT locId FROM block WHERE startIpNum < INET_ATON( '74.208.9.80' ) AND endIpNum >= INET_ATON( '74.208.9.80' ) )LiMIT 0,1 at the moment on my developing machine I'm getting the result in 0,42 sec average. Is there any way to improve it ? The application I'm developing will make request between 50 and 70 ip at time. Also in my case is beater to make a request for each ip or i can pass an array of ip via the query ? The ip they will be different for each request as i will cache the requests Many Thanks |
| |||
| On 6 Jun, 09:32, Daniele <dankan775removet...@yahoo.it> wrote: > Hi All > I've started a project connect with geolocation IP adresses > I have 2 table, one with all the iprange and one with the location > Table block over 2.800.000 row > CREATE TABLE `block` ( > `startIpNum` bigint(15) NOT NULL, > `endIpNum` bigint(15) NOT NULL, > `locId` int(11) NOT NULL, > PRIMARY KEY (`startIpNum`,`endIpNum`) > ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; > Field name min value max value min / max lenght > startIpNum 33996344 3741315072 8 /10 > endIpnum 33996351 3741319167 8 /10 > locId 2 196511 1/6 > > table location 200.000 row > CREATE TABLE `location` ( > `locId` int(9) NOT NULL, > `country` text collate utf8_unicode_ci, > `region` varchar(50) collate utf8_unicode_ci default NULL, > `city` varchar(50) collate utf8_unicode_ci default NULL, > `latitude` int(20) NOT NULL, > `longitude` int(20) NOT NULL, > KEY `locId` (`locId`) > ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; > > the query I use > SELECT country,region,city,latitude,longitude > FROM location > WHERE locId = ( > SELECT locId > FROM block > WHERE startIpNum < INET_ATON( '74.208.9.80' ) > AND endIpNum >= INET_ATON( '74.208.9.80' ) > )LiMIT 0,1 > > at the moment on my developing machine I'm getting the result in 0,42 > sec average. > Is there any way to improve it ? > The application I'm developing will make request between 50 and 70 ip at > time. > Also in my case is beater to make a request for each ip or i can pass an > array of ip via the query ? > The ip they will be different for each request as i will cache the requests > Many Thanks Use a JOIN instead of a sub-select. |
| |||
| Captain Paralytic wrote: > On 6 Jun, 09:32, Daniele <dankan775removet...@yahoo.it> wrote: >> Hi All >> I've started a project connect with geolocation IP adresses >> I have 2 table, one with all the iprange and one with the location >> Table block over 2.800.000 row >> CREATE TABLE `block` ( >> `startIpNum` bigint(15) NOT NULL, >> `endIpNum` bigint(15) NOT NULL, >> `locId` int(11) NOT NULL, >> PRIMARY KEY (`startIpNum`,`endIpNum`) >> ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; >> Field name min value max value min / max lenght >> startIpNum 33996344 3741315072 8 /10 >> endIpnum 33996351 3741319167 8 /10 >> locId 2 196511 1/6 >> >> table location 200.000 row >> CREATE TABLE `location` ( >> `locId` int(9) NOT NULL, >> `country` text collate utf8_unicode_ci, >> `region` varchar(50) collate utf8_unicode_ci default NULL, >> `city` varchar(50) collate utf8_unicode_ci default NULL, >> `latitude` int(20) NOT NULL, >> `longitude` int(20) NOT NULL, >> KEY `locId` (`locId`) >> ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; >> >> the query I use >> SELECT country,region,city,latitude,longitude >> FROM location >> WHERE locId = ( >> SELECT locId >> FROM block >> WHERE startIpNum < INET_ATON( '74.208.9.80' ) >> AND endIpNum >= INET_ATON( '74.208.9.80' ) >> )LiMIT 0,1 >> >> at the moment on my developing machine I'm getting the result in 0,42 >> sec average. >> Is there any way to improve it ? >> The application I'm developing will make request between 50 and 70 ip at >> time. >> Also in my case is beater to make a request for each ip or i can pass an >> array of ip via the query ? >> The ip they will be different for each request as i will cache the requests >> Many Thanks > > Use a JOIN instead of a sub-select. I read about join is not advice on big table as it will create a instance of the 2 table in one but I will try thanks |
| |||
| On 6 Jun, 11:49, Daniele <dankan775removet...@yahoo.it> wrote: > Captain Paralytic wrote: > > On 6 Jun, 09:32, Daniele <dankan775removet...@yahoo.it> wrote: > >> Hi All > >> I've started a project connect with geolocation IP adresses > >> I have 2 table, one with all the iprange and one with the location > >> Table block over 2.800.000 row > >> CREATE TABLE `block` ( > >> * *`startIpNum` bigint(15) NOT NULL, > >> * *`endIpNum` bigint(15) NOT NULL, > >> * *`locId` int(11) NOT NULL, > >> * *PRIMARY KEY *(`startIpNum`,`endIpNum`) > >> ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; > >> Field name * * *min value * * * max value * * * min /max lenght > >> startIpNum * * *33996344 * * * *3741315072 * * *8 /10 > >> endIpnum * * * *33996351 * * * *3741319167 * * *8/10 > >> locId * * * * * 2 * * * * * * * 196511 * * * * *1/6 > > >> table location 200.000 row > >> CREATE TABLE `location` ( > >> * *`locId` int(9) NOT NULL, > >> * *`country` text collate utf8_unicode_ci, > >> * *`region` varchar(50) collate utf8_unicode_ci default NULL, > >> * *`city` varchar(50) collate utf8_unicode_ci default NULL, > >> * *`latitude` int(20) NOT NULL, > >> * *`longitude` int(20) NOT NULL, > >> * *KEY `locId` (`locId`) > >> ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; > > >> the query I use > >> SELECT country,region,city,latitude,longitude > >> FROM location > >> WHERE locId = ( > >> * * * * SELECT locId > >> * * * * FROM block > >> * * * * WHERE startIpNum < INET_ATON( '74.208.9.80' ) > >> * * * * AND endIpNum >= INET_ATON( '74.208.9.80' ) > >> )LiMIT 0,1 > > >> at the moment on my developing machine I'm getting the result in 0,42 > >> sec average. > >> Is there any way to improve it ? > >> The application I'm developing will make request between 50 and 70 ip at > >> time. > >> Also in my case is beater to make a request for each ip or i can pass an > >> array of ip via the query ? > >> The ip they will be different for each request as i will cache the requests > >> Many Thanks > > > Use a JOIN instead of a sub-select. > > I read about join is not advice on big table as it will create a > instance of the 2 table in one > but I will try > thanks- Hide quoted text - > > - Show quoted text - WHAT??? Please could you point me to where you read that? |
| |||
| Captain Paralytic wrote: >> I read about join is not advice on big table as it will create a >> instance of the 2 table in one >> but I will try >> thanks- Hide quoted text - >> >> - Show quoted text - > > WHAT??? > > Please could you point me to where you read that? from:http://www.mysqlperformanceblog.com/...-large-tables/ I think that my speed reading jump an important part( the last line of this paragraph "Here is good example. As we saw my 30mil rows (12GB) table was scanned in less than 5 minutes. Now if we would do eq join of the table to other 30mil rows table and it will be completely random. We'll need to perform 30 millions of random row reads, which gives us 300.000 seconds with 100 rows/sec rate. So we would go from 5 minutes to almost 4 days if we need to do the join. Some people assume join would be close to two full table scans (as 60mil of rows need to be read) - this is way wrong." but i'm not sure about this "Avoid joins to large tables Joining of large data sets using nested loops is very expensive. Try to avoid it. Joins to smaller tables is OK but you might want to preload them to memory before join so there is no random IO needed to populate the caches." |
| |||
| On 6 Jun, 15:08, Daniele <dankan775removet...@yahoo.it> wrote: > Captain Paralytic wrote: > >> I read about join is not advice on big table as it will create a > >> instance of the 2 table in one > >> but I will try > >> thanks- Hide quoted text - > > >> - Show quoted text - > > > WHAT??? > > > Please could you point me to where you read that? > > from:http://www.mysqlperformanceblog.com/...l-could-be-slo.... > I think that my speed reading jump an important part( the last line of > this paragraph > > "Here is good example. As we saw my 30mil rows (12GB) table was scanned > in less than 5 minutes. Now if we would do eq join of the table to other > 30mil rows table and it will be completely random. We'll need to perform > 30 millions of random row reads, which gives us 300.000 seconds with 100 > rows/sec rate. So we would go from 5 minutes to almost 4 days if we need > to do the join. Some people assume join would be close to two full table > scans (as 60mil of rows need to be read) - this is way wrong." > > but i'm not sure about this > "Avoid joins to large tables *Joining of large data sets using nested > loops is very expensive. Try to avoid it. Joins to smaller tables is OK > but you might want to preload them to memory before join so there is no > random IO needed to populate the caches." Oh pleeeese! This is nothing like what you are trying to do. |
| |||
| Captain Paralytic wrote: > On 6 Jun, 15:08, Daniele <dankan775removet...@yahoo.it> wrote: >> Captain Paralytic wrote: >>>> I read about join is not advice on big table as it will create a >>>> instance of the 2 table in one >>>> but I will try >>>> thanks- Hide quoted text - >>>> - Show quoted text - >>> WHAT??? >>> Please could you point me to where you read that? >> from:http://www.mysqlperformanceblog.com/...l-could-be-slo... >> I think that my speed reading jump an important part( the last line of >> this paragraph >> >> "Here is good example. As we saw my 30mil rows (12GB) table was scanned >> in less than 5 minutes. Now if we would do eq join of the table to other >> 30mil rows table and it will be completely random. We'll need to perform >> 30 millions of random row reads, which gives us 300.000 seconds with 100 >> rows/sec rate. So we would go from 5 minutes to almost 4 days if we need >> to do the join. Some people assume join would be close to two full table >> scans (as 60mil of rows need to be read) - this is way wrong." >> >> but i'm not sure about this >> "Avoid joins to large tables Joining of large data sets using nested >> loops is very expensive. Try to avoid it. Joins to smaller tables is OK >> but you might want to preload them to memory before join so there is no >> random IO needed to populate the caches." > > Oh pleeeese! > > This is nothing like what you are trying to do. Sorry but do you mean I don't have to worry about of the size of my table ? Thanks |
| |||
| Captain Paralytic wrote: > On 6 Jun, 09:32, Daniele <dankan775removet...@yahoo.it> wrote: >> Hi All >> I've started a project connect with geolocation IP adresses >> I have 2 table, one with all the iprange and one with the location >> Table block over 2.800.000 row >> CREATE TABLE `block` ( >> `startIpNum` bigint(15) NOT NULL, >> `endIpNum` bigint(15) NOT NULL, >> `locId` int(11) NOT NULL, >> PRIMARY KEY (`startIpNum`,`endIpNum`) >> ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; >> Field name min value max value min / max lenght >> startIpNum 33996344 3741315072 8 /10 >> endIpnum 33996351 3741319167 8 /10 >> locId 2 196511 1/6 >> >> table location 200.000 row >> CREATE TABLE `location` ( >> `locId` int(9) NOT NULL, >> `country` text collate utf8_unicode_ci, >> `region` varchar(50) collate utf8_unicode_ci default NULL, >> `city` varchar(50) collate utf8_unicode_ci default NULL, >> `latitude` int(20) NOT NULL, >> `longitude` int(20) NOT NULL, >> KEY `locId` (`locId`) >> ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; >> >> the query I use >> SELECT country,region,city,latitude,longitude >> FROM location >> WHERE locId = ( >> SELECT locId >> FROM block >> WHERE startIpNum < INET_ATON( '74.208.9.80' ) >> AND endIpNum >= INET_ATON( '74.208.9.80' ) >> )LiMIT 0,1 >> >> at the moment on my developing machine I'm getting the result in 0,42 >> sec average. >> Is there any way to improve it ? >> The application I'm developing will make request between 50 and 70 ip at >> time. >> Also in my case is beater to make a request for each ip or i can pass an >> array of ip via the query ? >> The ip they will be different for each request as i will cache the requests >> Many Thanks > > Use a JOIN instead of a sub-select. I'm new to the join i'm using this query but is slower (3.5sec ) than the subquery i posted on top (0.4sec ) SELECT location.country, location.region, location.city, location.latitude, location.longitude FROM location RIGHT JOIN block ON location.locId = block.locId WHERE block.startIpNum < INET_ATON( '74.208.9.81' ) AND block.endIpNum >= INET_ATON( '74.208.9.81' ) LIMIT 0 , 30 I'm writing it wrong ? any help ? Direction ? Thanks |
| |||
| Daniele wrote: > Captain Paralytic wrote: >>> I read about join is not advice on big table as it will create a >>> instance of the 2 table in one >>> but I will try >>> thanks- Hide quoted text - >>> >>> - Show quoted text - >> >> WHAT??? >> >> Please could you point me to where you read that? > from:http://www.mysqlperformanceblog.com/...-large-tables/ > > I think that my speed reading jump an important part( the last line of > this paragraph > > "Here is good example. As we saw my 30mil rows (12GB) table was scanned > in less than 5 minutes. Now if we would do eq join of the table to other > 30mil rows table and it will be completely random. We'll need to perform > 30 millions of random row reads, which gives us 300.000 seconds with 100 > rows/sec rate. So we would go from 5 minutes to almost 4 days if we need > to do the join. Some people assume join would be close to two full table > scans (as 60mil of rows need to be read) - this is way wrong." > > but i'm not sure about this > "Avoid joins to large tables Joining of large data sets using nested > loops is very expensive. Try to avoid it. Joins to smaller tables is OK > but you might want to preload them to memory before join so there is no > random IO needed to populate the caches." > In addition to what Paul said - we have no idea what the poster's setup is. I suspect there were a number of optimizations he could have done to make this a lot faster - faster than a subselect, even. I've never seen a query take 4 days on ANY MySQL system - no matter how large. For it to take this long indicates some severe problems with the system configuration. It could even be lack of the correct indexes. We don't know. -- ================== Remove the "x" from my email address Jerry Stuckle JDS Computer Training Corp. jstucklex@attglobal.net ================== |
| ||||
| Daniele schreef: > Captain Paralytic wrote: >> On 6 Jun, 09:32, Daniele <dankan775removet...@yahoo.it> wrote: >>> Hi All >>> I've started a project connect with geolocation IP adresses >>> I have 2 table, one with all the iprange and one with the location >>> Table block over 2.800.000 row >>> CREATE TABLE `block` ( >>> `startIpNum` bigint(15) NOT NULL, >>> `endIpNum` bigint(15) NOT NULL, >>> `locId` int(11) NOT NULL, >>> PRIMARY KEY (`startIpNum`,`endIpNum`) >>> ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; >>> Field name min value max value min / max lenght >>> startIpNum 33996344 3741315072 8 /10 >>> endIpnum 33996351 3741319167 8 /10 >>> locId 2 196511 1/6 >>> >>> table location 200.000 row >>> CREATE TABLE `location` ( >>> `locId` int(9) NOT NULL, >>> `country` text collate utf8_unicode_ci, >>> `region` varchar(50) collate utf8_unicode_ci default NULL, >>> `city` varchar(50) collate utf8_unicode_ci default NULL, >>> `latitude` int(20) NOT NULL, >>> `longitude` int(20) NOT NULL, >>> KEY `locId` (`locId`) >>> ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; >>> >>> the query I use >>> SELECT country,region,city,latitude,longitude >>> FROM location >>> WHERE locId = ( >>> SELECT locId >>> FROM block >>> WHERE startIpNum < INET_ATON( '74.208.9.80' ) >>> AND endIpNum >= INET_ATON( '74.208.9.80' ) >>> )LiMIT 0,1 >>> >>> at the moment on my developing machine I'm getting the result in 0,42 >>> sec average. >>> Is there any way to improve it ? >>> The application I'm developing will make request between 50 and 70 ip at >>> time. >>> Also in my case is beater to make a request for each ip or i can pass an >>> array of ip via the query ? >>> The ip they will be different for each request as i will cache the >>> requests >>> Many Thanks >> >> Use a JOIN instead of a sub-select. > I'm new to the join > i'm using this query but is slower (3.5sec ) than the subquery i posted > on top (0.4sec ) > > SELECT location.country, location.region, location.city, > location.latitude, location.longitude > FROM location > RIGHT JOIN block ON location.locId = block.locId > WHERE block.startIpNum < INET_ATON( '74.208.9.81' ) > AND block.endIpNum >= INET_ATON( '74.208.9.81' ) > LIMIT 0 , 30 > > I'm writing it wrong ? > any help ? Direction ? > Thanks probably an index on 'block.locId' could help... but please, also post the results of EXPLAIN <your query> (see: http://dev.mysql.com/doc/refman/5.0/en/explain.html) -- Luuk |