Unix Technical Forum

Help with table design ( over 2.000.000 row )

This is a discussion on Help with table design ( over 2.000.000 row ) within the MySQL forums, part of the Database Server Software category; --> Hi All I've started a project connect with geolocation IP adresses I have 2 table, one with all the ...


Go Back   Unix Technical Forum > Database Server Software > MySQL

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 06-06-2008, 11:10 AM
Daniele
 
Posts: n/a
Default Help with table design ( over 2.000.000 row )

Hi All
I've started a project connect with geolocation IP adresses
I have 2 table, one with all the iprange and one with the location
Table block over 2.800.000 row
CREATE TABLE `block` (
`startIpNum` bigint(15) NOT NULL,
`endIpNum` bigint(15) NOT NULL,
`locId` int(11) NOT NULL,
PRIMARY KEY (`startIpNum`,`endIpNum`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Field name min value max value min / max lenght
startIpNum 33996344 3741315072 8 /10
endIpnum 33996351 3741319167 8 /10
locId 2 196511 1/6

table location 200.000 row
CREATE TABLE `location` (
`locId` int(9) NOT NULL,
`country` text collate utf8_unicode_ci,
`region` varchar(50) collate utf8_unicode_ci default NULL,
`city` varchar(50) collate utf8_unicode_ci default NULL,
`latitude` int(20) NOT NULL,
`longitude` int(20) NOT NULL,
KEY `locId` (`locId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

the query I use
SELECT country,region,city,latitude,longitude
FROM location
WHERE locId = (
SELECT locId
FROM block
WHERE startIpNum < INET_ATON( '74.208.9.80' )
AND endIpNum >= INET_ATON( '74.208.9.80' )
)LiMIT 0,1

at the moment on my developing machine I'm getting the result in 0,42
sec average.
Is there any way to improve it ?
The application I'm developing will make request between 50 and 70 ip at
time.
Also in my case is beater to make a request for each ip or i can pass an
array of ip via the query ?
The ip they will be different for each request as i will cache the requests
Many Thanks

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 06-06-2008, 11:10 AM
Captain Paralytic
 
Posts: n/a
Default Re: Help with table design ( over 2.000.000 row )

On 6 Jun, 09:32, Daniele <dankan775removet...@yahoo.it> wrote:
> Hi All
> I've started a project connect with geolocation IP adresses
> I have 2 table, one with all the iprange and one with the location
> Table block over 2.800.000 row
> CREATE TABLE `block` (
> `startIpNum` bigint(15) NOT NULL,
> `endIpNum` bigint(15) NOT NULL,
> `locId` int(11) NOT NULL,
> PRIMARY KEY (`startIpNum`,`endIpNum`)
> ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
> Field name min value max value min / max lenght
> startIpNum 33996344 3741315072 8 /10
> endIpnum 33996351 3741319167 8 /10
> locId 2 196511 1/6
>
> table location 200.000 row
> CREATE TABLE `location` (
> `locId` int(9) NOT NULL,
> `country` text collate utf8_unicode_ci,
> `region` varchar(50) collate utf8_unicode_ci default NULL,
> `city` varchar(50) collate utf8_unicode_ci default NULL,
> `latitude` int(20) NOT NULL,
> `longitude` int(20) NOT NULL,
> KEY `locId` (`locId`)
> ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
>
> the query I use
> SELECT country,region,city,latitude,longitude
> FROM location
> WHERE locId = (
> SELECT locId
> FROM block
> WHERE startIpNum < INET_ATON( '74.208.9.80' )
> AND endIpNum >= INET_ATON( '74.208.9.80' )
> )LiMIT 0,1
>
> at the moment on my developing machine I'm getting the result in 0,42
> sec average.
> Is there any way to improve it ?
> The application I'm developing will make request between 50 and 70 ip at
> time.
> Also in my case is beater to make a request for each ip or i can pass an
> array of ip via the query ?
> The ip they will be different for each request as i will cache the requests
> Many Thanks


Use a JOIN instead of a sub-select.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 06-06-2008, 11:10 AM
Daniele
 
Posts: n/a
Default Re: Help with table design ( over 2.000.000 row )

Captain Paralytic wrote:
> On 6 Jun, 09:32, Daniele <dankan775removet...@yahoo.it> wrote:
>> Hi All
>> I've started a project connect with geolocation IP adresses
>> I have 2 table, one with all the iprange and one with the location
>> Table block over 2.800.000 row
>> CREATE TABLE `block` (
>> `startIpNum` bigint(15) NOT NULL,
>> `endIpNum` bigint(15) NOT NULL,
>> `locId` int(11) NOT NULL,
>> PRIMARY KEY (`startIpNum`,`endIpNum`)
>> ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
>> Field name min value max value min / max lenght
>> startIpNum 33996344 3741315072 8 /10
>> endIpnum 33996351 3741319167 8 /10
>> locId 2 196511 1/6
>>
>> table location 200.000 row
>> CREATE TABLE `location` (
>> `locId` int(9) NOT NULL,
>> `country` text collate utf8_unicode_ci,
>> `region` varchar(50) collate utf8_unicode_ci default NULL,
>> `city` varchar(50) collate utf8_unicode_ci default NULL,
>> `latitude` int(20) NOT NULL,
>> `longitude` int(20) NOT NULL,
>> KEY `locId` (`locId`)
>> ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
>>
>> the query I use
>> SELECT country,region,city,latitude,longitude
>> FROM location
>> WHERE locId = (
>> SELECT locId
>> FROM block
>> WHERE startIpNum < INET_ATON( '74.208.9.80' )
>> AND endIpNum >= INET_ATON( '74.208.9.80' )
>> )LiMIT 0,1
>>
>> at the moment on my developing machine I'm getting the result in 0,42
>> sec average.
>> Is there any way to improve it ?
>> The application I'm developing will make request between 50 and 70 ip at
>> time.
>> Also in my case is beater to make a request for each ip or i can pass an
>> array of ip via the query ?
>> The ip they will be different for each request as i will cache the requests
>> Many Thanks

>
> Use a JOIN instead of a sub-select.

I read about join is not advice on big table as it will create a
instance of the 2 table in one
but I will try
thanks
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 06-06-2008, 05:22 PM
Captain Paralytic
 
Posts: n/a
Default Re: Help with table design ( over 2.000.000 row )

On 6 Jun, 11:49, Daniele <dankan775removet...@yahoo.it> wrote:
> Captain Paralytic wrote:
> > On 6 Jun, 09:32, Daniele <dankan775removet...@yahoo.it> wrote:
> >> Hi All
> >> I've started a project connect with geolocation IP adresses
> >> I have 2 table, one with all the iprange and one with the location
> >> Table block over 2.800.000 row
> >> CREATE TABLE `block` (
> >> * *`startIpNum` bigint(15) NOT NULL,
> >> * *`endIpNum` bigint(15) NOT NULL,
> >> * *`locId` int(11) NOT NULL,
> >> * *PRIMARY KEY *(`startIpNum`,`endIpNum`)
> >> ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
> >> Field name * * *min value * * * max value * * * min /max lenght
> >> startIpNum * * *33996344 * * * *3741315072 * * *8 /10
> >> endIpnum * * * *33996351 * * * *3741319167 * * *8/10
> >> locId * * * * * 2 * * * * * * * 196511 * * * * *1/6

>
> >> table location 200.000 row
> >> CREATE TABLE `location` (
> >> * *`locId` int(9) NOT NULL,
> >> * *`country` text collate utf8_unicode_ci,
> >> * *`region` varchar(50) collate utf8_unicode_ci default NULL,
> >> * *`city` varchar(50) collate utf8_unicode_ci default NULL,
> >> * *`latitude` int(20) NOT NULL,
> >> * *`longitude` int(20) NOT NULL,
> >> * *KEY `locId` (`locId`)
> >> ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

>
> >> the query I use
> >> SELECT country,region,city,latitude,longitude
> >> FROM location
> >> WHERE locId = (
> >> * * * * SELECT locId
> >> * * * * FROM block
> >> * * * * WHERE startIpNum < INET_ATON( '74.208.9.80' )
> >> * * * * AND endIpNum >= INET_ATON( '74.208.9.80' )
> >> )LiMIT 0,1

>
> >> at the moment on my developing machine I'm getting the result in 0,42
> >> sec average.
> >> Is there any way to improve it ?
> >> The application I'm developing will make request between 50 and 70 ip at
> >> time.
> >> Also in my case is beater to make a request for each ip or i can pass an
> >> array of ip via the query ?
> >> The ip they will be different for each request as i will cache the requests
> >> Many Thanks

>
> > Use a JOIN instead of a sub-select.

>
> I read about join is not advice on big table as it will create a
> instance of the 2 table in one
> but I will try
> thanks- Hide quoted text -
>
> - Show quoted text -


WHAT???

Please could you point me to where you read that?
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 06-06-2008, 05:22 PM
Daniele
 
Posts: n/a
Default Re: Help with table design ( over 2.000.000 row )

Captain Paralytic wrote:
>> I read about join is not advice on big table as it will create a
>> instance of the 2 table in one
>> but I will try
>> thanks- Hide quoted text -
>>
>> - Show quoted text -

>
> WHAT???
>
> Please could you point me to where you read that?

from:http://www.mysqlperformanceblog.com/...-large-tables/
I think that my speed reading jump an important part( the last line of
this paragraph

"Here is good example. As we saw my 30mil rows (12GB) table was scanned
in less than 5 minutes. Now if we would do eq join of the table to other
30mil rows table and it will be completely random. We'll need to perform
30 millions of random row reads, which gives us 300.000 seconds with 100
rows/sec rate. So we would go from 5 minutes to almost 4 days if we need
to do the join. Some people assume join would be close to two full table
scans (as 60mil of rows need to be read) - this is way wrong."

but i'm not sure about this
"Avoid joins to large tables Joining of large data sets using nested
loops is very expensive. Try to avoid it. Joins to smaller tables is OK
but you might want to preload them to memory before join so there is no
random IO needed to populate the caches."
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 06-06-2008, 05:22 PM
Captain Paralytic
 
Posts: n/a
Default Re: Help with table design ( over 2.000.000 row )

On 6 Jun, 15:08, Daniele <dankan775removet...@yahoo.it> wrote:
> Captain Paralytic wrote:
> >> I read about join is not advice on big table as it will create a
> >> instance of the 2 table in one
> >> but I will try
> >> thanks- Hide quoted text -

>
> >> - Show quoted text -

>
> > WHAT???

>
> > Please could you point me to where you read that?

>
> from:http://www.mysqlperformanceblog.com/...l-could-be-slo....
> I think that my speed reading jump an important part( the last line of
> this paragraph
>
> "Here is good example. As we saw my 30mil rows (12GB) table was scanned
> in less than 5 minutes. Now if we would do eq join of the table to other
> 30mil rows table and it will be completely random. We'll need to perform
> 30 millions of random row reads, which gives us 300.000 seconds with 100
> rows/sec rate. So we would go from 5 minutes to almost 4 days if we need
> to do the join. Some people assume join would be close to two full table
> scans (as 60mil of rows need to be read) - this is way wrong."
>
> but i'm not sure about this
> "Avoid joins to large tables *Joining of large data sets using nested
> loops is very expensive. Try to avoid it. Joins to smaller tables is OK
> but you might want to preload them to memory before join so there is no
> random IO needed to populate the caches."


Oh pleeeese!

This is nothing like what you are trying to do.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 06-06-2008, 05:22 PM
Daniele
 
Posts: n/a
Default Re: Help with table design ( over 2.000.000 row )

Captain Paralytic wrote:
> On 6 Jun, 15:08, Daniele <dankan775removet...@yahoo.it> wrote:
>> Captain Paralytic wrote:
>>>> I read about join is not advice on big table as it will create a
>>>> instance of the 2 table in one
>>>> but I will try
>>>> thanks- Hide quoted text -
>>>> - Show quoted text -
>>> WHAT???
>>> Please could you point me to where you read that?

>> from:http://www.mysqlperformanceblog.com/...l-could-be-slo...
>> I think that my speed reading jump an important part( the last line of
>> this paragraph
>>
>> "Here is good example. As we saw my 30mil rows (12GB) table was scanned
>> in less than 5 minutes. Now if we would do eq join of the table to other
>> 30mil rows table and it will be completely random. We'll need to perform
>> 30 millions of random row reads, which gives us 300.000 seconds with 100
>> rows/sec rate. So we would go from 5 minutes to almost 4 days if we need
>> to do the join. Some people assume join would be close to two full table
>> scans (as 60mil of rows need to be read) - this is way wrong."
>>
>> but i'm not sure about this
>> "Avoid joins to large tables Joining of large data sets using nested
>> loops is very expensive. Try to avoid it. Joins to smaller tables is OK
>> but you might want to preload them to memory before join so there is no
>> random IO needed to populate the caches."

>
> Oh pleeeese!
>
> This is nothing like what you are trying to do.

Sorry but do you mean I don't have to worry about of the size of my table ?
Thanks
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 06-06-2008, 05:22 PM
Daniele
 
Posts: n/a
Default Re: Help with table design ( over 2.000.000 row )

Captain Paralytic wrote:
> On 6 Jun, 09:32, Daniele <dankan775removet...@yahoo.it> wrote:
>> Hi All
>> I've started a project connect with geolocation IP adresses
>> I have 2 table, one with all the iprange and one with the location
>> Table block over 2.800.000 row
>> CREATE TABLE `block` (
>> `startIpNum` bigint(15) NOT NULL,
>> `endIpNum` bigint(15) NOT NULL,
>> `locId` int(11) NOT NULL,
>> PRIMARY KEY (`startIpNum`,`endIpNum`)
>> ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
>> Field name min value max value min / max lenght
>> startIpNum 33996344 3741315072 8 /10
>> endIpnum 33996351 3741319167 8 /10
>> locId 2 196511 1/6
>>
>> table location 200.000 row
>> CREATE TABLE `location` (
>> `locId` int(9) NOT NULL,
>> `country` text collate utf8_unicode_ci,
>> `region` varchar(50) collate utf8_unicode_ci default NULL,
>> `city` varchar(50) collate utf8_unicode_ci default NULL,
>> `latitude` int(20) NOT NULL,
>> `longitude` int(20) NOT NULL,
>> KEY `locId` (`locId`)
>> ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
>>
>> the query I use
>> SELECT country,region,city,latitude,longitude
>> FROM location
>> WHERE locId = (
>> SELECT locId
>> FROM block
>> WHERE startIpNum < INET_ATON( '74.208.9.80' )
>> AND endIpNum >= INET_ATON( '74.208.9.80' )
>> )LiMIT 0,1
>>
>> at the moment on my developing machine I'm getting the result in 0,42
>> sec average.
>> Is there any way to improve it ?
>> The application I'm developing will make request between 50 and 70 ip at
>> time.
>> Also in my case is beater to make a request for each ip or i can pass an
>> array of ip via the query ?
>> The ip they will be different for each request as i will cache the requests
>> Many Thanks

>
> Use a JOIN instead of a sub-select.

I'm new to the join
i'm using this query but is slower (3.5sec ) than the subquery i posted
on top (0.4sec )

SELECT location.country, location.region, location.city,
location.latitude, location.longitude
FROM location
RIGHT JOIN block ON location.locId = block.locId
WHERE block.startIpNum < INET_ATON( '74.208.9.81' )
AND block.endIpNum >= INET_ATON( '74.208.9.81' )
LIMIT 0 , 30

I'm writing it wrong ?
any help ? Direction ?
Thanks
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 06-13-2008, 01:59 PM
Jerry Stuckle
 
Posts: n/a
Default Re: Help with table design ( over 2.000.000 row )

Daniele wrote:
> Captain Paralytic wrote:
>>> I read about join is not advice on big table as it will create a
>>> instance of the 2 table in one
>>> but I will try
>>> thanks- Hide quoted text -
>>>
>>> - Show quoted text -

>>
>> WHAT???
>>
>> Please could you point me to where you read that?

> from:http://www.mysqlperformanceblog.com/...-large-tables/
>
> I think that my speed reading jump an important part( the last line of
> this paragraph
>
> "Here is good example. As we saw my 30mil rows (12GB) table was scanned
> in less than 5 minutes. Now if we would do eq join of the table to other
> 30mil rows table and it will be completely random. We'll need to perform
> 30 millions of random row reads, which gives us 300.000 seconds with 100
> rows/sec rate. So we would go from 5 minutes to almost 4 days if we need
> to do the join. Some people assume join would be close to two full table
> scans (as 60mil of rows need to be read) - this is way wrong."
>
> but i'm not sure about this
> "Avoid joins to large tables Joining of large data sets using nested
> loops is very expensive. Try to avoid it. Joins to smaller tables is OK
> but you might want to preload them to memory before join so there is no
> random IO needed to populate the caches."
>


In addition to what Paul said - we have no idea what the poster's setup
is. I suspect there were a number of optimizations he could have done
to make this a lot faster - faster than a subselect, even.

I've never seen a query take 4 days on ANY MySQL system - no matter how
large. For it to take this long indicates some severe problems with the
system configuration. It could even be lack of the correct indexes. We
don't know.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 06-13-2008, 01:59 PM
Luuk
 
Posts: n/a
Default Re: Help with table design ( over 2.000.000 row )

Daniele schreef:
> Captain Paralytic wrote:
>> On 6 Jun, 09:32, Daniele <dankan775removet...@yahoo.it> wrote:
>>> Hi All
>>> I've started a project connect with geolocation IP adresses
>>> I have 2 table, one with all the iprange and one with the location
>>> Table block over 2.800.000 row
>>> CREATE TABLE `block` (
>>> `startIpNum` bigint(15) NOT NULL,
>>> `endIpNum` bigint(15) NOT NULL,
>>> `locId` int(11) NOT NULL,
>>> PRIMARY KEY (`startIpNum`,`endIpNum`)
>>> ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
>>> Field name min value max value min / max lenght
>>> startIpNum 33996344 3741315072 8 /10
>>> endIpnum 33996351 3741319167 8 /10
>>> locId 2 196511 1/6
>>>
>>> table location 200.000 row
>>> CREATE TABLE `location` (
>>> `locId` int(9) NOT NULL,
>>> `country` text collate utf8_unicode_ci,
>>> `region` varchar(50) collate utf8_unicode_ci default NULL,
>>> `city` varchar(50) collate utf8_unicode_ci default NULL,
>>> `latitude` int(20) NOT NULL,
>>> `longitude` int(20) NOT NULL,
>>> KEY `locId` (`locId`)
>>> ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
>>>
>>> the query I use
>>> SELECT country,region,city,latitude,longitude
>>> FROM location
>>> WHERE locId = (
>>> SELECT locId
>>> FROM block
>>> WHERE startIpNum < INET_ATON( '74.208.9.80' )
>>> AND endIpNum >= INET_ATON( '74.208.9.80' )
>>> )LiMIT 0,1
>>>
>>> at the moment on my developing machine I'm getting the result in 0,42
>>> sec average.
>>> Is there any way to improve it ?
>>> The application I'm developing will make request between 50 and 70 ip at
>>> time.
>>> Also in my case is beater to make a request for each ip or i can pass an
>>> array of ip via the query ?
>>> The ip they will be different for each request as i will cache the
>>> requests
>>> Many Thanks

>>
>> Use a JOIN instead of a sub-select.

> I'm new to the join
> i'm using this query but is slower (3.5sec ) than the subquery i posted
> on top (0.4sec )
>
> SELECT location.country, location.region, location.city,
> location.latitude, location.longitude
> FROM location
> RIGHT JOIN block ON location.locId = block.locId
> WHERE block.startIpNum < INET_ATON( '74.208.9.81' )
> AND block.endIpNum >= INET_ATON( '74.208.9.81' )
> LIMIT 0 , 30
>
> I'm writing it wrong ?
> any help ? Direction ?
> Thanks


probably an index on 'block.locId' could help...

but please, also post the results of EXPLAIN <your query>
(see: http://dev.mysql.com/doc/refman/5.0/en/explain.html)


--
Luuk
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 11:32 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com