Unix Technical Forum

Slow query with MEMORY engine - why?

This is a discussion on Slow query with MEMORY engine - why? within the MySQL forums, part of the Database Server Software category; --> I'm doing this query repeatedly on a MEMORY database. SELECT domain, requestor_ip_hash AS requestor FROM ratingqueue WHERE rating_state = ...


Go Back   Unix Technical Forum > Database Server Software > MySQL

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 06-04-2008, 03:26 PM
John Nagle
 
Posts: n/a
Default Slow query with MEMORY engine - why?

I'm doing this query repeatedly on a MEMORY database.

SELECT domain, requestor_ip_hash AS requestor FROM ratingqueue WHERE
rating_state = 'queued' AND NOT EXISTS
(SELECT * FROM ratingqueue WHERE requestor_ip_hash = requestor AND rating_state
!= 'queued') ORDER BY request_timestamp LIMIT 1

Here's the table:

describe ratingqueue;
+-------------------+-------------------------------------+------+-----+---------------------+-------+
| Field | Type | Null | Key | Default
| Extra |
+-------------------+-------------------------------------+------+-----+---------------------+-------+
| domain | varchar(255) | NO | PRI |
| |
| requestor_ip_hash | int(11) | YES | MUL | NULL
| |
| rating_state | enum('queued','starting','running') | NO | MUL | queued
| |
| server | varchar(63) | YES | | NULL
| |
| priority | smallint(6) | YES | | 0
| |
| update_timestamp | timestamp | NO | |
CURRENT_TIMESTAMP | |
| request_timestamp | timestamp | NO | MUL |
0000-00-00 00:00:00 | |
+-------------------+-------------------------------------+------+-----+---------------------+-------+


It's actually a scheduler; a number of machines are looking for the next
work queue item. There's no need for the table to persist over restarts,
so it's in the MEMORY engine. The point of doing this in the database
is that multiple machines can use the same work queue.

This works fine when the table is small, but with
about 1000 entries, it seems to choke. Each query takes about 2 seconds.
And that's in the MEMORY database. This suggests that it's doing an O(N^2)
operation.

Here's an EXPLAIN:

explain SELECT domain, requestor_ip_hash AS requestor FROM ratingqueue WHERE
rating_state = 'queued' AND NOT EXISTS
(SELECT * FROM ratingqueue WHERE requestor_ip_hash = requestor AND rating_state
!= 'queued') ORDER BY request_timestamp LIMIT 1;
+----+--------------------+-------------+------+--------------------------------+--------------+---------+-------+------+-----------------------------+
| id | select_type | table | type | possible_keys
| key | key_len | ref | rows | Extra |
+----+--------------------+-------------+------+--------------------------------+--------------+---------+-------+------+-----------------------------+
| 1 | PRIMARY | ratingqueue | ref | rating_state
| rating_state | 1 | const | 1097 | Using where; Using filesort |
| 2 | DEPENDENT SUBQUERY | ratingqueue | ALL | requestor_ip_hash,rating_state
| NULL | NULL | NULL | 823 | Using where |
+----+--------------------+-------------+------+--------------------------------+--------------+---------+-------+------+-----------------------------+

All the fields in the subquery are keys. What's the problem?

There must be a more efficient way to express this query.
Suggestions?

John Nagle

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 06-04-2008, 03:26 PM
Captain Paralytic
 
Posts: n/a
Default Re: Slow query with MEMORY engine - why?

On 4 Jun, 06:05, John Nagle <na...@animats.com> wrote:
> I'm doing this query repeatedly on a MEMORY database.
>
> SELECT domain, requestor_ip_hash AS requestor FROM ratingqueue WHERE
> rating_state = 'queued' AND NOT EXISTS
> (SELECT * FROM ratingqueue WHERE requestor_ip_hash = requestor AND rating_state
> != 'queued') ORDER BY request_timestamp LIMIT 1
>
> Here's the table:
>
> describe ratingqueue;
> +-------------------+-------------------------------------+------+-----+---------------------+-------+
> | Field | Type | Null | Key | Default
> | Extra |
> +-------------------+-------------------------------------+------+-----+---------------------+-------+
> | domain | varchar(255) | NO | PRI |
> | |
> | requestor_ip_hash | int(11) | YES | MUL | NULL
> | |
> | rating_state | enum('queued','starting','running') | NO | MUL | queued
> | |
> | server | varchar(63) | YES | | NULL
> | |
> | priority | smallint(6) | YES | | 0
> | |
> | update_timestamp | timestamp | NO | |
> CURRENT_TIMESTAMP | |
> | request_timestamp | timestamp | NO | MUL |
> 0000-00-00 00:00:00 | |
> +-------------------+-------------------------------------+------+-----+---------------------+-------+
>
> It's actually a scheduler; a number of machines are looking for the next
> work queue item. There's no need for the table to persist over restarts,
> so it's in the MEMORY engine. The point of doing this in the database
> is that multiple machines can use the same work queue.
>
> This works fine when the table is small, but with
> about 1000 entries, it seems to choke. Each query takes about 2 seconds.
> And that's in the MEMORY database. This suggests that it's doing an O(N^2)
> operation.
>
> Here's an EXPLAIN:
>
> explain SELECT domain, requestor_ip_hash AS requestor FROM ratingqueue WHERE
> rating_state = 'queued' AND NOT EXISTS
> (SELECT * FROM ratingqueue WHERE requestor_ip_hash = requestor AND rating_state
> != 'queued') ORDER BY request_timestamp LIMIT 1;
> +----+--------------------+-------------+------+--------------------------------+--------------+---------+-------+------+-----------------------------+
> | id | select_type | table | type | possible_keys
> | key | key_len | ref | rows | Extra |
> +----+--------------------+-------------+------+--------------------------------+--------------+---------+-------+------+-----------------------------+
> | 1 | PRIMARY | ratingqueue | ref | rating_state
> | rating_state | 1 | const | 1097 | Using where; Using filesort |
> | 2 | DEPENDENT SUBQUERY | ratingqueue | ALL | requestor_ip_hash,rating_state
> | NULL | NULL | NULL | 823 | Using where |
> +----+--------------------+-------------+------+--------------------------------+--------------+---------+-------+------+-----------------------------+
>
> All the fields in the subquery are keys. What's the problem?
>
> There must be a more efficient way to express this query.
> Suggestions?
>
> John Nagle


Having all the fields as keys may not help that much when you have
more than one criteria in a WHERE or JOIN clause. In this case an
appropriate composite key is much more useful.

Also, I would rewrite this as a JOIN instead of a dependent subquery.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 06-05-2008, 05:17 AM
John Nagle
 
Posts: n/a
Default Re: Slow query with MEMORY engine - why?

Captain Paralytic wrote:
> On 4 Jun, 06:05, John Nagle <na...@animats.com> wrote:
>> I'm doing this query repeatedly on a MEMORY database.
>>
>> SELECT domain, requestor_ip_hash AS requestor FROM ratingqueue WHERE
>> rating_state = 'queued' AND NOT EXISTS
>> (SELECT * FROM ratingqueue WHERE requestor_ip_hash = requestor AND rating_state
>> != 'queued') ORDER BY request_timestamp LIMIT 1

>
> Having all the fields as keys may not help that much when you have
> more than one criteria in a WHERE or JOIN clause. In this case an
> appropriate composite key is much more useful.


> Also, I would rewrite this as a JOIN instead of a dependent subquery.


How do you express "not exists" in a JOIN?

John Nagle
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 06-05-2008, 05:39 PM
Captain Paralytic
 
Posts: n/a
Default Re: Slow query with MEMORY engine - why?

On 5 Jun, 02:25, John Nagle <na...@animats.com> wrote:
> Captain Paralytic wrote:
> > On 4 Jun, 06:05, John Nagle <na...@animats.com> wrote:
> >> I'm doing this query repeatedly on a MEMORY database.

>
> >> SELECT domain, requestor_ip_hash AS requestor FROM ratingqueue WHERE
> >> rating_state = 'queued' AND NOT EXISTS
> >> (SELECT * FROM ratingqueue WHERE requestor_ip_hash = requestor AND rating_state
> >> != 'queued') ORDER BY request_timestamp LIMIT 1

>
> > Having all the fields as keys may not help that much when you have
> > more than one criteria in a WHERE or JOIN clause. In this case an
> > appropriate composite key is much more useful.
> > Also, I would rewrite this as a JOIN instead of a dependent subquery.

>
> How do you express "not exists" in a JOIN?
>
> John Nagle


I'm not totally sure about what the data rules are in your table, but
something this should do it.

SELECT
r1.domain,
r1.requestor_ip_hash
FROM ratingqueue r1
LEFT JOIN ratingqueue r2 ON r1.requestor_ip_hash =
r2.requestor_ip_hash AND r2.rating_state != 'queued'
WHERE r1.rating_state = 'queued' AND r2.requestor_ip_hash IS NULL

Note that setting your query out nicely as above before posting makes
it much easier for us to offer help.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 06-13-2008, 01:59 PM
John Nagle
 
Posts: n/a
Default Re: Slow query with MEMORY engine - why?

Captain Paralytic wrote:
> On 5 Jun, 02:25, John Nagle <na...@animats.com> wrote:
>> Captain Paralytic wrote:
>>> On 4 Jun, 06:05, John Nagle <na...@animats.com> wrote:
>>>> I'm doing this query repeatedly on a MEMORY database.
>>>> SELECT domain, requestor_ip_hash AS requestor FROM ratingqueue WHERE
>>>> rating_state = 'queued' AND NOT EXISTS
>>>> (SELECT * FROM ratingqueue WHERE requestor_ip_hash = requestor AND rating_state
>>>> != 'queued') ORDER BY request_timestamp LIMIT 1
>>> Having all the fields as keys may not help that much when you have
>>> more than one criteria in a WHERE or JOIN clause. In this case an
>>> appropriate composite key is much more useful.
>>> Also, I would rewrite this as a JOIN instead of a dependent subquery.

>> How do you express "not exists" in a JOIN?
>>
>> John Nagle

>
> I'm not totally sure about what the data rules are in your table, but
> something this should do it.
>
> SELECT
> r1.domain,
> r1.requestor_ip_hash
> FROM ratingqueue r1
> LEFT JOIN ratingqueue r2 ON r1.requestor_ip_hash =
> r2.requestor_ip_hash AND r2.rating_state != 'queued'
> WHERE r1.rating_state = 'queued' AND r2.requestor_ip_hash IS NULL


That's worse. Tried the query in this form:

SELECT
r1.domain,
r1.requestor_ip_hash
FROM ratingqueue r1
LEFT JOIN ratingqueue r2 ON r1.requestor_ip_hash =
r2.requestor_ip_hash AND r2.rating_state != 'queued'
WHERE r1.rating_state = 'queued' AND r2.requestor_ip_hash IS NULL
ORDER BY request_timestamp LIMIT 1;

Timing results: (MySQL query browser, idle Windows PC, idle
database, MySQL 5 running on same machine as query browser.)

Original query: 0 rows fetched in 0.0007s. (4.3616s)
Left join query: 0 rows fetched in 0.0165s. (5.0268s)

In either case, it's taking 4 to 5 seconds to process this
query on a MEMORY table of modest size.

Table def is:

CREATE TABLE ratingqueue
(
domain VARCHAR(255) NOT NULL UNIQUE PRIMARY KEY,
requestor_ip_hash INT, rating_state ENUM ('queued','starting','running')
NOT NULL DEFAULT 'queued',
server VARCHAR(63) NULL, priority SMALLINT DEFAULT 0,
update_timestamp TIMESTAMP NOT NULL, request_timestamp TIMESTAMP NOT NULL,
INDEX(requestor_ip_hash, rating_state), INDEX(request_timestamp)
) ENGINE=MEMORY;

The table contains 1222 entries, 2 of which are 'rating_state="running"'
and all of which have the same 'requestor_ip_hash'. That's the problem.
The query is to return a row for which the requestor_ip_hash does not
match any row that is in "running" state. (It's a fair load balancer.)
When no such row exists, the query is incredibly inefficient.

There must be some better way to do this.

John Nagle
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 12:24 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com