Unix Technical Forum

Slow Subquery

This is a discussion on Slow Subquery within the MySQL General forum forums, part of the MySQL category; --> I'm trying to determine why a subquery is slower than running two separate queries. I have a simple many-to-many ...


Go Back   Unix Technical Forum > Database Server Software > MySQL > MySQL General forum

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 02-28-2008, 07:01 AM
Ryan Bates
 
Posts: n/a
Default Slow Subquery

I'm trying to determine why a subquery is slower than running two
separate queries. I have a simple many-to-many association using 3
tables: projects, tags and projects_tags. Here's the query I'm using
to find the projects with a given tag:

SELECT * FROM projects WHERE id IN (SELECT projects_tags.project_id
FROM tags, projects_tags WHERE tags.name='foo' AND
projects_tags.project_id=projects.id);
(0.36 sec)

Compare that with splitting it into two queries:

SELECT projects_tags.project_id FROM tags, projects_tags WHERE
tags.name='foo' AND projects_tags.project_id=projects.id
(0.00 sec) /* returns 1, 2, 3 */

SELECT * FROM projects WHERE id IN (1, 2, 3);
(0.00 sec)

Why is it so much faster? Looking at the explain statement (below) of
the one with the subquery, it appears it's not using the primary key
index on the projects table. Why is it that MySQL doesn't perform
this simple optimization? And is there a solution that will allow me
to still use a subquery?

I realize I can use a join instead of a subquery, but this is a
simplified example.


Here's the explain statement:

*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: projects
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 15433
Extra: Using where
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: tags
type: ref
possible_keys: PRIMARY,index_tags_on_name
key: index_tags_on_name
key_len: 258
ref: const
rows: 1
Extra: Using where; Using index
*************************** 3. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: projects_tags
type: ref
possible_keys: tag_id
key: tag_id
key_len: 5
ref: my_database.tags.id
rows: 10
Extra: Using where


Here's the table dumps:

CREATE TABLE `projects` (
`id` int(11) NOT NULL auto_increment,
`name` varchar(255) default NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

CREATE TABLE `tags` (
`id` int(11) NOT NULL auto_increment,
`name` varchar(255) default NULL,
PRIMARY KEY (`id`),
KEY `index_tags_on_name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

CREATE TABLE `projects_tags` (
`project_id` int(11) default NULL,
`tag_id` int(11) default NULL,
KEY `tag_id` (`tag_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;


I'm using MySQL 5.0.37. Thanks in advance.

Ryan
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 02-28-2008, 07:01 AM
Peter Brawley
 
Posts: n/a
Default Re: Slow Subquery

Ryan,

>Why is it so much faster?


Subquery optimisation in MySQL is a problem. For ideas see 'The
unbearable slowness of IN()' at
http://www.artfulsoftware.com/infotree/queries.php.

PB

Ryan Bates wrote:
> I'm trying to determine why a subquery is slower than running two
> separate queries. I have a simple many-to-many association using 3
> tables: projects, tags and projects_tags. Here's the query I'm using
> to find the projects with a given tag:
>
> SELECT * FROM projects WHERE id IN (SELECT projects_tags.project_id
> FROM tags, projects_tags WHERE tags.name='foo' AND
> projects_tags.project_id=projects.id);
> (0.36 sec)
>
> Compare that with splitting it into two queries:
>
> SELECT projects_tags.project_id FROM tags, projects_tags WHERE
> tags.name='foo' AND projects_tags.project_id=projects.id
> (0.00 sec) /* returns 1, 2, 3 */
>
> SELECT * FROM projects WHERE id IN (1, 2, 3);
> (0.00 sec)
>
> Why is it so much faster? Looking at the explain statement (below) of
> the one with the subquery, it appears it's not using the primary key
> index on the projects table. Why is it that MySQL doesn't perform this
> simple optimization? And is there a solution that will allow me to
> still use a subquery?
>
> I realize I can use a join instead of a subquery, but this is a
> simplified example.
>
>
> Here's the explain statement:
>
> *************************** 1. row ***************************
> id: 1
> select_type: PRIMARY
> table: projects
> type: ALL
> possible_keys: NULL
> key: NULL
> key_len: NULL
> ref: NULL
> rows: 15433
> Extra: Using where
> *************************** 2. row ***************************
> id: 2
> select_type: DEPENDENT SUBQUERY
> table: tags
> type: ref
> possible_keys: PRIMARY,index_tags_on_name
> key: index_tags_on_name
> key_len: 258
> ref: const
> rows: 1
> Extra: Using where; Using index
> *************************** 3. row ***************************
> id: 2
> select_type: DEPENDENT SUBQUERY
> table: projects_tags
> type: ref
> possible_keys: tag_id
> key: tag_id
> key_len: 5
> ref: my_database.tags.id
> rows: 10
> Extra: Using where
>
>
> Here's the table dumps:
>
> CREATE TABLE `projects` (
> `id` int(11) NOT NULL auto_increment,
> `name` varchar(255) default NULL,
> PRIMARY KEY (`id`)
> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
>
> CREATE TABLE `tags` (
> `id` int(11) NOT NULL auto_increment,
> `name` varchar(255) default NULL,
> PRIMARY KEY (`id`),
> KEY `index_tags_on_name` (`name`)
> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
>
> CREATE TABLE `projects_tags` (
> `project_id` int(11) default NULL,
> `tag_id` int(11) default NULL,
> KEY `tag_id` (`tag_id`)
> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
>
>
> I'm using MySQL 5.0.37. Thanks in advance.
>
> Ryan
>

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 02-28-2008, 07:01 AM
Baron Schwartz
 
Posts: n/a
Default Re: Slow Subquery

Ryan Bates wrote:
> I'm trying to determine why a subquery is slower than running two
> separate queries. I have a simple many-to-many association using 3
> tables: projects, tags and projects_tags. Here's the query I'm using to
> find the projects with a given tag:
>
> SELECT * FROM projects WHERE id IN (SELECT projects_tags.project_id FROM
> tags, projects_tags WHERE tags.name='foo' AND
> projects_tags.project_id=projects.id);
> (0.36 sec)


As another poster said, this kind of subquery runs slowly, but just to
elaborate on it: it's not every subquery that's a problem, just IN() and
NOT IN(). Use EXPLAIN EXTENDED followed by SHOW WARNINGS to see what's
happening.

Baron
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 02-28-2008, 07:01 AM
Brent Baisley
 
Posts: n/a
Default Re: Slow Subquery

You are using a correlated subquery, which MySQL is terrible at.
Whenever you find yourself doing a correlated subquery, see if you
can switch it to a derived table with a join, which MySQL is far
better at. A derived table is like a "virtual" table you create on
the fly. It's very simple, just assign a name to your query and then
treat it as if it is a regular table.

So your query would look something like this:
SELECT projects.* FROM projects
JOIN (SELECT projects_tags.project_id FROM tags, projects_tags WHERE
tags.name='foo' AND projects_tags.project_id=projects.id) AS ptagids
ON project.id=ptagids.project_id

Your IN has become a JOIN and mysql optimizes it far better.

On Oct 19, 2007, at 6:57 PM, Ryan Bates wrote:

> I'm trying to determine why a subquery is slower than running two
> separate queries. I have a simple many-to-many association using 3
> tables: projects, tags and projects_tags. Here's the query I'm
> using to find the projects with a given tag:
>
> SELECT * FROM projects WHERE id IN (SELECT projects_tags.project_id
> FROM tags, projects_tags WHERE tags.name='foo' AND
> projects_tags.project_id=projects.id);
> (0.36 sec)
>
> Compare that with splitting it into two queries:
>
> SELECT projects_tags.project_id FROM tags, projects_tags WHERE
> tags.name='foo' AND projects_tags.project_id=projects.id
> (0.00 sec) /* returns 1, 2, 3 */
>
> SELECT * FROM projects WHERE id IN (1, 2, 3);
> (0.00 sec)
>
> Why is it so much faster? Looking at the explain statement (below)
> of the one with the subquery, it appears it's not using the primary
> key index on the projects table. Why is it that MySQL doesn't
> perform this simple optimization? And is there a solution that will
> allow me to still use a subquery?
>
> I realize I can use a join instead of a subquery, but this is a
> simplified example.
>
>
> Here's the explain statement:
>
> *************************** 1. row ***************************
> id: 1
> select_type: PRIMARY
> table: projects
> type: ALL
> possible_keys: NULL
> key: NULL
> key_len: NULL
> ref: NULL
> rows: 15433
> Extra: Using where
> *************************** 2. row ***************************
> id: 2
> select_type: DEPENDENT SUBQUERY
> table: tags
> type: ref
> possible_keys: PRIMARY,index_tags_on_name
> key: index_tags_on_name
> key_len: 258
> ref: const
> rows: 1
> Extra: Using where; Using index
> *************************** 3. row ***************************
> id: 2
> select_type: DEPENDENT SUBQUERY
> table: projects_tags
> type: ref
> possible_keys: tag_id
> key: tag_id
> key_len: 5
> ref: my_database.tags.id
> rows: 10
> Extra: Using where
>
>
> Here's the table dumps:
>
> CREATE TABLE `projects` (
> `id` int(11) NOT NULL auto_increment,
> `name` varchar(255) default NULL,
> PRIMARY KEY (`id`)
> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
>
> CREATE TABLE `tags` (
> `id` int(11) NOT NULL auto_increment,
> `name` varchar(255) default NULL,
> PRIMARY KEY (`id`),
> KEY `index_tags_on_name` (`name`)
> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
>
> CREATE TABLE `projects_tags` (
> `project_id` int(11) default NULL,
> `tag_id` int(11) default NULL,
> KEY `tag_id` (`tag_id`)
> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
>
>
> I'm using MySQL 5.0.37. Thanks in advance.
>
> Ryan
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/mysql?
> unsub=brenttech@gmail.com
>


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 02-28-2008, 07:01 AM
Jay Pipes
 
Posts: n/a
Default Re: Slow Subquery

Indeed, as you say, Brent, correlated subqueries are not well-optimized
in MySQL. The specific subquery (the IN() subquery) demonstrated in the
original post is, however, optimized in MySQL 6.0

More comments inline.

Brent Baisley wrote:
> You are using a correlated subquery, which MySQL is terrible at.
> Whenever you find yourself doing a correlated subquery, see if you can
> switch it to a derived table with a join, which MySQL is far better at.
> A derived table is like a "virtual" table you create on the fly. It's
> very simple, just assign a name to your query and then treat it as if it
> is a regular table.


Actually, in this case, no need for a derived table. A simple join will
suffice:

SELECT * FROM projects p
JOIN project_tags pt ON p.project_id = pt.project_id
JOIN tags t ON pt.tag_id = t.tag_id
WHERE tags.name='foo';

Make sure you've got indexes on p (project_id), pt (project_id, tag_id),
t (name)

Cheers,

Jay

> So your query would look something like this:
> SELECT projects.* FROM projects
> JOIN (SELECT projects_tags.project_id FROM tags, projects_tags WHERE
> tags.name='foo' AND projects_tags.project_id=projects.id) AS ptagids
> ON project.id=ptagids.project_id
>
> Your IN has become a JOIN and mysql optimizes it far better.
>
> On Oct 19, 2007, at 6:57 PM, Ryan Bates wrote:
>
>> I'm trying to determine why a subquery is slower than running two
>> separate queries. I have a simple many-to-many association using 3
>> tables: projects, tags and projects_tags. Here's the query I'm using
>> to find the projects with a given tag:
>>
>> SELECT * FROM projects WHERE id IN (SELECT projects_tags.project_id
>> FROM tags, projects_tags WHERE tags.name='foo' AND
>> projects_tags.project_id=projects.id);
>> (0.36 sec)
>>
>> Compare that with splitting it into two queries:
>>
>> SELECT projects_tags.project_id FROM tags, projects_tags WHERE
>> tags.name='foo' AND projects_tags.project_id=projects.id
>> (0.00 sec) /* returns 1, 2, 3 */
>>
>> SELECT * FROM projects WHERE id IN (1, 2, 3);
>> (0.00 sec)
>>
>> Why is it so much faster? Looking at the explain statement (below) of
>> the one with the subquery, it appears it's not using the primary key
>> index on the projects table. Why is it that MySQL doesn't perform this
>> simple optimization? And is there a solution that will allow me to
>> still use a subquery?
>>
>> I realize I can use a join instead of a subquery, but this is a
>> simplified example.
>>
>>
>> Here's the explain statement:
>>
>> *************************** 1. row ***************************
>> id: 1
>> select_type: PRIMARY
>> table: projects
>> type: ALL
>> possible_keys: NULL
>> key: NULL
>> key_len: NULL
>> ref: NULL
>> rows: 15433
>> Extra: Using where
>> *************************** 2. row ***************************
>> id: 2
>> select_type: DEPENDENT SUBQUERY
>> table: tags
>> type: ref
>> possible_keys: PRIMARY,index_tags_on_name
>> key: index_tags_on_name
>> key_len: 258
>> ref: const
>> rows: 1
>> Extra: Using where; Using index
>> *************************** 3. row ***************************
>> id: 2
>> select_type: DEPENDENT SUBQUERY
>> table: projects_tags
>> type: ref
>> possible_keys: tag_id
>> key: tag_id
>> key_len: 5
>> ref: my_database.tags.id
>> rows: 10
>> Extra: Using where
>>
>>
>> Here's the table dumps:
>>
>> CREATE TABLE `projects` (
>> `id` int(11) NOT NULL auto_increment,
>> `name` varchar(255) default NULL,
>> PRIMARY KEY (`id`)
>> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
>>
>> CREATE TABLE `tags` (
>> `id` int(11) NOT NULL auto_increment,
>> `name` varchar(255) default NULL,
>> PRIMARY KEY (`id`),
>> KEY `index_tags_on_name` (`name`)
>> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
>>
>> CREATE TABLE `projects_tags` (
>> `project_id` int(11) default NULL,
>> `tag_id` int(11) default NULL,
>> KEY `tag_id` (`tag_id`)
>> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
>>
>>
>> I'm using MySQL 5.0.37. Thanks in advance.
>>
>> Ryan
>>
>> --
>> MySQL General Mailing List
>> For list archives: http://lists.mysql.com/mysql
>> To unsubscribe: http://lists.mysql.com/mysql?unsub=brenttech@gmail.com
>>

>
>


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 02:25 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com