Unix Technical Forum

join advice needed

This is a discussion on join advice needed within the MySQL forums, part of the Database Server Software category; --> OK, I think I have solved my problem but wanted to know if there was a better way to ...


Go Back   Unix Technical Forum > Database Server Software > MySQL

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 02-28-2008, 11:25 AM
Jason S
 
Posts: n/a
Default join advice needed

OK, I think I have solved my problem but wanted to know if there was a
better way to find rows that are NOT referenced by a 2nd table:

I have 2 tables:
Table "item_def" with an integer field "item_id" and some other stuff.
(primary key = item_id)
Table "tag_item_list" with only integer fields "item_id" and "tag_id".
(primary key = item_id, tag_id)

The tables are relatively large, on the order of 10,000 - 20,000
members, and will get larger.

I would like to get the list of all items (rows in the item_def table)
that do NOT match something in the tag_item_list table.

This seems like a textbook example; the MySQL help mentions it:

======
If there is no matching row for the right table in the ON or USING
part in a LEFT JOIN, a row with all columns set to NULL is used for
the right table. You can use this fact to find rows in a table that
have no counterpart in another table:

SELECT table1.* FROM table1
LEFT JOIN table2 ON table1.id=table2.id
WHERE table2.id IS NULL;
This example finds all rows in table1 with an id value that is not
present in table2 (that is, all rows in table1 with no corresponding
row in table2). This assumes that table2.id is declared NOT NULL
======

So I should do this:

SELECT item_def.* FROM item_def LEFT JOIN tag_item_list ON
item_def.item_id = tag_item_list.item_id WHERE tag_item_list.item_id
IS NULL

But my server hangs on a loooong query (makes me think of the computer
making tea in Hitchhiker's Guide). I used "EXPLAIN" and got this:
mysql> explain SELECT item_def.* FROM item_def LEFT JOIN tag_item_list
ON item_def.item_id = tag_item_list.item_id WHERE
tag_item_list.item_id IS NULL;
+----+-------------+---------------+-------+---------------+---------
+---------+------+-------+--------------------------------------+
| id | select_type | table | type | possible_keys | key |
key_len | ref | rows | Extra |
+----+-------------+---------------+-------+---------------+---------
+---------+------+-------+--------------------------------------+
| 1 | SIMPLE | item_def | ALL | NULL | NULL |
NULL | NULL | 13401 | |
| 1 | SIMPLE | tag_item_list | index | NULL | PRIMARY |
8 | NULL | 24664 | Using where; Using index; Not exists |
+----+-------------+---------------+-------+---------------+---------
+---------+------+-------+--------------------------------------+

So I figured I should add indices on tag_id and item_id for the table
tag_item_list. That seems to have fixed the problem.

My questions:
(a) Is there a better way to do this?
(b) for the indices, I used BTREE. But now that I think of it, maybe
it should be a HASH index; there's no real reason to care about
item_id or tag_id ordering relations. Am I thinking correctly?

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 02-28-2008, 11:25 AM
Paul Lautman
 
Posts: n/a
Default Re: join advice needed

Jason S wrote:
> OK, I think I have solved my problem but wanted to know if there was a
> better way to find rows that are NOT referenced by a 2nd table:
>
> I have 2 tables:
> Table "item_def" with an integer field "item_id" and some other stuff.
> (primary key = item_id)
> Table "tag_item_list" with only integer fields "item_id" and "tag_id".
> (primary key = item_id, tag_id)
>
> The tables are relatively large, on the order of 10,000 - 20,000
> members, and will get larger.
>
> I would like to get the list of all items (rows in the item_def table)
> that do NOT match something in the tag_item_list table.
>
> This seems like a textbook example; the MySQL help mentions it:
>
> ======
> If there is no matching row for the right table in the ON or USING
> part in a LEFT JOIN, a row with all columns set to NULL is used for
> the right table. You can use this fact to find rows in a table that
> have no counterpart in another table:
>
> SELECT table1.* FROM table1
> LEFT JOIN table2 ON table1.id=table2.id
> WHERE table2.id IS NULL;
> This example finds all rows in table1 with an id value that is not
> present in table2 (that is, all rows in table1 with no corresponding
> row in table2). This assumes that table2.id is declared NOT NULL
> ======
>
> So I should do this:
>
> SELECT item_def.* FROM item_def LEFT JOIN tag_item_list ON
> item_def.item_id = tag_item_list.item_id WHERE tag_item_list.item_id
> IS NULL
>
> But my server hangs on a loooong query (makes me think of the computer
> making tea in Hitchhiker's Guide). I used "EXPLAIN" and got this:
> mysql> explain SELECT item_def.* FROM item_def LEFT JOIN tag_item_list
> ON item_def.item_id = tag_item_list.item_id WHERE
> tag_item_list.item_id IS NULL;
> +----+-------------+---------------+-------+---------------+---------
> +---------+------+-------+--------------------------------------+
>> id | select_type | table | type | possible_keys | key |

> key_len | ref | rows | Extra |
> +----+-------------+---------------+-------+---------------+---------
> +---------+------+-------+--------------------------------------+
>> 1 | SIMPLE | item_def | ALL | NULL | NULL |

> NULL | NULL | 13401 | |
>> 1 | SIMPLE | tag_item_list | index | NULL | PRIMARY |

> 8 | NULL | 24664 | Using where; Using index; Not exists |
> +----+-------------+---------------+-------+---------------+---------
> +---------+------+-------+--------------------------------------+
>
> So I figured I should add indices on tag_id and item_id for the table
> tag_item_list. That seems to have fixed the problem.
>
> My questions:
> (a) Is there a better way to do this?
> (b) for the indices, I used BTREE. But now that I think of it, maybe
> it should be a HASH index; there's no real reason to care about
> item_id or tag_id ordering relations. Am I thinking correctly?


You have the best way to do it. Your indexes should have been primary keys
in the first place.


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 02-28-2008, 11:26 AM
Jason S
 
Posts: n/a
Default Re: join advice needed

> You have the best way to do it. Your indexes should have been primary keys
> in the first place.


??? They were primary keys. What I don't quite understand is the
subtleties of indexing. I assume that the primary keys create an index
for the keys as a tuple in aggregate, but do *not* create indices for
each primary key on its own. (otherwise it should have been fast w/o
my additional indexes)


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 02-28-2008, 11:26 AM
Jerry Stuckle
 
Posts: n/a
Default Re: join advice needed

Jason S wrote:
>> You have the best way to do it. Your indexes should have been primary keys
>> in the first place.

>
> ??? They were primary keys. What I don't quite understand is the
> subtleties of indexing. I assume that the primary keys create an index
> for the keys as a tuple in aggregate, but do *not* create indices for
> each primary key on its own. (otherwise it should have been fast w/o
> my additional indexes)
>
>


Jason,

Sorry, I'm coming in late on this one.

What you say is true. If you have an index on (a,b), then the index
will be constructed for (a,b), but not (a) or (b). This index can be
used for selecting by (a), but not by (b).

The reason an index isn't also created on (b) is because indexes slow
down INSERT and UPDATE operations where (b) is updated, as well as
DELETE operations.

Like everything else, indexes are a tradeoff. You're trading faster
SELECT operations for slower operations which change the columns being
indexed.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 02-28-2008, 11:26 AM
Jason S
 
Posts: n/a
Default Re: join advice needed

On Aug 13, 10:24 am, Jerry Stuckle <jstuck...@attglobal.net> wrote:
> What you say is true. If you have an index on (a,b), then the index
> will be constructed for (a,b), but not (a) or (b). This index can be
> used for selecting by (a), but not by (b).


Ah -- thanks! So I didn't need to add 2 indexes, I just needed to add
an index on the 2nd primary key. (where the 1st primary key gets an
index for free)

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 05:27 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com