This is a discussion on join advice needed within the MySQL forums, part of the Database Server Software category; --> OK, I think I have solved my problem but wanted to know if there was a better way to ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| OK, I think I have solved my problem but wanted to know if there was a better way to find rows that are NOT referenced by a 2nd table: I have 2 tables: Table "item_def" with an integer field "item_id" and some other stuff. (primary key = item_id) Table "tag_item_list" with only integer fields "item_id" and "tag_id". (primary key = item_id, tag_id) The tables are relatively large, on the order of 10,000 - 20,000 members, and will get larger. I would like to get the list of all items (rows in the item_def table) that do NOT match something in the tag_item_list table. This seems like a textbook example; the MySQL help mentions it: ====== If there is no matching row for the right table in the ON or USING part in a LEFT JOIN, a row with all columns set to NULL is used for the right table. You can use this fact to find rows in a table that have no counterpart in another table: SELECT table1.* FROM table1 LEFT JOIN table2 ON table1.id=table2.id WHERE table2.id IS NULL; This example finds all rows in table1 with an id value that is not present in table2 (that is, all rows in table1 with no corresponding row in table2). This assumes that table2.id is declared NOT NULL ====== So I should do this: SELECT item_def.* FROM item_def LEFT JOIN tag_item_list ON item_def.item_id = tag_item_list.item_id WHERE tag_item_list.item_id IS NULL But my server hangs on a loooong query (makes me think of the computer making tea in Hitchhiker's Guide). I used "EXPLAIN" and got this: mysql> explain SELECT item_def.* FROM item_def LEFT JOIN tag_item_list ON item_def.item_id = tag_item_list.item_id WHERE tag_item_list.item_id IS NULL; +----+-------------+---------------+-------+---------------+--------- +---------+------+-------+--------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+---------------+-------+---------------+--------- +---------+------+-------+--------------------------------------+ | 1 | SIMPLE | item_def | ALL | NULL | NULL | NULL | NULL | 13401 | | | 1 | SIMPLE | tag_item_list | index | NULL | PRIMARY | 8 | NULL | 24664 | Using where; Using index; Not exists | +----+-------------+---------------+-------+---------------+--------- +---------+------+-------+--------------------------------------+ So I figured I should add indices on tag_id and item_id for the table tag_item_list. That seems to have fixed the problem. My questions: (a) Is there a better way to do this? (b) for the indices, I used BTREE. But now that I think of it, maybe it should be a HASH index; there's no real reason to care about item_id or tag_id ordering relations. Am I thinking correctly? |
| |||
| Jason S wrote: > OK, I think I have solved my problem but wanted to know if there was a > better way to find rows that are NOT referenced by a 2nd table: > > I have 2 tables: > Table "item_def" with an integer field "item_id" and some other stuff. > (primary key = item_id) > Table "tag_item_list" with only integer fields "item_id" and "tag_id". > (primary key = item_id, tag_id) > > The tables are relatively large, on the order of 10,000 - 20,000 > members, and will get larger. > > I would like to get the list of all items (rows in the item_def table) > that do NOT match something in the tag_item_list table. > > This seems like a textbook example; the MySQL help mentions it: > > ====== > If there is no matching row for the right table in the ON or USING > part in a LEFT JOIN, a row with all columns set to NULL is used for > the right table. You can use this fact to find rows in a table that > have no counterpart in another table: > > SELECT table1.* FROM table1 > LEFT JOIN table2 ON table1.id=table2.id > WHERE table2.id IS NULL; > This example finds all rows in table1 with an id value that is not > present in table2 (that is, all rows in table1 with no corresponding > row in table2). This assumes that table2.id is declared NOT NULL > ====== > > So I should do this: > > SELECT item_def.* FROM item_def LEFT JOIN tag_item_list ON > item_def.item_id = tag_item_list.item_id WHERE tag_item_list.item_id > IS NULL > > But my server hangs on a loooong query (makes me think of the computer > making tea in Hitchhiker's Guide). I used "EXPLAIN" and got this: > mysql> explain SELECT item_def.* FROM item_def LEFT JOIN tag_item_list > ON item_def.item_id = tag_item_list.item_id WHERE > tag_item_list.item_id IS NULL; > +----+-------------+---------------+-------+---------------+--------- > +---------+------+-------+--------------------------------------+ >> id | select_type | table | type | possible_keys | key | > key_len | ref | rows | Extra | > +----+-------------+---------------+-------+---------------+--------- > +---------+------+-------+--------------------------------------+ >> 1 | SIMPLE | item_def | ALL | NULL | NULL | > NULL | NULL | 13401 | | >> 1 | SIMPLE | tag_item_list | index | NULL | PRIMARY | > 8 | NULL | 24664 | Using where; Using index; Not exists | > +----+-------------+---------------+-------+---------------+--------- > +---------+------+-------+--------------------------------------+ > > So I figured I should add indices on tag_id and item_id for the table > tag_item_list. That seems to have fixed the problem. > > My questions: > (a) Is there a better way to do this? > (b) for the indices, I used BTREE. But now that I think of it, maybe > it should be a HASH index; there's no real reason to care about > item_id or tag_id ordering relations. Am I thinking correctly? You have the best way to do it. Your indexes should have been primary keys in the first place. |
| |||
| > You have the best way to do it. Your indexes should have been primary keys > in the first place. ??? They were primary keys. What I don't quite understand is the subtleties of indexing. I assume that the primary keys create an index for the keys as a tuple in aggregate, but do *not* create indices for each primary key on its own. (otherwise it should have been fast w/o my additional indexes) |
| |||
| Jason S wrote: >> You have the best way to do it. Your indexes should have been primary keys >> in the first place. > > ??? They were primary keys. What I don't quite understand is the > subtleties of indexing. I assume that the primary keys create an index > for the keys as a tuple in aggregate, but do *not* create indices for > each primary key on its own. (otherwise it should have been fast w/o > my additional indexes) > > Jason, Sorry, I'm coming in late on this one. What you say is true. If you have an index on (a,b), then the index will be constructed for (a,b), but not (a) or (b). This index can be used for selecting by (a), but not by (b). The reason an index isn't also created on (b) is because indexes slow down INSERT and UPDATE operations where (b) is updated, as well as DELETE operations. Like everything else, indexes are a tradeoff. You're trading faster SELECT operations for slower operations which change the columns being indexed. -- ================== Remove the "x" from my email address Jerry Stuckle JDS Computer Training Corp. jstucklex@attglobal.net ================== |
| ||||
| On Aug 13, 10:24 am, Jerry Stuckle <jstuck...@attglobal.net> wrote: > What you say is true. If you have an index on (a,b), then the index > will be constructed for (a,b), but not (a) or (b). This index can be > used for selecting by (a), but not by (b). Ah -- thanks! So I didn't need to add 2 indexes, I just needed to add an index on the 2nd primary key. (where the 1st primary key gets an index for free) |