This is a discussion on Please Help Obi-Wan: Efficient Join/Cursor/Something/Anything? within the SQL Server forums, part of the Microsoft SQL Server category; --> Hi all I have a bit of a dilema that I am hoping some of you smart dudes might ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi all I have a bit of a dilema that I am hoping some of you smart dudes might be able to help me with. 1. I have a table with about 50 million records in it and quite a few columns. [Table A] 2. I have another table with just over 300 records in it and a single column (besides the id). [Table B] 3. I want to: Select all of those records from Table A where [table A].description does NOT contain any of (select color from [table B]) 4. An example Table A id ... [other columns] ... description 1 the green hornet 2 a red ball 3 a green dog 4 the yellow submarine 5 the pink panther Table B id color 55 blue 56 gold 57 green 58 purple 59 pink 60 white So I want to select all those rows in Table A where none of the words from Table B.color appear in the description field in Table A. I.E: The query would return the following from Table A: 2 a red ball 4 the yellow submarine The real life problem has more variables and is a little more complicated than this but this should suffice to give me the right idea. Due to the number of rows involved I need this to be relevantly efficient. Can someone suggest the most efficient way to proceed. PS. Please excuse my ignorance. Cheers Sean |
| |||
| On 7 Nov 2003 09:36:26 -0800, sfarrow@efinancialnews.com (Sean) wrote: >Due to the number of rows involved I need this to be relevantly >efficient. Can someone suggest the most efficient way to proceed. Well, I don't know exactly about efficient. I came up with this method, using like instead of not like (then reversing the results) because like is supposedly faster, and not in on integers isn't a big deal. Maybe reversed is faster (not like on the cursor, in on the integers), but I was doubting it. Couldn't think a way around a big honkin' SQL statement built by a cursor, executed in nicely limited 8000 character chunks. At least this cuts down the number of executions (versus 300 seperate ones...) Test bed: create table TableA (id int, description varchar(50)) insert into TableA values (1, 'the green hornet') insert into TableA values (2, 'a red ball') insert into TableA values (3, 'a green dog') insert into TableA values (4, 'the yellow submarine') insert into TableA values (5, 'the pink panther') create table TableB (id int, color varchar(50)) insert into TableB values (55, 'blue') insert into TableB values (56, 'gold') insert into TableB values (57, 'green') insert into TableB values (58, 'purple') insert into TableB values (59, 'pink') insert into TableB values (60, 'white') SQL to get values: declare @SQLstr varchar(8000) declare @Color varchar(50) create table #resultstable (id int) declare color_cursor CURSOR for select distinct color from TableB open color_cursor fetch next from color_cursor into @Color select @SQLStr = 'select id from TableA where ((1=0) ' while @@fetch_status = 0 begin if (len(@SQLStr) > 7000) begin select @SQLStr = @SQLStr + ')' insert into #resultstable(id) execute @SQLStr select @SQLStr = 'select id from TableA where ((1=0) ' end else begin select @SQLStr = @SQLStr + ' or description like ''%' + @Color + '%'' ' end fetch next from color_cursor into @Color end select @SQLStr = @SQLStr + ')' insert into #resultstable(id) execute(@SQLStr) close color_cursor deallocate color_cursor select * from TableA where id not in (select distinct id from #resultstable) drop table #resultstable ___________ To replay by email, chop off the head! |
| |||
| sfarrow@efinancialnews.com (Sean) wrote in message news:<da32d894.0311070936.718756f6@posting.google. com>... > Hi all >> 1. I have a table with about 50 million records [sic] in it and quite a few columns. [Table A] << Rows are not records -- big difference. >> I want to select all of those records [sic] from Table A where TableA.description does NOT contain any of (SELECT color FROM TableB) << Please post DDL, so that people do not have to guess what the keys, constraints, Declarative Referential Integrity, datatypes, etc. in your schema are. I think that description is free text, Latin alphabet and lowercased, but who knows? >> So I want to select all those rows in Table A where none of the words from TableB.color appear in the description field [sic] in TableA. << SELECT * FROM TableA AS A1 WHERE NOT EXISTS (SELECT * FROM TableB AS B1 WHERE A1.description LIKE '%' + B1.color +'%'); >> Due to the number of rows involved I need this to be relevantly efficient. Can someone suggest the most efficient way to proceed. << Don't use SQL for text searches; there are better products for that. This is simply going to be a slow table scan. |
| ||||
| Hi Sean, You can use ... A left outer join B ... where B is null. -- Louis create table #A (X varchar(100)) insert into #A values ('the green hornet') insert into #A values ('a red ball') insert into #A values ('a green dog') insert into #A values ('the yellow submarine') insert into #A values ('the pink panther') create Table #B (X varchar(100)) insert into #B values ('green') insert into #B values ('pink') select a.* from #A as a left outer join #B as b on a.x like '%'+b.x+'%' where b.x is null returns: x --------- a red ball the yellow submarine |