View Single Post

   
  #4 (permalink)  
Old 02-29-2008, 03:38 AM
John Bell
 
Posts: n/a
Default Re: optimizing a query to delete duplicates

Hi

You could try only selecting the correct data when you move it into the
permanent tables. But the following may work better:

DELETE FROM X1
FROM X X1 JOIN X X2
ON x2.Row < x1.Row
AND x1.Fieldvalue = x2.Fieldvalue
AND x1.FieldName = x2.FieldName

John


"Alexander Anderson" <no@spam.com> wrote in message
news:slrncfe0ft.mk1.alex@Toronto-HSE-ppp3682122.sympatico.ca...
> I have a DELETE statement that deletes duplicate data from a table. It
> takes a long time to execute, so I thought I'd seek advice here. The
> structure of the table is little funny. The following is NOT the table,
> but the representation of the data in the table:
>
> +-----------+
> | a | b |
> +-----+-----+
> | 123 | 234 |
> | 345 | 456 |
> | 123 | 123 |
> +-----+-----+
>
> As you can see, the data is tabular. This is how it is stored in the

table:
>
> +-----+-----------+------------+
> | Row | FieldName | FieldValue |
> +-----+-----------+------------+
> | 1 | a | 123 |
> | 1 | b | 234 |
> | 2 | a | 345 |
> | 2 | b | 456 |
> | 3 | a | 123 |
> | 3 | b | 234 |
> +-----+-----------+------------+
>
> What I need is to delete all records having the same "Row" when there

exists
> the same set of records with a different (smaller, to be precise) "Row".
> Using the example above, what I need to get is:
>
> +-----+-----------+------------+
> | Row | FieldName | FieldValue |
> +-----+-----------+------------+
> | 1 | a | 123 |
> | 1 | b | 234 |
> | 2 | a | 345 |
> | 2 | b | 456 |
> +-----+-----------+------------+
>
> A slow way of doing this seem to be:
>
> DELETE FROM X
> WHERE Row IN
> (SELECT DISTINCT Row FROM X x1
> WHERE EXISTS
> (SELECT * FROM X x2
> WHERE x2.Row < x1.Row
> AND NOT EXISTS
> (SELECT * FROM X x3
> WHERE x3.Row = x2.Row
> AND x3.FieldName = x2.FieldName
> AND x3.FieldValue <> x1.FieldValue)))
>
> Can this be done faster, better, and cheaper?



Reply With Quote