Re: optimizing a query to delete duplicates my knee-jerk reaction is:
Why is it important to optimize it? I think you should delete the
duplicates, then create a constraint that prevents them from recurring.
If, for some reason, you are unable to fix the application that creates
these duplicates, and creating a constraint causes errors in the application
that you can't tolerate, then I suppose an alternative would be to create a
trigger that deletes them upon entry. Having a composite index on the
columns that are being duplicated would enable such a trigger to run
quickly.
But looking at your query, I find it strangely complex.
Why not just:
DELETE FROM X
WHERE EXISTS (SELECT * FROM X x2
WHERE x2.Row < x.Row
AND X.FieldName = x2.FieldName
AND X.FieldValue = x2.FieldValue)
Am I missing something? Your NOT EXISTS has me a bit confused... I think it
might delete data in situations other than described.
Also, NOT EXISTS is generally slow. |