This is a discussion on Removing duplicates within the SQL Server forums, part of the Microsoft SQL Server category; --> Hi I have inherited a web app with the following table structure, and need to produce a table without ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi I have inherited a web app with the following table structure, and need to produce a table without any duplicates. Email seems like the best unique identifier - so only one of each e-mail address should be in the table. Following http://www.sqlteam.com/item.asp?ItemID=3331 I have been able to get a duplicate count working: select Email, count(*) as UserCount from dbo.Members group by Email having count(*) > 1 order by UserCount desc But the methods for create a new table without duplicates fail. My code for the 2nd method is: sp_rename 'Members', 'temp_Members' select distinct * into Members from temp_Members Table.... CREATE TABLE [dbo].[Members] ( [MemberID] [int] IDENTITY (1, 1) NOT NULL , [Username] [varchar] (10) COLLATE Latin1_General_CI_AS NOT NULL , [Password] [varchar] (10) COLLATE Latin1_General_CI_AS NOT NULL , [email] [varchar] (50) COLLATE Latin1_General_CI_AS NOT NULL , [Title] [varchar] (10) COLLATE Latin1_General_CI_AS NOT NULL , [FirstName] [varchar] (50) COLLATE Latin1_General_CI_AS NOT NULL , [Surname] [varchar] (50) COLLATE Latin1_General_CI_AS NOT NULL , [Address1] [varchar] (35) COLLATE Latin1_General_CI_AS NOT NULL , [Address2] [varchar] (35) COLLATE Latin1_General_CI_AS NOT NULL , [City] [varchar] (25) COLLATE Latin1_General_CI_AS NOT NULL , [Country] [varchar] (25) COLLATE Latin1_General_CI_AS NOT NULL , [Profession] [varchar] (50) COLLATE Latin1_General_CI_AS NOT NULL , [Publication] [varchar] (40) COLLATE Latin1_General_CI_AS NOT NULL , [DateAdded] [smalldatetime] NOT NULL , [SendMail] [smallint] NOT NULL ) ON [PRIMARY] GO Thanks B. |
| |||
| Can I assume that memberid is unique? If so, try this: INSERT INTO NewTable (username, password, email, title, firstname, surname, address1, address2, city, country, profession, publication, dateadded, sendmail) SELECT username, password, email, title, firstname, surname, address1, address2, city, country, profession, publication, dateadded, sendmail FROM Members AS M1 JOIN (SELECT MIN(memberid) AS memberid FROM Members GROUP BY email) AS M2 ON M1.memberid = M2.memberid -- David Portas ------------ Please reply only to the newsgroup -- |
| ||||
| Microsoft has a great article on this at http://support.microsoft.com/default...&Product=sql2k I use a code block for doing this that I load as required, I only run it when I have to. Once you've cleaned up your data, put a unique constraint to prevent dupes from getting back in. This is the code that I use for cleaning up dupes, you'll have to modify the table and field names obviously. This config assumes you are allowing Select/Into's. /* -- --WARNING! EXECUTE THIS SCRIPT WITH EXTREME CAUTION!, --AND THEN ONLY ONE STEP AT A TIME! -- --In the TotalTransit (DDS) database, this needs to run against --ddsTrip and ttVoucher_Trip. Since the two tables have different --layouts, the table that holds dupe rows must be dropped and --recreated every time this process is run. -- --There are now two seperate scripts, one for each table. -- --Run the process one step at a time by highlighting each step. -- --Wayne West, 10/13/03 -- WW 10/17/03 -- split into two procs, added more comments -- ------------------------------------------------------------------------------- -- --this is here to help find if there are any dupes select tripid, count(*) from ddsTrip group by tripid having count(*) > 1 select tripid, count(*) from ttVoucher_Trip group by tripid having count(*) > 1 -- ------------------------------------------------------------------------------- */ /* --Step 1 -- Verify there are dupes in the table select tripid, count(*) from ddsTrip group by tripid having count(*) > 1 --Step 2a -- Drop table that will hold dupe key values drop table zzzholdkey --Step 2b -- Collect key values, save in zzzHoldKey select RecKey = tripid, KeyCount = count(*) into zzzHoldKey from ddsTrip group by tripid having count(*) > 1 --Step 3a -- Drop table that will hold one instance of duplicate rows drop table zzzholddupes --Step 3b -- Collect one instance of duplicate rows select DISTINCT t.* into zzzHoldDupes from ddsTrip t, zzzholdkey hld where t.tripid = hld.reckey --Step 4 -- See if more than one field in addition to TripID is duplicated --This will indicate additional steps must be taken to clean up the table. select count(*), tripid from zzzholddupes group by tripid having count(*) > 1 --Step 5 -- Delete ALL rows based on key value of duplicate records delete ddsTrip from ddsTrip t, zzzholdkey hld where t.tripid = hld.reckey --Step 6 -- Reinsert the row captured in Step 3b insert ddsTrip select * from zzzholddupes --You probably ought to rerun Step 1 to make sure the file is clean. */ |