This is a discussion on How can I remove duplicate entries in a sql query? within the SQL Server forums, part of the Microsoft SQL Server category; --> I have a database being populated by hits to a program on a server. The problem is each client ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I have a database being populated by hits to a program on a server. The problem is each client connection may require a few hits in a 1-2 second time frame. This is resulting in multiple database entries - all exactly the same, except the event_id field, which is auto-numbered. I need a way to query the record w/out duplicates. That is, any records exactly the same except event_id should only return one record. Is this possible?? Thank you, Barry |
| |||
| try this, not tested :-) SELECT * FROM table1 a, (SELECT min(id) FROM table1) AS b WHERE a.id = b.id BarrySDCA wrote: > I have a database being populated by hits to a program on a server. > The problem is each client connection may require a few hits in a 1-2 > second time frame. This is resulting in multiple database entries - > all exactly the same, except the event_id field, which is > auto-numbered. > > I need a way to query the record w/out duplicates. That is, any > records exactly the same except event_id should only return one record. > > Is this possible?? > > Thank you, > > Barry |
| |||
| hrm....no luck. I ran this: SELECT * from `playback_log` a,(SELECT min(EVENT_ID) FROM `playback_log` ) AS b WHERE a.EVENT_ID = b.EVENT_ID Can you see anything? I appreciate the help a bunch. thank you |
| |||
| Haven't tested this: SELECT * FROM playback_log a WHERE a.event_id = (select min(event_id) from playback_log b where a.field1 = b.field1) ; You will need to list as many fields as need to match the identical rows in the where clause of the sub-select. Thomas BarrySDCA wrote on 03.03.2006 00:33: > hrm....no luck. I ran this: > > > SELECT * from `playback_log` a,(SELECT min(EVENT_ID) FROM > `playback_log` ) AS b WHERE a.EVENT_ID = b.EVENT_ID > > > Can you see anything? I appreciate the help a bunch. thank you > |
| |||
| "Thomas Kellerer" <WVIJEVPANEHT@spammotel.com> wrote in message news:46petmFc69qnU1@individual.net... > SELECT * > FROM playback_log a > WHERE a.event_id = (select min(event_id) from playback_log b > where a.field1 = b.field1) Here's a similar possibility, without using a correlated subquery: SELECT a.* FROM playback_log AS a WHERE a.event_id IN ( SELECT MIN(b.event_id) FROM playpack_log AS b GROUP BY b.field1, b.field2, b.field3, ...) What I've seen missing in the several solutions proposed is any use of GROUP BY. You'll need to GROUP BY all the fields of the table _except_ for event_id. Regards, Bill K. |
| |||
| Bill Karwin (bill@karwin.com) writes: > "Thomas Kellerer" <WVIJEVPANEHT@spammotel.com> wrote in message > news:46petmFc69qnU1@individual.net... >> SELECT * >> FROM playback_log a >> WHERE a.event_id = (select min(event_id) from playback_log b >> where a.field1 = b.field1) > > Here's a similar possibility, without using a correlated subquery: > > SELECT a.* > FROM playback_log AS a > WHERE a.event_id IN ( > SELECT MIN(b.event_id) > FROM playpack_log AS b > GROUP BY b.field1, b.field2, b.field3, ...) > > What I've seen missing in the several solutions proposed is any use of > GROUP BY. You'll need to GROUP BY all the fields of the table _except_ > for event_id. One more variation, using a derived table: SELECT a.* FROM playback_log AS a JOIN (SELECT MIN(event_id) FROM playback_log GROUP BY field1, field2, ...) AS b ON a.event_id = b.event_id -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/pro...ads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinf...ons/books.mspx |
| |||
| I'm not sure why you need a nested query as others have suggested. I see two options: Select Distinct blah1, blah2, [all fields except event_id], blah99 >From LoginTable or Select blah1, blah2, [all fields except event_id], blah99 >From LoginTable Group by blah1, blah2, [all fields except event_id], blah99 if you want an event_id, just for kicks, you could add an aggregator: Select blah1, blah2, max(event_id), blah99 >From LoginTable Group by blah1, blah2, [all fields except event_id], blah99 |
| |||
| BarrySDCA wrote: > I have a database being populated by hits to a program on a server. > The problem is each client connection may require a few hits in a 1-2 > second time frame. This is resulting in multiple database entries - > all exactly the same, except the event_id field, which is > auto-numbered. > > I need a way to query the record w/out duplicates. That is, any > records exactly the same except event_id should only return one record. > > Is this possible?? > > Thank you, > > Barry Hopefully you found the suggestions on how to filter out the duplicates helpful. I just want to point out that this shows the weakness of using a pseudokey as the primary key of a table. If they really are the same events, they shouldn't be duplicated. Actually, I don't think you have duplicate events, since at least some of the connection hits must have failed or timed out. Otherwise why were there retries? So they are not really the same. Are you maybe missing a status or result attribute? (but the ID field is still a poor crutch for a possibly weak design.) just some food for thought. Ed. |
| |||
| >(but the ID field is still a poor crutch for a possibly weak design.) hmmmm. i'd prefer to think of it as the table stores ALL hits to the table, but this particular data consumer only wants to see ONE. The way I think about it, your natural key would be the composite of all fields in the table; perhaps "natural", but certainly awkword. I like figital's solution. |
| ||||
| It's exactly what I'm doing.....I'm recording hits to an advertising engine, before they get to the media server. Some client players hit more than once before they connect, causing a duplicate entry. I just want to see it sorted out w/out the duplicates. I have the distribution server logs for accurate playback reporting, I'm only interested in seeing cleaned up hits. I setup code to remove the duplicates, but it's not fool proof. I will try the suggestions here and let the group know how it goes. Here is a link to what I'm doing: http://www.sundiegolive.com/advertising.htm notice if two duplicates come in now and are staggered between eachother, it shows twice still. I will try and let you all know... I would like to say that I didn't expect such great help from the group. I've thrown a few questions out to different groups but usually don't have so many great replies. Thank you everyone! |