This is a discussion on Duplicate Records within the MySQL forums, part of the Database Server Software category; --> I am using this query to find duplicate records (first_name, last_name and i_active) . However, this query takes a ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I am using this query to find duplicate records (first_name, last_name and i_active) . However, this query takes a lot of time to execute when executed for the first time. When I execute it for the second time, it is very fast.What could be the reason ? and can anyone recommend a way to optimise this? SELECT * FROM talent WHERE (((talent.first_name) In (SELECT first_name FROM talent As Tmp GROUP BY first_name,last_name,i_active HAVING Count(*)>1 And last_name = talent.last_name and i_active != '0'))) ORDER BY talent.first_name, talent.last_name TIA |
| |||
| On Feb 5, 8:39 am, "Pankaj" <panah...@gmail.com> wrote: > I am using this query to find duplicate records (first_name, last_name > and i_active) . However, this query takes a lot of time to execute > when executed for the first time. When I execute it for the second > time, it is very fast.What could be the reason ? and can anyone > recommend a way to optimise this? > > SELECT * FROM talent WHERE (((talent.first_name) In (SELECT first_name > FROM talent As Tmp GROUP BY first_name,last_name,i_active HAVING > Count(*)>1 And last_name = talent.last_name and i_active != '0'))) > ORDER BY talent.first_name, talent.last_name > > TIA The query is being cached - which is why it's quick the second time. Use a join instead of a sub select. It will be much faster. There are many examples of using joins in this way within these NGs. |
| |||
| > The query is being cached - which is why it's quick the second time. > > Use a join instead of a sub select. It will be much faster. There are > many examples of using joins in this way within these NGs. Thanks.... I use this query and the results are already faster SELECT * FROM talent as e1 LEFT JOIN talent AS e2 ON (e1.s_first_name=e2.s_first_name and e1.s_last_name=e2.s_last_name and e1.i_active=e2.i_active) WHERE (e2.i_talent_id IS NOT NULL) GROUP BY e1.s_first_name,e1.s_last_name,e1.i_active HAVING COUNT(*) > 1 but this displays the results only once. I want to view the duplicates as well. Any suggestions ? |
| |||
| On 5 Feb, 11:59, "Pankaj" <panah...@gmail.com> wrote: > > The query is being cached - which is why it's quick the second time. > > > Use a join instead of a sub select. It will be much faster. There are > > many examples of using joins in this way within these NGs. > > Thanks.... I use this query and the results are already faster > > SELECT * > FROM talent as e1 > LEFT JOIN talent AS e2 > ON (e1.s_first_name=e2.s_first_name and e1.s_last_name=e2.s_last_name > and e1.i_active=e2.i_active) > WHERE (e2.i_talent_id IS NOT NULL) > GROUP BY e1.s_first_name,e1.s_last_name,e1.i_active > HAVING COUNT(*) > 1 > > but this displays the results only once. I want to view the duplicates > as well. Any suggestions ? What happens if you leave out the GROUP BY clause? |
| |||
| On 5 Feb, 11:59, "Pankaj" <panah...@gmail.com> wrote: > > The query is being cached - which is why it's quick the second time. > > > Use a join instead of a sub select. It will be much faster. There are > > many examples of using joins in this way within these NGs. > > Thanks.... I use this query and the results are already faster > > SELECT * > FROM talent as e1 > LEFT JOIN talent AS e2 > ON (e1.s_first_name=e2.s_first_name and e1.s_last_name=e2.s_last_name > and e1.i_active=e2.i_active) > WHERE (e2.i_talent_id IS NOT NULL) > GROUP BY e1.s_first_name,e1.s_last_name,e1.i_active > HAVING COUNT(*) > 1 > > but this displays the results only once. I want to view the duplicates > as well. Any suggestions ? Could you supply an exort of the structure and some data for us to play with. |
| |||
| Pankaj <panahuja@gmail.com> wrote: > I am using this query to find duplicate records (first_name, last_name > and i_active). Make sure to use a primary key. In this way duplicate records will never occur. -- Per Erik Rønne http://www.RQNNE.dk |
| |||
| On Feb 6, 3:42 am, p...@RQNNE.invalid (Per Rønne) wrote: > Pankaj <panah...@gmail.com> wrote: > > I am using this query to find duplicate records (first_name, last_name > > and i_active). > > Make sure to use a primary key. In this way duplicate records will never > occur. > -- > Per Erik Rønnehttp://www.RQNNE.dk I am using primary keys. but some fields like first_name and last_name can be duplicate :-) |
| |||
| Here's one record ........... exported this from phpmyadmin so that you can simply import in your table -- phpMyAdmin SQL Dump -- version 2.8.2.4 -- http://www.phpmyadmin.net -- -- Server version: 5.0.24 -- PHP Version: 5.1.6 -- -- Database: `temp` -- -- -------------------------------------------------------- -- -- Table structure for table `edx_talent` -- CREATE TABLE `edx_talent` ( `i_talent_id` mediumint(9) NOT NULL auto_increment, `s_legal_name` varchar(60) default NULL, `s_first_name` varchar(150) default NULL, `s_last_name` varchar(150) default NULL, `s_middle_name` varchar(150) default NULL, `s_long_description` text, `s_short_description` varchar(255) default NULL, `d_dob` date NOT NULL default '0000-00-00', `s_sex` varchar(6) NOT NULL default '', `s_address_1` varchar(60) default NULL, `s_address_2` varchar(60) default NULL, `s_city` varchar(30) default NULL, `s_state_province` varchar(30) default NULL, `s_zip_route_code` varchar(10) default NULL, `s_country` varchar(20) default NULL, `s_drivers_license` varchar(15) default NULL, `s_drivers_license_state` varchar(5) default NULL, `i_view_on_website` int(1) default '0', `s_notes` text, `s_company` varchar(255) default NULL, `i_active` int(1) NOT NULL default '-1', PRIMARY KEY (`i_talent_id`), KEY `s_first_name` (`s_first_name`), KEY `s_last_name` (`s_last_name`), KEY `d_dob` (`d_dob`) ) ENGINE=MyISAM AUTO_INCREMENT=10 DEFAULT CHARSET=latin1 AUTO_INCREMENT=10 ; -- -- Dumping data for table `edx_talent` -- INSERT INTO `edx_talent` (`i_talent_id`, `s_legal_name`, `s_first_name`, `s_last_name`, `s_middle_name`, `s_long_description`, `s_short_description`, `d_dob`, `s_sex`, `s_address_1`, `s_address_2`, `s_city`, `s_state_province`, `s_zip_route_code`, `s_country`, `s_drivers_license`, `s_drivers_license_state`, `i_view_on_website`, `s_notes`, `s_company`, `i_active`) VALUES (1224, NULL, 'J, 'Antone', '', '', '', '1971-02-19', 'male', '', '', '', 'CA', '12345', 'US', 'A12345', 'CA', 0, '', NULL, 0), (1225, NULL, 'J, 'Antone', '', '', '', '1971-02-19', 'male', '', '', '', 'CA', '12345', 'US', 'A12345', 'CA', 0, '', '', -1), (1226, NULL, 'J, 'Antone', '', '', '', '1971-02-19', 'male', '', '', '', 'CA', '12345', 'US', 'A12345', 'CA', 0, '', NULL, -1); |
| |||
| Pankaj <panahuja@gmail.com> wrote: > On Feb 6, 3:42 am, p...@RQNNE.invalid (Per Rønne) wrote: > > Pankaj <panah...@gmail.com> wrote: > > > I am using this query to find duplicate records (first_name, last_name > > > and i_active). > > > > Make sure to use a primary key. In this way duplicate records will never > > occur. > I am using primary keys. but some fields like first_name and last_name > can be duplicate :-) Yes, but if you use primary keys, /per definition/ no duplicate records can exist. But of course, in a person database with let's say one milliard records, there may be more than one male holding the name John Smith :-). In your country you may not have a Central Person Register number uniquely defining every citizen and metic, and for other reasons it may not be wise to use such a field as the primary key. Then just define one column or a set of columns as being 'unique'. Perhaps I should add that I'm more used to Oracle than to MySQL. -- Per Erik Rønne http://www.RQNNE.dk |
| ||||
| On Feb 6, 6:40 am, "Pankaj" <panah...@gmail.com> wrote: > Here's one record ........... exported this from phpmyadmin so that > you can simply import in your table > > -- phpMyAdmin SQL Dump > -- version 2.8.2.4 > --http://www.phpmyadmin.net > -- > -- Server version: 5.0.24 > -- PHP Version: 5.1.6 > -- > -- Database: `temp` > -- > > -- -------------------------------------------------------- > > -- > -- Table structure for table `edx_talent` > -- > > CREATE TABLE `edx_talent` ( > `i_talent_id` mediumint(9) NOT NULL auto_increment, > `s_legal_name` varchar(60) default NULL, > `s_first_name` varchar(150) default NULL, > `s_last_name` varchar(150) default NULL, > `s_middle_name` varchar(150) default NULL, > `s_long_description` text, > `s_short_description` varchar(255) default NULL, > `d_dob` date NOT NULL default '0000-00-00', > `s_sex` varchar(6) NOT NULL default '', > `s_address_1` varchar(60) default NULL, > `s_address_2` varchar(60) default NULL, > `s_city` varchar(30) default NULL, > `s_state_province` varchar(30) default NULL, > `s_zip_route_code` varchar(10) default NULL, > `s_country` varchar(20) default NULL, > `s_drivers_license` varchar(15) default NULL, > `s_drivers_license_state` varchar(5) default NULL, > `i_view_on_website` int(1) default '0', > `s_notes` text, > `s_company` varchar(255) default NULL, > `i_active` int(1) NOT NULL default '-1', > PRIMARY KEY (`i_talent_id`), > KEY `s_first_name` (`s_first_name`), > KEY `s_last_name` (`s_last_name`), > KEY `d_dob` (`d_dob`) > ) ENGINE=MyISAM AUTO_INCREMENT=10 DEFAULT CHARSET=latin1 > AUTO_INCREMENT=10 ; > > -- > -- Dumping data for table `edx_talent` > -- > > INSERT INTO `edx_talent` (`i_talent_id`, `s_legal_name`, > `s_first_name`, `s_last_name`, `s_middle_name`, `s_long_description`, > `s_short_description`, `d_dob`, `s_sex`, `s_address_1`, `s_address_2`, > `s_city`, `s_state_province`, `s_zip_route_code`, `s_country`, > `s_drivers_license`, `s_drivers_license_state`, `i_view_on_website`, > `s_notes`, `s_company`, `i_active`) VALUES > (1224, NULL, 'J, 'Antone', '', '', '', '1971-02-19', 'male', '', '', > '', 'CA', '12345', 'US', 'A12345', 'CA', 0, '', NULL, 0), > (1225, NULL, 'J, 'Antone', '', '', '', '1971-02-19', 'male', '', '', > '', 'CA', '12345', 'US', 'A12345', 'CA', 0, '', '', -1), > (1226, NULL, 'J, 'Antone', '', '', '', '1971-02-19', 'male', '', '', > '', 'CA', '12345', 'US', 'A12345', 'CA', 0, '', NULL, -1); Interesting, the Js are all missing their closing quote. |