Unix Technical Forum

Select unique based on 4 field values

This is a discussion on Select unique based on 4 field values within the MySQL forums, part of the Database Server Software category; --> I have 'inherited' a database that has a lot of redundant info. The (only) table consists of 10 char(1) ...


Go Back   Unix Technical Forum > Database Server Software > MySQL

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 02-28-2008, 10:07 AM
semi_evil@inbox.com
 
Posts: n/a
Default Select unique based on 4 field values

I have 'inherited' a database that has a lot of redundant info. The
(only) table consists of 10 char(1) fields followed by 4 unsigned
floats. There's over 3 million records in it and a quick study learned
that while there are no full doubles in it several of the char
combinations have the same related float values. Average selections
take between 2 and 5 seconds at the moment, which leads to user
complaints.

The char fields can be reduced to a more comprehensive form. Before I
remove the data I have to assert my theory is correct.

How can I make a selection that leaves those records for which all of
the 4 floats are equal to those in other records out?

Maybe I am not explaining this very clearly, I am not that familiar
with this material. Perhaps an example helps explain better:

row1 a b c d e f g h i j 0.123 0.456 0.78 0.90
row2 k l m n o p q r s t 0.123 0.456 0.78 0.90 <-- leave out

The table is MyISAM, no indices, no primary key. There are no rows for
which all 10 char fields are equal. I am aware their are additional
ways to improve performance, I'd first like to get rid of (what I
think amounts to) 70 to 80% redundancy. The floats have up to 6
decimals, they all are in the range 0.0 - 1.0
Thanks for your kind assistance.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 02-28-2008, 10:07 AM
lark
 
Posts: n/a
Default Re: Select unique based on 4 field values

== Quote from semi_evil ( semi_evil@inbox.com)'s article
> I have 'inherited' a database that has a lot of redundant info. The
> (only) table consists of 10 char(1) fields followed by 4 unsigned
> floats. There's over 3 million records in it and a quick study learned
> that while there are no full doubles in it several of the char
> combinations have the same related float values. Average selections
> take between 2 and 5 seconds at the moment, which leads to user
> complaints.
> The char fields can be reduced to a more comprehensive form. Before I
> remove the data I have to assert my theory is correct.
> How can I make a selection that leaves those records for which all of
> the 4 floats are equal to those in other records out?
> Maybe I am not explaining this very clearly, I am not that familiar
> with this material. Perhaps an example helps explain better:
> row1 a b c d e f g h i j 0.123 0.456 0.78 0.90
> row2 k l m n o p q r s t 0.123 0.456 0.78 0.90 <-- leave out
> The table is MyISAM, no indices, no primary key. There are no rows for
> which all 10 char fields are equal. I am aware their are additional
> ways to improve performance, I'd first like to get rid of (what I
> think amounts to) 70 to 80% redundancy. The floats have up to 6
> decimals, they all are in the range 0.0 - 1.0
> Thanks for your kind assistance.



what do you want to do with them? delete them or leave them out of your result sets!!!
--
POST BY: lark with PHP News Reader
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 02-28-2008, 10:07 AM
Paul Lautman
 
Posts: n/a
Default Re: Select unique based on 4 field values

semi_evil@inbox.com wrote:
> I have 'inherited' a database that has a lot of redundant info. The
> (only) table consists of 10 char(1) fields followed by 4 unsigned
> floats. There's over 3 million records in it and a quick study learned
> that while there are no full doubles in it several of the char
> combinations have the same related float values. Average selections
> take between 2 and 5 seconds at the moment, which leads to user
> complaints.
>
> The char fields can be reduced to a more comprehensive form. Before I
> remove the data I have to assert my theory is correct.
>
> How can I make a selection that leaves those records for which all of
> the 4 floats are equal to those in other records out?
>
> Maybe I am not explaining this very clearly, I am not that familiar
> with this material. Perhaps an example helps explain better:
>
> row1 a b c d e f g h i j 0.123 0.456 0.78 0.90
> row2 k l m n o p q r s t 0.123 0.456 0.78 0.90 <-- leave out
>
> The table is MyISAM, no indices, no primary key. There are no rows for
> which all 10 char fields are equal. I am aware their are additional
> ways to improve performance, I'd first like to get rid of (what I
> think amounts to) 70 to 80% redundancy. The floats have up to 6
> decimals, they all are in the range 0.0 - 1.0
> Thanks for your kind assistance.


Can you add a primary key? If so add an auto-increment primary key column
and then you can use the Strawberry Query to do what you want. You haven't
given any names for the columns so I am going to assume that the table is
called bad_table the float columns are f1, f2, f3, f4 and the primary key
you will add is called id:

SELECT
`b2`.`*`
FROM `bad_table` `b1`
LEFT JOIN `bad_table` `b2` ON `b1`.`f`1` = `b2`.`f`1` AND `b1`.`f`2` =
`b2`.`f`2` AND `b1`.`f`3` = `b2`.`f`3` AND `b1`.`f`4` = `b2`.`f`4` AND
`b1`.`id` < `b2`.`id`
WHERE `b1`.`id` IS NULL


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 02-28-2008, 10:08 AM
semi_evil@inbox.com
 
Posts: n/a
Default Re: Select unique based on 4 field values

On Jun 26, 10:41 pm, "Paul Lautman" <paul.laut...@btinternet.com>
wrote:

>
> Can you add a primary key? If so add an auto-increment primary key column
> and then you can use the Strawberry Query to do what you want. You haven't
> given any names for the columns so I am going to assume that the table is
> called bad_table the float columns are f1, f2, f3, f4 and the primary key
> you will add is called id:
>
> SELECT
> `b2`.`*`
> FROM `bad_table` `b1`
> LEFT JOIN `bad_table` `b2` ON `b1`.`f`1` = `b2`.`f`1` AND `b1`.`f`2` =
> `b2`.`f`2` AND `b1`.`f`3` = `b2`.`f`3` AND `b1`.`f`4` = `b2`.`f`4` AND
> `b1`.`id` < `b2`.`id`
> WHERE `b1`.`id` IS NULL


That worked! Funny name. The table shrunk about 75%, I've combined all
the char fields into a single field now and added some PHP logic to
map user selections to a corresponding db row. Performance is
acceptable now.

Thanks for the help.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 02-28-2008, 10:08 AM
strawberry
 
Posts: n/a
Default Re: Select unique based on 4 field values

On Jun 28, 6:36 am, semi_e...@inbox.com wrote:
> On Jun 26, 10:41 pm, "Paul Lautman" <paul.laut...@btinternet.com>
> wrote:
>
>
>
> > Can you add a primary key? If so add an auto-increment primary key column
> > and then you can use the Strawberry Query to do what you want. You haven't
> > given any names for the columns so I am going to assume that the table is
> > called bad_table the float columns are f1, f2, f3, f4 and the primary key
> > you will add is called id:

>
> > SELECT
> > `b2`.`*`
> > FROM `bad_table` `b1`
> > LEFT JOIN `bad_table` `b2` ON `b1`.`f`1` = `b2`.`f`1` AND `b1`.`f`2` =
> > `b2`.`f`2` AND `b1`.`f`3` = `b2`.`f`3` AND `b1`.`f`4` = `b2`.`f`4` AND
> > `b1`.`id` < `b2`.`id`
> > WHERE `b1`.`id` IS NULL

>
> That worked! Funny name. The table shrunk about 75%, I've combined all
> the char fields into a single field now and added some PHP logic to
> map user selections to a corresponding db row. Performance is
> acceptable now.
>
> Thanks for the help.



What's 'funny' about it? ;-)

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 03:03 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com