Re: Count unique values from overlapping values Kim schreef:
> On Mar 11, 11:25 am, "Rik Wasmus" <luiheidsgoe...@hotmail.com> wrote:
>> On Tue, 11 Mar 2008 11:20:06 +0100, Kim <kims...@gmail.com> wrote:
>>> Sample data:
>>> colA,colB,sum
>>> 33996344,33996351,8
>>> 50331648,67276831,16945184
>>> 50331648,68257567,17925920
>>> 67276832,67276847,16
>>> 67276848,67277023,176
>>> 67277024,67277031,8
>>> Wanted output:
>>> 33996344,33996351,8
>>> 50331648,68257567,17925920
>>> Sum is done as follows (colB-colA+1).
>>> Any idea how to get this outcome ?
>>> This will give incorrect unique count, because it simply sums all up.
>>> SELECT sum(colB-colA+1) FROM tbl
>> If this is not what you want:
>> SELECT sum(colB-colA+1) FROM tbl
>> GROUP BY colA,colB
>>
>> ... you'll have to explain the logic further, as I can't really see it.
>> What overlaps what, how, an how should that be handled?
>> --
>> Rik Wasmus
>
> To Rik Wasmus:
> Its not what I want.
> If you look at the sample when you should see 6 rows. Of these only
> the 2 listed in "wanted output" is unique.
>
> Here is why:
> Row 1 is unique already, so no filtering needed.
> Row 2 is not unique because row 3 contains the same + more and
> therefor needs to be filtered out.
> Row 3 is unique because it has the largest range with no overlapping.
> Row 4-6 are all in the range which row 3 has and therefor needs to be
> filtered out.
that was totaly no obvious from you original post ;-)
>
>
> To Luuk:
> Thanks. Its step in the right direction as it can filter out rows with
> the same colA value but different colB value. Result:
> 33996344,33996351,8
> 50331648,68257567,17925920
> 67276832,67276847,16
> 67276848,67277023,176
> 67277024,67277031,8
>
>
> To all:
> How I get the wanted outcome has no restrictions as long as it can be
> done using one or more queries. If all fails, then I must resort to
> looping thru all rows in tbl, which I would prefer not to do because
> there is millions of rows.
select colA, max(colB) , (max(colB)-ColA+1) as sum
from tbl T1
where colA>ifnull((select max(colB) from tbl where colA<T1.colA),0)
group by colA;
--
Luuk |