Unix Technical Forum

SEO

vBulletin Search Engine Optimization


Go Back   Unix Technical Forum > Database Server Software > MySQL

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 03-17-2008, 06:13 AM
Kim
 
Posts: n/a
Default Count unique values from overlapping values

Sample data:
colA,colB,sum
33996344,33996351,8
50331648,67276831,16945184
50331648,68257567,17925920
67276832,67276847,16
67276848,67277023,176
67277024,67277031,8

Wanted output:
33996344,33996351,8
50331648,68257567,17925920

Sum is done as follows (colB-colA+1).
Any idea how to get this outcome ?

This will give incorrect unique count, because it simply sums all up.
SELECT sum(colB-colA+1) FROM tbl
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 03-17-2008, 06:13 AM
Rik Wasmus
 
Posts: n/a
Default Re: Count unique values from overlapping values

On Tue, 11 Mar 2008 11:20:06 +0100, Kim <kimslot@gmail.com> wrote:

> Sample data:
> colA,colB,sum
> 33996344,33996351,8
> 50331648,67276831,16945184
> 50331648,68257567,17925920
> 67276832,67276847,16
> 67276848,67277023,176
> 67277024,67277031,8
>
> Wanted output:
> 33996344,33996351,8
> 50331648,68257567,17925920
>
> Sum is done as follows (colB-colA+1).
> Any idea how to get this outcome ?
>
> This will give incorrect unique count, because it simply sums all up.
> SELECT sum(colB-colA+1) FROM tbl


If this is not what you want:
SELECT sum(colB-colA+1) FROM tbl
GROUP BY colA,colB

.... you'll have to explain the logic further, as I can't really see it.
What overlaps what, how, an how should that be handled?
--
Rik Wasmus
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 03-17-2008, 06:13 AM
Luuk
 
Posts: n/a
Default Re: Count unique values from overlapping values

Kim schreef:
> Sample data:
> colA,colB,sum
> 33996344,33996351,8
> 50331648,67276831,16945184
> 50331648,68257567,17925920
> 67276832,67276847,16
> 67276848,67277023,176
> 67277024,67277031,8
>
> Wanted output:
> 33996344,33996351,8
> 50331648,68257567,17925920
>
> Sum is done as follows (colB-colA+1).
> Any idea how to get this outcome ?
>
> This will give incorrect unique count, because it simply sums all up.
> SELECT sum(colB-colA+1) FROM tbl


select colA, max(colB) colB , (max(colB)-ColA+1) as sum
from tbl group by colA;


--
Luuk
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 03-17-2008, 06:13 AM
Kim
 
Posts: n/a
Default Re: Count unique values from overlapping values

On Mar 11, 11:25 am, "Rik Wasmus" <luiheidsgoe...@hotmail.com> wrote:
> On Tue, 11 Mar 2008 11:20:06 +0100, Kim <kims...@gmail.com> wrote:
> > Sample data:
> > colA,colB,sum
> > 33996344,33996351,8
> > 50331648,67276831,16945184
> > 50331648,68257567,17925920
> > 67276832,67276847,16
> > 67276848,67277023,176
> > 67277024,67277031,8

>
> > Wanted output:
> > 33996344,33996351,8
> > 50331648,68257567,17925920

>
> > Sum is done as follows (colB-colA+1).
> > Any idea how to get this outcome ?

>
> > This will give incorrect unique count, because it simply sums all up.
> > SELECT sum(colB-colA+1) FROM tbl

>
> If this is not what you want:
> SELECT sum(colB-colA+1) FROM tbl
> GROUP BY colA,colB
>
> ... you'll have to explain the logic further, as I can't really see it.
> What overlaps what, how, an how should that be handled?
> --
> Rik Wasmus


To Rik Wasmus:
Its not what I want.
If you look at the sample when you should see 6 rows. Of these only
the 2 listed in "wanted output" is unique.

Here is why:
Row 1 is unique already, so no filtering needed.
Row 2 is not unique because row 3 contains the same + more and
therefor needs to be filtered out.
Row 3 is unique because it has the largest range with no overlapping.
Row 4-6 are all in the range which row 3 has and therefor needs to be
filtered out.


To Luuk:
Thanks. Its step in the right direction as it can filter out rows with
the same colA value but different colB value. Result:
33996344,33996351,8
50331648,68257567,17925920
67276832,67276847,16
67276848,67277023,176
67277024,67277031,8


To all:
How I get the wanted outcome has no restrictions as long as it can be
done using one or more queries. If all fails, then I must resort to
looping thru all rows in tbl, which I would prefer not to do because
there is millions of rows.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 03-17-2008, 06:13 AM
Rik Wasmus
 
Posts: n/a
Default Re: Count unique values from overlapping values

On Tue, 11 Mar 2008 13:57:37 +0100, Kim <kimslot@gmail.com> wrote:

> On Mar 11, 11:25 am, "Rik Wasmus" <luiheidsgoe...@hotmail.com> wrote:
>> On Tue, 11 Mar 2008 11:20:06 +0100, Kim <kims...@gmail.com> wrote:
>> > Sample data:
>> > colA,colB,sum
>> > 33996344,33996351,8
>> > 50331648,67276831,16945184
>> > 50331648,68257567,17925920
>> > 67276832,67276847,16
>> > 67276848,67277023,176
>> > 67277024,67277031,8

>>
>> > Wanted output:
>> > 33996344,33996351,8
>> > 50331648,68257567,17925920

>>
>> > Sum is done as follows (colB-colA+1).
>> > Any idea how to get this outcome ?

>>
>> > This will give incorrect unique count, because it simply sums all up.
>> > SELECT sum(colB-colA+1) FROM tbl

>>
>> If this is not what you want:
>> SELECT sum(colB-colA+1) FROM tbl
>> GROUP BY colA,colB
>>
>> ... you'll have to explain the logic further, as I can't really see it.
>> What overlaps what, how, an how should that be handled?

>
> To Rik Wasmus:
> Its not what I want.
> If you look at the sample when you should see 6 rows. Of these only
> the 2 listed in "wanted output" is unique.
>
> Here is why:
> Row 1 is unique already, so no filtering needed.
> Row 2 is not unique because row 3 contains the same + more and
> therefor needs to be filtered out.
> Row 3 is unique because it has the largest range with no overlapping.
> Row 4-6 are all in the range which row 3 has and therefor needs to be
> filtered out.


So, if I understand correctly, you want the 'largest' range, and skip all
records that fall in a bigger range?

If that's the case, what should happen with:
1 4
2 6
1 3
8 12
9 10

The problem here is row 2 falls partly in row 1, but I assume:
1 4
2 6
8 12

If I assume these should count as different rows, I'd say this one:
SELECT
a.colA, a.colB, (a.colB-a.colA+1)
FROM tablename a
LEFT JOIN tablename b
ON b.colA <= a.colA
AND b.colA >= a.colA
# having a pk would help
# now we have to make sure it
# doesn't match on it's own row:
AND NOT (a.colA = b.colA AND a.colB = b.colB)
WHERE b.colA IS NULL
--
Rik Wasmus
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 03-17-2008, 06:13 AM
Luuk
 
Posts: n/a
Default Re: Count unique values from overlapping values

Kim schreef:
> On Mar 11, 11:25 am, "Rik Wasmus" <luiheidsgoe...@hotmail.com> wrote:
>> On Tue, 11 Mar 2008 11:20:06 +0100, Kim <kims...@gmail.com> wrote:
>>> Sample data:
>>> colA,colB,sum
>>> 33996344,33996351,8
>>> 50331648,67276831,16945184
>>> 50331648,68257567,17925920
>>> 67276832,67276847,16
>>> 67276848,67277023,176
>>> 67277024,67277031,8
>>> Wanted output:
>>> 33996344,33996351,8
>>> 50331648,68257567,17925920
>>> Sum is done as follows (colB-colA+1).
>>> Any idea how to get this outcome ?
>>> This will give incorrect unique count, because it simply sums all up.
>>> SELECT sum(colB-colA+1) FROM tbl

>> If this is not what you want:
>> SELECT sum(colB-colA+1) FROM tbl
>> GROUP BY colA,colB
>>
>> ... you'll have to explain the logic further, as I can't really see it.
>> What overlaps what, how, an how should that be handled?
>> --
>> Rik Wasmus

>
> To Rik Wasmus:
> Its not what I want.
> If you look at the sample when you should see 6 rows. Of these only
> the 2 listed in "wanted output" is unique.
>
> Here is why:
> Row 1 is unique already, so no filtering needed.
> Row 2 is not unique because row 3 contains the same + more and
> therefor needs to be filtered out.
> Row 3 is unique because it has the largest range with no overlapping.
> Row 4-6 are all in the range which row 3 has and therefor needs to be
> filtered out.


that was totaly no obvious from you original post ;-)

>
>
> To Luuk:
> Thanks. Its step in the right direction as it can filter out rows with
> the same colA value but different colB value. Result:
> 33996344,33996351,8
> 50331648,68257567,17925920
> 67276832,67276847,16
> 67276848,67277023,176
> 67277024,67277031,8
>
>
> To all:
> How I get the wanted outcome has no restrictions as long as it can be
> done using one or more queries. If all fails, then I must resort to
> looping thru all rows in tbl, which I would prefer not to do because
> there is millions of rows.


select colA, max(colB) , (max(colB)-ColA+1) as sum
from tbl T1
where colA>ifnull((select max(colB) from tbl where colA<T1.colA),0)
group by colA;


--
Luuk
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 03-17-2008, 06:13 AM
Kim
 
Posts: n/a
Default Re: Count unique values from overlapping values

On Mar 11, 2:11 pm, "Rik Wasmus" <luiheidsgoe...@hotmail.com> wrote:
> On Tue, 11 Mar 2008 13:57:37 +0100, Kim <kims...@gmail.com> wrote:
> > On Mar 11, 11:25 am, "Rik Wasmus" <luiheidsgoe...@hotmail.com> wrote:
> >> On Tue, 11 Mar 2008 11:20:06 +0100, Kim <kims...@gmail.com> wrote:
> >> > Sample data:
> >> > colA,colB,sum
> >> > 33996344,33996351,8
> >> > 50331648,67276831,16945184
> >> > 50331648,68257567,17925920
> >> > 67276832,67276847,16
> >> > 67276848,67277023,176
> >> > 67277024,67277031,8

>
> >> > Wanted output:
> >> > 33996344,33996351,8
> >> > 50331648,68257567,17925920

>
> >> > Sum is done as follows (colB-colA+1).
> >> > Any idea how to get this outcome ?

>
> >> > This will give incorrect unique count, because it simply sums all up.
> >> > SELECT sum(colB-colA+1) FROM tbl

>
> >> If this is not what you want:
> >> SELECT sum(colB-colA+1) FROM tbl
> >> GROUP BY colA,colB

>
> >> ... you'll have to explain the logic further, as I can't really see it.
> >> What overlaps what, how, an how should that be handled?

>
> > To Rik Wasmus:
> > Its not what I want.
> > If you look at the sample when you should see 6 rows. Of these only
> > the 2 listed in "wanted output" is unique.

>
> > Here is why:
> > Row 1 is unique already, so no filtering needed.
> > Row 2 is not unique because row 3 contains the same + more and
> > therefor needs to be filtered out.
> > Row 3 is unique because it has the largest range with no overlapping.
> > Row 4-6 are all in the range which row 3 has and therefor needs to be
> > filtered out.

>
> So, if I understand correctly, you want the 'largest' range, and skip all
> records that fall in a bigger range?
>
> If that's the case, what should happen with:
> 1 4
> 2 6
> 1 3
> 8 12
> 9 10
>
> The problem here is row 2 falls partly in row 1, but I assume:
> 1 4
> 2 6
> 8 12
>
> If I assume these should count as different rows, I'd say this one:
> SELECT
> a.colA, a.colB, (a.colB-a.colA+1)
> FROM tablename a
> LEFT JOIN tablename b
> ON b.colA <= a.colA
> AND b.colA >= a.colA
> # having a pk would help
> # now we have to make sure it
> # doesn't match on it's own row:
> AND NOT (a.colA = b.colA AND a.colB = b.colB)
> WHERE b.colA IS NULL
> --
> Rik Wasmus


Your understanding is correct, as is your assumption.
I had not yet considered the way you outline of a range overlapping
another range. I now see dark future of doing this without looping...

Regarding your query on sample data, this is what I get:
33996344,33996351,8
67276832,67276847,16
67276848,67277023,176
67277024,67277031,8

The table does not have PK, since tbl is sorted upon creation, which
happens once a month.
As I see it only a stored procedure utilizing a temporary table (TT)
will work. The TT will only have 1 field which contains a single
number.
Inserts will happen once a check has confirmed that the number is not
already in TT. After looping, a simply count(*) on TT will give the
desired result.

Other suggestions are welcome.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 03-17-2008, 06:13 AM
Rik Wasmus
 
Posts: n/a
Default Re: Count unique values from overlapping values

On Tue, 11 Mar 2008 14:55:10 +0100, Kim <kimslot@gmail.com> wrote:

> On Mar 11, 2:11 pm, "Rik Wasmus" <luiheidsgoe...@hotmail.com> wrote:
>> On Tue, 11 Mar 2008 13:57:37 +0100, Kim <kims...@gmail.com> wrote:
>> > On Mar 11, 11:25 am, "Rik Wasmus" <luiheidsgoe...@hotmail.com> wrote:
>> >> On Tue, 11 Mar 2008 11:20:06 +0100, Kim <kims...@gmail.com> wrote:
>> >> > Sample data:
>> >> > colA,colB,sum
>> >> > 33996344,33996351,8
>> >> > 50331648,67276831,16945184
>> >> > 50331648,68257567,17925920
>> >> > 67276832,67276847,16
>> >> > 67276848,67277023,176
>> >> > 67277024,67277031,8

>>
>> >> > Wanted output:
>> >> > 33996344,33996351,8
>> >> > 50331648,68257567,17925920

>>
>> >> > Sum is done as follows (colB-colA+1).
>> >> > Any idea how to get this outcome ?

>>
>> >> > This will give incorrect unique count, because it simply sums all

>> up.
>> >> > SELECT sum(colB-colA+1) FROM tbl

>>
>> >> If this is not what you want:
>> >> SELECT sum(colB-colA+1) FROM tbl
>> >> GROUP BY colA,colB

>>
>> >> ... you'll have to explain the logic further, as I can't really see

>> it.
>> >> What overlaps what, how, an how should that be handled?

>>
>> > To Rik Wasmus:
>> > Its not what I want.
>> > If you look at the sample when you should see 6 rows. Of these only
>> > the 2 listed in "wanted output" is unique.

>>
>> > Here is why:
>> > Row 1 is unique already, so no filtering needed.
>> > Row 2 is not unique because row 3 contains the same + more and
>> > therefor needs to be filtered out.
>> > Row 3 is unique because it has the largest range with no overlapping.
>> > Row 4-6 are all in the range which row 3 has and therefor needs to be
>> > filtered out.

>>
>> So, if I understand correctly, you want the 'largest' range, and skip
>> all
>> records that fall in a bigger range?
>>
>> If that's the case, what should happen with:
>> 1 4
>> 2 6
>> 1 3
>> 8 12
>> 9 10
>>
>> The problem here is row 2 falls partly in row 1, but I assume:
>> 1 4
>> 2 6
>> 8 12
>>
>> If I assume these should count as different rows, I'd say this one:
>> SELECT
>> a.colA, a.colB, (a.colB-a.colA+1)
>> FROM tablename a
>> LEFT JOIN tablename b
>> ON b.colA <= a.colA
>> AND b.colA >= a.colA
>> # having a pk would help
>> # now we have to make sure it
>> # doesn't match on it's own row:
>> AND NOT (a.colA = b.colA AND a.colB = b.colB)
>> WHERE b.colA IS NULL

>
> Your understanding is correct, as is your assumption.
>
> I had not yet considered the way you outline of a range overlapping
> another range. I now see dark future of doing this without looping...
>
> Regarding your query on sample data, this is what I get:
> 33996344,33996351,8
> 67276832,67276847,16
> 67276848,67277023,176
> 67277024,67277031,8


Hmmz, just an error in typing it up, the second JOIN clause should
offcourse be on colB:
mysql> SELECT
-> a.colA, a.colB, (a.colB-a.colA+1)
-> FROM tester a
-> LEFT JOIN tester b
-> ON b.colA <= a.colA
-> AND b.colB >= a.colB
-> # having a pk would help
-> # now we have to make sure it
-> # doesn't match on it's own row:
-> AND NOT (a.colA = b.colA AND a.colB = b.colB)
-> WHERE b.colA IS NULL;
+----------+----------+-------------------+
| colA | colB | (a.colB-a.colA+1) |
+----------+----------+-------------------+
| 33996344 | 33996351 | 8 |
| 50331648 | 68257567 | 17925920 |
+----------+----------+-------------------+
2 rows in set (0.00 sec)
--
Rik Wasmus
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 03-17-2008, 06:13 AM
Kim
 
Posts: n/a
Default Re: Count unique values from overlapping values

On Mar 11, 3:17 pm, "Rik Wasmus" <luiheidsgoe...@hotmail.com> wrote:
> On Tue, 11 Mar 2008 14:55:10 +0100, Kim <kims...@gmail.com> wrote:
> > On Mar 11, 2:11 pm, "Rik Wasmus" <luiheidsgoe...@hotmail.com> wrote:
> >> On Tue, 11 Mar 2008 13:57:37 +0100, Kim <kims...@gmail.com> wrote:
> >> > On Mar 11, 11:25 am, "Rik Wasmus" <luiheidsgoe...@hotmail.com> wrote:
> >> >> On Tue, 11 Mar 2008 11:20:06 +0100, Kim <kims...@gmail.com> wrote:
> >> >> > Sample data:
> >> >> > colA,colB,sum
> >> >> > 33996344,33996351,8
> >> >> > 50331648,67276831,16945184
> >> >> > 50331648,68257567,17925920
> >> >> > 67276832,67276847,16
> >> >> > 67276848,67277023,176
> >> >> > 67277024,67277031,8

>
> >> >> > Wanted output:
> >> >> > 33996344,33996351,8
> >> >> > 50331648,68257567,17925920

>
> >> >> > Sum is done as follows (colB-colA+1).
> >> >> > Any idea how to get this outcome ?

>
> >> >> > This will give incorrect unique count, because it simply sums all
> >> up.
> >> >> > SELECT sum(colB-colA+1) FROM tbl

>
> >> >> If this is not what you want:
> >> >> SELECT sum(colB-colA+1) FROM tbl
> >> >> GROUP BY colA,colB

>
> >> >> ... you'll have to explain the logic further, as I can't really see
> >> it.
> >> >> What overlaps what, how, an how should that be handled?

>
> >> > To Rik Wasmus:
> >> > Its not what I want.
> >> > If you look at the sample when you should see 6 rows. Of these only
> >> > the 2 listed in "wanted output" is unique.

>
> >> > Here is why:
> >> > Row 1 is unique already, so no filtering needed.
> >> > Row 2 is not unique because row 3 contains the same + more and
> >> > therefor needs to be filtered out.
> >> > Row 3 is unique because it has the largest range with no overlapping.
> >> > Row 4-6 are all in the range which row 3 has and therefor needs to be
> >> > filtered out.

>
> >> So, if I understand correctly, you want the 'largest' range, and skip
> >> all
> >> records that fall in a bigger range?

>
> >> If that's the case, what should happen with:
> >> 1 4
> >> 2 6
> >> 1 3
> >> 8 12
> >> 9 10

>
> >> The problem here is row 2 falls partly in row 1, but I assume:
> >> 1 4
> >> 2 6
> >> 8 12

>
> >> If I assume these should count as different rows, I'd say this one:
> >> SELECT
> >> a.colA, a.colB, (a.colB-a.colA+1)
> >> FROM tablename a
> >> LEFT JOIN tablename b
> >> ON b.colA <= a.colA
> >> AND b.colA >= a.colA
> >> # having a pk would help
> >> # now we have to make sure it
> >> # doesn't match on it's own row:
> >> AND NOT (a.colA = b.colA AND a.colB = b.colB)
> >> WHERE b.colA IS NULL

>
> > Your understanding is correct, as is your assumption.

>
> > I had not yet considered the way you outline of a range overlapping
> > another range. I now see dark future of doing this without looping...

>
> > Regarding your query on sample data, this is what I get:
> > 33996344,33996351,8
> > 67276832,67276847,16
> > 67276848,67277023,176
> > 67277024,67277031,8

>
> Hmmz, just an error in typing it up, the second JOIN clause should
> offcourse be on colB:
> mysql> SELECT
> -> a.colA, a.colB, (a.colB-a.colA+1)
> -> FROM tester a
> -> LEFT JOIN tester b
> -> ON b.colA <= a.colA
> -> AND b.colB >= a.colB
> -> # having a pk would help
> -> # now we have to make sure it
> -> # doesn't match on it's own row:
> -> AND NOT (a.colA = b.colA AND a.colB = b.colB)
> -> WHERE b.colA IS NULL;
> +----------+----------+-------------------+
> | colA | colB | (a.colB-a.colA+1) |
> +----------+----------+-------------------+
> | 33996344 | 33996351 | 8 |
> | 50331648 | 68257567 | 17925920 |
> +----------+----------+-------------------+
> 2 rows in set (0.00 sec)
> --
> Rik Wasmus


Yours actually work faster than Luuk's.
However the time it takes to sum up unique values on 10.000 rows takes
94sec, which isnt fast in any way. I dare not think of how much time
it will take for millions of rows.
SELECT
sum(a.colB-a.colA+1)
FROM tbl a
LEFT JOIN tbl b
ON b.colA <= a.colA
AND b.colB >= a.colB
AND NOT (a.colA = b.colA AND a.colB = b.colB)
WHERE b.colA IS NULL

Why is the where clause there ? If its avoid NULL values, then its not
a problem since there are none in tbl.

Therefor am I trying the SP way I described in my last post.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 03-17-2008, 06:13 AM
Rik Wasmus
 
Posts: n/a
Default Re: Count unique values from overlapping values

On Wed, 12 Mar 2008 10:57:54 +0100, Kim <kimslot@gmail.com> wrote:

> On Mar 11, 3:17 pm, "Rik Wasmus" <luiheidsgoe...@hotmail.com> wrote:
>> On Tue, 11 Mar 2008 14:55:10 +0100, Kim <kims...@gmail.com> wrote:
>> > On Mar 11, 2:11 pm, "Rik Wasmus" <luiheidsgoe...@hotmail.com> wrote:
>> >> On Tue, 11 Mar 2008 13:57:37 +0100, Kim <kims...@gmail.com> wrote:
>> >> > On Mar 11, 11:25 am, "Rik Wasmus" <luiheidsgoe...@hotmail.com>

>> wrote:
>> >> >> On Tue, 11 Mar 2008 11:20:06 +0100, Kim <kims...@gmail.com> wrote:
>> >> >> > Sample data:
>> >> >> > colA,colB,sum
>> >> >> > 33996344,33996351,8
>> >> >> > 50331648,67276831,16945184
>> >> >> > 50331648,68257567,17925920
>> >> >> > 67276832,67276847,16
>> >> >> > 67276848,67277023,176
>> >> >> > 67277024,67277031,8

>>
>> >> >> > Wanted output:
>> >> >> > 33996344,33996351,8
>> >> >> > 50331648,68257567,17925920

>>
>> >> >> > Sum is done as follows (colB-colA+1).
>> >> >> > Any idea how to get this outcome ?

>>
>> >> >> > This will give incorrect unique count, because it simply sums

>> all
>> >> up.
>> >> >> > SELECT sum(colB-colA+1) FROM tbl

>>
>> >> >> If this is not what you want:
>> >> >> SELECT sum(colB-colA+1) FROM tbl
>> >> >> GROUP BY colA,colB

>>
>> >> >> ... you'll have to explain the logic further, as I can't really

>> see
>> >> it.
>> >> >> What overlaps what, how, an how should that be handled?

>>
>> >> > To Rik Wasmus:
>> >> > Its not what I want.
>> >> > If you look at the sample when you should see 6 rows. Of these only
>> >> > the 2 listed in "wanted output" is unique.

>>
>> >> > Here is why:
>> >> > Row 1 is unique already, so no filtering needed.
>> >> > Row 2 is not unique because row 3 contains the same + more and
>> >> > therefor needs to be filtered out.
>> >> > Row 3 is unique because it has the largest range with no

>> overlapping.
>> >> > Row 4-6 are all in the range which row 3 has and therefor needs to

>> be
>> >> > filtered out.

>>
>> >> So, if I understand correctly, you want the 'largest' range, and skip
>> >> all
>> >> records that fall in a bigger range?

>>
>> >> If that's the case, what should happen with:
>> >> 1 4
>> >> 2 6
>> >> 1 3
>> >> 8 12
>> >> 9 10

>>
>> >> The problem here is row 2 falls partly in row 1, but I assume:
>> >> 1 4
>> >> 2 6
>> >> 8 12

>>
>> >> If I assume these should count as different rows, I'd say this one:
>> >> SELECT
>> >> a.colA, a.colB, (a.colB-a.colA+1)
>> >> FROM tablename a
>> >> LEFT JOIN tablename b
>> >> ON b.colA <= a.colA
>> >> AND b.colA >= a.colA
>> >> # having a pk would help
>> >> # now we have to make sure it
>> >> # doesn't match on it's own row:
>> >> AND NOT (a.colA = b.colA AND a.colB = b.colB)
>> >> WHERE b.colA IS NULL

>>
>> > Your understanding is correct, as is your assumption.

>>
>> > I had not yet considered the way you outline of a range overlapping
>> > another range. I now see dark future of doing this without looping....

>>
>> > Regarding your query on sample data, this is what I get:
>> > 33996344,33996351,8
>> > 67276832,67276847,16
>> > 67276848,67277023,176
>> > 67277024,67277031,8

>>
>> Hmmz, just an error in typing it up, the second JOIN clause should
>> offcourse be on colB:
>> mysql> SELECT
>> -> a.colA, a.colB, (a.colB-a.colA+1)
>> -> FROM tester a
>> -> LEFT JOIN tester b
>> -> ON b.colA <= a.colA
>> -> AND b.colB >= a.colB
>> -> # having a pk would help
>> -> # now we have to make sure it
>> -> # doesn't match on it's own row:
>> -> AND NOT (a.colA = b.colA AND a.colB = b.colB)
>> -> WHERE b.colA IS NULL;
>> +----------+----------+-------------------+
>> | colA | colB | (a.colB-a.colA+1) |
>> +----------+----------+-------------------+
>> | 33996344 | 33996351 | 8 |
>> | 50331648 | 68257567 | 17925920 |
>> +----------+----------+-------------------+
>> 2 rows in set (0.00 sec)

>
> Yours actually work faster than Luuk's.
> However the time it takes to sum up unique values on 10.000 rows takes
> 94sec, which isnt fast in any way. I dare not think of how much time
> it will take for millions of rows.


Could you run it with an EXPLAIN before the SELECT and give us the output?

> SELECT
> sum(a.colB-a.colA+1)
> FROM tbl a
> LEFT JOIN tbl b
> ON b.colA <= a.colA
> AND b.colB >= a.colB
> AND NOT (a.colA = b.colA AND a.colB = b.colB)
> WHERE b.colA IS NULL
>
> Why is the where clause there ? If its avoid NULL values, then its not
> a problem since there are none in tbl.


The WHERE clause is exactly there and works because colA is never NULL
Translated into human readable language this query is something like:

Give the result of sum(a.colB-a.colA+1) from tbl, where there are no rows
(colA IS NULL) in tbl where colA-colB falls into the range of this row
(the join clause).

> Therefor am I trying the SP way I described in my last post.


One query with the right indexes (I sure hope you have some...) is often
way faster then an SP.
--
Rik Wasmus
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 07:19 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
UnixAdminTalk.com

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282