Unix Technical Forum

Re: Syntax bug? Group by?

This is a discussion on Re: Syntax bug? Group by? within the pgsql Hackers forums, part of the PostgreSQL category; --> * Mark Woodward (pgsql@mohawksoft.com) wrote: > If I am asking for a specific column value, should I, technically > ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-12-2008, 05:20 AM
Stephen Frost
 
Posts: n/a
Default Re: Syntax bug? Group by?

* Mark Woodward (pgsql@mohawksoft.com) wrote:
> If I am asking for a specific column value, should I, technically
> speaking, need to group by that column?


Technically speaking, if you're asking for a specific tuple, should you
be allowed to request an aggregation?

Thanks,

Stephen

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)

iD8DBQFFNP/nrzgMPqB3kigRAmIRAJ90a2kZc0NtBPyQP3A30XIT3ehAQACgk NBa
3Xnb2j1CYRfD9Gb+s/91gJg=
=kIHr
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-12-2008, 05:20 AM
D'Arcy J.M. Cain
 
Posts: n/a
Default Re: Syntax bug? Group by?

On Tue, 17 Oct 2006 12:08:07 -0400
Stephen Frost <sfrost@snowman.net> wrote:
> * Mark Woodward (pgsql@mohawksoft.com) wrote:
> > If I am asking for a specific column value, should I, technically
> > speaking, need to group by that column?

>
> Technically speaking, if you're asking for a specific tuple, should you
> be allowed to request an aggregation?


One column value doesn't necessarily mean one tuple unless it has a
unique index on that column.

SELECT COUNT(*) FROM table WHERE field = 'value';

That's perfectly reasonable. You don't need the GROUP BY clause.

However, this doesn't sound like a hackers question. Next time, please
ask on another list such as pgsql-sql or even pgsql-novice. You can
review the mailing lists and their purpose at
http://www.postgresql.org/community/lists/

--
D'Arcy J.M. Cain <darcy@druid.net> | Democracy is three wolves
http://www.druid.net/darcy/ | and a sheep voting on
+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-12-2008, 05:21 AM
Shane Ambler
 
Posts: n/a
Default Re: Syntax bug? Group by?

Stephen Frost wrote:
> * Mark Woodward (pgsql@mohawksoft.com) wrote:
>> If I am asking for a specific column value, should I, technically
>> speaking, need to group by that column?

>
> Technically speaking, if you're asking for a specific tuple, should you
> be allowed to request an aggregation?
>

Only with the assumption that the value in the where clause is for a
unique column.

If you want min(col2) and avg(col2) where col1=x you can get it without
a group by, the same as if you put col1<x - if you want an aggregate of
all records returned not the aggregate based on each value of col1.


> select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;


But back to the query the issue comes in that the ycis_id value is
included with the return values requested (a single row value with
aggregate values that isn't grouped) - if ycis_id is not unique you will
get x number of returned tuples with ycis_id=15 and the same min() and
avg() values for each row.
Removing the ycis_id after the select will return the aggregate values
you want without the group by.


--

Shane Ambler
Postgres@007Marketing.com

Get Sheeky @ http://Sheeky.Biz

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-12-2008, 05:21 AM
Mark Woodward
 
Posts: n/a
Default Re: Syntax bug? Group by?

> Stephen Frost wrote:
>
>> select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;

>
> But back to the query the issue comes in that the ycis_id value is
> included with the return values requested (a single row value with
> aggregate values that isn't grouped) - if ycis_id is not unique you will
> get x number of returned tuples with ycis_id=15 and the same min() and
> avg() values for each row.
> Removing the ycis_id after the select will return the aggregate values
> you want without the group by.


I still assert that there will always only be one row to this query. This
is an aggregate query, so all the rows with ycis_id = 15, will be
aggregated. Since ycis_id is the identifying part of the query, it should
not need to be grouped.

My question, is it a syntactic technicality that PostgreSQL asks for a
"group by," or a bug in the parser?

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-12-2008, 05:21 AM
Nolan Cafferky
 
Posts: n/a
Default Re: Syntax bug? Group by?

Mark Woodward wrote:
>>> select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;
>>>

>
> I still assert that there will always only be one row to this query. This
> is an aggregate query, so all the rows with ycis_id = 15, will be
> aggregated. Since ycis_id is the identifying part of the query, it should
> not need to be grouped.
>
> My question, is it a syntactic technicality that PostgreSQL asks for a
> "group by," or a bug in the parser?
>

I think your point is that every non-aggregate column in the results of
the query also appears in the where clause and is given a single value
there, so conceivably, an all-knowing, all-powerful postgres could
recognize this and do the implied GROUP by on these columns.

I'm not in a position to give a definitive answer on this, but I suspect
that adjusting the query parser/planner to allow an implied GROUP BY
either gets prohibitively complicated, or fits too much of a special
case to be worth implementing.

select
ycis_id,
some_other_id,
min(tindex),
avg(tindex)
from
y
where
ycis_id = 15
group by
some_other_id;


Here, postgres would have to use the group by you specified, and also
recognize the single-valued constant assigned to ycis_id. Maybe not too
bad, but:

select
ycis_id,
some_other_id,
min(tindex),
avg(tindex)
from
y
where
ycis_id = some_single_valued_constant(foo, bar)
group by
some_other_id;

In this case, postgres doesn't know whether
some_single_valued_constant() will really return the same single value
for every tuple. Ultimately, as more complex queries are introduced, it
would become a lot simpler for the query writer to just specify the
group by columns instead of trying to guess it from the where clause.

Final note: I could also see situations where an implied group by would
silently allow a poorly written query to execute, instead of throwing an
error that suggests to the query writer that they did something wrong.

--
Nolan Cafferky
Software Developer
IT Department
RBS Interactive
nolan.cafferky@rbsinteractive.com


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-12-2008, 05:21 AM
Andrew Dunstan
 
Posts: n/a
Default Re: Syntax bug? Group by?

Mark Woodward wrote:
>> Stephen Frost wrote:
>>
>>
>>> select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;
>>>

>> But back to the query the issue comes in that the ycis_id value is
>> included with the return values requested (a single row value with
>> aggregate values that isn't grouped) - if ycis_id is not unique you will
>> get x number of returned tuples with ycis_id=15 and the same min() and
>> avg() values for each row.
>> Removing the ycis_id after the select will return the aggregate values
>> you want without the group by.
>>

>
> I still assert that there will always only be one row to this query. This
> is an aggregate query, so all the rows with ycis_id = 15, will be
> aggregated. Since ycis_id is the identifying part of the query, it should
> not need to be grouped.
>
> My question, is it a syntactic technicality that PostgreSQL asks for a
> "group by," or a bug in the parser?
>
>


AFAIK what you want is not per sql spec. What if you had instead written


select ycis_id, min(tindex), avg(tindex) from y where frobnitz(ycis_id) = 15;


? I think you are expecting too much reasoning from the engine.

cheers

andrew


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-12-2008, 05:21 AM
Markus Schaber
 
Posts: n/a
Default Re: Syntax bug? Group by?

Hi, Mark,

Mark Woodward wrote:
>> Stephen Frost wrote:
>>
>>> select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;

>> But back to the query the issue comes in that the ycis_id value is
>> included with the return values requested (a single row value with
>> aggregate values that isn't grouped) - if ycis_id is not unique you will
>> get x number of returned tuples with ycis_id=15 and the same min() and
>> avg() values for each row.
>> Removing the ycis_id after the select will return the aggregate values
>> you want without the group by.

>
> I still assert that there will always only be one row to this query. This
> is an aggregate query, so all the rows with ycis_id = 15, will be
> aggregated. Since ycis_id is the identifying part of the query, it should
> not need to be grouped.
>
> My question, is it a syntactic technicality that PostgreSQL asks for a
> "group by," or a bug in the parser?


I think that it's a lack of special-casing the = operator. Imagine
"where ycis_id>15" or "where ycis_id @|< $RECTANGLE" or other (probably
user defined) operators on (probably user defined) datatypes.

The parser has no real knowledge what the operators do, it simply
requests one that returns a bool.

One could make the parser to special case the = operator, and maybe some
others, however I doubt it's worth the effort.


HTH,
Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFNRo2yHQIGEs7eeARAw91AJ9vxylO4ATMp1phsANQIA 8J5n/GvACgnwI3
pQ/kAlCQYE1PbvvEaOmYTvs=
=3qk4
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-12-2008, 05:21 AM
Mark Woodward
 
Posts: n/a
Default Re: Syntax bug? Group by?

> Hi, Mark,
>
> Mark Woodward wrote:
>>> Stephen Frost wrote:
>>>
>>>> select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;
>>> But back to the query the issue comes in that the ycis_id value is
>>> included with the return values requested (a single row value with
>>> aggregate values that isn't grouped) - if ycis_id is not unique you
>>> will
>>> get x number of returned tuples with ycis_id=15 and the same min() and
>>> avg() values for each row.
>>> Removing the ycis_id after the select will return the aggregate values
>>> you want without the group by.

>>
>> I still assert that there will always only be one row to this query.
>> This
>> is an aggregate query, so all the rows with ycis_id = 15, will be
>> aggregated. Since ycis_id is the identifying part of the query, it
>> should
>> not need to be grouped.
>>
>> My question, is it a syntactic technicality that PostgreSQL asks for a
>> "group by," or a bug in the parser?

>
> I think that it's a lack of special-casing the = operator. Imagine
> "where ycis_id>15" or "where ycis_id @|< $RECTANGLE" or other (probably
> user defined) operators on (probably user defined) datatypes.
>
> The parser has no real knowledge what the operators do, it simply
> requests one that returns a bool.
>
> One could make the parser to special case the = operator, and maybe some
> others, however I doubt it's worth the effort.


I understand the SQL, and this isn't a "sql" question else it would be on
a different list, it is a PostgreSQL internals question and IMHO potential
bug.

The original query:
select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;

Should NOT require a "group by" to get ycis_id in the results.





---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-12-2008, 05:21 AM
Mark Woodward
 
Posts: n/a
Default Re: Syntax bug? Group by?

> Mark Woodward wrote:
>>> Stephen Frost wrote:
>>>
>>>
>>>> select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;
>>>>
>>> But back to the query the issue comes in that the ycis_id value is
>>> included with the return values requested (a single row value with
>>> aggregate values that isn't grouped) - if ycis_id is not unique you
>>> will
>>> get x number of returned tuples with ycis_id=15 and the same min() and
>>> avg() values for each row.
>>> Removing the ycis_id after the select will return the aggregate values
>>> you want without the group by.
>>>

>>
>> I still assert that there will always only be one row to this query.
>> This
>> is an aggregate query, so all the rows with ycis_id = 15, will be
>> aggregated. Since ycis_id is the identifying part of the query, it
>> should
>> not need to be grouped.
>>
>> My question, is it a syntactic technicality that PostgreSQL asks for a
>> "group by," or a bug in the parser?
>>
>>

>
> AFAIK what you want is not per sql spec. What if you had instead written
>
>
> select ycis_id, min(tindex), avg(tindex) from y where frobnitz(ycis_id)
> = 15;
>
>
> ? I think you are expecting too much reasoning from the engine.
>

Regardless, I can get the results I need and have already worked around
this. The reason why I posted the question to hackers was that I think it
is a bug.

The output column "ycis_id" is unabiguously a single value with regards to
the query. Shouldn't PostgreSQL "know" this? AFAIR, I think I've used this
exact type of query before either on PostgreSQL or another system, maybe
Oracle, and it did work.

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-12-2008, 05:21 AM
Mark Woodward
 
Posts: n/a
Default Re: Syntax bug? Group by?

> Mark Woodward wrote:
>>>> select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;
>>>>

>>
>> I still assert that there will always only be one row to this query.
>> This
>> is an aggregate query, so all the rows with ycis_id = 15, will be
>> aggregated. Since ycis_id is the identifying part of the query, it
>> should
>> not need to be grouped.
>>
>> My question, is it a syntactic technicality that PostgreSQL asks for a
>> "group by," or a bug in the parser?
>>

> I think your point is that every non-aggregate column in the results of
> the query also appears in the where clause and is given a single value
> there, so conceivably, an all-knowing, all-powerful postgres could
> recognize this and do the implied GROUP by on these columns.


Not exactly.
>
> I'm not in a position to give a definitive answer on this, but I suspect
> that adjusting the query parser/planner to allow an implied GROUP BY
> either gets prohibitively complicated, or fits too much of a special
> case to be worth implementing.
>
> select
> ycis_id,
> some_other_id,
> min(tindex),
> avg(tindex)
> from
> y
> where
> ycis_id = 15
> group by
> some_other_id;


This is not, in fact, like the example I gave and confuses the point I am
trying to make.


The original query:
select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;

ycis_id is unambiguous and MUST be only one value, there should be no
requirement of grouping. In fact, a "group by" implies multiple result
rows in an aggregate query.

As I said in other branches of this thread, this isn't a SQL question, it
is a question of whether or not the PostgreSQL parser is correct or not,
and I do not believe that it is working correctly.


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 02:47 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com