This is a discussion on Re: Syntax bug? Group by? within the pgsql Hackers forums, part of the PostgreSQL category; --> * Mark Woodward (pgsql@mohawksoft.com) wrote: > If I am asking for a specific column value, should I, technically > ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| * Mark Woodward (pgsql@mohawksoft.com) wrote: > If I am asking for a specific column value, should I, technically > speaking, need to group by that column? Technically speaking, if you're asking for a specific tuple, should you be allowed to request an aggregation? Thanks, Stephen -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) iD8DBQFFNP/nrzgMPqB3kigRAmIRAJ90a2kZc0NtBPyQP3A30XIT3ehAQACgk NBa 3Xnb2j1CYRfD9Gb+s/91gJg= =kIHr -----END PGP SIGNATURE----- |
| |||
| On Tue, 17 Oct 2006 12:08:07 -0400 Stephen Frost <sfrost@snowman.net> wrote: > * Mark Woodward (pgsql@mohawksoft.com) wrote: > > If I am asking for a specific column value, should I, technically > > speaking, need to group by that column? > > Technically speaking, if you're asking for a specific tuple, should you > be allowed to request an aggregation? One column value doesn't necessarily mean one tuple unless it has a unique index on that column. SELECT COUNT(*) FROM table WHERE field = 'value'; That's perfectly reasonable. You don't need the GROUP BY clause. However, this doesn't sound like a hackers question. Next time, please ask on another list such as pgsql-sql or even pgsql-novice. You can review the mailing lists and their purpose at http://www.postgresql.org/community/lists/ -- D'Arcy J.M. Cain <darcy@druid.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner. ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Stephen Frost wrote: > * Mark Woodward (pgsql@mohawksoft.com) wrote: >> If I am asking for a specific column value, should I, technically >> speaking, need to group by that column? > > Technically speaking, if you're asking for a specific tuple, should you > be allowed to request an aggregation? > Only with the assumption that the value in the where clause is for a unique column. If you want min(col2) and avg(col2) where col1=x you can get it without a group by, the same as if you put col1<x - if you want an aggregate of all records returned not the aggregate based on each value of col1. > select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15; But back to the query the issue comes in that the ycis_id value is included with the return values requested (a single row value with aggregate values that isn't grouped) - if ycis_id is not unique you will get x number of returned tuples with ycis_id=15 and the same min() and avg() values for each row. Removing the ycis_id after the select will return the aggregate values you want without the group by. -- Shane Ambler Postgres@007Marketing.com Get Sheeky @ http://Sheeky.Biz ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| > Stephen Frost wrote: > >> select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15; > > But back to the query the issue comes in that the ycis_id value is > included with the return values requested (a single row value with > aggregate values that isn't grouped) - if ycis_id is not unique you will > get x number of returned tuples with ycis_id=15 and the same min() and > avg() values for each row. > Removing the ycis_id after the select will return the aggregate values > you want without the group by. I still assert that there will always only be one row to this query. This is an aggregate query, so all the rows with ycis_id = 15, will be aggregated. Since ycis_id is the identifying part of the query, it should not need to be grouped. My question, is it a syntactic technicality that PostgreSQL asks for a "group by," or a bug in the parser? ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Mark Woodward wrote: >>> select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15; >>> > > I still assert that there will always only be one row to this query. This > is an aggregate query, so all the rows with ycis_id = 15, will be > aggregated. Since ycis_id is the identifying part of the query, it should > not need to be grouped. > > My question, is it a syntactic technicality that PostgreSQL asks for a > "group by," or a bug in the parser? > I think your point is that every non-aggregate column in the results of the query also appears in the where clause and is given a single value there, so conceivably, an all-knowing, all-powerful postgres could recognize this and do the implied GROUP by on these columns. I'm not in a position to give a definitive answer on this, but I suspect that adjusting the query parser/planner to allow an implied GROUP BY either gets prohibitively complicated, or fits too much of a special case to be worth implementing. select ycis_id, some_other_id, min(tindex), avg(tindex) from y where ycis_id = 15 group by some_other_id; Here, postgres would have to use the group by you specified, and also recognize the single-valued constant assigned to ycis_id. Maybe not too bad, but: select ycis_id, some_other_id, min(tindex), avg(tindex) from y where ycis_id = some_single_valued_constant(foo, bar) group by some_other_id; In this case, postgres doesn't know whether some_single_valued_constant() will really return the same single value for every tuple. Ultimately, as more complex queries are introduced, it would become a lot simpler for the query writer to just specify the group by columns instead of trying to guess it from the where clause. Final note: I could also see situations where an implied group by would silently allow a poorly written query to execute, instead of throwing an error that suggests to the query writer that they did something wrong. -- Nolan Cafferky Software Developer IT Department RBS Interactive nolan.cafferky@rbsinteractive.com ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| Mark Woodward wrote: >> Stephen Frost wrote: >> >> >>> select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15; >>> >> But back to the query the issue comes in that the ycis_id value is >> included with the return values requested (a single row value with >> aggregate values that isn't grouped) - if ycis_id is not unique you will >> get x number of returned tuples with ycis_id=15 and the same min() and >> avg() values for each row. >> Removing the ycis_id after the select will return the aggregate values >> you want without the group by. >> > > I still assert that there will always only be one row to this query. This > is an aggregate query, so all the rows with ycis_id = 15, will be > aggregated. Since ycis_id is the identifying part of the query, it should > not need to be grouped. > > My question, is it a syntactic technicality that PostgreSQL asks for a > "group by," or a bug in the parser? > > AFAIK what you want is not per sql spec. What if you had instead written select ycis_id, min(tindex), avg(tindex) from y where frobnitz(ycis_id) = 15; ? I think you are expecting too much reasoning from the engine. cheers andrew ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| Hi, Mark, Mark Woodward wrote: >> Stephen Frost wrote: >> >>> select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15; >> But back to the query the issue comes in that the ycis_id value is >> included with the return values requested (a single row value with >> aggregate values that isn't grouped) - if ycis_id is not unique you will >> get x number of returned tuples with ycis_id=15 and the same min() and >> avg() values for each row. >> Removing the ycis_id after the select will return the aggregate values >> you want without the group by. > > I still assert that there will always only be one row to this query. This > is an aggregate query, so all the rows with ycis_id = 15, will be > aggregated. Since ycis_id is the identifying part of the query, it should > not need to be grouped. > > My question, is it a syntactic technicality that PostgreSQL asks for a > "group by," or a bug in the parser? I think that it's a lack of special-casing the = operator. Imagine "where ycis_id>15" or "where ycis_id @|< $RECTANGLE" or other (probably user defined) operators on (probably user defined) datatypes. The parser has no real knowledge what the operators do, it simply requests one that returns a bool. One could make the parser to special case the = operator, and maybe some others, however I doubt it's worth the effort. HTH, Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFNRo2yHQIGEs7eeARAw91AJ9vxylO4ATMp1phsANQIA 8J5n/GvACgnwI3 pQ/kAlCQYE1PbvvEaOmYTvs= =3qk4 -----END PGP SIGNATURE----- |
| |||
| > Hi, Mark, > > Mark Woodward wrote: >>> Stephen Frost wrote: >>> >>>> select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15; >>> But back to the query the issue comes in that the ycis_id value is >>> included with the return values requested (a single row value with >>> aggregate values that isn't grouped) - if ycis_id is not unique you >>> will >>> get x number of returned tuples with ycis_id=15 and the same min() and >>> avg() values for each row. >>> Removing the ycis_id after the select will return the aggregate values >>> you want without the group by. >> >> I still assert that there will always only be one row to this query. >> This >> is an aggregate query, so all the rows with ycis_id = 15, will be >> aggregated. Since ycis_id is the identifying part of the query, it >> should >> not need to be grouped. >> >> My question, is it a syntactic technicality that PostgreSQL asks for a >> "group by," or a bug in the parser? > > I think that it's a lack of special-casing the = operator. Imagine > "where ycis_id>15" or "where ycis_id @|< $RECTANGLE" or other (probably > user defined) operators on (probably user defined) datatypes. > > The parser has no real knowledge what the operators do, it simply > requests one that returns a bool. > > One could make the parser to special case the = operator, and maybe some > others, however I doubt it's worth the effort. I understand the SQL, and this isn't a "sql" question else it would be on a different list, it is a PostgreSQL internals question and IMHO potential bug. The original query: select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15; Should NOT require a "group by" to get ycis_id in the results. ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| > Mark Woodward wrote: >>> Stephen Frost wrote: >>> >>> >>>> select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15; >>>> >>> But back to the query the issue comes in that the ycis_id value is >>> included with the return values requested (a single row value with >>> aggregate values that isn't grouped) - if ycis_id is not unique you >>> will >>> get x number of returned tuples with ycis_id=15 and the same min() and >>> avg() values for each row. >>> Removing the ycis_id after the select will return the aggregate values >>> you want without the group by. >>> >> >> I still assert that there will always only be one row to this query. >> This >> is an aggregate query, so all the rows with ycis_id = 15, will be >> aggregated. Since ycis_id is the identifying part of the query, it >> should >> not need to be grouped. >> >> My question, is it a syntactic technicality that PostgreSQL asks for a >> "group by," or a bug in the parser? >> >> > > AFAIK what you want is not per sql spec. What if you had instead written > > > select ycis_id, min(tindex), avg(tindex) from y where frobnitz(ycis_id) > = 15; > > > ? I think you are expecting too much reasoning from the engine. > Regardless, I can get the results I need and have already worked around this. The reason why I posted the question to hackers was that I think it is a bug. The output column "ycis_id" is unabiguously a single value with regards to the query. Shouldn't PostgreSQL "know" this? AFAIR, I think I've used this exact type of query before either on PostgreSQL or another system, maybe Oracle, and it did work. ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| ||||
| > Mark Woodward wrote: >>>> select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15; >>>> >> >> I still assert that there will always only be one row to this query. >> This >> is an aggregate query, so all the rows with ycis_id = 15, will be >> aggregated. Since ycis_id is the identifying part of the query, it >> should >> not need to be grouped. >> >> My question, is it a syntactic technicality that PostgreSQL asks for a >> "group by," or a bug in the parser? >> > I think your point is that every non-aggregate column in the results of > the query also appears in the where clause and is given a single value > there, so conceivably, an all-knowing, all-powerful postgres could > recognize this and do the implied GROUP by on these columns. Not exactly. > > I'm not in a position to give a definitive answer on this, but I suspect > that adjusting the query parser/planner to allow an implied GROUP BY > either gets prohibitively complicated, or fits too much of a special > case to be worth implementing. > > select > ycis_id, > some_other_id, > min(tindex), > avg(tindex) > from > y > where > ycis_id = 15 > group by > some_other_id; This is not, in fact, like the example I gave and confuses the point I am trying to make. The original query: select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15; ycis_id is unambiguous and MUST be only one value, there should be no requirement of grouping. In fact, a "group by" implies multiple result rows in an aggregate query. As I said in other branches of this thread, this isn't a SQL question, it is a question of whether or not the PostgreSQL parser is correct or not, and I do not believe that it is working correctly. ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| Thread Tools | |
| Display Modes | |
|
|