vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I've looked into cutting back on the implicit casts to text, which exposed the following little gem. The expressions 'abc' || 34 34 || 'abc' would no longer work, with the following error message: ERROR: 22P02: array value must start with "{" or dimension information That's because the best matches are now respectively anyarray || anyelement anyelement || anyarray Now either this is just too bad and users of a system with reduced casts to text will have to live with this odd error message, or coercing any old unknown constant to anyarray isn't such a good idea. Comments? -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| Peter Eisentraut <peter_e@gmx.net> writes: > I've looked into cutting back on the implicit casts to text, which > exposed the following little gem. > The expressions > 'abc' || 34 > 34 || 'abc' > would no longer work, with the following error message: > ERROR: 22P02: array value must start with "{" or dimension information Hm, that's annoying. Not that the expressions fail --- we want them to --- but that the error message is so unhelpful. Since ANYARRAY is already special to the type system, I don't have a problem with inserting some special case to prevent this, but I'm not sure what the special case should be. Is it too klugy to say "don't implicitly cast unknown to anyarray unless the literal's value starts with { or ["? We've never made casting decisions depend on the contents of strings before, and I'm really loath to make 'em do so now. Seems basically we'd want to not cast unknown to anyarray unless there is some additional bit of context suggesting that that's the right thing. But what should that extra requirement be? Can we go as far as not doing this cast implicitly at all? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Am Dienstag, 27. Februar 2007 19:50 schrieb Tom Lane: > Seems basically we'd want to not cast unknown to anyarray unless there > is some additional bit of context suggesting that that's the right thing. > But what should that extra requirement be? Can we go as far as not > doing this cast implicitly at all? We could say that unknown is not taken as anyarray input if the entire function/operator argument list consists of anyelement or anyarray. But that might be even harder to comprehend. With the ARRAY[...] syntax available, converting unknown to anyarray might be altogether unnecessary. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Awhile back, I wrote: > Peter Eisentraut <peter_e@gmx.net> writes: >> I've looked into cutting back on the implicit casts to text, which >> exposed the following little gem. >> The expressions >> 'abc' || 34 >> 34 || 'abc' >> would no longer work, with the following error message: >> ERROR: 22P02: array value must start with "{" or dimension information > Hm, that's annoying. Not that the expressions fail --- we want them to > --- but that the error message is so unhelpful. I've looked into this more closely. The problem basically is that the parser sees these alternatives for binary || operators: select oid,oid::regoperator,oprcode from pg_operator where oprname = '||'; oid | oid | oprcode ------+-------------------------+--------------- 349 | ||(anyarray,anyelement) | array_append 374 | ||(anyelement,anyarray) | array_prepend 375 | ||(anyarray,anyarray) | array_cat 654 | ||(text,text) | textcat 1797 | ||(bit,bit) | bitcat 2018 | ||(bytea,bytea) | byteacat (6 rows) If there is no implicit cast from int to text, then operator 349 is the *only* candidate that is not immediately eliminated by the lack of any way to cast an integer 34 to its right argument type. So as far as the parser is concerned there is no ambiguity. If we hack things to prevent matching unknown to anyarray, as was suggested in the previous discussion, we'll get "operator does not exist: "unknown" || integer". Which is better than the 22P02 error, but still not great. It furthermore seems that the two operators anyarray || anyelement and anyelement || anyarray are really the only cases where an undesirable match to anyarray might occur. The other operators that take anyarray take it on both sides, which means that they'd not be preferred unless the other operand was discernibly an array. I don't think we want a solution that causes "knownarraycolumn = '{1,2,3}'" to start failing. That argument is even more compelling on the function side, because for instance there isn't a lot of doubt about the user's intent if he writes "array_append('{1,2,3}', 34)". So after reflecting on all that, it doesn't seem like a good idea to hack the type-coercion code to discriminate against matching unknown to anyarray. It looks to me like we have a very narrow problem and we should tailor a very narrow solution. What I am currently thinking we should do is make oper() specifically test for the case of operator 349 with UNKNOWN left input, or operator 374 with UNKNOWN right input, and throw a custom error message hinting that the other operand needs to be cast to text. In the long run maybe we should choose some other name for the array_append and array_prepend operators to avoid the confusion with concatenation. It seems to me that "concatenation" normally implies "stringing together similar objects", which these two operators definitely don't do, and so you could argue that || was a bad name for them from the get-go. But compatibility worries would mean we couldn't eliminate the old names for quite a long time, so maybe it's too late for that. Comments? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| "Tom Lane" <tgl@sss.pgh.pa.us> writes: > So after reflecting on all that, it doesn't seem like a good idea to > hack the type-coercion code to discriminate against matching unknown > to anyarray. It looks to me like we have a very narrow problem and > we should tailor a very narrow solution. What I am currently thinking > we should do is make oper() specifically test for the case of operator > 349 with UNKNOWN left input, or operator 374 with UNKNOWN right input, > and throw a custom error message hinting that the other operand > needs to be cast to text. Wouldn't that mean that 'foo'||'bar' would *still* fail? It really seems to me that at some point down the line we're going to cave and admit that users do expect 'foo' to be a string first and cast to other types only if the context requires it. That would mean we should be considering matching "unknown" as text first without casting and only if that fails looking for other types. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| I wrote: > It looks to me like we have a very narrow problem and > we should tailor a very narrow solution. What I am currently thinking > we should do is make oper() specifically test for the case of operator > 349 with UNKNOWN left input, or operator 374 with UNKNOWN right input, > and throw a custom error message hinting that the other operand > needs to be cast to text. I've been experimenting with another solution, which is to not add any weird error cases but instead add operators that will capture the problem cases back away from the anyelement||anyarray operators. My current prototype is create function catany(text, anyelement) returns text as $$ select $1 || $2::text $$ language sql; create function catany(anyelement, text) returns text as $$ select $1::text || $2 $$ language sql; create operator || (procedure = catany, leftarg = text, rightarg = anyelement); create operator || (procedure = catany, leftarg = anyelement, rightarg = text); which seems to mostly do the "right" thing. This approach would have one nice property, namely eliminating the single biggest point of push-back we are likely to get from removing the implicit casts to text. I have no doubt that practically the only reasonable use-case for that behavior was to let people concatenate stuff without being too picky about casts, and this mostly preserves that ability. It's not perfect, because it only fixes cases in which at least one operand is either unknown or implicitly coercible to text. But in practice I think that would cover 99% of cases, since typical usages tend to alternate literals and data values. Thoughts? Is this too klugy for words? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Gregory Stark <stark@enterprisedb.com> writes: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: >> we should do is make oper() specifically test for the case of operator >> 349 with UNKNOWN left input, or operator 374 with UNKNOWN right input, >> and throw a custom error message hinting that the other operand >> needs to be cast to text. > Wouldn't that mean that 'foo'||'bar' would *still* fail? No, because that would preferentially match to text || text, it being a preferred-type case. The current behavior with the implicit casts removed is template1=# select 'abc' || '34'; ?column? ---------- abc34 (1 row) ie, this was matched to the text || text operator; template1=# select 'abc' || 34; ERROR: array value must start with "{" or dimension information ie, this was matched to the anyarray || anyelement operator --- because it clearly can't match text || text. > It really seems to me that at some point down the line we're going to > cave and admit that users do expect 'foo' to be a string first and > cast to other types only if the context requires it. We already do that to some extent, as shown above; and it's got approximately nothing to do with this problem anyway. The cases where we have got a problem are where the other argument is clearly *not* text. But having said that, I'm currently leaning to the other solution of generalizing the || operator (and only that operator) instead of fooling with the type resolution rules. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| I wrote: > I've been experimenting with another solution, which is to not add any > weird error cases but instead add operators that will capture the > problem cases back away from the anyelement||anyarray operators. > My current prototype is > create function catany(text, anyelement) returns text as > $$ select $1 || $2::text $$ language sql; > create function catany(anyelement, text) returns text as > $$ select $1::text || $2 $$ language sql; > create operator || (procedure = catany, leftarg = text, rightarg = anyelement); > create operator || (procedure = catany, leftarg = anyelement, rightarg = text); > which seems to mostly do the "right" thing. This approach would have > one nice property, namely eliminating the single biggest point of > push-back we are likely to get from removing the implicit casts to text. I've been testing this approach more, and finding that it really captures a bit too much: some cases that you'd prefer went to anyelement||anyarray will be captured by the text||anyelement operator. For example in 8.2 this is mapped to array_prepend: regression=# select 'x'::text || array['aa','bb','cc']; ?column? -------------- {x,aa,bb,cc} (1 row) but with the experimental code you get textcat: catany=# select 'x'::text || array['aa','bb','cc']; ?column? ------------- x{aa,bb,cc} (1 row) Basically the textcat operators will capture any case where the scalar side is implicitly coercible to text, because the type resolution rules will prefer that. There are some hacks we could make to make this less probable (eg, declare the capturing operators as taking varchar instead of text) but I can't find any complete solution short of changing the resolution rules themselves. Which I'm loath to do since it might have unexpected side-effects. What I would like to propose is that we deprecate use of || as the operator name for array_prepend and array_append, and invent new recommended names for them. As I said earlier, these operators aren't exactly concatenation in any normal sense anyway, since they don't treat their operands symmetrically. My first thought is to suggest using the shifting symbols: anyelement >> anyarray anyarray << anyelement but perhaps someone will have a better suggestion. If we do that, then we have a solution for anyone whose array prepend or append operator is unexpectedly captured by text concatenation: use the new names instead. Now this is only going to seem like a good idea if you agree that we should have some capturing operators like these. But if we don't, I think we are going to get a lot of push-back from people whose concatenations of random datatypes suddenly stopped working. Essentially this proposal is putting the compatibility hit of tightening the implicit cast rules onto people who are using array append/prepend instead of people who are using concatenation without explicit casts. I think there are a lot fewer of the former than the latter. Comments? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| Tom Lane wrote: > >>> The expressions >>> 'abc' || 34 >>> 34 || 'abc' >>> would no longer work, with the following error message: >>> ERROR: 22P02: array value must start with "{" or dimension information > >> Hm, that's annoying. Not that the expressions fail --- we want them to >> --- but that the error message is so unhelpful. indeed > In the long run maybe we should choose some other name for the > array_append and array_prepend operators to avoid the confusion with > concatenation. It seems to me that "concatenation" normally implies > "stringing together similar objects", which these two operators > definitely don't do, and so you could argue that || was a bad name > for them from the get-go. But compatibility worries would mean we > couldn't eliminate the old names for quite a long time, so maybe > it's too late for that. > > Comments? Originally I saw this situation as as requiring the concatenation operator per SQL 2003: <array value expression> ::= <array concatenation> | <array primary> <array concatenation> ::= <array value expression 1> <concatenation operator> <array primary> <concatenation operator> ::= || <array value expression 1> ::= <array value expression> <array primary> ::= <value expression primary> <value expression primary> ::= <parenthesized value expression> | <nonparenthesized value expression primary> <parenthesized value expression> ::= <left paren> <value expression> <right paren> <value expression> ::= <common value expression> | <boolean value expression> | <row value expression> <common value expression> ::= <numeric value expression> | <string value expression> | <datetime value expression> | <interval value expression> | <user-defined type value expression> | <reference value expression> | <collection value expression> <nonparenthesized value expression primary> ::= <unsigned value specification> | <column reference> | <set function specification> | <window function> | <scalar subquery> | <case expression> | <cast specification> | <field reference> | <subtype treatment> | <method invocation> | <static method invocation> | <new specification> | <attribute or method reference> | <reference resolution> | <collection value constructor> | <array element reference> | <multiset element reference> | <routine invocation> | <next value expression> <collection value constructor> ::= <array value constructor> | <multiset value constructor> <unsigned value specification> ::= <unsigned literal> | <general value specification> <unsigned literal> ::= <unsigned numeric literal> | <general literal> <general literal> ::= <character string literal> | <national character string literal> | <Unicode character string literal> | <binary string literal> | <datetime literal> | <interval literal> | <boolean literal> What I can't decide now is whether all the above means the anyelement in this operation ought to be in parens or not. It seems to me that the anyelement can be any literal _except_ a <signed numeric literal>. In that case the spec seems to require parenthesis. Joe ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| ||||
| > For example in 8.2 this is mapped to array_prepend: > > regression=# select 'x'::text || array['aa','bb','cc']; > ?column? > -------------- > {x,aa,bb,cc} > (1 row) > > but with the experimental code you get textcat: > > catany=# select 'x'::text || array['aa','bb','cc']; > ?column? > ------------- > x{aa,bb,cc} > (1 row) This is what I would have expected || to give, and not what 8.2 does. So disregarding the rest of the argument I think that array_[pre|ap]pend should have other operators. Andreas ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |