Unix Technical Forum

SEO

vBulletin Search Engine Optimization


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-12-2008, 07:24 AM
Peter Eisentraut
 
Posts: n/a
Default Implicit casts with generic arrays

I've looked into cutting back on the implicit casts to text, which
exposed the following little gem.

The expressions

'abc' || 34
34 || 'abc'

would no longer work, with the following error message:

ERROR: 22P02: array value must start with "{" or dimension information

That's because the best matches are now respectively

anyarray || anyelement
anyelement || anyarray

Now either this is just too bad and users of a system with reduced casts
to text will have to live with this odd error message, or coercing any
old unknown constant to anyarray isn't such a good idea.

Comments?

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-12-2008, 07:24 AM
Tom Lane
 
Posts: n/a
Default Re: Implicit casts with generic arrays

Peter Eisentraut <peter_e@gmx.net> writes:
> I've looked into cutting back on the implicit casts to text, which
> exposed the following little gem.


> The expressions
> 'abc' || 34
> 34 || 'abc'
> would no longer work, with the following error message:
> ERROR: 22P02: array value must start with "{" or dimension information


Hm, that's annoying. Not that the expressions fail --- we want them to
--- but that the error message is so unhelpful.

Since ANYARRAY is already special to the type system, I don't have a
problem with inserting some special case to prevent this, but I'm not
sure what the special case should be. Is it too klugy to say "don't
implicitly cast unknown to anyarray unless the literal's value starts
with { or ["? We've never made casting decisions depend on the contents
of strings before, and I'm really loath to make 'em do so now.

Seems basically we'd want to not cast unknown to anyarray unless there
is some additional bit of context suggesting that that's the right thing.
But what should that extra requirement be? Can we go as far as not
doing this cast implicitly at all?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-12-2008, 07:26 AM
Peter Eisentraut
 
Posts: n/a
Default Re: Implicit casts with generic arrays

Am Dienstag, 27. Februar 2007 19:50 schrieb Tom Lane:
> Seems basically we'd want to not cast unknown to anyarray unless there
> is some additional bit of context suggesting that that's the right thing.
> But what should that extra requirement be? Can we go as far as not
> doing this cast implicitly at all?


We could say that unknown is not taken as anyarray input if the entire
function/operator argument list consists of anyelement or anyarray. But that
might be even harder to comprehend. With the ARRAY[...] syntax available,
converting unknown to anyarray might be altogether unnecessary.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-12-2008, 08:57 AM
Tom Lane
 
Posts: n/a
Default Re: Implicit casts with generic arrays

Awhile back, I wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
>> I've looked into cutting back on the implicit casts to text, which
>> exposed the following little gem.


>> The expressions
>> 'abc' || 34
>> 34 || 'abc'
>> would no longer work, with the following error message:
>> ERROR: 22P02: array value must start with "{" or dimension information


> Hm, that's annoying. Not that the expressions fail --- we want them to
> --- but that the error message is so unhelpful.


I've looked into this more closely. The problem basically is that the
parser sees these alternatives for binary || operators:

select oid,oid::regoperator,oprcode from pg_operator where oprname = '||';
oid | oid | oprcode
------+-------------------------+---------------
349 | ||(anyarray,anyelement) | array_append
374 | ||(anyelement,anyarray) | array_prepend
375 | ||(anyarray,anyarray) | array_cat
654 | ||(text,text) | textcat
1797 | ||(bit,bit) | bitcat
2018 | ||(bytea,bytea) | byteacat
(6 rows)

If there is no implicit cast from int to text, then operator 349 is the
*only* candidate that is not immediately eliminated by the lack of any
way to cast an integer 34 to its right argument type. So as far as the
parser is concerned there is no ambiguity. If we hack things to prevent
matching unknown to anyarray, as was suggested in the previous
discussion, we'll get "operator does not exist: "unknown" || integer".
Which is better than the 22P02 error, but still not great.

It furthermore seems that the two operators anyarray || anyelement and
anyelement || anyarray are really the only cases where an undesirable
match to anyarray might occur. The other operators that take anyarray
take it on both sides, which means that they'd not be preferred unless
the other operand was discernibly an array. I don't think we want a
solution that causes "knownarraycolumn = '{1,2,3}'" to start failing.

That argument is even more compelling on the function side, because
for instance there isn't a lot of doubt about the user's intent if he
writes "array_append('{1,2,3}', 34)".

So after reflecting on all that, it doesn't seem like a good idea to
hack the type-coercion code to discriminate against matching unknown
to anyarray. It looks to me like we have a very narrow problem and
we should tailor a very narrow solution. What I am currently thinking
we should do is make oper() specifically test for the case of operator
349 with UNKNOWN left input, or operator 374 with UNKNOWN right input,
and throw a custom error message hinting that the other operand
needs to be cast to text.

In the long run maybe we should choose some other name for the
array_append and array_prepend operators to avoid the confusion with
concatenation. It seems to me that "concatenation" normally implies
"stringing together similar objects", which these two operators
definitely don't do, and so you could argue that || was a bad name
for them from the get-go. But compatibility worries would mean we
couldn't eliminate the old names for quite a long time, so maybe
it's too late for that.

Comments?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-12-2008, 08:57 AM
Gregory Stark
 
Posts: n/a
Default Re: Implicit casts with generic arrays

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> So after reflecting on all that, it doesn't seem like a good idea to
> hack the type-coercion code to discriminate against matching unknown
> to anyarray. It looks to me like we have a very narrow problem and
> we should tailor a very narrow solution. What I am currently thinking
> we should do is make oper() specifically test for the case of operator
> 349 with UNKNOWN left input, or operator 374 with UNKNOWN right input,
> and throw a custom error message hinting that the other operand
> needs to be cast to text.


Wouldn't that mean that 'foo'||'bar' would *still* fail?

It really seems to me that at some point down the line we're going to cave and
admit that users do expect 'foo' to be a string first and cast to other types
only if the context requires it. That would mean we should be considering
matching "unknown" as text first without casting and only if that fails
looking for other types.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-12-2008, 08:57 AM
Tom Lane
 
Posts: n/a
Default Re: Implicit casts with generic arrays

I wrote:
> It looks to me like we have a very narrow problem and
> we should tailor a very narrow solution. What I am currently thinking
> we should do is make oper() specifically test for the case of operator
> 349 with UNKNOWN left input, or operator 374 with UNKNOWN right input,
> and throw a custom error message hinting that the other operand
> needs to be cast to text.


I've been experimenting with another solution, which is to not add any
weird error cases but instead add operators that will capture the
problem cases back away from the anyelement||anyarray operators.
My current prototype is

create function catany(text, anyelement) returns text as
$$ select $1 || $2::text $$ language sql;
create function catany(anyelement, text) returns text as
$$ select $1::text || $2 $$ language sql;

create operator || (procedure = catany, leftarg = text, rightarg = anyelement);
create operator || (procedure = catany, leftarg = anyelement, rightarg = text);

which seems to mostly do the "right" thing. This approach would have
one nice property, namely eliminating the single biggest point of
push-back we are likely to get from removing the implicit casts to text.
I have no doubt that practically the only reasonable use-case for that
behavior was to let people concatenate stuff without being too picky
about casts, and this mostly preserves that ability. It's not perfect,
because it only fixes cases in which at least one operand is either
unknown or implicitly coercible to text. But in practice I think that
would cover 99% of cases, since typical usages tend to alternate
literals and data values.

Thoughts? Is this too klugy for words?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-12-2008, 08:57 AM
Tom Lane
 
Posts: n/a
Default Re: Implicit casts with generic arrays

Gregory Stark <stark@enterprisedb.com> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> we should do is make oper() specifically test for the case of operator
>> 349 with UNKNOWN left input, or operator 374 with UNKNOWN right input,
>> and throw a custom error message hinting that the other operand
>> needs to be cast to text.


> Wouldn't that mean that 'foo'||'bar' would *still* fail?


No, because that would preferentially match to text || text, it being
a preferred-type case. The current behavior with the implicit casts
removed is

template1=# select 'abc' || '34';
?column?
----------
abc34
(1 row)

ie, this was matched to the text || text operator;

template1=# select 'abc' || 34;
ERROR: array value must start with "{" or dimension information

ie, this was matched to the anyarray || anyelement operator --- because
it clearly can't match text || text.

> It really seems to me that at some point down the line we're going to
> cave and admit that users do expect 'foo' to be a string first and
> cast to other types only if the context requires it.


We already do that to some extent, as shown above; and it's got
approximately nothing to do with this problem anyway. The cases where
we have got a problem are where the other argument is clearly *not*
text.

But having said that, I'm currently leaning to the other solution of
generalizing the || operator (and only that operator) instead of
fooling with the type resolution rules.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-12-2008, 08:57 AM
Tom Lane
 
Posts: n/a
Default Re: Implicit casts with generic arrays

I wrote:
> I've been experimenting with another solution, which is to not add any
> weird error cases but instead add operators that will capture the
> problem cases back away from the anyelement||anyarray operators.
> My current prototype is


> create function catany(text, anyelement) returns text as
> $$ select $1 || $2::text $$ language sql;
> create function catany(anyelement, text) returns text as
> $$ select $1::text || $2 $$ language sql;


> create operator || (procedure = catany, leftarg = text, rightarg = anyelement);
> create operator || (procedure = catany, leftarg = anyelement, rightarg = text);


> which seems to mostly do the "right" thing. This approach would have
> one nice property, namely eliminating the single biggest point of
> push-back we are likely to get from removing the implicit casts to text.


I've been testing this approach more, and finding that it really
captures a bit too much: some cases that you'd prefer went to
anyelement||anyarray will be captured by the text||anyelement operator.
For example in 8.2 this is mapped to array_prepend:

regression=# select 'x'::text || array['aa','bb','cc'];
?column?
--------------
{x,aa,bb,cc}
(1 row)

but with the experimental code you get textcat:

catany=# select 'x'::text || array['aa','bb','cc'];
?column?
-------------
x{aa,bb,cc}
(1 row)

Basically the textcat operators will capture any case where the scalar
side is implicitly coercible to text, because the type resolution rules
will prefer that. There are some hacks we could make to make this less
probable (eg, declare the capturing operators as taking varchar instead
of text) but I can't find any complete solution short of changing the
resolution rules themselves. Which I'm loath to do since it might have
unexpected side-effects.

What I would like to propose is that we deprecate use of || as the
operator name for array_prepend and array_append, and invent new
recommended names for them. As I said earlier, these operators
aren't exactly concatenation in any normal sense anyway, since they
don't treat their operands symmetrically. My first thought is to
suggest using the shifting symbols:
anyelement >> anyarray
anyarray << anyelement
but perhaps someone will have a better suggestion. If we do that, then
we have a solution for anyone whose array prepend or append operator is
unexpectedly captured by text concatenation: use the new names instead.

Now this is only going to seem like a good idea if you agree that we
should have some capturing operators like these. But if we don't,
I think we are going to get a lot of push-back from people whose
concatenations of random datatypes suddenly stopped working.
Essentially this proposal is putting the compatibility hit of tightening
the implicit cast rules onto people who are using array append/prepend
instead of people who are using concatenation without explicit casts.
I think there are a lot fewer of the former than the latter.

Comments?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-12-2008, 08:57 AM
Joe Conway
 
Posts: n/a
Default Re: Implicit casts with generic arrays

Tom Lane wrote:
>
>>> The expressions
>>> 'abc' || 34
>>> 34 || 'abc'
>>> would no longer work, with the following error message:
>>> ERROR: 22P02: array value must start with "{" or dimension information

>
>> Hm, that's annoying. Not that the expressions fail --- we want them to
>> --- but that the error message is so unhelpful.


indeed

> In the long run maybe we should choose some other name for the
> array_append and array_prepend operators to avoid the confusion with
> concatenation. It seems to me that "concatenation" normally implies
> "stringing together similar objects", which these two operators
> definitely don't do, and so you could argue that || was a bad name
> for them from the get-go. But compatibility worries would mean we
> couldn't eliminate the old names for quite a long time, so maybe
> it's too late for that.
>
> Comments?


Originally I saw this situation as as requiring the concatenation
operator per SQL 2003:

<array value expression> ::=
<array concatenation>
| <array primary>
<array concatenation> ::=
<array value expression 1> <concatenation operator> <array primary>
<concatenation operator> ::= ||

<array value expression 1> ::= <array value expression>
<array primary> ::= <value expression primary>
<value expression primary> ::=
<parenthesized value expression>
| <nonparenthesized value expression primary>
<parenthesized value expression> ::=
<left paren> <value expression> <right paren>
<value expression> ::=
<common value expression>
| <boolean value expression>
| <row value expression>
<common value expression> ::=
<numeric value expression>
| <string value expression>
| <datetime value expression>
| <interval value expression>
| <user-defined type value expression>
| <reference value expression>
| <collection value expression>
<nonparenthesized value expression primary> ::=
<unsigned value specification>
| <column reference>
| <set function specification>
| <window function>
| <scalar subquery>
| <case expression>
| <cast specification>
| <field reference>
| <subtype treatment>
| <method invocation>
| <static method invocation>
| <new specification>
| <attribute or method reference>
| <reference resolution>
| <collection value constructor>
| <array element reference>
| <multiset element reference>
| <routine invocation>
| <next value expression>
<collection value constructor> ::=
<array value constructor>
| <multiset value constructor>
<unsigned value specification> ::=
<unsigned literal>
| <general value specification>
<unsigned literal> ::=
<unsigned numeric literal>
| <general literal>
<general literal> ::=
<character string literal>
| <national character string literal>
| <Unicode character string literal>
| <binary string literal>
| <datetime literal>
| <interval literal>
| <boolean literal>


What I can't decide now is whether all the above means the anyelement in
this operation ought to be in parens or not. It seems to me that the
anyelement can be any literal _except_ a <signed numeric literal>. In
that case the spec seems to require parenthesis.

Joe

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-12-2008, 08:57 AM
Zeugswetter Andreas ADI SD
 
Posts: n/a
Default Re: Implicit casts with generic arrays


> For example in 8.2 this is mapped to array_prepend:
>
> regression=# select 'x'::text || array['aa','bb','cc'];
> ?column?
> --------------
> {x,aa,bb,cc}
> (1 row)
>
> but with the experimental code you get textcat:
>
> catany=# select 'x'::text || array['aa','bb','cc'];
> ?column?
> -------------
> x{aa,bb,cc}
> (1 row)


This is what I would have expected || to give, and not what 8.2 does.
So disregarding the rest of the argument I think that array_[pre|ap]pend
should have other operators.

Andreas

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 06:38 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
UnixAdminTalk.com

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498