Unix Technical Forum

Escape handling in COPY, strings, psql

This is a discussion on Escape handling in COPY, strings, psql within the pgsql Hackers forums, part of the PostgreSQL category; --> Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Here is an updated version of the COPY \x ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-11-2008, 04:06 AM
Bruce Momjian
 
Posts: n/a
Default Escape handling in COPY, strings, psql

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Here is an updated version of the COPY \x patch. It is the first patch
> > attached.
> > Also, I realized that if we support \x in COPY, we should also support
> > \x in strings to the backend. This is the second patch.

>
> Do we really want to do any of these things? We've been getting beaten
> up recently about the fact that we have non-SQL-spec string escapes
> (ie, all the backslash stuff) so I'm a bit dubious about adding more,
> especially when there's little if any demand for it.


I thought about that, but adding additional escape letters isn't our
problem --- it is the escape mechanism itself that is the issue.

I have wanted to post on this issue so now is a good time. I think we
have been validly beaten up in that we pride ourselves on standards
compliance but have escape requirement on all strings. Our string
escapes are a major problem --- not the number of them but the
requirement to double backslashes on input, like 'C:\\tmp'. I am
thinking the only clean solution is to add a special keyword like ESCAPE
before strings that contain escape information. I think a GUC is too
general. You know if the string is a constant if it contains escapes
just by looking at it, and if it is a variable, hopefully you know if it
has escapes.

Basically, I think we have to deal with this somehow. I think it could
be implemented by looking for the ESCAPE keyword in parser/scan.l and
handling it all in there by ignoring backslash escapes if ESCAPE
preceeds the string. By the time you are in gram.y, it is too late.

> I don't object too much to the COPY addition, since that's outside any
> spec anyway, but I do think we ought to think twice about adding this
> to SQL literal handling.
>
> > Third, I found out that psql has some unusual handling of escaped
> > numbers. Instead of using \ddd as octal, it has \ddd is decimal, \0ddd
> > is octal, and \0xddd is decimal. It is basically following the strtol()
> > rules for an escaped value. This seems confusing and contradicts how
> > the rest of our system works.

>
> I agree, that's just going to confuse people.
>
> > ! xqescape [\\][^0-7x]

>
> If you are going to insist on this, at least make it case-insensitive.


The submitted COPY patch also was case-insensitive, \x and \X, but I
changed that because we are case-sensitive for all backslashes in COPY,
and C is the same (\n and \N are different too, so we actually use the
case-sensitivity). Should we allow \X just so it is case-insensitive
like the SQL specification X'4f'? That is the only logic I can think of
for it to be case-insensitive, but we have to then do that at all
levels, and I am not sure it makes sense.

--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-11-2008, 04:06 AM
Peter Eisentraut
 
Posts: n/a
Default Re: Escape handling in COPY, strings, psql

Bruce Momjian wrote:
> I thought about that, but adding additional escape letters isn't our
> problem --- it is the escape mechanism itself that is the issue.


In a random-encoding environment, the option to specify byte values
directly -- at any level -- is of limited value anyway and is a
potential source of errors. So let's stay away from that.

I did not find the original posts that your quotations came from, but it
has to be considered that COPY is also subject to encoding processing.
Overall, I find this proposal to be a dangerous option.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-11-2008, 04:06 AM
Bruce Momjian
 
Posts: n/a
Default Re: Escape handling in COPY, strings, psql

Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > > Here is an updated version of the COPY \x patch. It is the first patch
> > > attached.
> > > Also, I realized that if we support \x in COPY, we should also support
> > > \x in strings to the backend. This is the second patch.

> >
> > Do we really want to do any of these things? We've been getting beaten
> > up recently about the fact that we have non-SQL-spec string escapes
> > (ie, all the backslash stuff) so I'm a bit dubious about adding more,
> > especially when there's little if any demand for it.

>
> I thought about that, but adding additional escape letters isn't our
> problem --- it is the escape mechanism itself that is the issue.
>
> I have wanted to post on this issue so now is a good time. I think we
> have been validly beaten up in that we pride ourselves on standards
> compliance but have escape requirement on all strings. Our string
> escapes are a major problem --- not the number of them but the
> requirement to double backslashes on input, like 'C:\\tmp'. I am
> thinking the only clean solution is to add a special keyword like ESCAPE
> before strings that contain escape information. I think a GUC is too
> general. You know if the string is a constant if it contains escapes
> just by looking at it, and if it is a variable, hopefully you know if it
> has escapes.
>
> Basically, I think we have to deal with this somehow. I think it could
> be implemented by looking for the ESCAPE keyword in parser/scan.l and
> handling it all in there by ignoring backslash escapes if ESCAPE
> preceeds the string. By the time you are in gram.y, it is too late.


One other idea would be to remove escape processing for single-quoted
strings but keep it for our $$ strings, becuase they are not ANSI
standard.

--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-11-2008, 04:06 AM
Tom Lane
 
Posts: n/a
Default Re: Escape handling in COPY, strings, psql

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> One other idea would be to remove escape processing for single-quoted
> strings but keep it for our $$ strings, becuase they are not ANSI
> standard.


There is *no* escape processing within $$, and never will be, because
that would destroy the entire point. You'd be right back having to
double backslashes.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-11-2008, 04:06 AM
Peter Eisentraut
 
Posts: n/a
Default Re: Escape handling in COPY, strings, psql

Bruce Momjian wrote:
> > I am thinking the only clean solution is to add a special keyword
> > like ESCAPE before strings that contain escape information. I
> > think a GUC is too general. You know if the string is a constant
> > if it contains escapes just by looking at it, and if it is a
> > variable, hopefully you know if it has escapes.


I do support gradually phasing out backslash escapes in standard string
literals in the interest of portability. Most of the current escape
sequences are of limited value anyway. Let's think about ways to get
there:

Enabling escape sequences in string literals controls the formatting of
input (and output?) data, so it is akin to, say, the client encoding
and the date style, so a GUC variable isn't out of the question in my
mind. It makes most sense, though, if we want to eventually make users
switch it off all the time, that is, as a transition aid. But before
that can happen, we need to come up with an alternative mechanism to
enter weird characters.

One such way may be to provide functions (say, chr(), tab(), etc.) to
give access to unprintable characters, but that will result in terrible
performance for long strings and it also won't help with COPY or places
where only literals are allowed.

Another way would be to allow escape sequences only in specially marked
strings. The proposal above doing 'foo' ESCAPE 'x' seems fairly
elegant for SQL linguists but would be pretty weird to implement in the
lexer. It won't help with COPY either, but that is really the case for
all solutions.

A more compact represenation may be using a prefix letter, like E'foo'.
This fits the SQL syntax, is familiar with Python programmers (although
in the other direction), and can be implemented efficiently in the
lexer. I like that the best, personally.

For COPY, we would probably have to use a flag in the COPY command
itself either way (like already done for NULL AS).

Comments? Other ideas? Keep the escapes?

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-11-2008, 04:06 AM
Tom Lane
 
Posts: n/a
Default Re: Escape handling in COPY, strings, psql

Peter Eisentraut <peter_e@gmx.net> writes:
> I do support gradually phasing out backslash escapes in standard string
> literals in the interest of portability. Most of the current escape
> sequences are of limited value anyway. Let's think about ways to get
> there:


I really don't think there is any way to get there without creating
gaping security holes in all kinds of client code :-(. If we change
the escaping rules, then a client that is expecting some other rule
than happens to be in force will be subject to trivial SQL-injection
attacks. This will make the autocommit fiasco pale by comparison ...

> For COPY, we would probably have to use a flag in the COPY command
> itself either way (like already done for NULL AS).


The spec-compatibility argument for removing escapes does not apply to
COPY at all, so I see no need to fool with the COPY definition in any
case.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-11-2008, 04:06 AM
Bruce Momjian
 
Posts: n/a
Default Re: Escape handling in COPY, strings, psql

Peter Eisentraut wrote:
> Bruce Momjian wrote:
> > > I am thinking the only clean solution is to add a special keyword
> > > like ESCAPE before strings that contain escape information. I
> > > think a GUC is too general. You know if the string is a constant
> > > if it contains escapes just by looking at it, and if it is a
> > > variable, hopefully you know if it has escapes.

>
> I do support gradually phasing out backslash escapes in standard string
> literals in the interest of portability. Most of the current escape
> sequences are of limited value anyway. Let's think about ways to get
> there:
>
> Enabling escape sequences in string literals controls the formatting of
> input (and output?) data, so it is akin to, say, the client encoding
> and the date style, so a GUC variable isn't out of the question in my
> mind. It makes most sense, though, if we want to eventually make users
> switch it off all the time, that is, as a transition aid. But before
> that can happen, we need to come up with an alternative mechanism to
> enter weird characters.
>
> One such way may be to provide functions (say, chr(), tab(), etc.) to
> give access to unprintable characters, but that will result in terrible
> performance for long strings and it also won't help with COPY or places
> where only literals are allowed.
>
> Another way would be to allow escape sequences only in specially marked
> strings. The proposal above doing 'foo' ESCAPE 'x' seems fairly
> elegant for SQL linguists but would be pretty weird to implement in the
> lexer. It won't help with COPY either, but that is really the case for
> all solutions.


I was suggesting ESCAPE 'string' or ESC 'string'. The marker has to be
before the string so scan.l can alter its processing of the string ---
after the string is too late --- there is no way to undo any escaping
that has happened, and it might already be used by gram.y.

I could probably hack up a sample implementation if people are
interested.

> A more compact representation may be using a prefix letter, like E'foo'.
> This fits the SQL syntax, is familiar with Python programmers (although
> in the other direction), and can be implemented efficiently in the
> lexer. I like that the best, personally.
>
> For COPY, we would probably have to use a flag in the COPY command
> itself either way (like already done for NULL AS).


I agree with Tom that COPY has to be left unchanged. The fundamental
problem is the representation of NULL values, that I don't think we can
do without some escape mechanism. Single-quote escapes works by
doubling them, but once you need to represent something more like
null's, I can't think of a solution without escapes.

--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-11-2008, 04:06 AM
Bruce Momjian
 
Posts: n/a
Default Re: Escape handling in COPY, strings, psql

Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > I do support gradually phasing out backslash escapes in standard string
> > literals in the interest of portability. Most of the current escape
> > sequences are of limited value anyway. Let's think about ways to get
> > there:

>
> I really don't think there is any way to get there without creating
> gaping security holes in all kinds of client code :-(. If we change
> the escaping rules, then a client that is expecting some other rule
> than happens to be in force will be subject to trivial SQL-injection
> attacks. This will make the autocommit fiasco pale by comparison ...


I looked at PQescapeString() and fortunately it escapes single quotes
by doing double-single quotes, not by using a backslash. This was
probably chosen for standards compliance.

Basically, I think our current behavior is not sustainable. I think we
are going to need to do something, and I think we should consider a
solution now rather than later. I don't think we can be as serious a
contender for portability without some kind of solution.

I am thinking we should first tell people in 8.1 that they should start
using only double-single quotes, and perhaps support the ESCAPE phrase
as a no-op, and then consider some kind of solution in 8.2 or later.

I don't think fixing this is going to be a huge security problem, but it
might be a small one. The good thing is that double-single quotes work,
so if people use only that for quote escaping, if you forget the ESCAPE
clause, you just get literal backslashes, not a security problem.

I ran the following test:

test=> select $$\$$;
?column?
----------
\
(1 row)

test=> create table test (x TEXT);
CREATE TABLE
test=> INSERT INTO test VALUES ($$\$$);
INSERT 0 1
test=> SELECT * FROM test;
x
---
\
(1 row)

and the good news is that output of backslashes is fine --- it is just
input that is the issue, and the security problem is only using \',
which we would have to tell people to avoid and start using only ''.

I think we can tell people in 8.1 that they should modify their
applications to only use '', and that \' might be a security problem in
the future. If we get to that then using ESC or not only affects input
of values and literal backslashes being entered, and my guess is that
90% of the backslash entries that want escaping are literal in the
application and not supplied by program variables. In fact, if we
disable backslash by default then strings coming in only have to deal
with single quotes (like other databases) and the system is more secure
because there is no special backslash handling by default.

> > For COPY, we would probably have to use a flag in the COPY command
> > itself either way (like already done for NULL AS).

>
> The spec-compatibility argument for removing escapes does not apply to
> COPY at all, so I see no need to fool with the COPY definition in any
> case.


Agreed.

--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-11-2008, 04:06 AM
Christopher Kings-Lynne
 
Posts: n/a
Default Re: Escape handling in COPY, strings, psql

> I think we can tell people in 8.1 that they should modify their
> applications to only use '', and that \' might be a security problem in
> the future. If we get to that then using ESC or not only affects input
> of values and literal backslashes being entered, and my guess is that
> 90% of the backslash entries that want escaping are literal in the
> application and not supplied by program variables. In fact, if we
> disable backslash by default then strings coming in only have to deal
> with single quotes (like other databases) and the system is more secure
> because there is no special backslash handling by default.


I can tell you right now this will be a problem There are loads of
PHP ppl who use addslashes() instead of pg_escape_string() to escape data.

Chris


---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-11-2008, 04:06 AM
Peter Eisentraut
 
Posts: n/a
Default Re: Escape handling in COPY, strings, psql

Bruce Momjian wrote:
> I was suggesting ESCAPE 'string' or ESC 'string'. The marker has to
> be before the string so scan.l can alter its processing of the string
> --- after the string is too late --- there is no way to undo any
> escaping that has happened, and it might already be used by gram.y.


That pretty much corresponds to my E'string' proposal. Both are
probably equally trivial to implement.

> I agree with Tom that COPY has to be left unchanged. The fundamental
> problem is the representation of NULL values, that I don't think we
> can do without some escape mechanism. Single-quote escapes works by
> doubling them, but once you need to represent something more like
> null's, I can't think of a solution without escapes.


Yes, I now realize that COPY has a whole set of different rules anyway,
so we can leave that out of this discussion.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 06:19 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com