vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Attached is a patch to the scanner and the COPY code that checks for invalidly encoded data that can currently leak into our system via \ escapes in quoted literals or text mode copy fields, as recently discussed. That would still leave holes via chr(), convert() and possibly other functions, but these two paths are the biggest holes that need plugging. cheers andrew ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Andrew Dunstan <andrew@dunslane.net> writes: > Attached is a patch to the scanner and the COPY code that checks for > invalidly encoded data that can currently leak into our system via \ > escapes in quoted literals or text mode copy fields, as recently > discussed. That would still leave holes via chr(), convert() and > possibly other functions, but these two paths are the biggest holes that > need plugging. The COPY code looks sane. On the scan.l change, I believe two out of three of those calls are useless, because we do not do backslash processing in dollar-quoted strings nor in quoted identifiers. Also, I'd kinda like to have the check-for-high-bit optimization in scan.l too --- some people do throw big literals at the thing. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Tom Lane wrote: > Also, I'd kinda like to have the check-for-high-bit optimization in > scan.l too --- some people do throw big literals at the thing. > > > OK, will do. Am I correct in thinking I don't need to worry about the <xeescape> case, only the <xeoctesc> and <xehexesc> cases? cheers andrew ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Andrew Dunstan <andrew@dunslane.net> writes: > Tom Lane wrote: >> Also, I'd kinda like to have the check-for-high-bit optimization in >> scan.l too --- some people do throw big literals at the thing. >> > OK, will do. Am I correct in thinking I don't need to worry about the > <xeescape> case, only the <xeoctesc> and <xehexesc> cases? [ squint ... ] Hm, wouldn't bet on it. That leads to unescape_single_char(), which is fine for the cases that it explicitly knows about (\b and so on), but what if the following byte has the high bit set? Not only would that pass through a high bit to the output, but very possibly this results in disassembling a multibyte input character. So it looks like you need to recheck if unescape_single_char sees a high-bit-set char. You should take a second look at the COPY code to see if there's a similar case there --- I forget what it does with backslash followed by non-digit. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Tom Lane wrote: > > So it looks like you need to recheck if unescape_single_char sees a > high-bit-set char. > > You should take a second look at the COPY code to see if there's a > similar case there --- I forget what it does with backslash followed > by non-digit. > > It's covered. Revised patch attached. I'll probably apply this some time tomorrow. cheers andrew ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| ||||
| Andrew Dunstan <andrew@dunslane.net> writes: > addlitchar(unescape_single_char(yytext[1])); > + if (IS_HIGHBIT_SET(literalbuf[literallen])) > + saw_high_bit = true; Isn't that array subscript off-by-one? Probably better to put the test inside unescape_single_char(), anyway. Otherwise it looks sane, though maybe shy a comment or so. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| Thread Tools | |
| Display Modes | |
|
|