vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| The following bug has been logged online: Bug reference: 1671 Logged by: Mark Dilger Email address: markdilger@yahoo.com PostgreSQL version: 8.0.2 Operating system: Mandrake Linux 9.1 (2.4.21-0.13mdkenterprise #1 SMP) Description: Long interval string representation rejected Details: <QUOTE> bash-2.05b$ psql Welcome to psql 8.0.2, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help with psql commands \g or terminate with semicolon to execute query \q to quit mark=# select '4 millenniums 5 centuries 4 decades 1 year 4 months 4 days 17 minutes 31 seconds'::interval; ERROR: invalid input syntax for type interval: "4 millenniums 5 centuries 4 decades 1 year 4 months 4 days 17 minutes 31 seconds" mark=# select '4 millenniums 5 centuries 4 decades 1 year 4 months 4 days 17 minutes'::interval; interval ----------------------------------- 4541 years 4 mons 4 days 00:17:00 (1 row) </QUOTE> It appears that any string representation of an interval of length greater than 76 is rejected. (76 = 51 + 25 = MAXDATELEN + MAXDATEFIELDS). This appears to be a limitation enforced within function interval_in() in the file src/backend/utils/adt/timestamp.c ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend |
| |||
| Tom Lane wrote: > There are a lot of fixed-size local buffers in that code. The ones > used in output routines seem defensible since the string to be generated > is predictable. The ones that are used for processing input are likely > wrong. OTOH I'm not eager to throw a palloc into each of those code > paths ... can we avoid that? I'm not sure offhand what the upper bounds on legal input for each of the datetime types is. Why not just allocate a larger but still fixed-size buffer -- say, 256 bytes? (While we're on the subject, it seems rather silly for ParseDateTime() not to do its own bounds checking -- all of its call sites do a strlen() on the input buffer before calling it, which could be avoided if ParseDateTime() we passed the size of `lowstr') -Neil ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Neil Conway <neilc@samurai.com> writes: > Tom Lane wrote: >> There are a lot of fixed-size local buffers in that code. The ones >> used in output routines seem defensible since the string to be generated >> is predictable. The ones that are used for processing input are likely >> wrong. > I'm not sure offhand what the upper bounds on legal input for each of > the datetime types is. Well, if you allow for whitespace between tokens then it's immediately clear that there is no fixed upper bound. Perhaps it would work to downcase just one token at a time, so that the max buffer length equals the max acceptable token? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org |
| |||
| Tom Lane wrote: > Well, if you allow for whitespace between tokens then it's immediately > clear that there is no fixed upper bound. Good point -- there is no upper bound on the input string, but since we skip whitespace, AFAICS this shouldn't affect the requirements for the size of the working buffer (lowstr). So if we passed the size of the working buffer to ParseDateTime() (per earlier gripe), we could only bail out when we actually need to use more working space than was allocated, not simply when strlen(input) >= sizeof(buffer). The implementation might be a bit ugly, though. > Perhaps it would work to downcase just one token at a time, so that > the max buffer length equals the max acceptable token? Not sure I follow you. -Neil ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Neil Conway <neilc@samurai.com> writes: >> Perhaps it would work to downcase just one token at a time, so that >> the max buffer length equals the max acceptable token? > Not sure I follow you. I believe that the reason for the local buffer is to hold a downcased version of the input, which we can compare to the all-lower-case tables of relevant keywords. So plan A for fixing it is to downcase and compare one token at a time, instead of downcasing the whole string at once. Plan B is to make the token comparison routines case-insensitive themselves, so that there's no need for a temporary copy of the input. Probably someone can think of plans C and D too ... I don't have any strong allegiance to any one way to fix it. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Tom Lane wrote: > I believe that the reason for the local buffer is to hold a downcased > version of the input, which we can compare to the all-lower-case tables > of relevant keywords. Well, that's one of the reasons, but not the only one. For example, how do you parse '17 minutes31 seconds'::interval without either a working buffer or the ability to resize the input buffer? (We need to split the input into two fields, each of which must be a NUL-terminated string. These fields are currently implemented as into the working buffer. If we changed that to be pointers into the input string, we have no where to put the NUL terminator. So you either need a separate buffer, repalloc() plus shuffling everything down, or palloc'd allocation for the fields themselves.) So I don't see that fixing the casing issue is sufficient. On closer inspection, the current code seems to get this wrong as well :-( For example, given the query: select '4millenniums5centuries4decades1year4months4days17 minutes31 seconds'::interval; attaching via gdb, with a breakpoint at datetime.c:890 -- (gdb) p lp $1 = 0xbfffdde1 "ÿÞ@" (gdb) p lowstr + 25 + 51 $4 = 0xbfffdddc "onds" (i.e. lp points past the end of lowstr) -Neil ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend |
| |||
| Neil Conway <neilc@samurai.com> writes: > Tom Lane wrote: >> I believe that the reason for the local buffer is to hold a downcased >> version of the input, which we can compare to the all-lower-case tables >> of relevant keywords. > Well, that's one of the reasons, but not the only one. For example, how > do you parse '17 minutes31 seconds'::interval without either a working > buffer or the ability to resize the input buffer? Sorry, s/downcased/downcased and null-terminated/. I have not read the parsing code in question lately, but offhand it seems like transferring one token at a time into a work buffer isn't a completely broken idea... > On closer inspection, the current code seems to get this wrong as well > :-( Wouldn't be surprised :-( regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend |
| |||
| Tom Lane wrote: > Sorry, s/downcased/downcased and null-terminated/. I have not read the > parsing code in question lately, but offhand it seems like transferring > one token at a time into a work buffer isn't a completely broken idea... I wouldn't call it "broken", but I don't see how it could be done without a significant refactoring of how datetime parsing is done, and your handwaving has yet to convince me 1. parse the input string into fields, which are arrays of pointers into a working buffer, via ParseDateTime() 2. decode the fields as appropriate for the datatype via DecodeInterval(), DecodeDatetime(), DecodeTimeOnly(), etc. i.e. I don't see an easy way to do field decoding one field at a time. -Neil ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster |
| |||
| Neil Conway <neilc@samurai.com> writes: > your handwaving has yet to convince me Zing ;-) I'm too tired to think about this more tonight, but it seems like the basic point is that we can't just construct a list of pointers into (a copy of) the original input string. A certain amount of data copying is going to have to occur; or else we change the token matching code to not assume null-termination. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend |
| ||||
| Tom Lane wrote: > I'm too tired to think about this more tonight, but it seems like > the basic point is that we can't just construct a list of pointers > into (a copy of) the original input string. I think we _can_ do it that way, it's just a question of whether that is the best approach. I think the solution I outlined before would work fine: pass the length of the working buffer to ParseDateTime(), and reject the input only if the parsing process actually requires more working space than was provided. ParseDateTime() can easily skip whitespace without storing it in the temporary buffer -- and once we've done that, AFAIK we _can_ place an upper bound on the amount of working space that is needed (each field has a maximum length and there are a maximum number of fields). Or we could rewrite how we do parsing, which seems to be what you're suggesting. I agree the code could do with cleanup, although we will also need some kind of minimally-invasive fix for back branches. -Neil ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |