vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| > -----Original Message----- > From: Martijn van Oosterhout [mailto:kleptog@svana.org] > Sent: 13 April 2006 11:54 > To: Dave Page; pgsql-hackers@postgresql.org; Hiroshi Inoue > Subject: Re: [HACKERS] Practical impediment to supporting > multiple SSL libraries > > On Thu, Apr 13, 2006 at 06:44:12AM -0400, Stephen Frost wrote: > > Instead of having it hijack the libpq connection and implement the > > wireline protocol itself, why don't we work on fixing the problems > > (such as the double-copying that libpq requires) in libpq > to allow the > > driver (and others!) to use it in the 'orthodox' way? > > Ok. I'm not sure what this "double copying" you're referring > to is, The libpq driver copies results out of the PGresult struct into the internal QueryResult classes. With libpq out of the loop, data can go straight from the wire into the QR. > but I'd certaintly like to know why people are > reimplementing the protocol (psqlODBC is hardly the only one). There are elements of the wire protocol that libpq doesn't actually implement from what I recall. IIRC, they were added specifically for JDBC but also intended to be used by psqlODBC as well. I forget the details though as I wasn't so involved with the ODBC development back then. In addition of course, implementing the protocol natively does allow for maximum flexibility. Regards, Dave. ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| On Thu, Apr 13, 2006 at 12:12:25PM +0100, Dave Page wrote: > > Ok. I'm not sure what this "double copying" you're referring > > to is, > > The libpq driver copies results out of the PGresult struct into the > internal QueryResult classes. With libpq out of the loop, data can go > straight from the wire into the QR. Hmm, the simplest improvement I can think of is one where you register a callback that libpq calls whenever it has received a new tuple. However, w.r.t. the copying, the pointers in get PGresult are in memory belonging to that result. As long as that PGresult hangs around, you should be able to just copy the pointers rather than the data? Or is this unacceptable? The only alternative I can think of is let users provide a callback that is given the number of bytes and it returns memory to store the data into. But that just seems unnecessarily complex, considering you could just copy the pointers. > There are elements of the wire protocol that libpq doesn't actually > implement from what I recall. IIRC, they were added specifically for > JDBC but also intended to be used by psqlODBC as well. I forget the > details though as I wasn't so involved with the ODBC development back > then. Ugh, that's terrible. How do these features get tested if nothing within the main tree implements them. > In addition of course, implementing the protocol natively does allow for > maximum flexibility. Maybe, but it should be possible to have a lot of flexibility without having many projects jump through all sorts of hoops everytime a new protocol version is created. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFEPjctIB7bNG8LQkwRAhaGAJ4+NKfCL9X9KqegNU/wb0Pw4cOCeACdHaxf 1oj/PduccD/Qo1yUd5DNOlI= =D9JT -----END PGP SIGNATURE----- |
| |||
| * Martijn van Oosterhout (kleptog@svana.org) wrote: > On Thu, Apr 13, 2006 at 12:12:25PM +0100, Dave Page wrote: > > > Ok. I'm not sure what this "double copying" you're referring > > > to is, > > > > The libpq driver copies results out of the PGresult struct into the > > internal QueryResult classes. With libpq out of the loop, data can go > > straight from the wire into the QR. > > Hmm, the simplest improvement I can think of is one where you > register a callback that libpq calls whenever it has received a new > tuple. You wouldn't want it on every tuple as that'd get expensive through function calls. > However, w.r.t. the copying, the pointers in get PGresult are in memory > belonging to that result. As long as that PGresult hangs around, you > should be able to just copy the pointers rather than the data? Or is > this unacceptable? It's actually pretty common (or seems to be anyway) to want to store the data from the query result into your own data structure. Yes, you could just use pointers all over the place but that means you're going to have to use things which understand PQresult everywhere as opposed to having a generic 'storage manager' with other generic things (index creator, aggregator, etc) which can be used with more than just PQresults. > The only alternative I can think of is let users provide a callback > that is given the number of bytes and it returns memory to store the > data into. But that just seems unnecessarily complex, considering you > could just copy the pointers. You don't provide a callback, you have the user provide a memory region to libpq which libpq can then fill in. It's really not that difficult, the API would really look quite a bit like PQexecParams, ie: int PQgetTuples(PGresult *res, // Returns number of tuples populated const int max_ntuples, // Basically buffer size char *result_set, // Destination buffer const int *columnOffsets, // integer array of offsets const int *columnLengths, // integer array of lengths, for checks const int record_len, // Length of each structure int *columnNulls, // 0/1 for is not null / is null int resultFormat); // Or maybe just binary? If we want to do conversion of the data in some way then we may need to expand this to include that ability (but I don't think PQgetvalue does, so...). > > There are elements of the wire protocol that libpq doesn't actually > > implement from what I recall. IIRC, they were added specifically for > > JDBC but also intended to be used by psqlODBC as well. I forget the > > details though as I wasn't so involved with the ODBC development back > > then. > > Ugh, that's terrible. How do these features get tested if nothing > within the main tree implements them. I fully agree with this sentiment... > > In addition of course, implementing the protocol natively does allow for > > maximum flexibility. > > Maybe, but it should be possible to have a lot of flexibility without > having many projects jump through all sorts of hoops everytime a new > protocol version is created. Indeed. Thanks, Stephen -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) iD8DBQFEPkTirzgMPqB3kigRAlgbAJ0fg9rQhSy5AgdMUnDmEL dv3du/rQCfUAZa rqM8sbwrzvwMv+X3muKHdZI= =hnq+ -----END PGP SIGNATURE----- |
| |||
| On Thu, Apr 13, 2006 at 08:32:34AM -0400, Stephen Frost wrote: > * Martijn van Oosterhout (kleptog@svana.org) wrote: > > Hmm, the simplest improvement I can think of is one where you > > register a callback that libpq calls whenever it has received a new > > tuple. > > You wouldn't want it on every tuple as that'd get expensive through > function calls. Why not? Internally we call pqAddTuple for every tuple, calling a user function instead is hardly going to be more expensive. Also, I was thinking of the situation where the user function could set a flag so the eventual caller of (perhaps) PQconsumeInput knows that it's got enough for now. > It's actually pretty common (or seems to be anyway) to want to store the > data from the query result into your own data structure. Yes, you could > just use pointers all over the place but that means you're going to have > to use things which understand PQresult everywhere as opposed to having a > generic 'storage manager' with other generic things (index creator, > aggregator, etc) which can be used with more than just PQresults. <snip> > You don't provide a callback, you have the user provide a memory region > to libpq which libpq can then fill in. It's really not that difficult, > the API would really look quite a bit like PQexecParams, ie: Except in the case of psqlODBC, it wants to be able to malloc/free() each field, which your method doesn't solve. Also, it doesn't solve the duplicate memory use, nor the retreiving of rows before the resultset is complete. > If we want to do conversion of the data in some way then we may need to > expand this to include that ability (but I don't think PQgetvalue does, > so...). I think a callback is much easier. As a bonus the user could specify that libpq doesn't need to remember the rows. Memory savings. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFEPk0sIB7bNG8LQkwRApSxAJ0WUTzfSiWVC97AIZVolT RhSUFSsACfXzHY LlD5AiLEq/8+grbGpnf7MtM= =FGrf -----END PGP SIGNATURE----- |
| |||
| * Martijn van Oosterhout (kleptog@svana.org) wrote: > On Thu, Apr 13, 2006 at 08:32:34AM -0400, Stephen Frost wrote: > > You wouldn't want it on every tuple as that'd get expensive through > > function calls. > > Why not? Internally we call pqAddTuple for every tuple, calling a user > function instead is hardly going to be more expensive. Also, I was > thinking of the situation where the user function could set a flag > so the eventual caller of (perhaps) PQconsumeInput knows that it's got > enough for now. Hrmpf, the fact that we have a different call we make for every tuple anyway isn't exactly encouraging to me. > > You don't provide a callback, you have the user provide a memory region > > to libpq which libpq can then fill in. It's really not that difficult, > > the API would really look quite a bit like PQexecParams, ie: > > Except in the case of psqlODBC, it wants to be able to malloc/free() > each field, which your method doesn't solve. Also, it doesn't solve the > duplicate memory use, nor the retreiving of rows before the resultset > is complete. I don't entirely follow why you think it wouldn't solve the duplicate memory use (except perhaps in the psqlODBC case if they decide to just grab a bunch of tuples into one area and then go through and malloc/free each one after that, not exactly what I'd suggest...). The basic idea was actually modeled off of 'read'- you get back what's currently available, which might not be the full set you asked for so far. I think perhaps you're assuming that my suggestion would just be an overlay on top of the existing libpq PQgetReult which would just turn around and call PQgetResult to fill in the memory region provided by the user- entirely *not* the case... Perhaps I should have used 'PQconn' instead of 'PQresult' as the first argument and that would have been clearer. Additionally, honestly, this is very similar to how Oracle's multi-row retrival works... It uses two functions (one for setup into its own structure and then one for actually getting rows) but the basic idea is the same. > > If we want to do conversion of the data in some way then we may need to > > expand this to include that ability (but I don't think PQgetvalue does, > > so...). > > I think a callback is much easier. As a bonus the user could specify > that libpq doesn't need to remember the rows. Memory savings. My solution didn't have libpq remembering the rows... Thanks, Stephen -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) iD8DBQFEPlNSrzgMPqB3kigRAul3AJ0cTDFJpb5Paw7CBS8rrw A3cXQdwACfV3iz ivh5AgOWEHh9JTZawmtspYs= =ngBV -----END PGP SIGNATURE----- |
| |||
| Martijn van Oosterhout <kleptog@svana.org> writes: > On Thu, Apr 13, 2006 at 12:12:25PM +0100, Dave Page wrote: > > > Ok. I'm not sure what this "double copying" you're referring > > > to is, > > > > The libpq driver copies results out of the PGresult struct into the > > internal QueryResult classes. With libpq out of the loop, data can go > > straight from the wire into the QR. > > Hmm, the simplest improvement I can think of is one where you > register a callback that libpq calls whenever it has received a new > tuple. That could be useful for applications but I think a driver really wants to retain control of the flow of control. To make use of a callback it would have to have an awkward dance of calling whatever function gives libpq license to call the callback, having the callback stuff the data in a temporary space, then checking for new data in the temporary space, and returning it to the user. -- greg ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| * Martijn van Oosterhout (kleptog@svana.org) wrote: > Why not? Internally we call pqAddTuple for every tuple, calling a user > function instead is hardly going to be more expensive. Also, I was > thinking of the situation where the user function could set a flag > so the eventual caller of (perhaps) PQconsumeInput knows that it's got > enough for now. I went ahead and looked through the libpq source a bit. What I was suggesting looks like it would change primairly getAnotherTuple to, instead of allocating the result memory itself, just store the result into the appropriate place in the user-provided memory space. Thus, getAnotherTuple wouldn't do any allocation and wouldn't call pqAddTuple at all. It would need to keep track of where it is in the user-provided memory area and if it runs out of space return back through the 'outofmemory' mechanism. The new function would basically set up the appropriate structures in the PGconn and then call 'parseInput()' which would then handle any recently-arrived data, call getAnotherTuple, which would then detect that it's dumping data into a user-provided area and would do so until it's finished being called by parseInput() or it runs out of user memory space. This would be used with the async command processing. A drawback, of course, is that this degenerates to busy-waiting if the application has nothing better to do. Any clue as to if the PQsocket could safely be used in a select()-based system? I'm guessing it could, just never tried that myself. data which needs to be sent and hasn't been yet for some reason. Thanks! Stephen -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) iD8DBQFEPmzErzgMPqB3kigRAg4+AJ9iaJLT8tcWLJ2SRB4xfz IPi43r1QCcC7nJ RfjuSW8oS5zHfH3u3GL3Z0k= =Y2rD -----END PGP SIGNATURE----- |
| |||
| On Thu, Apr 13, 2006 at 09:34:10AM -0400, Stephen Frost wrote: > * Martijn van Oosterhout (kleptog@svana.org) wrote: > > Except in the case of psqlODBC, it wants to be able to malloc/free() > > each field, which your method doesn't solve. Also, it doesn't solve the > > duplicate memory use, nor the retreiving of rows before the resultset > > is complete. > > I don't entirely follow why you think it wouldn't solve the duplicate > memory use (except perhaps in the psqlODBC case if they decide to just > grab a bunch of tuples into one area and then go through and malloc/free > each one after that, not exactly what I'd suggest...). Right, I didn't understand that you meant to be doing this synchronously, as the data came in. I thought it was just another way of retreiving the data already received. But given that a stated reason that psqlODBC didn't use the libpq interface was due to the copying of all the data, it would be nice if we had something for that. From looking at your declaration: int PQgetTuples(PGresult *res, // Returns number of tuples populated const int max_ntuples, // Basically buffer size char *result_set, // Destination buffer const int *columnOffsets, // integer array of offsets const int *columnLengths, // integer array of lengths, for checks const int record_len, // Length of each structure int *columnNulls, // 0/1 for is not null / is null int resultFormat); // Or maybe just binary? you seem to be suggesting that all the data be stored in one big memory block at resultset. What do you do if the data is longer than the given length? What does record_len mean (what structures)? Also, you can't specify binary/non-binary here, that's done in the query request. libpq doesn't handle the data differently depending on binaryness. Also, how can you find out the actual length of each value after the call? Frankly I'm not seeing much improvement over normal processing. It just seems like yet another data-model that won't fit most users. The definition of PQgetvalue is merely: return res->tuples[tup_num][field_num].value; So we could acheive the same effect by letting people look into PQresult before the query is finished. The function you suggest would be especially difficult for something like psqlODBC which has no idea beforehand how long a value could be. I'm still of the opinion that letting people supply an alternative to pqAddTuple would be cleaner. The interface would look like: typedef struct pgresAttValue { int len; /* length in bytes of the value */ char *value; /* actual value, plus terminating zero byte */ } PGresAttValue; typedef int (*PQtuplecallback)( PQresult *res, PGresAttValue *fields ); int PQsettuplecallback( PQresult *res, PQtuplecallback cb ); fields is simply a pointer to an array of nfields such structures. Users can do whatever they want with the info, store it in their own structure, parse it, throw it away, send it over a network, etc. With this callback I could probably implement your function above fairly straightforwardly. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFEPmzYIB7bNG8LQkwRAj7oAJ4mOR6tiCDkkrcjBSF98M YtTimZegCeP5fv ANN3noi3g6+59e0kvOBTWM4= =HuPb -----END PGP SIGNATURE----- |
| |||
| * Greg Stark (gsstark@mit.edu) wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > > Hmm, the simplest improvement I can think of is one where you > > register a callback that libpq calls whenever it has received a new > > tuple. > > That could be useful for applications but I think a driver really wants to > retain control of the flow of control. To make use of a callback it wouldhave > to have an awkward dance of calling whatever function gives libpq licenseto > call the callback, having the callback stuff the data in a temporary space, > then checking for new data in the temporary space, and returning it to the > user. I doubt the callback would be called at some inopportune time... Probably the callback would be passed into a libpq call which then directly calls the callback and is done with it when it returns. The libpq function would certainly need a parameter which is just passed to the callback to allow the system to maintain state (such as how many tuples the callback has processed so far) to avoid ugly global variables but otherwise I don't really see that this is changing the flow of control all that much... I can see how having a callback would be useful though I think for a good number of cases it's just going to be populating a memory region with it and we could cover that common case by providing an API for exactly that. The other issue with a callback is that libpq would have to either call the callback for each value (not my preference) or have some way to pass a whole variable-length tuple to the callback, which would require libpq to allocate memory for the tuple (hopefully only once and not per-tuple) and then build up whatever structure it's going to give to the callback in memory (copy once) and then call the callback which would be required to copy the tuple somewhere else (copy again). Of course, all of this is after an initial copy from read() into the read buffer, but I doubt that could be helped (and read()'ing small enough amounts to make it happen wouldn't really improve things). Thanks, Stephen -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) iD8DBQFEPm9QrzgMPqB3kigRAlQnAKCELso0VPETK+R1JkI1vZ 1Sb2OCXwCeKOS+ VjH+DHYi4wxqrSAiXbt8eU4= =LIx/ -----END PGP SIGNATURE----- |
| ||||
| On Thu, Apr 13, 2006 at 11:14:57AM -0400, Greg Stark wrote: > That could be useful for applications but I think a driver really wants to > retain control of the flow of control. To make use of a callback it wouldhave > to have an awkward dance of calling whatever function gives libpq licenseto > call the callback, having the callback stuff the data in a temporary space, > then checking for new data in the temporary space, and returning it to the > user. We have an asyncronous interface. I was thinking like: PQsendQuery( conn, query ); res = PQgetResult( conn ); gotenough = FALSE; PQsetcallback( res, mycallback ); while( !gotenough ) PQconsumeinput(conn); /* When we reach here we have at least five rows in our data structure */ sub mycallback(res,data) { /* stuff data in memory structure */ if( row_count > 5 ) gotenough = TRUE; } If you set non-blocking you can even go off and do other things while waiting. No need for temporary space... Does this seem too complex? -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFEPm8EIB7bNG8LQkwRAiw2AJ4nMsZJxCVCBw07LmkQcX pL2bxSXgCeITOv X1+iZIzpsotaiUYeyd6bRls= =9B89 -----END PGP SIGNATURE----- |