This is a discussion on Re: seq scan cache vs. index cache smackdown within the Pgsql Performance forums, part of the PostgreSQL category; --> > Magnus Hagander wrote: > > I don't think that's correct either. Scatter/Gather I/O is used to SQL > ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| > Magnus Hagander wrote: > > I don't think that's correct either. Scatter/Gather I/O is used to SQL > > Server can issue reads for several blocks from disks into it's own > > buffer cache with a single syscall even if these buffers are not > > sequential. It did make significant performance improvements when they > > added it, though. > > > > (For those not knowing - it's ReadFile/WriteFile where you pass an array > > of "this many bytes to this address" as parameters) > > Isn't that like the BSD writev()/readv() that Linux supports also? Is > that something we should be using on Unix if it is supported by the OS? readv and writev are in the single unix spec...and yes they are basically just like the win32 versions except that that are synchronous (and therefore better, IMO). On some systems they might just be implemented as a loop inside the library, or even as a macro. http://www.opengroup.org/onlinepubs/.../sysuio.h.html On operating systems that optimize vectored read operations, it's pretty reasonable to assume good or even great performance gains, in addition to (or instead of) recent changes to xlog.c to group writes together for a file...it just takes things one stop further. Is there a reason why readv/writev have not been considered in the past? Merlin ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings |
| |||
| "Merlin Moncure" <merlin.moncure@rcsonline.com> writes: > Is there a reason why readv/writev have not been considered in the past? Lack of portability, and lack of obvious usefulness that would justify dealing with the lack of portability. I don't think there's any value in trying to write ordinary buffers this way; making the buffer manager able to write multiple buffers at once sounds like a great deal of complexity and deadlock risk in return for not much. It might be an alternative to the existing proposed patch for writing multiple WAL buffers at once, but frankly I consider that patch a waste of effort. In real scenarios you seldom get to write more than one WAL page without a forced sync occurring because someone committed. Even if *your* transaction is long, any other backend committing a small transaction still fsyncs. On top of that, the bgwriter will be flushing WAL in order to maintain the write-ahead rule any time it dumps a dirty buffer. I have a personal to-do item to make the bgwriter explicitly responsible for writing completed WAL pages as part of its duties, but I haven't done anything about it because I think that it will write lots of such pages without any explicit code, thanks to the bufmgr's LSN interlock. Even if it doesn't get it done that way, the simplest answer is to add a little bit of code to make sure bgwriter generally does the writes, and then we don't care. If you want to experiment with writev, feel free, but I'll want to see demonstrable performance benefits before any such code actually goes in. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| ||||
| Merlin Moncure wrote: > > readv and writev are in the single unix spec...and yes ... > > On some systems they might just be implemented as a loop inside the > library, or even as a macro. You sure? Requirements like this: http://www.opengroup.org/onlinepubs/...xsh/write.html "Write requests of {PIPE_BUF} bytes or less will not be interleaved with data from other processes doing writes on the same pipe." make me think that it couldn't be just a macro; and if it were a loop in the library it seems it'd still have to make sure it's done with a single write system call. (yeah, I know that requirement is just for pipes; and I suppose they could write a loop for normal files and a different special case for pipes; but I'd be surprised). |