vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| > -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Friday, December 16, 2005 10:41 PM > To: Dann Corbit > Cc: Qingqing Zhou; Bruce Momjian; Luke Lonergan; Neil Conway; pgsql- > hackers@postgresql.org > Subject: Re: [HACKERS] Re: Which qsort is used > > "Dann Corbit" <DCorbit@connx.com> writes: > >> I've still got a problem with these checks; I think they are a net > >> waste of cycles on average. > > > The benchmarks say that they (order checks) are a good idea on average > > for ordered data, random data, and partly ordered data. > > There are lies, damn lies, and benchmarks ;-) > > The problem with citing a benchmark for this discussion is that a > benchmark can't tell you anything about real-world probabilities; > it only tells you about the probabilities occuring in the benchmark > case. You need to make the case that the benchmark reflects the > real world, which you didn't. > > > If you trace the algorithms in a debugger you will be surprised at how > > often the partitions are ordered, even with random sets as input. > > Well, I do agree that checking for orderedness on small partitions would > succeed more often than on larger partitions or the whole file --- but > the code-as-given checks all the way down. Moreover, the argument given > for spending these cycles is that insertion sort sucks on reverse-order > input ... where "sucks" means that it spends O(N^2) time. But it spends > O(N^2) in the average case, too. I agree that in general these checks are not important and they complicate what is a simple and elegant algorithm. The cases where the checks are important (highly ordered data sets) are rare and so the value added is minimal. I am actually quite impressed with the excellence of Bentley's sort out of the box. It's definitely the best library implementation of a sort I have seen. ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| On Fri, Dec 16, 2005 at 10:43:58PM -0800, Dann Corbit wrote: > I am actually quite impressed with the excellence of Bentley's sort out > of the box. It's definitely the best library implementation of a sort I > have seen. I'm not sure whether we have a conclusion here, but I do have one question: is there a significant difference in the number of times the comparison routines are called? Comparisons in PostgreSQL are fairly expensive given the fmgr overhead and when comparing tuples it's even worse. We don't want to accedently pick a routine that saves data shuffling by adding extra comparisons. The stats at [1] don't say. They try to factor in CPU cost but they seem to use unrealistically small values. I would think a number around 50 (or higher) would be more representative. [1] http://www.cs.toronto.edu/~zhouqq/po...sort/sort.html Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQFDpptzIB7bNG8LQkwRAmC6AJ4qYrIm3SYnBV3BybSmm+ Gl4vpEywCfRDxg bnIK4INRqOVFNBAKR/gDPcM= =92qA -----END PGP SIGNATURE----- |
| ||||
| Martin, On 12/19/05 3:37 AM, "Martijn van Oosterhout" <kleptog@svana.org> wrote: > I'm not sure whether we have a conclusion here, but I do have one > question: is there a significant difference in the number of times the > comparison routines are called? Comparisons in PostgreSQL are fairly > expensive given the fmgr overhead and when comparing tuples it's even > worse. It would be interesting to note the comparison count of the different routines. Something that really grabbed me about the results though is that the relative performance of the routines dramatically shifted when the indirect references in the comparators went in. The first test I did sorted an array of int4 - these tests that Qingqing did sorted arrays using an indirect pointer list, at which point the same distributions performed very differently. I suspect that it is the number of comparisons that caused this, and further that the indirection has disabled the compiler optimizations for memory prefetch and other things that it could normally recognize. Given the usage pattern in Postgres, where sorted things are a mix of strings and intrinsic types, I'm not sure those optimizations could be done by one routine. I haven't verified this, but it certainly seems that the NetBSD routine is the overall winner for the type of use that Postgres has (sorting the using a pointer list). - Luke ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| Thread Tools | |
| Display Modes | |
|
|