vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| > On Wed, 2 Nov 2005, Merlin Moncure wrote: > If you put client/server on the same machine, then we don't know how the > CPU is splitted. Can you take a look at the approximate number by > observing the task manager data while running? ok, I generated a test case which was 250k inserts to simple two column table all in single transaction. Every 50k inserts, time is recorded via timeofday(). Running from remote, Time progression is: First 50k: 20 sec Second : 29 sec [...] final: : 66 sec so, clear upward progression of time/rec. Initial time is 2.5k inserts/sec which is decent but not great for such a narrow table. CPU time on server starts around 50% and drops in exact proportion to insert performance. My earlier gprof test also suggest there is no smoking gun sucking down all the cpu time. cpu time on the client is very volatile but with a clear increase over time starting around 20 and ending perhaps 60. My client box is pretty quick, 3ghz p4. Running the script locally, from the server, cpu time is pegged at 100% and stays...first 50k is 23 sec with a much worse decomposition to almost three minutes for final 50k. Merlin > If communication code is the suspect, can we measure the difference if we > disable the redefinition of recv()/send() etc in port/win32.h (may require > change related code a little bit as well). In this way, the socket will > not be able to pickup signals, but let see if there is any performance > difference first. > > Regards, > Qingqing > > > > > > [OK, I'm bringing this back on-list, and bringing it to QingQing's > > attention, who I secretly hope is the right person to be looking at this > > problem > > > P.s. You scared me ;-) ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| ""Merlin Moncure"" <merlin.moncure@rcsonline.com> wrote > > Running from remote, Time progression is: > First 50k: 20 sec > Second : 29 sec > [...] > final: : 66 sec > This may due to the maintainence cost of a big transaction, I am not sure .... Tom? > so, clear upward progression of time/rec. Initial time is 2.5k > inserts/sec which is decent but not great for such a narrow table. CPU > time on server starts around 50% and drops in exact proportion to insert > performance. My earlier gprof test also suggest there is no smoking gun > sucking down all the cpu time. > Not to 100%, so this means the server is always starving. It is waiting on something -- of couse not lock. That's why I think there is some problem on network communication. Another suspect will be the write - I knwo NTFS system will issue an internal log when extending a file. To remove the second suspect, I will try to hack the source to do a "fake" write ... Regards, Qingqing |
| |||
| "Qingqing Zhou" <zhouqq@cs.toronto.edu> wrote > > Not to 100%, so this means the server is always starving. It is waiting on > something -- of couse not lock. That's why I think there is some problem > on network communication. Another suspect will be the write - I knwo NTFS > system will issue an internal log when extending a file. To remove the > second suspect, I will try to hack the source to do a "fake" write ... > To patch: ------------------------- Here is a quite straight hack to implement "fake" write for both relation and xlog. Now the server becomes pure CPU play. 1. RelationGetBufferForTuple()/hio.c: remove line (if you do not enable cassert, then doesn't matter): - Assert(PageIsNew((PageHeader) pageHeader)); 2. ReadBuffer()/bufmgr.c: remove line - smgrextend(reln->rd_smgr, blockNum, (char *) bufBlock, - reln->rd_istemp); 3. XLogWrite()/xlog.c errno = 0; + goto fake; if (write(openLogFile, from, nbytes) != nbytes) { /* if write didn't set errno, assume no disk space */ ... } + + fake: /* Update state for write */ To use it: ------------------------- 1. have several copies of a correct data; 2. patch the server; 3. when you startup postmaster, use the following parameters: postmaster -c"checkpoint_timeout=3600" -c"bgwriter_all_percent=0" -Ddata Note now the database server is one-shoot usable -- after you shutdown, it won't startup again. Just run begin; many inserts; end; To observe: ------------------------- (1) In this case, what's the remote server CPU usage -- 100%? I don't have several machines to test it. In my single machine, I run 35000 insert commands from psql by cut and paste into it and could observe that: --- 25% kernel time 75% user time 20% postgresql (--enable-debug --enable-cassert) 65% psql (as same above) 10% csrss (system process, manage graphics commands (not sure, just googled it), etc) 5% system (system process) --- (2) In this case, Linux still keeps almost 10 times faster? After this, we may need more observations like comparison of simple "select 1;" to reduce the code space we may want to explore ... Regards, Qingqing |
| |||
| "Merlin Moncure" <merlin.moncure@rcsonline.com> writes: > ok, I generated a test case which was 250k inserts to simple two column > table all in single transaction. Every 50k inserts, time is recorded > via timeofday(). You mean something like the attached? > Running from remote, Time progression is: > First 50k: 20 sec > Second : 29 sec > [...] > final: : 66 sec On Unix I get a dead flat line (within measurement noise), both local loopback and across my LAN. after 50000 30.20 sec after 100000 31.67 sec after 150000 30.98 sec after 200000 29.64 sec after 250000 29.83 sec "top" shows nearly constant CPU usage over the run, too. With a local connection it's pretty well pegged, with LAN connection the server's about 20% idle and the client about 90% (client machine is much faster than server which may affect this, but I'm too lazy to try it in the other direction). I think it's highly likely that you are looking at some strange behavior of the Windows TCP stack. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| ||||
| On Thu, 3 Nov 2005, Tom Lane wrote: > > On Unix I get a dead flat line (within measurement noise), both local > loopback and across my LAN. > > after 50000 30.20 sec > after 100000 31.67 sec > after 150000 30.98 sec > after 200000 29.64 sec > after 250000 29.83 sec > Confirmed in Linux. And on a winxp machine(sp2) with server, client together, with (see almost no performance difference) or without my "fake" write, the observation is still hold for both cases: after 50000 25.21 sec after 100000 26.26 sec after 150000 25.23 sec after 200000 26.25 sec after 250000 26.58 sec In both cases, postgres 67% cpu, psql 15~20%, rest: system process. Kernel time is 40+% -- where from? Regards, Qingqing ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |