vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| > -----Original Message----- > From: Joel Fradkin [mailto:jfradkin@wazagua.com] > Sent: Thursday, April 14, 2005 11:39 AM > To: 'Tom Lane'; 'Dawid Kuroczko' > Cc: 'PERFORM' > Subject: Re: [PERFORM] speed of querry? > > > I did as described to alter table and did not see any > difference in speed. I am trying to undo the symbolic > link to the data array and set it up on raid 5 disks in > the machine just to test if there is an issue with the > config of the raid 10 array or a problem with the controller. > > I am kinda lame at Linux so not sure I have got it yet still > testing. Still kind puzzled why it chose tow different option, > but one is running windows version of postgres, so maybe that > has something to do with it. That sounds like a plausible explanation. However, it could simply be that the statistics gathered on each box are sufficiently different to cause different plans. > The data bases and configs (as far as page cost) are the same. Did you do as Dawid suggested? > [...] > Then do a query couple of times (EXPLAIN ANALYZE also > do: > SET enable_seqscan = off; > and rerun the query -- if it was significantly faster, you will > want to do: > SET enable_seqscan = on; > and tweak: > SET random_page_cost = 2.1; > ...and play with values. When you reach the random_page_cost > which suits your data, you will want to put it into > postgresql.conf > [...] This is above and beyond toying with the column statistics. You are basically telling the planner to use an index. Try this, and post the EXPLAIN ANALYZE for the seqscan = off case on the slow box if it doesn't speed things up for you. __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129 ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| It is still slower on the Linux box. (included is explain with SET enable_seqscan = off; explain analyze select * from viwassoclist where clientnum ='SAKS') See below. I did a few other tests (changing drive arrays helped by 1 second was slower on my raid 10 on the powervault). Pulling just raw data is much faster on the Linux box. "Seq Scan on tblresponse_line (cost=100000000.00..100089717.78 rows=4032078 width=67) (actual time=0.028..4600.431 rows=4032078 loops=1)" "Total runtime: 6809.399 ms" Windows box "Seq Scan on tblresponse_line (cost=0.00..93203.68 rows=4031968 width=67) (actual time=16.000..11316.000 rows=4031968 loops=1)" "Total runtime: 16672.000 ms" I am going to reload the data bases, just to see what I get. I am thinking I may have to flatten the files for postgres (eliminate joins of any kind for reporting etc). Might make a good deal more data, but I think from the app's point of view it is a good idea anyway, just not sure how to handle editing. Joel Fradkin "Merge Join (cost=49697.60..50744.71 rows=14987 width=113) (actual time=11301.160..12171.072 rows=160593 loops=1)" " Merge Cond: ("outer".locationid = "inner".locationid)" " -> Sort (cost=788.81..789.89 rows=432 width=49) (actual time=3.318..3.603 rows=441 loops=1)" " Sort Key: l.locationid" " -> Index Scan using ix_location on tbllocation l (cost=0.00..769.90 rows=432 width=49) (actual time=0.145..2.283 rows=441 loops=1)" " Index Cond: ('SAKS'::text = (clientnum)::text)" " -> Sort (cost=48908.79..49352.17 rows=177352 width=75) (actual time=11297.774..11463.780 rows=160594 loops=1)" " Sort Key: a.locationid" " -> Merge Right Join (cost=26247.95..28942.93 rows=177352 width=75) (actual time=8357.010..9335.362 rows=177041 loops=1)" " Merge Cond: ((("outer".clientnum)::text = "inner"."?column10?") AND ("outer".id = "inner".jobtitleid))" " -> Index Scan using ix_tbljobtitle_id on tbljobtitle jt (cost=0.00..243.76 rows=6604 width=37) (actual time=0.122..12.049 rows=5690 loops=1)" " Filter: (1 = presentationid)" " -> Sort (cost=26247.95..26691.33 rows=177352 width=53) (actual time=8342.271..8554.943 rows=177041 loops=1)" " Sort Key: (a.clientnum)::text, a.jobtitleid" " -> Index Scan using ix_associate_clientnum on tblassociate a (cost=0.00..10786.17 rows=177352 width=53) (actual time=0.166..1126.052 rows=177041 loops=1)" " Index Cond: ((clientnum)::text = 'SAKS'::text)" "Total runtime: 12287.502 ms" This is above and beyond toying with the column statistics. You are basically telling the planner to use an index. Try this, and post the EXPLAIN ANALYZE for the seqscan = off case on the slow box if it doesn't speed things up for you. __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129 ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster |
| |||
| "Joel Fradkin" <jfradkin@wazagua.com> writes: > "Merge Join (cost=49697.60..50744.71 rows=14987 width=113) (actual > time=11301.160..12171.072 rows=160593 loops=1)" > " Merge Cond: ("outer".locationid = "inner".locationid)" > " -> Sort (cost=788.81..789.89 rows=432 width=49) (actual > time=3.318..3.603 rows=441 loops=1)" > " Sort Key: l.locationid" > " -> Index Scan using ix_location on tbllocation l > (cost=0.00..769.90 rows=432 width=49) (actual time=0.145..2.283 rows=441 > loops=1)" > " Index Cond: ('SAKS'::text = (clientnum)::text)" > " -> Sort (cost=48908.79..49352.17 rows=177352 width=75) (actual > time=11297.774..11463.780 rows=160594 loops=1)" > " Sort Key: a.locationid" > " -> Merge Right Join (cost=26247.95..28942.93 rows=177352 > width=75) (actual time=8357.010..9335.362 rows=177041 loops=1)" > " Merge Cond: ((("outer".clientnum)::text = > "inner"."?column10?") AND ("outer".id = "inner".jobtitleid))" > " -> Index Scan using ix_tbljobtitle_id on tbljobtitle jt > (cost=0.00..243.76 rows=6604 width=37) (actual time=0.122..12.049 rows=5690 > loops=1)" > " Filter: (1 = presentationid)" > " -> Sort (cost=26247.95..26691.33 rows=177352 width=53) > (actual time=8342.271..8554.943 rows=177041 loops=1)" > " Sort Key: (a.clientnum)::text, a.jobtitleid" > " -> Index Scan using ix_associate_clientnum on > tblassociate a (cost=0.00..10786.17 rows=177352 width=53) (actual > time=0.166..1126.052 rows=177041 loops=1)" > " Index Cond: ((clientnum)::text = 'SAKS'::text)" > "Total runtime: 12287.502 ms" It strikes me as odd that the thing isn't considering hash joins for at least some of these steps. Can you force it to (by setting enable_mergejoin off)? If not, what are the datatypes of the join columns exactly? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org |
| |||
| Joel Fradkin Turning off merg joins seems to of done it but what do I need to do so I am not telling the system explicitly not to use them, I must be missing some setting? On linux box. explain analyze select * from viwassoclist where clientnum ='SAKS' "Hash Join (cost=988.25..292835.36 rows=15773 width=113) (actual time=23.514..3024.064 rows=160593 loops=1)" " Hash Cond: ("outer".locationid = "inner".locationid)" " -> Hash Left Join (cost=185.57..226218.77 rows=177236 width=75) (actual time=21.147..2221.098 rows=177041 loops=1)" " Hash Cond: (("outer".jobtitleid = "inner".id) AND (("outer".clientnum)::text = ("inner".clientnum)::text))" " -> Seq Scan on tblassociate a (cost=0.00..30851.25 rows=177236 width=53) (actual time=0.390..1095.385 rows=177041 loops=1)" " Filter: ((clientnum)::text = 'SAKS'::text)" " -> Hash (cost=152.55..152.55 rows=6604 width=37) (actual time=20.609..20.609 rows=0 loops=1)" " -> Seq Scan on tbljobtitle jt (cost=0.00..152.55 rows=6604 width=37) (actual time=0.033..12.319 rows=6603 loops=1)" " Filter: (1 = presentationid)" " -> Hash (cost=801.54..801.54 rows=454 width=49) (actual time=2.196..2.196 rows=0 loops=1)" " -> Index Scan using ix_location on tbllocation l (cost=0.00..801.54 rows=454 width=49) (actual time=0.111..1.755 rows=441 loops=1)" " Index Cond: ('SAKS'::text = (clientnum)::text)" "Total runtime: 3120.366 ms" here are the table defs and view if that helps. I posted the config a while back, but can do it again if you need to see it. CREATE OR REPLACE VIEW viwassoclist AS SELECT a.clientnum, a.associateid, a.associatenum, a.lastname, a.firstname, jt.value AS jobtitle, l.name AS "location", l.locationid AS mainlocationid, l.divisionid, l.regionid, l.districtid, (a.lastname::text || ', '::text) || a.firstname::text AS assocname, a.isactive, a.isdeleted FROM tblassociate a LEFT JOIN tbljobtitle jt ON a.jobtitleid = jt.id AND jt.clientnum::text = a.clientnum::text AND 1 = jt.presentationid JOIN tbllocation l ON a.locationid = l.locationid AND l.clientnum::text = a.clientnum::text; CREATE TABLE tblassociate ( clientnum varchar(16) NOT NULL, associateid int4 NOT NULL, associatenum varchar(10), firstname varchar(50), middleinit varchar(5), lastname varchar(50), ssn varchar(18), dob timestamp, address varchar(100), city varchar(50), state varchar(50), country varchar(50), zip varchar(10), homephone varchar(14), cellphone varchar(14), pager varchar(14), associateaccount varchar(50), doh timestamp, dot timestamp, rehiredate timestamp, lastdayworked timestamp, staffexecid int4, jobtitleid int4, locationid int4, deptid int4, positionnum int4, worktypeid int4, sexid int4, maritalstatusid int4, ethnicityid int4, weight float8, heightfeet int4, heightinches int4, haircolorid int4, eyecolorid int4, isonalarmlist bool NOT NULL DEFAULT false, isactive bool NOT NULL DEFAULT true, ismanager bool NOT NULL DEFAULT false, issecurity bool NOT NULL DEFAULT false, createdbyid int4, isdeleted bool NOT NULL DEFAULT false, militarybranchid int4, militarystatusid int4, patrontypeid int4, identificationtypeid int4, workaddress varchar(200), testtypeid int4, testscore int4, pin int4, county varchar(50), CONSTRAINT pk_tblassociate PRIMARY KEY (clientnum, associateid), CONSTRAINT ix_tblassociate UNIQUE (clientnum, associatenum) ) CREATE TABLE tbljobtitle ( clientnum varchar(16) NOT NULL, id int4 NOT NULL, value varchar(50), code varchar(16), isdeleted bool DEFAULT false, presentationid int4 NOT NULL DEFAULT 1, CONSTRAINT pk_tbljobtitle PRIMARY KEY (clientnum, id, presentationid) ) CREATE TABLE tbllocation ( clientnum varchar(16) NOT NULL, locationid int4 NOT NULL, districtid int4 NOT NULL, regionid int4 NOT NULL, divisionid int4 NOT NULL, locationnum varchar(8), name varchar(50), clientlocnum varchar(50), address varchar(100), address2 varchar(100), city varchar(50), state varchar(2) NOT NULL DEFAULT 'zz'::character varying, zip varchar(10), countryid int4, phone varchar(15), fax varchar(15), payname varchar(40), contact char(36), active bool NOT NULL DEFAULT true, coiprogram text, coilimit text, coiuser varchar(255), coidatetime varchar(32), ec_note_field varchar(1050), locationtypeid int4, open_time timestamp, close_time timestamp, insurance_loc_id varchar(50), lpregionid int4, sic int4, CONSTRAINT pk_tbllocation PRIMARY KEY (clientnum, locationid), CONSTRAINT ix_tbllocation_1 UNIQUE (clientnum, locationnum, name), CONSTRAINT ix_tbllocation_unique_number UNIQUE (clientnum, divisionid, regionid, districtid, locationnum) ) It strikes me as odd that the thing isn't considering hash joins for at least some of these steps. Can you force it to (by setting enable_mergejoin off)? If not, what are the datatypes of the join columns exactly? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org |
| ||||
| "Joel Fradkin" <jfradkin@wazagua.com> writes: > Turning off merg joins seems to of done it but what do I need to do so I am > not telling the system explicitly not to use them, I must be missing some > setting? > " -> Hash Left Join (cost=185.57..226218.77 rows=177236 width=75) (actual > time=21.147..2221.098 rows=177041 loops=1)" > " Hash Cond: (("outer".jobtitleid = "inner".id) AND > (("outer".clientnum)::text = ("inner".clientnum)::text))" It's overestimating the cost of this join for some reason ... and I think I see why. It's not accounting for the combined effect of the two hash clauses, only for the "better" one. What are the statistics for tbljobtitle.id and tbljobtitle.clientnum --- how many distinct values of each, and are the distributions skewed to a few popular values? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| Thread Tools | |
| Display Modes | |
|
|