Re: resource utilization / hardware selection Hi Sybrand ,
-There is no such thing as async_io possible on an ufs filesysyem.
-You can easily verify this by trussing dbwr.
Okay , i mean threaded io . This is a method which simulates kernel
async io which is only available on raw devices . For ufs , some number
of threads are spawned , i can see the related system create thread
call in truss output .(i can also see this after the kaio read system
call is failed .) It can be argued whether this type of aio has any
benefit .
-Not high, but disastrous.
-Parallel query is only useful when you are striped your data across
-multiple disks. Try removing the parallel hint, and it will be faster.
I have tested this:
SQL> set timing on
SQL> set autotrace on
SQL> select /*+PARALLEL(T,8) FULL(T)*/ count(*) from
MERKEZ.mrk_musteri_TELEFON
T;
COUNT(*)
----------
46545797
Elapsed: 00:04:42.02
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=3053 Card=1)
1 0 SORT (AGGREGATE)
2 1 SORT* (AGGREGATE)
:Q182157
7000
3 2 PARTITION RANGE* (ALL)
:Q182157
7000
4 3 PARTITION HASH* (ALL)
:Q182157
7000
5 4 TABLE ACCESS* (FULL) OF 'MRK_MUSTERI_TELEFON' (Cos
:Q182157 t=3053 Card=28170840)
7000
2 PARALLEL_TO_SERIAL SELECT /*+ PIV_SSF */
SYS_OP_MSR(COUNT(*)) FROM (SELECT /*+ NO_EXPAND ROWID(A2)
3 PARALLEL_COMBINED_WITH_PARENT
4 PARALLEL_COMBINED_WITH_PARENT
5 PARALLEL_COMBINED_WITH_PARENT
Statistics
----------------------------------------------------------
7461 recursive calls
3 db block gets
424453 consistent gets
401403 physical reads
804 redo size
494 bytes sent via SQL*Net to client
652 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
178 sorts (memory)
0 sorts (disk)
1 rows processed
This parallel sql takes 4:42 min. with 401403 disk reads.
The nonparallel version of the same sql:
SQL> set autotrace on;
SQL> set timing on;
SQL> select /*+noPARALLEL(T) FULL(T)*/ count(*) from
MERKEZ.mrk_musteri_TELEFON T;
COUNT(*)
----------
46545797
Elapsed: 00:10:11.64
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=24418 Card=1)
1 0 SORT (AGGREGATE)
2 1 PARTITION RANGE (ALL)
3 2 PARTITION HASH (ALL)
4 3 TABLE ACCESS (FULL) OF 'MRK_MUSTERI_TELEFON' (Cost=2
4418 Card=28170840)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
420260 consistent gets
401107 physical reads
2291832 redo size
494 bytes sent via SQL*Net to client
651 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
This takes 10 min. with 401107 disk reads
During this test , there exists no other users connected to the db.
For the parallel sql , iostat , vmstat , sar shows:
avg-cpu: %user %nice %system %iowait %idle
2.15 0.00 1.80 0.00 96.05
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
sdp2 43501.50 0.00 885.60 1.30 51122.40 1.30 25561.20
0.65 57.64 114.44 129.43 1.13 100.00
22:52:33 CPU %user %nice %system %idle
22:52:43 all 17.00 0.00 15.25 991.75
22:52:53 all 19.65 0.00 16.55 987.80
22:53:03 all 22.30 0.00 14.53 987.17
22:53:13 all 17.12 0.00 13.28 993.60
procs memory swap io system
cpu
r b w swpd free buff cache si so bi bo in cs us
sy id
0 8 1 12384 1168976 234272 1798800 1 1 2 1 2 0
1 0 0
0 8 1 12384 1168976 234272 1798800 0 0 22746 62 5122 5058
2 2 96
0 8 0 12384 1168976 234272 1798800 0 0 22921 15 5106 5079
2 2 96
for nonparallel sql:
11:11:42 PM CPU %user %nice %system %idle
11:11:52 PM all 18.95 0.00 14.40 990.65
11:12:02 PM all 18.95 0.00 13.12 991.92
11:12:12 PM all 18.52 0.00 13.50 991.98
avg-cpu: %user %nice %system %iowait %idle
1.70 0.00 1.50 0.00 96.80
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
sdp2 20787.00 0.00 670.70 2.00 21457.60 2.00 10728.80
1.00 31.90 0.89 1.33 1.32 89.07
procs memory swap io system
cpu
r b w swpd free buff cache si so bi bo in cs us
sy id
0 1 0 564032 833040 129936 4278928 1 0 0 0 0 0
1 0 0
0 1 0 564032 833040 129936 4278928 0 0 11982 16 6465 10554
2 2 97
0 1 0 564032 833040 129936 4278928 0 0 12014 15 6389 10535
2 2 96
0 1 0 564032 833040 129936 4278928 0 0 12412 127 6524 10811
2 2 96
For both these cases , here is my comment and understanding :
For both case , cpus are idle . Parallel query utilized the disks 100
% , has 1.13 msec service time , 129.43 msec await . ( wait in disk
queue) 1.13 msec. service time seems to me a very good value for the
emc . High await shows that lots of io reuqests are sent to the emc and
they wait their turn in queue.
Nonparallel sql also have 1.32 msec service time , very low 1.33 await
.. This time i think , since there is only one oracle process that
requests data , the await time is low. Since the number of io requests
are not high , all of them are processed . The disk are utilized at
89.07 percent.
So what these results show to me that , parallel sql altough utilized
the disks 100% runs faster than the nonparallel sql . So utilizing the
disks for more than 95 is not a bad thing , for this case , which
everything is idle (cpu-memory) and slaves wait more data from the
disks. If i optimize the emc part better , such as striping to more
disks , the await value may be decreased.
Do you agree with me?
-Please answer the following questions
Yes, sure.
- as this is the second time you post this story, and you have been
asked to post the execution plans, why don't you do so? This is an
Oracle newsgroup, not a car dealer.
I have posted in this mail.
- Are your statistics current?
Yes. Also , both sqls uses full table scans , no index , and i also put
hint.
- Did you gather system statistics?
yes
- Or did you set the optimizer parameters appropiately?
If you consider setting db_file_multiblock_read_count or parallel query
parameters , i can say that those parameters are set correct.
- Are you aware you are forking off 16 query slaves for this query on
a box with 4 and 2 cpus for your second query?
Yes i am aware. Running a sql with 8 parallel degree does not put any
load into the cpu. So why does not do this? Yes i have 4 cpus , but if
i see that cpu resources are idle , i think i can use this resource.
Since sar or vmstat show that cpu util is very low , i think i can add
more parallelism. On the other hand , running this sql by using 4
slaves instead of 8 takes more time .
- Are you aware those query slaves need to be coordinated?
Yes. I monitor sar -q in order to see the cpu queue. Nothing high there
.. If there is a way to measure whether oracle parallel query slaves
coordination is bottleneck or not , i can check this.
- Did you stripe the data across 8 spindles?
yes
- Aren't you just *creating* contention *by design*?
I dont think so
- So what do you expect of the outcome?
- Do you realize replacing the hardware for an untuned database is
probably not going to help you *at all*?
I dont believe that my db is untuned. But ,based on the above optimizer
plans and other outputs , i am open to all advices. I dont only
monitor the os part , i check both db and os .
Mainly I try to optimize random seq.scans and try to use the server
resources as efficient as possible. I can replace the hardware , add
more cpus for example. But since my 8 way parallel query only utilizes
2 percent of the 4 ia64 cpus , adding 4 more cpus does not change
anything.
Kind Regards,
tolga |