Unix Technical Forum

Re: High load and iowait but no disk access

This is a discussion on Re: High load and iowait but no disk access within the Pgsql Performance forums, part of the PostgreSQL category; --> On 30-Aug-05, at 14:46, Anjan Dave wrote: > I have seen references of changing the kernel io scheduler at ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-18-2008, 12:20 PM
=?ISO-8859-1?Q?R=E9my_Beaumont?=
 
Posts: n/a
Default Re: High load and iowait but no disk access


On 30-Aug-05, at 14:46, Anjan Dave wrote:

> I have seen references of changing the kernel io scheduler at boot
> time…not sure if it applies to RHEL3.0, or will help, but try setting
> ‘elevator=deadline’ during boot time or via grub.conf.

That's only for RHEL 4.0.

> Have you tried running a simple ‘dd’ on the LUN?

We get amazing performance using dd.
> The drives are in RAID10 configuration, right?

NetApp has their own type of raid format (RAID4 aka WAFL)

Rémy
> *
> Thanks,
> Anjan
>
> From: Woody Woodring [mailto:george.woodring@iglass.net]
> Sent: Tuesday, August 30, 2005 2:30 PM
> To: 'Rémy Beaumont'; pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] High load and iowait but no disk access
> *
> Have you tried a different kernel?* We run with a netapp over NFS
> without any issues, but we have seen high IO-wait on other Dell boxes
> (running* and not running postgres) and RHES 3.* We have replaced a
> Dell PowerEdge 350 running RH 7.3 *with a PE750 with more memory
> running RHES3 and it be bogged down with IO waits due to syslog
> messages writing to the disk, the old slower server could handle it
> fine.* I don't know if it is a Dell thing or a RH kernel, but we try
> different kernels on our boxes to try to find one that works better.*
> We have not found one that stands out over another*consistently but we
> have been moving away from Update 2 kernel (2.4.21-15.ELsmp) due to
> server lockup issues.* Unfortunately we get the best disk throughput
> on our few remaining 7.3 boxes.
> *
> Woody
> *
> IGLASS Networks
> www.iglass.net
> *
>
>
> From: pgsql-performance-owner@postgresql.org
> [mailtogsql-performance-owner@postgresql.org] On Behalf Of Rémy
> Beaumont
> Sent: Monday, August 29, 2005 9:43 AM
> To: pgsql-performance@postgresql.org
> Subject: [PERFORM] High load and iowait but no disk access
> We have been trying to pinpoint what originally seem to be a I/O
> bottleneck but which now seems to be an issue with either Postgresql
> or RHES 3.
>
> We have the following test environment on which we can reproduce the
> problem:
>
> 1) Test System A
> Dell 6650 Quad Xeon Pentium 4
> 8 Gig of RAM
> OS: RHES 3 update 2
> Storage: NetApp FAS270 connected using an FC card using 10 disks
>
> 2) Test System B
> Dell Dual Xeon Pentium III
> 2 Gig o RAM
> OS: RHES 3 update 2
> Storage: NetApp FAS920 connected using an FC card using 28 disks
>
> Our Database size is around 30G.
>
> The behavior we see is that when running queries that do random reads
> on disk, IOWAIT goes over 80% and actual disk IO falls to a crawl at a
> throughput bellow 3000kB/s (We usually average 40000 kB/s to 80000
> kB/s on sequential read operations on the netapps)
>
> The stats of the NetApp do confirm that it is sitting idle. Doing an
> strace on the Postgresql process shows that is it doing seeks and
> reads.
>
> So my question is where is this iowait time spent ?
> Is there a way to pinpoint the problem in more details ?
> We are able to reproduce this behavior with Postgresql 7.4.8 and 8.0.3
>
> I have included the output of top,vmstat,strace and systat from the
> Netapp from System B while running a single query that generates this
> behavior.
>
> Rémy
>
> top output:
> 06:27:28 up 5 days, 16:59, 6 users, load average: 1.04, 1.30, 1.01
> 72 processes: 71 sleeping, 1 running, 0 zombie, 0 stopped
> CPU states: cpu user nice system irq softirq iowait idle
> total 2.7% 0.0% 1.0% 0.1% 0.2% 46.0% 49.5%
> cpu00 0.2% 0.0% 0.2% 0.0% 0.2% 2.2% 97.2%
> cpu01 5.3% 0.0% 1.9% 0.3% 0.3% 89.8% 1.9%
> Mem: 2061696k av, 2043936k used, 17760k free, 0k shrd, 3916k buff
> 1566332k actv, 296648k in_d, 30504k in_c
> Swap: 16771584k av, 21552k used, 16750032k free 1933772k cached
>
> PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
> 30960 postgres 15 0 13424 10M 9908 D 2.7 0.5 2:00 1 postmaster
> 30538 root 15 0 1080 764 524 S 0.7 0.0 0:43 0 sshd
> 1 root 15 0 496 456 436 S 0.0 0.0 0:08 0 init
> 2 root RT 0 0 0 0 SW 0.0 0.0 0:00 0 migration/0
> 3 root RT 0 0 0 0 SW 0.0 0.0 0:00 1 migration/1
> 4 root 15 0 0 0 0 SW 0.0 0.0 0:01 0 keventd
> 5 root 34 19 0 0 0 SWN 0.0 0.0 0:00 0 ksoftirqd/0
> 6 root 34 19 0 0 0 SWN 0.0 0.0 0:00 1 ksoftirqd/1
> 9 root 15 0 0 0 0 SW 0.0 0.0 0:24 1 bdflush
> 7 root 15 0 0 0 0 SW 0.0 0.0 6:53 1 kswapd
> 8 root 15 0 0 0 0 SW 0.0 0.0 8:44 1 kscand
> 10 root 15 0 0 0 0 SW 0.0 0.0 0:13 0 kupdated
> 11 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 mdrecoveryd
> 17 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 ahc_dv_0
>
>
> vmstat output
> procs memory swap io system cpu
> r b swpd free buff cache si so bi bo in cs us sy id wa
> 0 1 21552 17796 4872 1931928 2 3 3 1 27 6 2 1 7 3
> 0 1 21552 18044 4880 1931652 0 0 1652 0 397 512 1 2 50 47
> 0 1 21552 17976 4896 1931664 0 0 2468 0 407 552 2 2 50 47
> 1 0 21552 17984 4896 1931608 0 0 2124 0 418 538 3 3 48 46
> 0 1 21552 18028 4900 1931536 0 0 1592 0 385 509 1 3 50 46
> 0 1 21552 18040 4916 1931488 0 0 1620 820 419 581 2 2 50 46
> 0 1 21552 17968 4916 1931536 0 4 1708 4 402 554 3 1 50 46
> 1 1 21552 18052 4916 1931388 0 0 1772 0 409 531 3 1 49 47
> 0 1 21552 17912 4924 1931492 0 0 1772 0 408 565 3 1 48 48
> 0 1 21552 17932 4932 1931440 0 4 1356 4 391 545 5 0 49 46
> 0 1 21552 18320 4944 1931016 0 4 1500 840 414 571 1 1 48 50
> 0 1 21552 17872 4944 1931440 0 0 2116 0 392 496 1 5 46 48
> 0 1 21552 18060 4944 1931232 0 0 2232 0 423 597 1 2 48 49
> 1 1 21552 17684 4944 1931584 0 0 1752 0 395 537 1 1 50 48
> 0 1 21552 18000 4944 1931240 0 0 1576 0 401 549 0 1 50 49
>
>
> NetApp stats:
> CPU NFS CIFS HTTP Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP
> CP Disk DAFS FCP iSCSI FCP kB/s
> in out read write read write age hit time ty util in out
> 2% 0 0 0 139 0 0 2788 0 0 0 3 96% 0% - 15% 0 139 0 3 2277
> 2% 0 0 0 144 0 0 2504 0 0 0 3 96% 0% - 18% 0 144 0 3 2150
> 2% 0 0 0 130 0 0 2212 0 0 0 3 96% 0% - 13% 0 130 0 3 1879
> 3% 0 0 0 169 0 0 2937 80 0 0 3 96% 0% - 13% 0 169 0 4 2718
> 2% 0 0 0 139 0 0 2448 0 0 0 3 96% 0% - 12% 0 139 0 3 2096
> 2% 0 0 0 137 0 0 2116 0 0 0 3 96% 0% - 10% 0 137 0 3 1892
> 3% 0 0 0 107 0 0 2660 812 0 0 3 96% 24% T 20% 0 107 0 3 1739
> 2% 0 0 0 118 0 0 1788 0 0 0 3 96% 0% - 13% 0 118 0 3 1608
> 2% 0 0 0 136 0 0 2228 0 0 0 3 96% 0% - 11% 0 136 0 3 2018
> 2% 0 0 0 119 0 0 1940 0 0 0 3 96% 0% - 13% 0 119 0 3 1998
> 2% 0 0 0 136 0 0 2175 0 0 0 3 96% 0% - 14% 0 136 0 3 1929
> 2% 0 0 0 133 0 0 1924 0 0 0 3 96% 0% - 19% 0 133 0 3 2292
> 2% 0 0 0 115 0 0 2044 0 0 0 3 96% 0% - 11% 0 115 0 3 1682
> 2% 0 0 0 134 0 0 2256 0 0 0 3 96% 0% - 12% 0 134 0 3 2096
> 2% 0 0 0 112 0 0 2184 0 0 0 3 96% 0% - 12% 0 112 0 3 1633
> 2% 0 0 0 163 0 0 2348 0 0 0 3 96% 0% - 13% 0 163 0 4 2421
> 2% 0 0 0 120 0 0 2056 184 0 0 3 96% 8% T 14% 0 120 0 3 1703
>
> strace output:
> read(55, "\4\0\0\0\10fm}\1\0\0\0p\0\264\0\0 \2 \230\236\320\0020"...,
> 8192) = 8192
> _llseek(55, 857997312, [857997312], SEEK_SET) = 0
> read(55, "\4\0\0\0\\\315\321|\1\0\0\0p\0\354\0\0 \2 \250\236\260"...,
> 8192) = 8192
> _llseek(55, 883220480, [883220480], SEEK_SET) = 0
> read(55, "\4\0\0\0T\17a~\1\0\0\0p\0\20\1\0 \2
> \270\236\220\2D\235"..., 8192) = 8192
> _llseek(55, 858005504, [858005504], SEEK_SET) = 0
> read(55, "\4\0\0\0\300\356\321|\1\0\0\0p\0\330\0\0 \2
> \260\236\240"..., 8192) = 8192
> _llseek(55, 857964544, [857964544], SEEK_SET) = 0
> read(55, "\4\0\0\0lH\321|\1\0\0\0p\0<\1\0 \2 \300\236\200\2p\235"...,
> 8192) = 8192
> _llseek(55, 857956352, [857956352], SEEK_SET) = 0
> read(55, "\4\0\0\0l\'\321|\1\0\0\0p\0\320\0\0 \2
> \260\236\240\2\\"..., 8192) = 8192
> _llseek(55, 910802944, [910802944], SEEK_SET) = 0
> read(55, "\4\0\0\0\10}\25\200\1\0\0\0l\0\274\1\0 \2 \250\236\260"...,
> 8192) = 8192
> _llseek(55, 857948160, [857948160], SEEK_SET) = 0
> read(55, "\4\0\0\0\370\5\321|\1\0\0\0p\0\350\0\0 \2 \230\236\320"...,
> 8192) = 8192
> _llseek(56, 80371712, [80371712], SEEK_SET) = 0
> read(56, "\4\0\0\0Lf \217\1\0\0\0p\0\f\1\0 \2
> \250\236\260\2T\235"..., 8192) = 8192
> read(102,
> "\2\0\34\0001\236\0\0\1\0\0\0\t\0\0\00020670\0\0\0 B\6\0"..., 8192) =
> 8192
> _llseek(55, 857939968, [857939968], SEEK_SET) = 0
> read(55, "\4\0\0\0\244\344\320|\1\0\0\0l\0\230\1\0 \2
> \244\236\270"..., 8192) = 8192
> _llseek(55, 857923584, [857923584], SEEK_SET) = 0
> read(55, "\4\0\0\0\224\242\320|\1\0\0\0p\0|\0\0 \2
> \234\236\310\002"..., 8192) = 8192
> _llseek(55, 57270272, [57270272], SEEK_SET) = 0
> read(55, "\4\0\0\0\3204FK\1\0\0\0t\0\340\0\0 \2
> \310\236j\2\214\235"..., 8192) = 8192
> _llseek(55, 870727680, [870727680], SEEK_SET) = 0
> read(55, "\4\0\0\0x>\233}\1\0\0\0p\0@\1\0 \2 \250\236\260\2X\235"...,
> 8192) = 8192
> _llseek(55, 1014734848, [1014734848], SEEK_SET) = 0
> read(55, "\4\0\0\0\34\354\201\206\1\0\0\0p\0p\0\0 \2
> \264\236\230"..., 8192) = 8192
> _llseek(55, 857874432, [857874432], SEEK_SET) = 0
> read(55, "\4\0\0\0\214\331\317|\1\0\0\0l\0\324\1\0 \2
> \224\236\330"..., 8192) = 8192
> _llseek(55, 760872960, [760872960], SEEK_SET) = 0
> read(55, "\4\0\0\0\30\257\321v\1\0\0\0p\0\230\0\0 \2
> \234\236\310"..., 8192) = 8192
> _llseek(55, 891715584, [891715584], SEEK_SET) = 0
> read(55, "\4\0\0\0\370\220\347~\1\0\0\0p\0P\1\0 \2
> \230\236\320\2"..., 8192) = 8192
> _llseek(55, 857858048, [857858048], SEEK_SET) = 0
> read(55, "\4\0\0\0\250\227\317|\1\0\0\0p\0\264\0\0 \2
> \254\236\250"..., 8192) = 8192
> _llseek(55, 666910720, [666910720], SEEK_SET) = 0
> read(55, "\4\0\0\0x\206\3q\1\0\0\0p\0004\1\0 \2
> \254\236\242\2P\235"..., 8192) = 8192
> _llseek(55, 857841664, [857841664], SEEK_SET) = 0
> read(55, "\4\0\0\0dT\317|\1\0\0\0p\0\224\0\0 \2
> \214\236\350\2\30"..., 8192) = 8192
>
>
>



---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 01:48 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com