This is a discussion on Gigaswift adapter in an old e250 within the Sun Solaris Hardware forums, part of the Solaris Operating System category; --> There was an unused E250 sitting in a closet and I thought, hey that could be a solid fileserver. ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| There was an unused E250 sitting in a closet and I thought, hey that could be a solid fileserver. So I bought some disks, memory, and a gigaswift adapter, threw solaris 10 on it and created a nice zfs nfs server for all our solaris and linux clients. About 10 clients and 20 users. Everything seemed to work fine in my tests and I rolled it out into production and things started slowing down. Not all the time, just transient slow downs where all the users interactive response went horribly wrong. Then it would speed back up and be fine. I believe I tracked it down to the gigaswift board being too fast for the 2 400MHz CPUs. The old rule of thumb that I heard was a 1MHz of cpu power for 1 MHz of ethernet speed, so I knew I was cutting it close. What happens is that during an iperf test, or multiple users doing a large cvs checkout, or multiple users doing a build, or any single large(>500MB) file transfer, the machine goes from 50% to 70% kernel. I've updated to the latest ce driver, done the latest cluster patch for solaris 10, placed the board in the 33MHz and 66MHz PCI slot, played with /etc/system parameters, changed the ce.conf file, changed other ce parameters with ndd and nothing really works. I can slow my iperf down from 350Mbits/sec to 120Mbits/sec by increasing rx_intr_time and rx_intr_pkts to 300 and 600 respectively. This decreases the kernel cpu utilization to around 35%. But this still will slow down any other users to their home directories. Also the kstat program reports no overflows, failures, drops, errors, or anything that screams out at me. I'm guessing that I shouldn't have put a 1Gbit ethernet in an old 2 400MHz CPU E250, or solaris 10 is optimized for the newer machines and its defaults just don't match my case, or I don't know. I had a lousy weekend trying to get things to work. Any ideas on how I can get this to work? Thanks, Mark |
| |||
| On Apr 9, 7:18 pm, Mark Valery <meval...@comcast.net> wrote: > I believe I tracked it down to thegigaswiftboard being too fast for > the 2 400MHz CPUs. .. .. .. > What happens is that during an iperf test, or multiple users doing a large > cvs checkout, or multiple users doing a build, or any single large(>500MB) > file transfer, the machine goes from 50% to 70% kernel. > .. .. .. > Any ideas on how I can get this to work? Maybe http://freshmeat.net/projects/slowdown/ will help? "slowdown is a program that contends with I/O-hungry processes. The "nice" program does a good job of handling CPU priorities, but doesn't help much when you have a process that is moving tons of data; other processes can continue to starve for I/O, making a system painful to use, as during a backup, while tripwire is running, etc. slowdown manages another process by sleeping for a user-specified number of seconds or fractions of seconds, each time some data is moved using, for example, read(), write(), send(), recv(), etc." |
| |||
| Hi, Mark Valery wrote: > There was an unused E250 sitting in a closet and I thought, hey that could > be a solid fileserver. So I bought some disks, memory, and a gigaswift > adapter, threw solaris 10 on it and created a nice zfs nfs server for all > our solaris and linux clients. About 10 clients and 20 users. > Everything seemed to work fine in my tests and I rolled it out into > production and things started slowing down. Not all the time, just > transient slow downs where all the users interactive response went > horribly wrong. Then it would speed back up and be fine. > > I believe I tracked it down to the gigaswift board being too fast for > the 2 400MHz CPUs. The old rule of thumb that I heard was a 1MHz of > cpu power for 1 MHz of ethernet speed, so I knew I was cutting it > close. I would't say that the GbE board is to fast, but moving from 100Mb/s to 1000Mb/s will show the next bottleneck just as I understand your posting. > > What happens is that during an iperf test, or multiple users doing a large > cvs checkout, or multiple users doing a build, or any single large(>500MB) > file transfer, the machine goes from 50% to 70% kernel. > > I've updated to the latest ce driver, done the latest cluster patch for > solaris 10, placed the board in the 33MHz and 66MHz PCI slot, played with > /etc/system parameters, changed the ce.conf file, changed other ce > parameters with ndd and nothing really works. I can slow my iperf down > from 350Mbits/sec to 120Mbits/sec by increasing rx_intr_time and > rx_intr_pkts to 300 and 600 respectively. This decreases the kernel > cpu utilization to around 35%. But this still will slow down any > other users to their home directories. Also the kstat program reports > no overflows, failures, drops, errors, or anything that screams out > at me. > > I'm guessing that I shouldn't have put a 1Gbit ethernet in an old 2 > 400MHz CPU E250, or solaris 10 is optimized for the newer machines and > its defaults just don't match my case, or I don't know. I had a lousy > weekend trying to get things to work. > > Any ideas on how I can get this to work? > Thanks, > Mark What about disk IO, check iostat -xtcn 1 and vmstat 1 outputs when performance goes down. Also, is not ZFS more CPU intensive that UFS? We only have a E250 as fileserver, sure we can bring it down on its knees if using GbE interface and transfering lot of files, but thats just the way it is. I would guess that the SCSI interface is the next bottleneck before the CPU, if disk is blocked then nothing happends. /michael |
| |||
| On Thu, 12 Apr 2007 00:19:42 +0200, Michael Laajanen wrote: > Hi, > > What about disk IO, check iostat -xtcn 1 and vmstat 1 outputs when > performance goes down. > > Also, is not ZFS more CPU intensive that UFS? Yes and I was making it worse with compression running on the raidz pool. > > We only have a E250 as fileserver, sure we can bring it down on its > knees if using GbE interface and transfering lot of files, but thats > just the way it is. I like the E250 as a fileserver for our size group. These old machines are like tanks. A bit slow, but they just keep running and I feel much safer with all the home dirs in a raidz pool. > > I would guess that the SCSI interface is the next bottleneck before the > CPU, if disk is blocked then nothing happends. I don't get close to this bottleneck yet when using nfs. I can max it out when doing copies on the fileserver, but no other user log into this machine, so its only a test to let me know what speeds I can get into the raidz pool. > This is what I ended up doing: I've got it to a manageable level now until we get a real system administrator. I turned off compression on the zfs raidz storage, cleaned up a users windows machine that was causing samba on another solaris machine to constantly communicate with ypserv running on the E250. Using ethereal, ypserv was sending 1/4 of the amount of data that nfs was doing. I also slowed down the 1GHz ethernet board to around a 125MHz throughput measured by iperf by increasing the values of rx_intr_time and rx_intr_pkts to 300 and 600 respectively. All of these together has now decreased kernel cpu percentage to around 20% during peak usage and I've haven't any complaints about interactive respond going to hell. |
| |||
| On Wed, 11 Apr 2007 11:57:53 -0700, dan.nygre wrote: >> Maybe http://freshmeat.net/projects/slowdown/ will help? > > "slowdown is a program that contends with I/O-hungry processes. The > "nice" program does a good job of handling CPU priorities, but doesn't > help much when you have a process that is moving tons of data; other > processes can continue to starve for I/O, making a system painful to > use, as during a backup, while tripwire is running, etc. slowdown > manages another process by sleeping for a user-specified number of > seconds or fractions of seconds, each time some data is moved using, > for example, read(), write(), send(), recv(), etc." Thanks. Not sure how I can specifically apply this to the ethernet board tcp communications. If I do it to nfsd, then I slow all connections. I've got it to a manageable level now until we get a real system administrator. I turned off compression on the zfs raidz storage, cleaned up a users windows machine that was causing samba on another solaris machine to constantly communicate with ypserv running on the E250. Using ethereal, ypserv was sending 1/4 of the amount of data that nfs was doing. I also slowed down the 1GHz ethernet board to around a 125MHz throughput measured by iperf by increasing the values of rx_intr_time and rx_intr_pkts to 300 and 600 respectively. All of these together has now decreased kernel cpu percentage to around 20% during peak usage and I've haven't any complaints about interactive respond going to hell. |
| |||
| > On Wed, 11 Apr 2007 11:57:53 -0700, dan.nygre wrote: >> Maybe http://freshmeat.net/projects/slowdown/will help? > > > "slowdown is a program that contends with I/O-hungry processes. The > > "nice" program does a good job of handling CPU priorities, but doesn't > > help much when you have a process that is moving tons of data; other > > processes can continue to starve for I/O, making a system painful to > > use, as during a backup, while tripwire is running, etc. slowdown > > manages another process by sleeping for a user-specified number of > > seconds or fractions of seconds, each time some data is moved using, > > for example, read(), write(), send(), recv(), etc." > On Apr 18, 6:41 pm, Mark Valery <meval...@comcast.net> wrote: > Thanks. Not sure how I can specifically apply this to the ethernet > board tcp communications. If I do it to nfsd, then I slow all > connections. > > I've got it to a manageable level now until we get a real system > administrator. I'm glad you have the problem resolved, and thanks for the follow up, I was curious how it turned out. As for "slowdown", I though you could apply it to CVS or other applications that were causing I/O related problems. It seemed like you knew what the I/O intensive applications were that were causing the problem, and by just capping the I/O allowed on them, the rest of the I/O accesses would not be starved. |
| |||
| On Apr 18, 6:41 pm, Mark Valery <meval...@comcast.net> wrote: > Thanks. Not sure how I can specifically apply this to the ethernet > board tcp communications. If I do it to nfsd, then I slow all > connections. In other words, I didn't realize this E250 was configured only as fileserver. I thought it was also running some client applications accessing the filesystem which you could throttle with "slowdown". |
| ||||
| On Mon, 23 Apr 2007 12:46:52 -0700, dan.nygre wrote: > On Apr 18, 6:41 pm, Mark Valery <meval...@comcast.net> wrote: >> Thanks. Not sure how I can specifically apply this to the ethernet >> board tcp communications. If I do it to nfsd, then I slow all >> connections. > > In other words, I didn't realize this E250 was configured only as > fileserver. I thought it was also running some client applications > accessing the filesystem which you could throttle with "slowdown". I wasn't thinking beyond the fileserver. I could slow the client applications on the users computers that's accessing their home directories where their cvs workspace resides. |