vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi All, We are using FTP program to to transfer files between two AIX servers via MPLS(Multi-Protocol Label Switching) WAN connection. Our AIX is V5.2, and the WAN bandwidth is 45Mbps. We can almost get full throughput(45Mbps) by using UDP protocol, but only get 3Mbps for FTP transfers. The ping response between these two hosts are 40ms. When we had a trace analysis (Ethereal) on the network switch, we found there was a latency on receiving host, about 45ms, and there is no network fragmention. We have adjust the following parameters in both hosts via no command: 1. Change rfc1323 from 0 to 1 2. Change tcp_sendspace from 16384 to 262144. 3. Change tcp_recvspace from 16384 to 262144. 3. Change tcp_pmtu_discover from 1 to 0. 4. Adjust the MTU size to 1400, as the MPLS maximum size if 1476. This is done by route command (add specific routing with -mtu option). 5. Restarted inetd. However, we still get the same performance (3Mbps) after we adjust these parameters. I also found an article about similar situation: http://www.llnl.gov/asci/discom/sc2000_king.html Could you please advise a solution to make full use of WAN bandwidth for large file transfer? Thanks in advance. Hunter Zhou |
| |||
| hunter.zhough@gmail.com wrote: > Hi All, > We are using FTP program to to transfer files between two AIX servers > via MPLS(Multi-Protocol Label Switching) WAN connection. > > > Our AIX is V5.2, and the WAN bandwidth is 45Mbps. We can almost get > full throughput(45Mbps) by using UDP protocol, but only get 3Mbps for > FTP transfers. The ping response between these two hosts are 40ms. > > When we had a trace analysis (Ethereal) on the network switch, we found > there was a latency on receiving host, about 45ms, and there is no > network fragmention. > > We have adjust the following parameters in both hosts via no command: > > 1. Change rfc1323 from 0 to 1 > 2. Change tcp_sendspace from 16384 to 262144. > 3. Change tcp_recvspace from 16384 to 262144. > 3. Change tcp_pmtu_discover from 1 to 0. > 4. Adjust the MTU size to 1400, as the MPLS maximum size if 1476. > This is done by route command (add specific routing with -mtu > option). > 5. Restarted inetd. > > However, we still get the same performance (3Mbps) after we adjust > these parameters. I also found an article about similar situation: > http://www.llnl.gov/asci/discom/sc2000_king.html > > Could you please advise a solution to make full use of WAN bandwidth > for large file transfer? > > Thanks in advance. > > Hunter Zhou A couple of things: a very good article in IBM eserver mag for Unix took an exhaustive look at tcp send and receive space a couple of months ago, and pretty much made a case for tcp_sendspace of 131072 and tcp_recvspace of 65536 when using GigE nics. (local LAN) How pmtu discover works seems to change with every release of AIX. Good info below: http://publib.boulder.ibm.com/infoce...cp_pathmtu.htm One last question. Is throughput the same no matter which system initiates the ftp? On my systems, a FTP put from AIX 5.3 to AIX 5.2 is much faster than FTP get initiated on the AIX 5.2 system. (for the same file) |
| |||
| <aixdude@yahoo.com> wrote in message news:1129838865.145676.224380@z14g2000cwz.googlegr oups.com... > > hunter.zhough@gmail.com wrote: > > Hi All, > > We are using FTP program to transfer files between two AIX servers > > via MPLS(Multi-Protocol Label Switching) WAN connection. > > > > > > Our AIX is V5.2, and the WAN bandwidth is 45Mbps. We can almost get > > full throughput(45Mbps) by using UDP protocol, but only get 3Mbps for > > FTP transfers. The ping response between these two hosts are 40ms. > > > > When we had a trace analysis (Ethereal) on the network switch, we found > > there was a latency on receiving host, about 45ms, and there is no > > network fragmentation. > > > > We have adjust the following parameters in both hosts via no command: > > > > 1. Change rfc1323 from 0 to 1 > > 2. Change TCP_sendspace from 16384 to 262144. > > 3. Change TCP_recvspace from 16384 to 262144. > > 3. Change TCP_PMTU_discover from 1 to 0. > > 4. Adjust the MTU size to 1400, as the MPLS maximum size if 1476. > > This is done by route command (add specific routing with -MTU > > option). > > 5. Restarted inetd. > > > > However, we still get the same performance (3Mbps) after we adjust > > these parameters. I also found an article about similar situation: > > http://www.llnl.gov/asci/discom/sc2000_king.html > > > > Could you please advise a solution to make full use of WAN bandwidth > > for large file transfer? > > > > Thanks in advance. > > > > Hunter Zhou > > A couple of things: > > a very good article in IBM eServer mag. for Unix took an exhaustive look > at TCP send and receive space a couple of months ago, and pretty much > made a case for tcp_sendspace of 131072 and tcp_recvspace of 65536 when > using GigE NICs. (local LAN) > > How PMTU discover works seems to change with every release of AIX. > > Good info below: > > http://publib.boulder.ibm.com/infoce...cp_pathmtu.htm > > One last question. Is throughput the same no matter which system > initiates the ftp? On my systems, a FTP put from AIX 5.3 to AIX 5.2 is > much faster than FTP get initiated on the AIX 5.2 system. (for the same > file) Although I'm an AIX newbie, I am an widely experience Network Specialist. What you are experiencing is a general TCP/IP "tuning and customisation" issue. There are a number of factors involved with any network transport with a "connection-oriented, with error-recovery" protocol like TCP (as opposed to a "datagram-oriented" protocol like UDP). Specifically, with a high latency, you should consider the following: 1) implementing "Jumbo Packets" (> 1492-byte or so), if possible. 2) increasing the "Send Window" and "Receive Window" sizes. 3) implementing "Piggy-Back ACKs", if not already done. 4) reducing the low-level "Retries" (say, to 3) and "Timeout" values (to 2 x estimated propagation time, plus latency/turn-around) for high-reliability links/connections. 5) increasing the low-level "Receives before ACK", if implemented as a separate, tuneable parameter (other than by "Receive Window"). The interaction of these parameters can have significant impact on throughput, but may also have an undesired impact on important small packets for interactive connections/datagrams, if one gets too carried away with tuning for "batch" traffic. Bear in mind any "store-and-forward" routers in the path, as packets may get dropped, if buffering is exceeded there. "Network Performance Monitoring and Identification" measures should be put into place, if not already implemented, as many application designers have a somewhat misguided "bandwidth if free" approach to networks. HTTH -- Regards, Tim Clarke (a.k.a. WBST) |
| |||
| The two system running at the same level of AIX, version 5.2 ML4. I tried both direction and got the same throughput. I also tried change teh delayack and delayackports to enable piggy back ACK, and the throughput is enhanced from 3Mbps to 5Mbps, but still well below the capacity. Thanks. |
| |||
| Hi Tim, Thank you for your valuable response. I am trying these tunable parameters. While these parameters should be able to change via AIX no command, I don't know how to change the item 4 for Retires and Timeout, and 5 "Receives before ACK". Could you please help to advice which parameters may reflect those in the item 4 and 5? Here is the TCP tunable parameters: http://publib.boulder.ibm.com/infoce...ixcmds4/no.htm All the best! Hunter Zhou |
| |||
| hunter.zhough@gmail.com wrote: > The two system running at the same level of AIX, version 5.2 ML4. > I tried both direction and got the same throughput. > > I also tried change teh delayack and delayackports to enable piggy back > ACK, and the throughput is enhanced from 3Mbps to 5Mbps, but still well > below the capacity. > The problem can be: The router is saturated and the tcp ack will not get fast enough through the router. Read -> http://www.benzedrine.cx/ackpri.html The router will reset some ip information ( i do not remember which ) about the tcp_window size . So the client will use only a small tcp_window . -> Check with tcpdump the tcp_window_size On a WAN with a high lagency you should have a larger tcp_window size so you need NOT as often a ack. Hopefully the packet will not be corrupted. I would do also some testing with linux clients because you can configure there the tcp_window_scale On AIX check: Is ISNO enabled ? $ no -o isno Is rfc1323 enabled $ no -o rfc1323 Check that tcp_recvspace and tcp_sendspace have a large value ( > 128kb ) because for values larger than 65536, you must enable rfc1323 (rfc1323=1) to enable TCP window scaling. For calculation see http://www16.boulder.ibm.com/pseries...ixcmds4/no.htm and search for tcp_sendspace $ no -o tcp_recvspace $ no -o tcp_sendspace Read also about performance: http://www16.boulder.ibm.com/pseries...d/netperf3.htm Hint: If ISNO is enabled then the configuration on the adapter will overide the ' no ' settings . So you can configure only the adapter which is using the WAN The main goal should be to have a large tcp_window and high priority acks in case its not a problem with the router,os or anything else. hth Hajo P.S I am not a networker so please read the documtation ( links ) carefully so you know what YOU do. |
| |||
| Hi Hajo, Our network staff told me that the core switches are far from saturated. The round trip time is always about 40ms between the servers in these two data centres. Probably the high latency caused the low TCP throughput. Here is the script we used to change the TCP settings in both AIX. The receiver server does not have that route command. We do turn on the isno for the test. #!/usr/bin/ksh no -D no -o rfc1323=1 no -o tcp_sendspace=524288 no -o tcp_recvspace=524288 no -o tcp_pmtu_discover=0 no -o tcp_nodelayack=0 no -o delayack=3 no -o delayackports={5001,21} no -o rto_length=6 no -o isno=1 route delete 10.155.15.92 192.168.158.246 route add 10.155.15.92 192.168.158.246 -mtu 1400 ifconfig en5 tcp_recvspace 524288 tcp_sendspace 524288 tcp_nodelay 0 rfc1323 1 ifconfig en5 stopsrc -s inetd; startsrc -s inetd We got 5Mbps out of 45Mbps throughput based on this settings. Thanks Hunter |
| |||
| hunter.zho...@gmail.com wrote: > Hi Hajo, > > Our network staff told me that the core switches are far from > saturated. The round trip time is always about 40ms between the servers > in these two data centres. Probably the high latency caused the low TCP > throughput. > > Here is the script we used to change the TCP settings in both AIX. The > receiver server does not have that route command. We do turn on the > isno for the test. > > #!/usr/bin/ksh > no -D > no -o rfc1323=1 > no -o tcp_sendspace=524288 > no -o tcp_recvspace=524288 > no -o tcp_pmtu_discover=0 > no -o tcp_nodelayack=0 > no -o delayack=3 > no -o delayackports={5001,21} > no -o rto_length=6 > no -o isno=1 > route delete 10.155.15.92 192.168.158.246 > route add 10.155.15.92 192.168.158.246 -mtu 1400 > ifconfig en5 tcp_recvspace 524288 tcp_sendspace 524288 tcp_nodelay 0 > rfc1323 1 > ifconfig en5 > stopsrc -s inetd; startsrc -s inetd > > We got 5Mbps out of 45Mbps throughput based on this settings. > > Thanks > > Hunter according to the man page for no setting the rfc1323 parameter, should PRECEDE setting the tcp send and receive spaces to larger than 64K rfc1323 enables the larger setting and must be specified first |
| |||
| <hunter.zhough@gmail.com> wrote in message news:1129910259.550381.218250@g49g2000cwa.googlegr oups.com... > Hi Tim, > > Thank you for your valuable response. I am trying these tuneable > parameters. > > While these parameters should be able to change via AIX no command, I > don't know how to change the item 4 for Retires and Timeout, and 5 > "Receives before ACK". > > Could you please help to advice which parameters may reflect those in > the item 4 and 5? > > Here is the TCP tuneable parameters: > > http://publib.boulder.ibm.com/infoce...ixcmds4/no.htm OK, please consider any "compatibility mode" setting etc. for V5.2, see: http://publib.boulder.ibm.com/infoce...htm#migrcompat and/or "enhancements" in V5.2, see: http://publib.boulder.ibm.com/infoce....htm#noandnfso 4) reducing the low-level "Retries" (say, to 3) and "Timeout" values (to 2 x estimated propagation time, plus latency/turn-around) for high-reliability links/connections. "Retries" appears to default to 3 as "ndp_umaxtries" (TCP being built on top of UDP unicasts between the server & client). The timeouts are dynamically learned it would seem, but there are a lot of parameters in there and I may have missed it. The related high-level (TCP connection-related) values are set/changed via the rto_min=, rto_max= and rto_limit= values, it would seem. 5) increasing the low-level "Receives before ACK", if implemented as a separate, tuneable parameter (other than by "Receive Window"). This appears to be two parameters: a) delayacks=1 or 3 on the "client side" and b) delayackports=21 on the "client side" (20-control and 21-data for FTP, by default, on the "server side"). This will only send an ACK to the server when requested by a SYN request from the server (has reached its max send window) or a FIN (end-of-transfer). Additionally, review the sack= , tcp_init_window= , rfc1323= and rfc2414= values and their meanings and inter-relationships. I think that covers the points adequately for now. HTTH -- Regards, Tim Clarke (a.k.a. WBST) |
| ||||
| hunter.zhough@gmail.com wrote: > Hi Hajo, > > Our network staff told me that the core switches are far from > saturated. As i understand your 45Mbps means MegaBits ? If so then you have a max throughput of about 6Mbytes/sec. Nowadays server working often with 1Gbit networks ( 100 Mbytes ) But i can not really say anymore then todo the test i suggested because you must narrow down the problem. Everything else is digging into the dark. Create a drawing from network and check every hop. Try also differnet path for your ftp transfer and check again. AIX1 --- switch1 ---- router1 --- router2 --- AIX2 TESTHost----------| TESTHost----------------------| > The round trip time is always about 40ms between the servers > in these two data centres. Probably the high latency caused the low TCP > throughput. If so take the calculation for the window provided by IBM , set the network adapter and check with ftp, tcpdump. tcpdump is needed to verify ON both sites that the data on the wire is packed as expected. Sorry for able to help any further. Hajo |