This is a discussion on 10g RAC - fail over test issue within the Oracle Database forums, part of the Database Server Software category; --> Hello, While running some TAF tests on a 10g (10.1.0.3) SE 2 node RAC, the delay to switch from ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hello, While running some TAF tests on a 10g (10.1.0.3) SE 2 node RAC, the delay to switch from one node to other takes 15-20 minutes when unplugging the network cable. Also, after plugging the network back in, network connectivity on the node is lost. We have to restart network services and cluster services or just reboot if they don't restart properly. Another test we did was shutting down one node while running a long query (the TAF failover test notes from metalink) and failover takes 5 minutes in that case. Has anyone else experienced this type of behavior when the network being uplugged and plugged back in actually kills all network services? TIA. SG |
| |||
| SG wrote: > Hello, > > While running some TAF tests on a 10g (10.1.0.3) SE 2 node RAC, the delay to > switch from one node to other takes 15-20 minutes when unplugging the > network cable. Also, after plugging the network back in, network > connectivity on the node is lost. We have to restart network services and > cluster services or just reboot if they don't restart properly. Another test > we did was shutting down one node while running a long query (the TAF > failover test notes from metalink) and failover takes 5 minutes in that > case. Has anyone else experienced this type of behavior when the network > being uplugged and plugged back in actually kills all network services? TIA. > > SG Testing failover with destruction of the Vulcan mind-meld is not reasonable as this is one place where the creation of redundant paths is straight-forward and not too expensive. I've yet to ever see a RAC cluster fail this way. A more reasonable test is SHUTDOWN ABORT, or if you are brave, pulling the power cord out with a quick tug. That siad 15-20 minutes is about right for a NIC card for TCP/IP setting trying to do a keep-alive. Check out your settings and get the dumbest NIC cards you can find that are certified by the vendor for RAC. HTH -- Daniel A. Morgan University of Washington damorgan@x.washington.edu (replace 'x' with 'u' to respond) |
| |||
| Going to look into tcp/ip tweaks. I have come across other notes about reducing tcp parameters to possible speed up failover. Thanks for the input. SG "DA Morgan" <damorgan@x.washington.edu> wrote in message news:1114201450.507554@yasure... > SG wrote: > >> Hello, >> >> While running some TAF tests on a 10g (10.1.0.3) SE 2 node RAC, the delay >> to switch from one node to other takes 15-20 minutes when unplugging the >> network cable. Also, after plugging the network back in, network >> connectivity on the node is lost. We have to restart network services and >> cluster services or just reboot if they don't restart properly. Another >> test we did was shutting down one node while running a long query (the >> TAF failover test notes from metalink) and failover takes 5 minutes in >> that case. Has anyone else experienced this type of behavior when the >> network being uplugged and plugged back in actually kills all network >> services? TIA. >> >> SG > > Testing failover with destruction of the Vulcan mind-meld is not > reasonable as this is one place where the creation of redundant paths > is straight-forward and not too expensive. I've yet to ever see a RAC > cluster fail this way. > > A more reasonable test is SHUTDOWN ABORT, or if you are brave, pulling > the power cord out with a quick tug. > > That siad 15-20 minutes is about right for a NIC card for TCP/IP > setting trying to do a keep-alive. Check out your settings and get > the dumbest NIC cards you can find that are certified by the vendor > for RAC. > > HTH > -- > Daniel A. Morgan > University of Washington > damorgan@x.washington.edu > (replace 'x' with 'u' to respond) |
| |||
| On Sat, 23 Apr 2005 14:39:06 GMT, "SG" <ab@cd.com> wrote: >Going to look into tcp/ip tweaks. I have come across other notes about >reducing tcp parameters to possible speed up failover. Thanks for the input. > >SG Please refrain from topposting and including the entire message. -- Sybrand Bakker, Senior Oracle DBA |
| |||
| SG wrote: > Going to look into tcp/ip tweaks. I have come across other notes about > reducing tcp parameters to possible speed up failover. Thanks for the input. > > SG I hate to recommend any vendor's NIC cards but I'm currently using NetGear 311s in the lab and even with a cross-over cable getting failovers that never exceed 2 seconds even with tens of connected sessions and hundreds of transactions per second. -- Daniel A. Morgan University of Washington damorgan@x.washington.edu (replace 'x' with 'u' to respond) |
| ||||
| I would have to agree with Daniel's original thought. We have a two-node RAC system (9.2.0.4 on Win2k3) and each node has 4 1GB ports (2 onboard and 2 PCI cards). 2 ports are used for the public interface and 2 are used for the cluster interconnect. Both pairs are load-balanced/fault-tolerant using the HP NIC software. For each pair, the members are connected to separate switches (CISCO 3750 series). We'd have to have a pretty significant hardware failure for the servers to completely lose their connectivity (and at that point, we'd have a lot of other problems... the database would be the least of our troubles). Long story short... even using cheap hardware, you should be able to provide redundant NICs for each node. |