Unix Technical Forum

10g RAC - fail over test issue

This is a discussion on 10g RAC - fail over test issue within the Oracle Database forums, part of the Database Server Software category; --> Hello, While running some TAF tests on a 10g (10.1.0.3) SE 2 node RAC, the delay to switch from ...


Go Back   Unix Technical Forum > Database Server Software > Oracle Database

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 02-24-2008, 12:51 PM
SG
 
Posts: n/a
Default 10g RAC - fail over test issue

Hello,

While running some TAF tests on a 10g (10.1.0.3) SE 2 node RAC, the delay to
switch from one node to other takes 15-20 minutes when unplugging the
network cable. Also, after plugging the network back in, network
connectivity on the node is lost. We have to restart network services and
cluster services or just reboot if they don't restart properly. Another test
we did was shutting down one node while running a long query (the TAF
failover test notes from metalink) and failover takes 5 minutes in that
case. Has anyone else experienced this type of behavior when the network
being uplugged and plugged back in actually kills all network services? TIA.

SG


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 02-24-2008, 12:51 PM
DA Morgan
 
Posts: n/a
Default Re: 10g RAC - fail over test issue

SG wrote:

> Hello,
>
> While running some TAF tests on a 10g (10.1.0.3) SE 2 node RAC, the delay to
> switch from one node to other takes 15-20 minutes when unplugging the
> network cable. Also, after plugging the network back in, network
> connectivity on the node is lost. We have to restart network services and
> cluster services or just reboot if they don't restart properly. Another test
> we did was shutting down one node while running a long query (the TAF
> failover test notes from metalink) and failover takes 5 minutes in that
> case. Has anyone else experienced this type of behavior when the network
> being uplugged and plugged back in actually kills all network services? TIA.
>
> SG


Testing failover with destruction of the Vulcan mind-meld is not
reasonable as this is one place where the creation of redundant paths
is straight-forward and not too expensive. I've yet to ever see a RAC
cluster fail this way.

A more reasonable test is SHUTDOWN ABORT, or if you are brave, pulling
the power cord out with a quick tug.

That siad 15-20 minutes is about right for a NIC card for TCP/IP
setting trying to do a keep-alive. Check out your settings and get
the dumbest NIC cards you can find that are certified by the vendor
for RAC.

HTH
--
Daniel A. Morgan
University of Washington
damorgan@x.washington.edu
(replace 'x' with 'u' to respond)
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 02-24-2008, 12:52 PM
SG
 
Posts: n/a
Default Re: 10g RAC - fail over test issue

Going to look into tcp/ip tweaks. I have come across other notes about
reducing tcp parameters to possible speed up failover. Thanks for the input.

SG

"DA Morgan" <damorgan@x.washington.edu> wrote in message
news:1114201450.507554@yasure...
> SG wrote:
>
>> Hello,
>>
>> While running some TAF tests on a 10g (10.1.0.3) SE 2 node RAC, the delay
>> to switch from one node to other takes 15-20 minutes when unplugging the
>> network cable. Also, after plugging the network back in, network
>> connectivity on the node is lost. We have to restart network services and
>> cluster services or just reboot if they don't restart properly. Another
>> test we did was shutting down one node while running a long query (the
>> TAF failover test notes from metalink) and failover takes 5 minutes in
>> that case. Has anyone else experienced this type of behavior when the
>> network being uplugged and plugged back in actually kills all network
>> services? TIA.
>>
>> SG

>
> Testing failover with destruction of the Vulcan mind-meld is not
> reasonable as this is one place where the creation of redundant paths
> is straight-forward and not too expensive. I've yet to ever see a RAC
> cluster fail this way.
>
> A more reasonable test is SHUTDOWN ABORT, or if you are brave, pulling
> the power cord out with a quick tug.
>
> That siad 15-20 minutes is about right for a NIC card for TCP/IP
> setting trying to do a keep-alive. Check out your settings and get
> the dumbest NIC cards you can find that are certified by the vendor
> for RAC.
>
> HTH
> --
> Daniel A. Morgan
> University of Washington
> damorgan@x.washington.edu
> (replace 'x' with 'u' to respond)



Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 02-24-2008, 12:52 PM
Sybrand Bakker
 
Posts: n/a
Default Re: 10g RAC - fail over test issue

On Sat, 23 Apr 2005 14:39:06 GMT, "SG" <ab@cd.com> wrote:

>Going to look into tcp/ip tweaks. I have come across other notes about
>reducing tcp parameters to possible speed up failover. Thanks for the input.
>
>SG


Please refrain from topposting and including the entire message.


--
Sybrand Bakker, Senior Oracle DBA
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 02-24-2008, 12:52 PM
DA Morgan
 
Posts: n/a
Default Re: 10g RAC - fail over test issue

SG wrote:

> Going to look into tcp/ip tweaks. I have come across other notes about
> reducing tcp parameters to possible speed up failover. Thanks for the input.
>
> SG


I hate to recommend any vendor's NIC cards but I'm currently using
NetGear 311s in the lab and even with a cross-over cable getting
failovers that never exceed 2 seconds even with tens of connected
sessions and hundreds of transactions per second.
--
Daniel A. Morgan
University of Washington
damorgan@x.washington.edu
(replace 'x' with 'u' to respond)
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 02-24-2008, 12:52 PM
Jesse
 
Posts: n/a
Default Re: 10g RAC - fail over test issue

I would have to agree with Daniel's original thought. We have a
two-node RAC system (9.2.0.4 on Win2k3) and each node has 4 1GB ports
(2 onboard and 2 PCI cards). 2 ports are used for the public interface
and 2 are used for the cluster interconnect. Both pairs are
load-balanced/fault-tolerant using the HP NIC software. For each pair,
the members are connected to separate switches (CISCO 3750 series).
We'd have to have a pretty significant hardware failure for the servers
to completely lose their connectivity (and at that point, we'd have a
lot of other problems... the database would be the least of our
troubles).

Long story short... even using cheap hardware, you should be able to
provide redundant NICs for each node.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 09:40 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com