Unix Technical Forum

HACMP and failed network

This is a discussion on HACMP and failed network within the AIX Operating System forums, part of the Unix Operating Systems category; --> Over the weekend we had the following to occur and I've been tasked to find out why. We had ...


Go Back   Unix Technical Forum > Unix Operating Systems > AIX Operating System

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 01-04-2008, 07:52 PM
St. Claire
 
Posts: n/a
Default HACMP and failed network

Over the weekend we had the following to occur and I've been tasked to
find out why. We had a complete network failure and the HA servers
crashed. Now to my understanding HA is supposed to detect network
failures and come into action. What could be some of the causes of
this?

Thanks to all replies
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 01-04-2008, 07:52 PM
Darrell Frappier
 
Posts: n/a
Default Re: HACMP and failed network

On 21 Jul 2003 04:46:59 -0700, tstclaire@hotmail.com (St. Claire)
wrote:
>Over the weekend we had the following to occur and I've been tasked to
>find out why. We had a complete network failure and the HA servers
>crashed. Now to my understanding HA is supposed to detect network
>failures and come into action.


Your understanding is wrong. HA can detect and handle the failure of
a single adapter or network drop and switch to a standby. A complete
network failure means there are no working standby adapters so
switching is pointless. You need to work with IBM support to make
sure your levels of HA code are up to date. HA should be able to
survive a network failure based on my experience.


Darrell Frappier Detroit MI N3JWJ darrell at frappier dot us
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 01-04-2008, 07:53 PM
St. Claire
 
Posts: n/a
Default Re: HACMP and failed network

That makes sense. However, does this warrant a dead man switch
condition if the system is still operating normally without a network?


Darrell Frappier <darrell at frappier dot us> wrote in message news:<06930f653b5f963a3b8f2cf301177035@free.terane ws.com>...
> On 21 Jul 2003 04:46:59 -0700, tstclaire@hotmail.com (St. Claire)
> wrote:
> >Over the weekend we had the following to occur and I've been tasked to
> >find out why. We had a complete network failure and the HA servers
> >crashed. Now to my understanding HA is supposed to detect network
> >failures and come into action.

>
> Your understanding is wrong. HA can detect and handle the failure of
> a single adapter or network drop and switch to a standby. A complete
> network failure means there are no working standby adapters so
> switching is pointless. You need to work with IBM support to make
> sure your levels of HA code are up to date. HA should be able to
> survive a network failure based on my experience.
>
>
> Darrell Frappier Detroit MI N3JWJ darrell at frappier dot us

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 01-04-2008, 07:53 PM
Pete's
 
Posts: n/a
Default Re: HACMP and failed network

This does not warrant a Dead Man Switch. A dead-man switch is where
HACMP can't get enough CPU to set heartbeats over IP and Non-ip
networks. I think you're missing something here. If you experienced
a total network failure, was it just the IP network? If you
experience just an IP network failure, then, your non-ip network is
still available and HACMP sends it's heart beats over that, that way,
it knows that servers are still alive and it's just the IP net. If
you experienced a total failure, both IP and Non IP, then did you
experience a Power Failure in your Computer room or is your private
network between your cluster on the IP Network?

HTH,
Pete's

tstclaire@hotmail.com (St. Claire) wrote in message news:<8581fcb0.0307220654.3a1d7494@posting.google. com>...
> That makes sense. However, does this warrant a dead man switch
> condition if the system is still operating normally without a network?
>
>
> Darrell Frappier <darrell at frappier dot us> wrote in message news:<06930f653b5f963a3b8f2cf301177035@free.terane ws.com>...
> > On 21 Jul 2003 04:46:59 -0700, tstclaire@hotmail.com (St. Claire)
> > wrote:
> > >Over the weekend we had the following to occur and I've been tasked to
> > >find out why. We had a complete network failure and the HA servers
> > >crashed. Now to my understanding HA is supposed to detect network
> > >failures and come into action.

> >
> > Your understanding is wrong. HA can detect and handle the failure of
> > a single adapter or network drop and switch to a standby. A complete
> > network failure means there are no working standby adapters so
> > switching is pointless. You need to work with IBM support to make
> > sure your levels of HA code are up to date. HA should be able to
> > survive a network failure based on my experience.
> >
> >
> > Darrell Frappier Detroit MI N3JWJ darrell at frappier dot us

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 01-04-2008, 07:54 PM
sasa queer
 
Posts: n/a
Default Re: HACMP and failed network

In real sense, by default HACMP does not take any action against Network failure
event except logging in the HACMP log files.
However you can customize HACMP to take any specific action ( e.g to run
some script ) or you can promote network failure to node failure.
However default behavior is to just log the event.

SASA

tstclaire@hotmail.com (St. Claire) wrote in message news:<8581fcb0.0307230950.6f3ac7d@posting.google.c om>...
> I didn't think it should but I'm really confused as to why it
> happened. We have two [tcp/ip] networks here and they both went down
> but there was no power failure. We have one non ip network and that
> one was fine. There was no system dump which tells me that the system
> did not crash; I guess it just hanged. My reason for thinking it's HA
> related is because the only systems that 'crashed' were the HA nodes.
>
>
>
> empete2000@yahoo.com (Pete's) wrote in message news:<6724a51f.0307230452.77f19d7c@posting.google. com>...
> > This does not warrant a Dead Man Switch. A dead-man switch is where
> > HACMP can't get enough CPU to set heartbeats over IP and Non-ip
> > networks. I think you're missing something here. If you experienced
> > a total network failure, was it just the IP network? If you
> > experience just an IP network failure, then, your non-ip network is
> > still available and HACMP sends it's heart beats over that, that way,
> > it knows that servers are still alive and it's just the IP net. If
> > you experienced a total failure, both IP and Non IP, then did you
> > experience a Power Failure in your Computer room or is your private
> > network between your cluster on the IP Network?
> >
> > HTH,
> > Pete's
> >
> > tstclaire@hotmail.com (St. Claire) wrote in message news:<8581fcb0.0307220654.3a1d7494@posting.google. com>...
> > > That makes sense. However, does this warrant a dead man switch
> > > condition if the system is still operating normally without a network?
> > >
> > >
> > > Darrell Frappier <darrell at frappier dot us> wrote in message news:<06930f653b5f963a3b8f2cf301177035@free.terane ws.com>...
> > > > On 21 Jul 2003 04:46:59 -0700, tstclaire@hotmail.com (St. Claire)
> > > > wrote:
> > > > >Over the weekend we had the following to occur and I've been tasked to
> > > > >find out why. We had a complete network failure and the HA servers
> > > > >crashed. Now to my understanding HA is supposed to detect network
> > > > >failures and come into action.
> > > >
> > > > Your understanding is wrong. HA can detect and handle the failure of
> > > > a single adapter or network drop and switch to a standby. A complete
> > > > network failure means there are no working standby adapters so
> > > > switching is pointless. You need to work with IBM support to make
> > > > sure your levels of HA code are up to date. HA should be able to
> > > > survive a network failure based on my experience.
> > > >
> > > >
> > > > Darrell Frappier Detroit MI N3JWJ darrell at frappier dot us

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 12:17 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com