This is a discussion on HACMP and failed network within the AIX Operating System forums, part of the Unix Operating Systems category; --> Over the weekend we had the following to occur and I've been tasked to find out why. We had ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Over the weekend we had the following to occur and I've been tasked to find out why. We had a complete network failure and the HA servers crashed. Now to my understanding HA is supposed to detect network failures and come into action. What could be some of the causes of this? Thanks to all replies |
| |||
| On 21 Jul 2003 04:46:59 -0700, tstclaire@hotmail.com (St. Claire) wrote: >Over the weekend we had the following to occur and I've been tasked to >find out why. We had a complete network failure and the HA servers >crashed. Now to my understanding HA is supposed to detect network >failures and come into action. Your understanding is wrong. HA can detect and handle the failure of a single adapter or network drop and switch to a standby. A complete network failure means there are no working standby adapters so switching is pointless. You need to work with IBM support to make sure your levels of HA code are up to date. HA should be able to survive a network failure based on my experience. Darrell Frappier Detroit MI N3JWJ darrell at frappier dot us |
| |||
| That makes sense. However, does this warrant a dead man switch condition if the system is still operating normally without a network? Darrell Frappier <darrell at frappier dot us> wrote in message news:<06930f653b5f963a3b8f2cf301177035@free.terane ws.com>... > On 21 Jul 2003 04:46:59 -0700, tstclaire@hotmail.com (St. Claire) > wrote: > >Over the weekend we had the following to occur and I've been tasked to > >find out why. We had a complete network failure and the HA servers > >crashed. Now to my understanding HA is supposed to detect network > >failures and come into action. > > Your understanding is wrong. HA can detect and handle the failure of > a single adapter or network drop and switch to a standby. A complete > network failure means there are no working standby adapters so > switching is pointless. You need to work with IBM support to make > sure your levels of HA code are up to date. HA should be able to > survive a network failure based on my experience. > > > Darrell Frappier Detroit MI N3JWJ darrell at frappier dot us |
| |||
| This does not warrant a Dead Man Switch. A dead-man switch is where HACMP can't get enough CPU to set heartbeats over IP and Non-ip networks. I think you're missing something here. If you experienced a total network failure, was it just the IP network? If you experience just an IP network failure, then, your non-ip network is still available and HACMP sends it's heart beats over that, that way, it knows that servers are still alive and it's just the IP net. If you experienced a total failure, both IP and Non IP, then did you experience a Power Failure in your Computer room or is your private network between your cluster on the IP Network? HTH, Pete's tstclaire@hotmail.com (St. Claire) wrote in message news:<8581fcb0.0307220654.3a1d7494@posting.google. com>... > That makes sense. However, does this warrant a dead man switch > condition if the system is still operating normally without a network? > > > Darrell Frappier <darrell at frappier dot us> wrote in message news:<06930f653b5f963a3b8f2cf301177035@free.terane ws.com>... > > On 21 Jul 2003 04:46:59 -0700, tstclaire@hotmail.com (St. Claire) > > wrote: > > >Over the weekend we had the following to occur and I've been tasked to > > >find out why. We had a complete network failure and the HA servers > > >crashed. Now to my understanding HA is supposed to detect network > > >failures and come into action. > > > > Your understanding is wrong. HA can detect and handle the failure of > > a single adapter or network drop and switch to a standby. A complete > > network failure means there are no working standby adapters so > > switching is pointless. You need to work with IBM support to make > > sure your levels of HA code are up to date. HA should be able to > > survive a network failure based on my experience. > > > > > > Darrell Frappier Detroit MI N3JWJ darrell at frappier dot us |
| ||||
| In real sense, by default HACMP does not take any action against Network failure event except logging in the HACMP log files. However you can customize HACMP to take any specific action ( e.g to run some script ) or you can promote network failure to node failure. However default behavior is to just log the event. SASA tstclaire@hotmail.com (St. Claire) wrote in message news:<8581fcb0.0307230950.6f3ac7d@posting.google.c om>... > I didn't think it should but I'm really confused as to why it > happened. We have two [tcp/ip] networks here and they both went down > but there was no power failure. We have one non ip network and that > one was fine. There was no system dump which tells me that the system > did not crash; I guess it just hanged. My reason for thinking it's HA > related is because the only systems that 'crashed' were the HA nodes. > > > > empete2000@yahoo.com (Pete's) wrote in message news:<6724a51f.0307230452.77f19d7c@posting.google. com>... > > This does not warrant a Dead Man Switch. A dead-man switch is where > > HACMP can't get enough CPU to set heartbeats over IP and Non-ip > > networks. I think you're missing something here. If you experienced > > a total network failure, was it just the IP network? If you > > experience just an IP network failure, then, your non-ip network is > > still available and HACMP sends it's heart beats over that, that way, > > it knows that servers are still alive and it's just the IP net. If > > you experienced a total failure, both IP and Non IP, then did you > > experience a Power Failure in your Computer room or is your private > > network between your cluster on the IP Network? > > > > HTH, > > Pete's > > > > tstclaire@hotmail.com (St. Claire) wrote in message news:<8581fcb0.0307220654.3a1d7494@posting.google. com>... > > > That makes sense. However, does this warrant a dead man switch > > > condition if the system is still operating normally without a network? > > > > > > > > > Darrell Frappier <darrell at frappier dot us> wrote in message news:<06930f653b5f963a3b8f2cf301177035@free.terane ws.com>... > > > > On 21 Jul 2003 04:46:59 -0700, tstclaire@hotmail.com (St. Claire) > > > > wrote: > > > > >Over the weekend we had the following to occur and I've been tasked to > > > > >find out why. We had a complete network failure and the HA servers > > > > >crashed. Now to my understanding HA is supposed to detect network > > > > >failures and come into action. > > > > > > > > Your understanding is wrong. HA can detect and handle the failure of > > > > a single adapter or network drop and switch to a standby. A complete > > > > network failure means there are no working standby adapters so > > > > switching is pointless. You need to work with IBM support to make > > > > sure your levels of HA code are up to date. HA should be able to > > > > survive a network failure based on my experience. > > > > > > > > > > > > Darrell Frappier Detroit MI N3JWJ darrell at frappier dot us |