Unix Technical Forum

HACMP failover service IP issue

This is a discussion on HACMP failover service IP issue within the AIX Operating System forums, part of the Unix Operating Systems category; --> We had an unplanned (ha!) HACMP event yesterday. For whatever reason, the secondary node couldn't activate the resource group ...


Go Back   Unix Technical Forum > Unix Operating Systems > AIX Operating System

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  1 links from elsewhere to this Post. Click to view. #1 (permalink)  
Old 01-05-2008, 02:40 AM
A. Gordon Lyph
 
Posts: n/a
Default HACMP failover service IP issue

We had an unplanned (ha!) HACMP event yesterday. For whatever reason,
the secondary node couldn't activate the resource group (read: I
believe the admin who configured HACMP didn't set it up right). At
any rate, I restarted the primary node, restarted Cluster services and
expected it to swap the service IP onto the boot adapter.
Unfortunately, it didn't. As this is a mission critical (man, I hate
that term) app, I ifconfig'd the interface back to the service IP so
the users could get back in. Of course, errpt is showing 'local
adapter misconfiguration.'

This morning, I took the secondary node out of the cluster, manually
shutdown all of the applications on the primary node, then rebooted
the primary node, thinking that HACMP would do the boot->svc IP swap
on reboot. Not so. I tried synchronizing cluster resources,
activating resource group, etc. (with and without the secondary node
in the cluster) all to no avail. I wound up ifconfig'ing the
interface again so the users could do their work.

Any thoughts on why HACMP would not automagically swap the boot and
svc IPs? I'm definitely not HACMP certified, but from what I've read
it should acquire the resource and setup the topology. Am I missing
something?
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 01-05-2008, 02:40 AM
Andreas Schulze
 
Posts: n/a
Default Re: HACMP failover service IP issue

"A. Gordon Lyph" <a_g_lyph@hotmail.com> schrieb im Newsbeitrag
news:f8479612.0406290331.239b535c@posting.google.c om...
> We had an unplanned (ha!) HACMP event yesterday. For whatever reason,
> the secondary node couldn't activate the resource group (read: I
> believe the admin who configured HACMP didn't set it up right). At
> any rate, I restarted the primary node, restarted Cluster services and
> expected it to swap the service IP onto the boot adapter.
> Unfortunately, it didn't. As this is a mission critical (man, I hate
> that term) app, I ifconfig'd the interface back to the service IP so
> the users could get back in. Of course, errpt is showing 'local
> adapter misconfiguration.'
>
> This morning, I took the secondary node out of the cluster, manually
> shutdown all of the applications on the primary node, then rebooted
> the primary node, thinking that HACMP would do the boot->svc IP swap
> on reboot. Not so. I tried synchronizing cluster resources,
> activating resource group, etc. (with and without the secondary node
> in the cluster) all to no avail. I wound up ifconfig'ing the
> interface again so the users could do their work.
>
> Any thoughts on why HACMP would not automagically swap the boot and
> svc IPs? I'm definitely not HACMP certified, but from what I've read
> it should acquire the resource and setup the topology. Am I missing
> something?


Post the output of cllsif.


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 01-05-2008, 02:41 AM
Rizwan Abbasi
 
Posts: n/a
Default Re: HACMP failover service IP issue

Hi,

Please also tell us the version of HACMP and AIX, oslevel -r, as well as
which the box / model type on which you are running the HACMP.

Also here are the steps you should start with

1. Check the cluster log file i.e. hacmp.out, cluster.log, clstrmgr.debug
2. Check the cluster processes i.e. lssrc -g cluster
3. check the services i.e. lssrc -g grpsvcs and lssrc -g emsvcs

Also as describe earlier we need the output of these two command
#/usr/es/sbin/cluster/utilities/cllsif

#/usr/es/sbin/cluster/utilities/cllsnode


Riz.

"Andreas Schulze" <b79xan@gmx.de> wrote in message
news:2kd6o1Fv9a6U1@uni-berlin.de...
> "A. Gordon Lyph" <a_g_lyph@hotmail.com> schrieb im Newsbeitrag
> news:f8479612.0406290331.239b535c@posting.google.c om...
> > We had an unplanned (ha!) HACMP event yesterday. For whatever reason,
> > the secondary node couldn't activate the resource group (read: I
> > believe the admin who configured HACMP didn't set it up right). At
> > any rate, I restarted the primary node, restarted Cluster services and
> > expected it to swap the service IP onto the boot adapter.
> > Unfortunately, it didn't. As this is a mission critical (man, I hate
> > that term) app, I ifconfig'd the interface back to the service IP so
> > the users could get back in. Of course, errpt is showing 'local
> > adapter misconfiguration.'
> >
> > This morning, I took the secondary node out of the cluster, manually
> > shutdown all of the applications on the primary node, then rebooted
> > the primary node, thinking that HACMP would do the boot->svc IP swap
> > on reboot. Not so. I tried synchronizing cluster resources,
> > activating resource group, etc. (with and without the secondary node
> > in the cluster) all to no avail. I wound up ifconfig'ing the
> > interface again so the users could do their work.
> >
> > Any thoughts on why HACMP would not automagically swap the boot and
> > svc IPs? I'm definitely not HACMP certified, but from what I've read
> > it should acquire the resource and setup the topology. Am I missing
> > something?

>
> Post the output of cllsif.
>
>



Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 01-05-2008, 02:41 AM
A. Gordon Lyph
 
Posts: n/a
Default Re: HACMP failover service IP issue

"Rizwan Abbasi" <abbasi2@attglobal.net> wrote in message news:<40e25458_3@news2.prserv.net>...

> Please also tell us the version of HACMP and AIX, oslevel -r, as well as
> which the box / model type on which you are running the HACMP.


Any help is greatly appreciated. I'm not terribly confident in this
box, given the goings on. Here we go...

oslevel -r: 5100-03

lslpp -l | grep -i hacmp:

cluster.msg.en_US.cspoc 4.5.0.2 COMMITTED HACMP CSPOC Messages
- U.S.
rsct.basic.hacmp 2.2.1.30 COMMITTED RSCT Basic Function
(HACMP/ES
rsct.compat.basic.hacmp 2.2.1.30 COMMITTED RSCT Event Management
Basic
Function (HACMP/ES
Support)
rsct.compat.clients.hacmp
Function (HACMP/ES
Support)

> 1. Check the cluster log file i.e. hacmp.out, cluster.log, clstrmgr.debug


hacmp.out is empty, not sure what I'm supposed to be looking for in
cluster.log. clstrmgr.debug has been overwritten and rotated such
that the log containing the outage on Monday is gone.

> 2. Check the cluster processes i.e. lssrc -g cluster


Subsystem Group PID Status
clstrmgrES cluster 20148 active
clsmuxpdES cluster 20666 active
clinfoES cluster 21160 active

> 3. check the services i.e. lssrc -g grpsvcs and lssrc -g emsvcs


lssrc -g grpsvcs:

Subsystem Group PID Status
grpsvcs grpsvcs 18112 active
grpglsm grpsvcs inoperative

(hrmm, grpglsm inop?)

lssrc -g emsvcs:

Subsystem Group PID Status
emsvcs emsvcs 19872 active
emaixos emsvcs 15918 active

> Also as describe earlier we need the output of these two command
> #/usr/es/sbin/cluster/utilities/cllsif

(forgive the formatting, please)

Adapter Type Network Net Type Attribute Node IP Address Hardware
Address Interfa
ce Name Global Name Netmask

lawpriboot boot ether1 ether public lawpri
10.5
..53.19 en2
255.255.0.0
lawprisvc service ether1 ether public lawpri
10.5
..31.14
255.255.0.0
lawpristby standby ether1 ether public lawpri
10.9
..53.19 en1
255.255.0.0
lawpri-tty1 service rs232a rs232 serial lawpri
/dev
/tty1
lawsecsvc service ether1 ether public lawsec
10.5
..31.15 en2
255.255.0.0
lawsecstby standby ether1 ether public lawsec
10.9
..53.20 en1
255.255.0.0
lawsec-tty1 service rs232a rs232 serial lawsec
/dev
/tty1

> #/usr/es/sbin/cluster/utilities/cllsnode


NODE lawpri:
Interfaces to network ether1
boot Interface: Name lawpriboot, Attribute public, IP
address 1
0.5.53.19
service Interface: Name lawprisvc, Attribute public,
IP address
10.5.31.14
standby Interface: Name lawpristby, Attribute public,
IP addres
s 10.9.53.19
Interfaces to network rs232a
service Interface: Name lawpri-tty1, Attribute serial,
IP addre
ss /dev/tty1

NODE lawsec:
Interfaces to network ether1
service Interface: Name lawsecsvc, Attribute public,
IP address
10.5.31.15
standby Interface: Name lawsecstby, Attribute public,
IP addres
s 10.9.53.20
Interfaces to network rs232a
service Interface: Name lawsec-tty1, Attribute serial,
IP addre
ss /dev/tty1
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 01-05-2008, 02:44 AM
Simon Marchese
 
Posts: n/a
Default Re: HACMP failover service IP issue

A. Gordon Lyph wrote:
>
> hacmp.out is empty, not sure what I'm supposed to be looking for in
> cluster.log. clstrmgr.debug has been overwritten and rotated such
> that the log containing the outage on Monday is gone.
>

WHat the guy meant was the hacmp.out with content for that day. If you
"ls -alt /tmp/hacmp.out*" you will see that there are /tmp/hacmp.out.[1-7].
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 01-05-2008, 02:45 AM
Mike
 
Posts: n/a
Default Re: HACMP failover service IP issue

a_g_lyph@hotmail.com (A. Gordon Lyph) wrote in message news:<f8479612.0406290331.239b535c@posting.google. com>...
> We had an unplanned (ha!) HACMP event yesterday. For whatever reason,
> the secondary node couldn't activate the resource group (read: I
> believe the admin who configured HACMP didn't set it up right). At
> any rate, I restarted the primary node, restarted Cluster services and
> expected it to swap the service IP onto the boot adapter.
> Unfortunately, it didn't. As this is a mission critical (man, I hate
> that term) app, I ifconfig'd the interface back to the service IP so
> the users could get back in. Of course, errpt is showing 'local
> adapter misconfiguration.'
>
> This morning, I took the secondary node out of the cluster, manually
> shutdown all of the applications on the primary node, then rebooted
> the primary node, thinking that HACMP would do the boot->svc IP swap
> on reboot. Not so. I tried synchronizing cluster resources,
> activating resource group, etc. (with and without the secondary node
> in the cluster) all to no avail. I wound up ifconfig'ing the
> interface again so the users could do their work.
>
> Any thoughts on why HACMP would not automagically swap the boot and
> svc IPs? I'm definitely not HACMP certified, but from what I've read
> it should acquire the resource and setup the topology. Am I missing
> something?



I will try to help if I can. First I need to make sure I understand
the steps you took. You say "took the secondary node out of the
cluster". What does that mean?

If you rebooted both nodes, all interfaces should have the "boot"
address active (i.e. the one AIX activates). Now if you start cluster
services on one node, that node should acquire the resource group and
if the service IP address is a member of the resource group, it should
replace the boot address (or be added as an alias if that option is
used when defining the network to HACMP). When you start cluster
services on the other node, it may or may not take over the resource
group depending on the takeover policy.

Verify the service label is part of the resource group. If you
rebooted the primary node and the resource group was acquired on the
other node, it will not move to the primary node unless it has a
cascading policy and the node you rebooted appears first in the
resource group node list.
- Mike
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump

LinkBacks (?)
LinkBack to this Thread: http://www.unixadmintalk.com/aix-operating-system/3043-hacmp-failover-service-ip-issue.html

Posted By For Type Date
cfgmrgr unknown error - comp.unix.aix | Google Groups This thread Refback 07-01-2008 07:18 PM


All times are GMT. The time now is 05:18 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com