born2veg@yahoo.com (HP_BRS) writes:
> Beyond making sure that you are up-to-date on driver patches, you
> should check the switch/hub/cable/other that is involved in the
> topology. MC/SG is only as good as the underlying architecture. You
> should ensure that you have done things like configuring standby NICs,
Standby NICs will not avoid the LAN switching. I think the "cable
disconnect" is triggered when it should not. Using a standby NIC will
reduce the probability of two such events being triggered
simultaneously, but will not solve the problem.
> running MC/SG heartbeat packets over multiple LANs, multiple sources
> of power, etc. Also, if a LANic outage does not warrant the package
> going down, you can configure the package to stay up.
Both nodes have a three-way power path, dual path to disks, and two
LANs for hardbeat. However if the LAN a package uses fails in a way as
described, the package is downed and restarted for nothing. If the
package is a huge database application that serves hundreds of users,
you don't like packet "switching for fun" very much.
Anyway, patches are quite up-to-date, and I'll investigate the problem
further.
Thanks for the comment.
Ulrich
>
> Good luck,
>
> HP_BRS
>
>
> Ulrich Windl <Ulrich.Windl@RZ.Uni-Regensburg.DE> wrote in message news:<m31xuno8dn.fsf@pc5234.klinik.uni-regensburg.de>...
> > Hello,
> >
> > at the moment we are running the latest ServiceGuard release with
> > Support Plus patches from June 2003 on a two-node-cluster consisting
> > of two L3000 running HP-UX 11.11. The system are connected via a 100
> > Base-T LAN and a 1000 Base-T LAN.
> >
> > Since May we had two network faults reported by cmcld that were not
> > reproducible. SG had stopped the packages depending on the LAN
> > interface, so it was not very good.
> >
> > I've watched /var/adm/nettl*, and all I could see was a "...bad cable
> > connection...", "...going Offline @ [0/0/0/0] [Cable disconnected]".
> >
> > What worries me is that nettl never reports when the LAN interface is
> > up again. According to cmcld the outage lasted exactly for 20 seconds.
> > Last time it lasted exactly for 30 seconds.
> >
> > I guess that the driver is doing nonsense; maybe if the load is high.
> > Can somebody explain what condition exactly triggers the "cable
> > disconnected" message in the driver?
> >
> > Any insights?