Unix Technical Forum

Unable to make outbound connections all of a sudden

This is a discussion on Unable to make outbound connections all of a sudden within the Ingres forums, part of the Database Server Software category; --> Hi folks We has a live production installation with 250+ clients connected via an openroad 4.1 application, the b/e ...


Go Back   Unix Technical Forum > Database Server Software > Ingres

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-20-2008, 10:39 PM
steveg15
 
Posts: n/a
Default Unable to make outbound connections all of a sudden

Hi folks

We has a live production installation with 250+ clients connected via
an openroad 4.1 application, the b/e is AIX 5.2 . This has had no h/w
changes for 2 years, no upgrade to Ingres (II 2.6/0305 (rs4.us5/00)
patch 10384), no CBF changes or PC changes (NODES , that we've been
told about) . The o/r application has not changed for some months. We
have load balancing with 5 iigcc's starting (one interesting point is
the fact we now only appear to have one of them clocking up CPU time
when looking at a ps -ef output).

Two days ago the following error started appearing in the Ingres
errlog :-

Y8707E ::[37528 IIGCC, 0000000d]: Wed Feb 7 16:03:18 2007
E_CLFE05_BS_CONNECT_ERR Unable to make outgoing connection.
Y8707E ::[37528 IIGCC, 0000000d]: System communication error:
Socket is not connected.

A restart to Ingres and all will work ok for an hour or so.

Another Company supports the AIX server, although we do have Ingres
access (but not root), they are saying this is an Ingres problem.....
We do also have access to errpt and there is an error showing :-

LABEL: GOENT_LINK_DOWN
IDENTIFIER: DED8E752

Date/Time: Thu 8 Feb 08:03:23 2007
Sequence Number: 515851
Machine Id: 005C90DF4C00
Node Id: Y8707E
Class: H
Type: TEMP
Resource Name: ent1
Resource Class: adapter
Resource Type: 14106902
Location: U0.1-P1-I1/E1
VPD:
Product Specific.( ).......10/100/1000 Base-TX PCI-X Adapter
Part Number.................00P4501
FRU Number..................00P4501
EC Level....................H12511
Manufacture ID..............YL1021
Network Address.............000255535899
ROM Level (alterable).......GOL002
Description
ETHERNET DOWN

Probable Causes
CABLE
CSMA/CD ADAPTER

Failure Causes
LINK TIMEOUT

Recommended Actions
CHECK CABLE AND ITS CONNECTIONS

Detail Data
FILE NAME
line: 191 file: goent_limbo.c
PCI ETHERNET STATISTICS
0000 0005 0063 0853 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0001
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000
0000 0000 0000 0000 0000 0000 0000 0002 0000 0001 0000 0001 0000 0000
0000 0000
0000 0000 0000 0000 0000 0000 0000 BB80 00F0 0249 0C00 0000 0000 01A0
0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
DEVICE DRIVER INTERNAL STATE
3333 3333 0000 0000 0000 0000
SOURCE ADDRESS
0002 5553 5899

We are being told this is a physical NIC which is not been used and
nothing to do with the problem. ....

Whilst the fault is present you cannot access IPM (except with -s) or
iinamu. Even when you first re-start Ingres, although you can start
IPM you cannot access the "Server" option?

I've seen a number of old posts on here on the subject, but little in
the way of responses, so assume this is either a silly question,
vague or not a well known subject. Funny enough those looking after
the server have told us they've requested an engineer to look at the
server (probably 24-48 hours away...)? My knowledge of TCP/IP is nil,
but from what I've read this looks like a server problem related to
TCP sockets i.e. a server problem. But any suggestions would be most
welcome as there are 250 user that can't work.

Steve

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-20-2008, 10:39 PM
steveg15
 
Posts: n/a
Default Re: Unable to make outbound connections all of a sudden

On Feb 8, 12:23 pm, "steveg15" <steve....@atosorigin.com> wrote:
> Hi folks
>
> We has a live production installation with 250+ clients connected via
> an openroad 4.1 application, the b/e is AIX 5.2 . This has had no h/w
> changes for 2 years, no upgrade to Ingres (II 2.6/0305 (rs4.us5/00)
> patch 10384), no CBF changes or PC changes (NODES , that we've been
> told about) . The o/r application has not changed for some months. We
> have load balancing with 5 iigcc's starting (one interesting point is
> the fact we now only appear to have one of them clocking up CPU time
> when looking at a ps -ef output).
>
> Two days ago the following error started appearing in the Ingres
> errlog :-
>
> Y8707E ::[37528 IIGCC, 0000000d]: Wed Feb 7 16:03:18 2007
> E_CLFE05_BS_CONNECT_ERR Unable to make outgoing connection.
> Y8707E ::[37528 IIGCC, 0000000d]: System communication error:
> Socket is not connected.
>
> A restart to Ingres and all will work ok for an hour or so.
>
> Another Company supports the AIX server, although we do have Ingres
> access (but not root), they are saying this is an Ingres problem.....
> We do also have access to errpt and there is an error showing :-
>
> LABEL: GOENT_LINK_DOWN
> IDENTIFIER: DED8E752
>
> Date/Time: Thu 8 Feb 08:03:23 2007
> Sequence Number: 515851
> Machine Id: 005C90DF4C00
> Node Id: Y8707E
> Class: H
> Type: TEMP
> Resource Name: ent1
> Resource Class: adapter
> Resource Type: 14106902
> Location: U0.1-P1-I1/E1
> VPD:
> Product Specific.( ).......10/100/1000 Base-TX PCI-X Adapter
> Part Number.................00P4501
> FRU Number..................00P4501
> EC Level....................H12511
> Manufacture ID..............YL1021
> Network Address.............000255535899
> ROM Level (alterable).......GOL002
> Description
> ETHERNET DOWN
>
> Probable Causes
> CABLE
> CSMA/CD ADAPTER
>
> Failure Causes
> LINK TIMEOUT
>
> Recommended Actions
> CHECK CABLE AND ITS CONNECTIONS
>
> Detail Data
> FILE NAME
> line: 191 file: goent_limbo.c
> PCI ETHERNET STATISTICS
> 0000 0005 0063 0853 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
> 0000 0001
> 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
> 0000 0000
> 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
> 0000 0000
> 0000 0000 0000 0000 0000 0000 0000 0002 0000 0001 0000 0001 0000 0000
> 0000 0000
> 0000 0000 0000 0000 0000 0000 0000 BB80 00F0 0249 0C00 0000 0000 01A0
> 0000 0000
> 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
> DEVICE DRIVER INTERNAL STATE
> 3333 3333 0000 0000 0000 0000
> SOURCE ADDRESS
> 0002 5553 5899
>
> We are being told this is a physical NIC which is not been used and
> nothing to do with the problem. ....
>
> Whilst the fault is present you cannot access IPM (except with -s) or
> iinamu. Even when you first re-start Ingres, although you can start
> IPM you cannot access the "Server" option?
>
> I've seen a number of old posts on here on the subject, but little in
> the way of responses, so assume this is either a silly question,
> vague or not a well known subject. Funny enough those looking after
> the server have told us they've requested an engineer to look at the
> server (probably 24-48 hours away...)? My knowledge of TCP/IP is nil,
> but from what I've read this looks like a server problem related to
> TCP sockets i.e. a server problem. But any suggestions would be most
> welcome as there are 250 user that can't work.
>
> Steve


Paul

The config.dat parameters have not changed.

Ingres is started in batch (by us) as Ingres

Netstat shows a number of users connected (when any of them can
connect), but the bulk of them are using a single port. Out of 194
users 180 will be against the second port (18065 in our case) and the
rest split over the other 4.

One significant factor that's since come to light is the fact the
licence has been out of date since 01/01/07 (this is not supported by
us). The CA site indicates there is some problem, but what this
results in is somewhat vague... http://supportconnectw.ca.com/public/
ca_common_docs/lic_expire.asp. What's been said in this looks to me
like CA do not know the implications, but I think the needs to be
fixed a.s.a.p. There is and engineer looking at the AIX server as I
write....

Steve

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-20-2008, 10:39 PM
Karl & Betty Schendel
 
Posts: n/a
Default Re: [Info-Ingres] Unable to make outbound connections all of asudden

At 4:23 AM -0800 2/8/07, steveg15 wrote:
>[snip]




>Whilst the fault is present you cannot access IPM (except with -s) or
>iinamu. Even when you first re-start Ingres, although you can start
>IPM you cannot access the "Server" option?


Sounds like the Name Server is getting hosed somehow. Is it
running? Can you connect to the DBMS server directly?
(find out what the port number for the dbms server is, by
looking in errlog.log if necessary, and
export II_DBMS_SERVER=portnumber
sql some-database
locally on the aix box.)

If you can connect to the DBMS server directly, it sounds like a
stuck name server. It's not immediately clear to me why
the name server would be hanging, but it is odd that this started
all of a sudden.

Am I right in thinking that existing connections are OK, it's
just that no new ones can be made when this happens?

It does sound as if something changed on at least some of the
clients, such that you're burying one of the comm servers instead
of using them all. I don't see how that would be related to the hang.
(GCC/GCN is not my area of expertise though.)

Karl
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-20-2008, 10:39 PM
Jim Gramling
 
Posts: n/a
Default Re: Unable to make outbound connections all of a sudden

Steve said:

> We has a live production installation with 250+ clients connected via
> an openroad 4.1 application, the b/e is AIX 5.2 .

<snip>
>We have load balancing with 5 iigcc's starting (one interesting point is
> the fact we now only appear to have one of them clocking up CPU time
> when looking at a ps -ef output).

'
Steve, what does the client netutil look like? do you have merged
vnode entries on each client (II0, II1, II2, etc.)? Is each client
netutil configured seperately or do you use some kind of funky, shared
network client installation? I know you say that nothing has changed,
but if that is the case, if one user changed the netutil installation,
it might have affected all clients.

If you do have merged netutil entries, can you do a test connection to
each gcc within netutil from a remote client?

Another thing ... this is a longshot, but, are you by any chance using
installation passwords? If so, check the following file: $II_SYSTEM/
ingres/files/name/IILTICKET_*. Does it seem really large and
growing? Is the iigcn process consuming an inordinate amount of cpu?
If so, do this :

ingstop -iigcn
rm $II_SYSTEM/ingres/files/name/IILTICKET_*
ingstart -iigcn

Then get back in touch with me: There is a known Ingres bug that can
bring the name server to a screeching halt when you have frequent
connections and use installation passwords; used installation password
"tickets" aren't correctly "cleaned up", causing the IILTICKET file to
grow forever. When it reaches "critical mass" (in my case, that was a
couple-hundred Mb), iigcn cpu use soars and the gcn no longer can
respond adequately to remote connection requests. I wouldn't think
that 250 users would be enough to cause you problems (actually, I
wouldn't even think that you would need five iigcc's), but anything
is possible.

These are the only things that come to mind. If you are desperate and
really adventurous, you can activate GCN tracing, which might shed
some additional light on what's going on.

export II_GCA_LOG="mytracefile.out"
export II_GCN_TRACE=5

ingstop -iigcn
ingstart -iigcn

Good luck!

Jim Gramling
Rio de Janeiro

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-20-2008, 10:39 PM
Jim Gramling
 
Posts: n/a
Default Re: Unable to make outbound connections all of a sudden

Sorry ... I just noticed that, in your post, you said "unable to make
outbound connections" ... so I misunderstood your post: your problem
is on the client machine? What kind of machine is it? running AIX
also? Have you tried rebooting your client machine?

Still could be a ticket issue on the server side if you are using
installation passwords. Try bypassing the name server as Karl
suggested; also, take a look at "show comsvr *" in iinamu to see if
all five gcc's are actually registered.

Regards,

Jim Gramling
Rio de Janeiro

On Feb 8, 8:13 pm, "Jim Gramling" <jimwgraml...@gmail.com> wrote:
> Steve said:
>
> > We has a live production installation with 250+ clients connected via
> > an openroad 4.1 application, the b/e is AIX 5.2 .

> <snip>
> >We have load balancing with 5 iigcc's starting (one interesting point is
> > the fact we now only appear to have one of them clocking up CPU time
> > when looking at a ps -ef output).

>
> '
> Steve, what does the client netutil look like? do you have merged
> vnode entries on each client (II0, II1, II2, etc.)? Is each client
> netutil configured seperately or do you use some kind of funky, shared
> network client installation? I know you say that nothing has changed,
> but if that is the case, if one user changed the netutil installation,
> it might have affected all clients.
>
> If you do have merged netutil entries, can you do a test connection to
> each gcc within netutil from a remote client?
>
> Another thing ... this is a longshot, but, are you by any chance using
> installation passwords? If so, check the following file: $II_SYSTEM/
> ingres/files/name/IILTICKET_*. Does it seem really large and
> growing? Is the iigcn process consuming an inordinate amount of cpu?
> If so, do this :
>
> ingstop -iigcn
> rm $II_SYSTEM/ingres/files/name/IILTICKET_*
> ingstart -iigcn
>
> Then get back in touch with me: There is a known Ingres bug that can
> bring the name server to a screeching halt when you have frequent
> connections and use installation passwords; used installation password
> "tickets" aren't correctly "cleaned up", causing the IILTICKET file to
> grow forever. When it reaches "critical mass" (in my case, that was a
> couple-hundred Mb), iigcn cpu use soars and the gcn no longer can
> respond adequately to remote connection requests. I wouldn't think
> that 250 users would be enough to cause you problems (actually, I
> wouldn't even think that you would need five iigcc's), but anything
> is possible.
>
> These are the only things that come to mind. If you are desperate and
> really adventurous, you can activate GCN tracing, which might shed
> some additional light on what's going on.
>
> export II_GCA_LOG="mytracefile.out"
> export II_GCN_TRACE=5
>
> ingstop -iigcn
> ingstart -iigcn
>
> Good luck!
>
> Jim Gramling
> Rio de Janeiro



Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 06:44 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com