This is a discussion on Is there some restriction (or "kernel"parameter) for the depth of subshells (and functions in ksh) on AIX5.3-64bit after which these switch between correct and incorrect from level to level ? within the AIX Operating System forums, part of the Unix Operating Systems category; --> I have a very alarming (not to say "shocking!") behaviour on a customer's AIX (5.3-64 bit, the exact bos ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I have a very alarming (not to say "shocking!") behaviour on a customer's AIX (5.3-64 bit, the exact bos I haven't determined yet) that a ksh-script called from a deeper level (of shell-script-calls) inside of our Application fails to handle errorelvels correctly, in 2 different constructs, that is: receving the value of a return from a function wrongly and later getting the wrong errorlevel after the exit of a subshell, whereas the same script called directly from commandline behaves perfect. I try to make the situation simple but would be VERY happy for any ideas, so I could go into more details of course later... (I name the scripts here with extension referring to their shell, in real life they are all named *bat or without extension) from commandline directly: A.ksh calls B.ksh calls C.ksh calls D.ksh which has a function f1() which calls E.csh which has as last line exit 1 the next line inside of f1 consists of I=$? and normally there is I==1, the function goes on, returns (errorlevel irrelevant here) life goes on, B.ksh calls another M.csh which also calls the same substructure (just one level deeper, in fact some logwritermodule ...) again C.ksh calls D.ksh which has a function f1() which calls E.csh which has as last line exit 1 I=$? which also is ==1 B.ksh then later calls it's own function func1 with T=$(func1) where func1 echoes a text in one of it's lines and does return 1 as last line so R=$? in the next line should be (and is when calling A.ksh directly) 1 also. Called from a surrounding application which is called from commandline (consisting of some additional levels of shells [and a microfocus cobol programm making a SYSTEM-call of a script which calls A.ksh]) they get the values of 1 inside as well, that is E.csh makes exit 1 in both cases and func1 returns 1 BUT: the first exit 1 of E.csh is received as 0 in D.ksh, whereas the second (one level deeper [! not one less but 1 level more !]) is received correctly as 1 and the return 1 of func1 in B.ksh is misinterpreted as 0 (even higher in call-hiarchy than the E.csh-exits anyway). I debugged all relevant scripts B.ksh, D.ksh and E.csh including debugging of the functions in B and D to get those values and I might as well stop programming if I can't believe an exit or return anylonger. I can't reproduce the effect on my own AIX so I have a real problem of comparing configurations of a customer and our development machine which is bad I know (and worse not being root in any case) but I'm fishing for ideas. If it wouldn't work in any case (direct call of A.ksh and in the more complex way) I would think there's something installed bad anyway on the customer's machine but... Needless to say that the same did work well in both methods in a former release (of AIX 5.2.whatsoever...) and everything is reproduceable at the customer's machines (old: simple=ok, complex=ok; new: simple=ok,complex=notok) and what's worse: he's already using this new machine in production so there's no way back or possibility to delay some migration or so. They are working in that environment, and some important parts of the application just fail! Thank you for your patience and any comments welcome! bine |
| |||
| On Nov 8, 3:03 pm, bine <sabine.hubrig-schaumb...@sungard.de> wrote: > I have a very alarming (not to say "shocking!") behaviour on a > customer's AIX (5.3-64 bit, the exact bos I haven't determined yet) > that a ksh-script called from a deeper level (of shell-script-calls) > inside of our Application fails to handle errorelvels correctly, in 2 > different constructs, that is: receving the value of a return from a > function wrongly and later getting the wrong errorlevel after the exit > of a subshell, whereas the same script called directly from > commandline behaves perfect. > > I try to make the situation simple but would be VERY happy for any > ideas, so I could go into more details of course later... > (I name the scripts here with extension referring to their shell, in > real life they are all named *bat or without extension) > > from commandline directly: > > A.ksh calls > B.ksh calls > C.ksh calls > D.ksh which has a function > f1() which calls > E.csh which has as last line > exit 1 > the next line inside of f1 consists of > I=$? > and normally there is I==1, the function goes on, returns > (errorlevel irrelevant here) > life goes on, B.ksh calls another > M.csh which also calls the same substructure (just one level deeper, > in fact some logwritermodule ...) again > C.ksh calls > D.ksh which has a function > f1() which calls > E.csh which has as last line > exit 1 > I=$? which also is ==1 > B.ksh then later calls it's own function func1 with > T=$(func1) where func1 echoes a text in one of it's lines and does > return 1 as last line so > R=$? in the next line should be (and is when calling A.ksh directly) > 1 also. > > Called from a surrounding application which is called from commandline > (consisting of some additional levels of shells [and a microfocus > cobol programm making a SYSTEM-call of a script which calls A.ksh]) > they get the values of 1 inside as well, that is E.csh makes exit 1 in > both cases and func1 returns 1 > BUT: > the first exit 1 of E.csh is received as 0 in D.ksh, whereas the > second (one level deeper [! not one less but 1 level more !]) is > received correctly as 1 > and the return 1 of func1 in B.ksh is misinterpreted as 0 (even higher > in call-hiarchy than the E.csh-exits anyway). > > I debugged all relevant scripts B.ksh, D.ksh and E.csh including > debugging of the functions in B and D to get those values and I might > as well stop programming if I can't believe an exit or return > anylonger. > > I can't reproduce the effect on my own AIX so I have a real problem of > comparing configurations of a customer and our development machine > which is bad I know (and worse not being root in any case) but I'm > fishing for ideas. > > If it wouldn't work in any case (direct call of A.ksh and in the more > complex way) I would think there's something installed bad anyway on > the customer's machine but... > Needless to say that the same did work well in both methods in a > former release (of AIX 5.2.whatsoever...) and everything is > reproduceable at the customer's machines (old: simple=ok, complex=ok; > new: simple=ok,complex=notok) and what's worse: he's already using > this new machine in production so there's no way back or possibility > to delay some migration or so. They are working in that environment, > and some important parts of the application just fail! > > Thank you for your patience and any comments welcome! > bine are you sure that the application which is launching the script has the same environment set up as your command line? |
| |||
| On Nov 9, 9:03 am, bine <sabine.hubrig-schaumb...@sungard.de> wrote: I found that KSH functions can only be nested 9 deep |
| |||
| On 8 Nov., 21:36, a...@mail.com wrote: > > are you sure that the application which is launching the script has > the same environment set up as your command line?- Zitierten Text ausblenden - > yes, because in both cases there is a normal user login first and then the call of a menu-script to start that stuff, and I received the outputs of env and set from the customer because I speculated in that direction myself first, because in rela life/production we normally omit the login+menu-call (interactive) and start using cron which of course CAN BE totally different, but the menu is provided anyway and the later tests showed that cron (or what this customer preferres: controlM which even another env-possibility) is not the point. nevertheless thanks for asking as I don't fell that alone anymore ;-) |
| |||
| On 9 Nov., 02:56, Henry <snogfest_hosebe...@yahoo.com> wrote: > I found that KSH functions can only be nested 9 deep I heard something like that and thought in that direction and wrote a set of mini-shellscripts calling each other constantly deeper and deeper depending on a parameter I gave them (in fact even switching between ksh and csh from time to time as we do [unfortunately] have a mixture of both) and had no problem going down to (can't find it now, guess ... ) 30 levels (on my machine, so if there is a restriction is it installation/ parameter-dependent?) my next approach will be to omit that cobol-call inbetween (simply for testing of course, no chance at the moment because there's businesslogic involved as well nortmally, to use that as workaround in the future) to determine if the more complex calls even fail (at the customers) when they consist of shellscripts purely. thanks bine |
| |||
| On 9 Nov., 12:17, bine <sabine.hubrig-schaumb...@sungard.de> wrote: > On 9 Nov., 02:56, Henry <snogfest_hosebe...@yahoo.com> wrote: > > > I found that KSH functions can only be nested 9 deep > > I heard something like that and thought in that direction and wrote a > set of mini-shellscripts calling each other constantly deeper and > deeper depending on a parameter I gave them (in fact even switching > between ksh and csh from time to time as we do [unfortunately] have a > mixture of both) > and had no problem going down to (can't find it now, guess ... ) 30 > levels (on my machine, so if there is a restriction is it installation/ > parameter-dependent?) As I still have no real answer I want to press this point a bit further: on the "old" AIX5.2 AND on my own machine with 5.3 the application works perfectly well, so there can't be a global restriction but there has to be some kind of configuration. In fact we don't have 9 (or more) pure ksh but a mixture of csh and ksh (and functions inside of those, how do you count those?) and if I count correct, in level 10 the interpretation of the exit of a csh-script fails, one level deeper the same works, and the return of a function inside of a ksh more outside (in level 7, the 3rd ksh in the complete hirarchy) fails as well. any ideas? |
| |||
| .... > any ideas? try increasing ncargs and see if it fixes the problem. On each subshell the parent environment is given plus the additional new setting which might cause the subshell to run out of ENV memory. from IBM .... ncargs Purpose: Specifies the maximum allowable size of the ARG/ENV list (in 4 KB blocks) when running exec() subroutines. Values: Default: 6; Range: 6 to 1024 Display: lsattr -E -l sys0 -a ncargs Change: chdev -l sys0 -a ncargs=NewValue Change takes effect immediately and is preserved over boot. ..... To determine the correct OS level run: $ oslevel -s hth Hajo |
| |||
| On Nov 13, 6:35 pm, Hajo Ehlers <serv...@metamodul.com> wrote: > ... > > > any ideas? > > try increasing ncargs and see if it fixes the problem. On each > subshell the parent environment is given plus the additional new > setting which might cause the subshell to run out of ENV memory. > > from IBM > ... > ncargs > Purpose: Specifies the maximum allowable size of the ARG/ENV list (in > 4 KB blocks) when running exec() subroutines. > Values: Default: 6; Range: 6 to 1024 > Display: lsattr -E -l sys0 -a ncargs > Change: chdev -l sys0 -a ncargs=NewValue Change takes effect > immediately and is preserved over boot. > .... > > To determine the correct OS level run: > $ oslevel -s > > hth > Hajo Since you are nesting so deep, you may want to also check the number of open files allowed for the user ID under which the application runs. If you hit the limit, that can give unpredictable results. J. |
| |||
| On 14 Nov., 00:35, Hajo Ehlers <serv...@metamodul.com> wrote: > ... > > > any ideas? > > try increasing ncargs and see if it fixes the problem. On each > subshell the parent environment is given plus the additional new > setting which might cause the subshell to run out of ENV memory. > > from IBM > ... > ncargs > Purpose: Specifies the maximum allowable size of the ARG/ENV list (in > 4 KB blocks) when running exec() subroutines. > Values: Default: 6; Range: 6 to 1024 > Display: lsattr -E -l sys0 -a ncargs > Change: chdev -l sys0 -a ncargs=NewValue Change takes effect > immediately and is preserved over boot. > .... > > To determine the correct OS level run: > $ oslevel -s > > hth > Hajo thanks hajo, this sounds great (as long as I get a [hopefully ;-] smaller number back as info from my customer whom I have to ask about his value ...) my oslevel is 5300-06-01-0000 (which to me no longer beeing the "root" for many years now, so having forgotten all the details, seems to be almost the same info as "instfix -i | grep ML" told me, but I will keep that in mind). the customer told me as output of this "instfix" only ML5, I still have to check what IBM fixed with ML6. |
| ||||
| On 14 Nov., 18:33, jprob...@gmail.com wrote: > On Nov 13, 6:35 pm, Hajo Ehlers <serv...@metamodul.com> wrote: > > > > > > > ... > > > > any ideas? > > > try increasing ncargs and see if it fixes the problem. On each > > subshell the parent environment is given plus the additional new > > setting which might cause the subshell to run out of ENV memory. > > > from IBM > > ... > > ncargs > > Purpose: Specifies the maximum allowable size of the ARG/ENV list (in > > 4 KB blocks) when running exec() subroutines. > > Values: Default: 6; Range: 6 to 1024 > > Display: lsattr -E -l sys0 -a ncargs > > Change: chdev -l sys0 -a ncargs=NewValue Change takes effect > > immediately and is preserved over boot. > > .... > > > To determine the correct OS level run: > > $ oslevel -s > > > hth > > Hajo > > Since you are nesting so deep, you may want to also check the number > of open files allowed for the user ID under which the application > runs. If you hit the limit, that can give unpredictable results. > > J.- Zitierten Text ausblenden - > > - Zitierten Text anzeigen - thanks J. do you have a hint as to how to detemine this value on commandline? I tried smit, the submenu of "Change / Show Characteristics of a User", but for my own user there weren't many values filled in so I seem to get everything from defaults. which file would that be anyway? e.g. "maximum processes per uid" might be CHILD_MAX in /usr/include/ sys/limits.h and there is some OPEN_MAX as well but that's not per user but per process, according to the comment, so that might be a wrong source. so if it's configured userdependent, what do I tell my customer to call, to give me the information? thanks in advance bine |