This is a discussion on Non reproducible NIS failure within the Linux Operating System forums, part of the Unix Operating Systems category; --> I'm having non reproducible problems with NIS, from time to time I have this error when trying to get ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I'm having non reproducible problems with NIS, from time to time I have this error when trying to get any user info (uid, homedir, ...) like in the command "chown www-data file": do_ypcall: clnt_call: RPC: Unable to send; errno = Operation not permitted and chown fails but the same command works 99% of the time. It seems to happen more often when the servers are loaded, for example during cron.daily. I'm using a master NIS master with ~4000 users and 15 NIS slaves. This error happens on any of them. I'm thinking of a limit reached under some circumstances like the maximum number of concurrent connections or something like that. Google only returns 4 pages about this error message on the web and none on usenet, no answer. Do you have any idea where I should look at? Where should I ask about that? Merry Xmas with NIS. -- Cyril Bouthors |
| |||
| Cyril Bouthors <cyril@bouthors.org> wrote: > I'm having non reproducible problems with NIS, from time to time I > have this error when trying to get any user info (uid, homedir, ...) > like in the command "chown www-data file": > do_ypcall: clnt_call: RPC: Unable to send; errno = Operation not permitted Probably a server failure due to overloading. What's your fanout? I try never to exceed 20:1. Or it could be local, and ypbind that's overloaded. Can happen with bad network cable. Needs precise debugging to know. > and chown fails but the same command works 99% of the time. It seems > to happen more often when the servers are loaded, for example during > cron.daily. Ah. Yes. > I'm using a master NIS master with ~4000 users and 15 NIS slaves. This > error happens on any of them. And how fast is the server? Anyway, you mean 15 slave servers? Not 15 clients? > I'm thinking of a limit reached under some circumstances like the > maximum number of concurrent connections or something like that. > Google only returns 4 pages about this error message on the web and > none on usenet, no answer. Do you have any idea where I should look > at? Where should I ask about that? You should talk to the author. Peter |
| |||
| ptb@oboe.it.uc3m.es (P.T. Breuer) writes: > Probably a server failure due to overloading. It should delay the answer but not fail, is there a way to change the timeout? > What's your fanout? I try never to exceed 20:1. I'm not sure to understand the question but my load is around 0.5 most of the time and between 1 and 4 during cron.daily (processing gigabytes of statistics). > it could be local, and ypbind that's overloaded. Is there a way to know that? Number of queries, ... ? > Can happen with bad network cable. Needs precise debugging to know. Network is not used since the server is a slave. By the way, there's no problems with the network cables nor switches. > And how fast is the server? Both master and slaves are fast. > Anyway, you mean 15 slave servers? Not 15 clients? I used to have 15 clients and 1 master, I've switched to 15 slaves and 1 master for performance issues. > You should talk to the author. I've already asked Thorsten Kukuk. -- Cyril Bouthors |
| |||
| Cyril Bouthors <cyril@bouthors.org> wrote: > ptb@oboe.it.uc3m.es (P.T. Breuer) writes: > > Probably a server failure due to overloading. > It should delay the answer but not fail, is there a way to change the > timeout? Oh - it'll fail. You can recompile. > > What's your fanout? I try never to exceed 20:1. > I'm not sure to understand the question but my load is around 0.5 most No, the number of clients accessing each server; "fanout". > > it could be local, and ypbind that's overloaded. > Is there a way to know that? Number of queries, ... ? Debug. > > Can happen with bad network cable. Needs precise debugging to know. > Network is not used since the server is a slave. By the way, there's Are you saying that the server is only used locally? By people on the localhost machine? > no problems with the network cables nor switches. > > And how fast is the server? > Both master and slaves are fast. > > Anyway, you mean 15 slave servers? Not 15 clients? > I used to have 15 clients and 1 master, I've switched to 15 slaves and > 1 master for performance issues. And how many clients? I think that perhaps you are describing a situation in which every client is also a slave server which is accessed only from localhost? In that case the (slave) server will go down for a while when you do the ypxfr of the maps, or the flush of the server caches. (there are many different ways of transferring maps, but ypxfr via pull is quite common). That may cause the client (ypbind) to switch to a broadcast mde temporarily. > > You should talk to the author. > I've already asked Thorsten Kukuk. And what did he say? Peter |
| |||
| ptb@oboe.it.uc3m.es (P.T. Breuer) writes: > No, the number of clients accessing each server; "fanout". 1 client = 1 server > Are you saying that the server is only used locally? By people on > the localhost machine? Exactly. > you are describing a situation in which every client is also a slave > server which is accessed only from localhost? Yes. > In that case the (slave) server will go down for a while when you do > the ypxfr I've already thought about that but ypxfr in this architecture are only made when adding new users and does not coincide with "Operation not permitted" error messages. > And what did he say? No answer at this time but it's quite normal during Xmas. -- Cyril Bouthors |
| |||
| Cyril Bouthors <cyril@jexiste.fr> wrote: > ptb@oboe.it.uc3m.es (P.T. Breuer) writes: > > you are describing a situation in which every client is also a slave > > server which is accessed only from localhost? > Yes. > > In that case the (slave) server will go down for a while when you do > > the ypxfr > I've already thought about that but ypxfr in this architecture are > only made when adding new users and does not coincide with "Operation You sure? There should be scripts in cron like ypxfr_1perday. > not permitted" error messages. Peter |
| |||
| "Cyril Bouthors" <cyril@jexiste.fr> wrote in message news:878ykzh7kp.fsf@wide.bouthors.org... > ptb@oboe.it.uc3m.es (P.T. Breuer) writes: > > > No, the number of clients accessing each server; "fanout". > > 1 client = 1 server > > > Are you saying that the server is only used locally? By people on > > the localhost machine? > > Exactly. > > > you are describing a situation in which every client is also a slave > > server which is accessed only from localhost? > > Yes. > > > In that case the (slave) server will go down for a while when you do > > the ypxfr > > I've already thought about that but ypxfr in this architecture are > only made when adding new users and does not coincide with "Operation > not permitted" error messages. 1: Look for cron jobs that tickle the ypxfr. Different OS's have different cron jobs to tickle hourly, daly, etc. cron jobs for NIX. 2: Consider using dual hosts in your /etc/yp.conf: Linux NIS can certainly tolerate this, and it gives you a fallover to use if one of them fails at NIS slave service. |
| |||
| ptb@oboe.it.uc3m.es (P.T. Breuer) writes: > You sure? There should be scripts in cron like ypxfr_1perday. I've already checked: # find /etc -type f | xargs grep ypxfr /etc/init.d/nis: echo -n "ypxfrd " /etc/init.d/nis: --exec ${NET}/rpc.ypxfrd /etc/init.d/nis: --name rpc.ypxfrd /etc/rpc:ypxfrd 100069 /etc/ypserv.securenets:# for NIS clients (and slave servers - ypxfrd uses this -- Cyril Bouthors |
| |||
| "Nico Kadel-Garcia" <nkadel@comcast.net> writes: > Look for cron jobs that tickle the ypxfr. I've already checked, look at my other post. > Consider using dual hosts in your /etc/yp.conf I think this one is a good idea. I've changed yp.conf but since I can't reproduce the bug easily, I have to wait few hours/days to see if cron scripts happen to fail again. Nico, Thank you. -- Cyril Bouthors |
| ||||
| "Cyril Bouthors" <cyril@jexiste.fr> wrote in message news:87zndefctt.fsf@wide.bouthors.org... > "Nico Kadel-Garcia" <nkadel@comcast.net> writes: > > > Look for cron jobs that tickle the ypxfr. > > I've already checked, look at my other post. > > > Consider using dual hosts in your /etc/yp.conf > > I think this one is a good idea. I've changed yp.conf but since I > can't reproduce the bug easily, I have to wait few hours/days to see > if cron scripts happen to fail again. It was unclear to me that you checked on both clients and the server, but you're entirely welcome. Also, are your clients the same OS as your server? I've run into real "fun" with distinct OS's ideas about NIS.... |