This is a discussion on stability problem, disappearing interfaces within the comp.unix.bsd.openbsd.misc forums, part of the OpenBSD category; --> Hello, I encountered a rather strange stability problem. My box runs 24/24 as router on a pppoe DSL line. ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hello, I encountered a rather strange stability problem. My box runs 24/24 as router on a pppoe DSL line. The setup runs smoothly for days and even weeks, but at some point... 1. All network interfaces suddenly disappear: # ifconfig -a : no such interface # ifconfig tun0 tun0: no such interface The connection is still up (it is impossible, though, to modify /etc/pf.conf, since pf ceases to know the interfaces (like tun0) referenced therein). 2. All traffic stops, and socket-related operations yield "no bufferspace" errors Mar 26 05:27:15 l ppp[26918]: tun0: Warning: iface add: ioctl(SIOCAIFADDR, 80.146.107.110 -> 217.5.98.182 ): No buffer space available Mar 26 05:27:15 l ppp[26918]: tun0: Error: ipcp_InterfaceUp: unable to set ip address Googling around, I found others having similar problems (eg, <20031101004342.GH25238@griffon.lucky.openbsd.misc >, <47bdb55e.0211300833.23bf2420@posting.google.com >, <3FBB2396.9050508@bbyrd.net.lucky.openbsd.misc>, <000001c3e42e$0953c7e0$6500a8c0@shuttle.lucky.open bsd.misc>), but no solution yet. According to these postings, the problems exist (at least) on 3.2, 3.3 and 3.4, and also 3.5-current. Details of my setup: - the problems appear on two completely different h/w setups o desktop system with two PCI-based Realtek 8139 NICs o laptop system with two PCMCIA NICs: one 3Com 3C589D (connected to the DSL modem), one Realtek-based. - ppp is restarted every 24 hours - I'm running pf with altq on tun0 - recovery from above error messages is only possible by rebooting - successively killing all processes does not help - Kernel is 3.3Rel with altq/tun patch from benzedrine.cx... looks a bit shaky, but the above postings indicate there's only few chance that the problems disappear even in 3.5 (which I'll try next). I suspect the problems _might_ be related to either the Realtek NIC drivers (less likely) or ppp with pppoe, possibly in conjunction with the support for altq on tunneling devices.... but this is just rough guesses. Any further ideas, or suggestions on how to find more evidence...? Thanks, /alex ob-dmesg: vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv vvvvvvvvvvvvvvvvvvvv OpenBSD 3.3 (build) #0: Thu Sep 4 00:52:08 CEST 2003 arost@localhost:/tmp/build cpu0: F00F bug workaround installed cpu0: Intel Pentium/MMX ("GenuineIntel" 586-class) 166 MHz cpu0: FPU,V86,DE,PSE,TSC,MSR,MCE,CX8,MMX real mem = 66699264 (65136K) avail mem = 56258560 (54940K) using 839 buffers containing 3436544 bytes (3356K) of memory mainbus0 (root) bios0 at mainbus0: AT/286+(c6) BIOS, date 04/28/99, BIOS32 rev. 0 @ 0xf0400 apm0 at bios0: Power Management spec V1.2 apm0: AC on, battery charge unknown pcibios0 at bios0: rev. 2.1 @ 0xf0000/0xa22 pcibios0: PCI IRQ Routing Table rev. 1.0 @ 0xf09b0/112 (5 entries) pcibios0: PCI Interrupt Router at 000:07:0 ("Intel 82371FB PCI-ISA" rev 0x00) pcibios0: PCI bus #0 is the last bus bios0: ROM list: 0xc0000/0x8000 pci0 at mainbus0 bus 0: configuration mode 1 (bios) pchb0 at pci0 dev 0 function 0 "Intel 82439HX" rev 0x03 pcib0 at pci0 dev 7 function 0 "Intel 82371SB PCI-ISA" rev 0x01 pciide0 at pci0 dev 7 function 1 "Intel 82371SB IDE" rev 0x00: DMA, channel 0 wired to compatibility, channel 1 wired to compatibility wd0 at pciide0 channel 0 drive 0: <Maxtor 94098U8> wd0: 16-sector PIO, LBA, 39082MB, 16383 cyl, 16 head, 63 sec, 80041248 sectors wd0(pciide0:0:0): using PIO mode 4, DMA mode 2 pciide0: channel 1 disabled (no drives) rl0 at pci0 dev 11 function 0 "Realtek 8139" rev 0x10: irq 10 address 00:e0:4c:e6:cd:7b rlphy0 at rl0 phy 0: RTL internal phy rl1 at pci0 dev 12 function 0 "Realtek 8139" rev 0x10: irq 11 address 00:e0:4c:e7:07:ef rlphy1 at rl1 phy 0: RTL internal phy isa0 at pcib0 isadma0 at isa0 pckbc0 at isa0 port 0x60/5 pckbd0 at pckbc0 (kbd slot) pckbc0: using irq 1 for kbd slot wskbd0 at pckbd0: console keyboard vga0 at isa0 port 0x3b0/48 iomem 0xa0000/131072 wsdisplay0 at vga0: console (80x25, vt100 emulation), using wskbd0 wsdisplay0: screen 1-5 added (80x25, vt100 emulation) pcppi0 at isa0 port 0x61 midi0 at pcppi0: <PC speaker> sysbeep0 at pcppi0 lpt0 at isa0 port 0x378/4 irq 7 npx0 at isa0 port 0xf0/16: using exception 16 pccom0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo pccom1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo fdc0 at isa0 port 0x3f0/6 irq 6 drq 2 fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec biomask 4040 netmask 4c40 ttymask 4cc2 pctr: 586-class performance counters and user-level cycle counter enabled dkcsum: wd0 matched BIOS disk 80 root on wd0a rootdev=0x0 rrootdev=0x300 rawdev=0x302 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^ |
| |||
| Alexander Ost wrote: > Hello, > > I encountered a rather strange stability problem. My box runs 24/24 as > router on a pppoe DSL line. The setup runs smoothly for days and even > weeks, but at some point... > > 1. All network interfaces suddenly disappear: > > # ifconfig -a > : no such interface > > # ifconfig tun0 > tun0: no such interface > > The connection is still up (it is impossible, though, to modify > /etc/pf.conf, since pf ceases to know the interfaces (like tun0) > referenced therein). > > 2. All traffic stops, and socket-related operations yield > "no bufferspace" errors > > Mar 26 05:27:15 l ppp[26918]: tun0: Warning: iface add: ioctl(SIOCAIFADDR, 80.146.107.110 -> 217.5.98.182 > ): No buffer space available > Mar 26 05:27:15 l ppp[26918]: tun0: Error: ipcp_InterfaceUp: unable to set ip address > > > > Googling around, I found others having similar problems (eg, > <20031101004342.GH25238@griffon.lucky.openbsd.misc >, > <47bdb55e.0211300833.23bf2420@posting.google.com >, > <3FBB2396.9050508@bbyrd.net.lucky.openbsd.misc>, > <000001c3e42e$0953c7e0$6500a8c0@shuttle.lucky.open bsd.misc>), but no > solution yet. According to these postings, the problems exist (at > least) on 3.2, 3.3 and 3.4, and also 3.5-current. > > Details of my setup: > > - the problems appear on two completely different h/w setups > o desktop system with two PCI-based Realtek 8139 NICs > o laptop system with two PCMCIA NICs: one 3Com 3C589D (connected to > the DSL modem), one Realtek-based. > > - ppp is restarted every 24 hours > > - I'm running pf with altq on tun0 > > - recovery from above error messages is only possible by rebooting - > successively killing all processes does not help > > - Kernel is 3.3Rel with altq/tun patch from benzedrine.cx... looks a > bit shaky, but the above postings indicate there's only few chance > that the problems disappear even in 3.5 (which I'll try next). > > I suspect the problems _might_ be related to either the Realtek NIC > drivers (less likely) or ppp with pppoe, possibly in conjunction with > the support for altq on tunneling devices.... but this is just rough > guesses. Any further ideas, or suggestions on how to find more > evidence...? > > Thanks, > > /alex > > > ob-dmesg: > vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv vvvvvvvvvvvvvvvvvvvv > OpenBSD 3.3 (build) #0: Thu Sep 4 00:52:08 CEST 2003 > arost@localhost:/tmp/build > cpu0: F00F bug workaround installed > cpu0: Intel Pentium/MMX ("GenuineIntel" 586-class) 166 MHz > cpu0: FPU,V86,DE,PSE,TSC,MSR,MCE,CX8,MMX > real mem = 66699264 (65136K) > avail mem = 56258560 (54940K) > using 839 buffers containing 3436544 bytes (3356K) of memory > mainbus0 (root) > bios0 at mainbus0: AT/286+(c6) BIOS, date 04/28/99, BIOS32 rev. 0 @ 0xf0400 > apm0 at bios0: Power Management spec V1.2 > apm0: AC on, battery charge unknown > pcibios0 at bios0: rev. 2.1 @ 0xf0000/0xa22 > pcibios0: PCI IRQ Routing Table rev. 1.0 @ 0xf09b0/112 (5 entries) > pcibios0: PCI Interrupt Router at 000:07:0 ("Intel 82371FB PCI-ISA" rev 0x00) > pcibios0: PCI bus #0 is the last bus > bios0: ROM list: 0xc0000/0x8000 > pci0 at mainbus0 bus 0: configuration mode 1 (bios) > pchb0 at pci0 dev 0 function 0 "Intel 82439HX" rev 0x03 > pcib0 at pci0 dev 7 function 0 "Intel 82371SB PCI-ISA" rev 0x01 > pciide0 at pci0 dev 7 function 1 "Intel 82371SB IDE" rev 0x00: DMA, channel 0 wired to compatibility, channel 1 wired to compatibility > wd0 at pciide0 channel 0 drive 0: <Maxtor 94098U8> > wd0: 16-sector PIO, LBA, 39082MB, 16383 cyl, 16 head, 63 sec, 80041248 sectors > wd0(pciide0:0:0): using PIO mode 4, DMA mode 2 > pciide0: channel 1 disabled (no drives) > rl0 at pci0 dev 11 function 0 "Realtek 8139" rev 0x10: irq 10 address 00:e0:4c:e6:cd:7b > rlphy0 at rl0 phy 0: RTL internal phy > rl1 at pci0 dev 12 function 0 "Realtek 8139" rev 0x10: irq 11 address 00:e0:4c:e7:07:ef > rlphy1 at rl1 phy 0: RTL internal phy > isa0 at pcib0 > isadma0 at isa0 > pckbc0 at isa0 port 0x60/5 > pckbd0 at pckbc0 (kbd slot) > pckbc0: using irq 1 for kbd slot > wskbd0 at pckbd0: console keyboard > vga0 at isa0 port 0x3b0/48 iomem 0xa0000/131072 > wsdisplay0 at vga0: console (80x25, vt100 emulation), using wskbd0 > wsdisplay0: screen 1-5 added (80x25, vt100 emulation) > pcppi0 at isa0 port 0x61 > midi0 at pcppi0: <PC speaker> > sysbeep0 at pcppi0 > lpt0 at isa0 port 0x378/4 irq 7 > npx0 at isa0 port 0xf0/16: using exception 16 > pccom0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo > pccom1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo > fdc0 at isa0 port 0x3f0/6 irq 6 drq 2 > fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec > biomask 4040 netmask 4c40 ttymask 4cc2 > pctr: 586-class performance counters and user-level cycle counter enabled > dkcsum: wd0 matched BIOS disk 80 > root on wd0a > rootdev=0x0 rrootdev=0x300 rawdev=0x302 > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^ Yep, get rid of the realtek NICS would be a start, try and get intel 8255x cards (fxp driver). Also, try an upgrade to 3.4 or wait a week and go to 3.5. |
| |||
| bards wrote: > Alexander Ost wrote: >> - the problems appear on two completely different h/w setups >> o desktop system with two PCI-based Realtek 8139 NICs > Yep, get rid of the realtek NICS would be a start, this must be one of those YMMV issues. I've been using Realtek8139 nics for years because they've been cheap, fast and flawless for me. Others say they suck. go figure |
| |||
| beldar wrote: > bards wrote: > >> Alexander Ost wrote: > > >>> - the problems appear on two completely different h/w setups >>> o desktop system with two PCI-based Realtek 8139 NICs > > >> Yep, get rid of the realtek NICS would be a start, > > > this must be one of those YMMV issues. I've been using Realtek8139 nics > for years because they've been cheap, fast and flawless for me. Others > say they suck. go figure Indeed, I think they get a mention on the openbsd website too. Cant find it but I'm sure I've read an 'avoid like plague' warning. |
| |||
| On Mon, 26 Apr 2004, bards wrote: > > this must be one of those YMMV issues. I've been using Realtek8139 nics > > for years because they've been cheap, fast and flawless for me. Others > > say they suck. go figure > > Indeed, I think they get a mention on the openbsd website too. Cant find > it but I'm sure I've read an 'avoid like plague' warning. i know i've shoveled gigabytes and gigabytes through rl nics without any trouble. i can't actually recall anybody with a network problem that was confirmed to be a realtek's fault. i've seen lots of posts where people with rl nics had network trouble, but never a followup to say swapping the nic solved the problem. in short, people have heard they suck, and tell other people they suck, but no one has experienced said suckage. people key in on the word realtek every time it's in a post, and are always willing to blame the card, all evidence to the contrary, which rarely solves the problem and is just more runaround. they may not be the best performers (which just means they use more cpu, not that they can't handle 100mbit), but if your connection is pppoe, the realtek is really, really not the bottleneck. lots of people are fond of quoting the driver comment which says you need "a 400MHz PII or some equally overmuscled CPU to drive it." that was seven years ago. cpus get twice as fast every 1.5 years. you do the math. for the OP, output of vmstat -m and netstat -m may be informative. it sounds like a memory leak. a lot of these were fixed with 3.4, and another lot in 3.5. -- |
| |||
| On Sun, 25 Apr 2004 20:59:02 -0700 Ted Unangst <tedu@stanford.edu> wrote: > i know i've shoveled gigabytes and gigabytes through rl nics without > any trouble. i can't actually recall anybody with a network problem > that was confirmed to be a realtek's fault. i've seen lots of posts > where people with rl nics had network trouble, but never a followup > to say swapping the nic solved the problem. in short, people have > heard they suck, and tell other people they suck, but no one has > experienced said suckage. I had troubles with 8139 at work and since they have been swapped with 3com gears these troubles disapeared. I also know people that had troubles with realtek cards too, so no it's not a myth. mips |
| |||
| On Mon, 26 Apr 2004, mips wrote: > I had troubles with 8139 at work and since they have been swapped with > 3com gears these troubles disapeared. I also know people that had > troubles with realtek cards too, so no it's not a myth. at last, a confirmed report. i'd like to separate out the problems that come from the nic and the ones that don't. -- |
| |||
| On 2004-04-26, mips <anti@spam.gov> wrote: > On Sun, 25 Apr 2004 20:59:02 -0700 > Ted Unangst <tedu@stanford.edu> wrote: > >> i know i've shoveled gigabytes and gigabytes through rl nics without >> any trouble. i can't actually recall anybody with a network problem >> that was confirmed to be a realtek's fault. i've seen lots of posts >> where people with rl nics had network trouble, but never a followup >> to say swapping the nic solved the problem. I've seen enough of those in the comp.unix.bsd.*.misc groups, and I do recall those including followups. >> in short, people have >> heard they suck, and tell other people they suck, but no one has >> experienced said suckage. > > I had troubles with 8139 at work and since they have been swapped with > 3com gears these troubles disapeared. I also know people that had > troubles with realtek cards too, so no it's not a myth. Another datapoint; I had problems with realteks[1], like consistent low throughput or random faillures. Some cards seem to be ok (when not pounded on), some just fail when you look at them. Add to that recommendations from people running large enough shops to see scale differences and some technical discussion with people In The Know of stuff low-level enough to see realteks have severe technical limitations built-in. If your time is worth nothing, realteks are cheap. If it's not, better look for something that doesn't break as easily. The price difference is offset by less downtime, time wasted and pissed off people.[2] I like NICs that you can trample with data and they still survive. Realteks do the trample thing themselves, but not the surviving. [1] Not always entirely the hardware's fault, sometimes the software sucks too. Given that the sucky software with different hardware did suck somewhat less means the hardware is at least partially responsible. [2] Which is no excuse to overprice quality cards, but I digress. -- j p d (at) d s b (dot) t u d e l f t (dot) n l . |
| |||
| Ted Unangst wrote: > On Mon, 26 Apr 2004, bards wrote: > >> > this must be one of those YMMV issues. I've been using Realtek8139 >> > nics >> > for years because they've been cheap, fast and flawless for me. Others >> > say they suck. go figure >> >> Indeed, I think they get a mention on the openbsd website too. Cant find >> it but I'm sure I've read an 'avoid like plague' warning. > > i know i've shoveled gigabytes and gigabytes through rl nics without any > trouble. i can't actually recall anybody with a network problem that was > confirmed to be a realtek's fault. i've seen lots of posts where people > with rl nics had network trouble, but never a followup to say swapping the > nic solved the problem. in short, people have heard they suck, and tell > other people they suck, but no one has experienced said suckage. > I've had a problem with a realtek, but it is of the unusual kind. Card wouldn't talk at all on a 100Base-TX network. Realtek's own diagnostics package showed problems too (can't remember exactly what - too much water under the bridge). Once I rebooted the host while the card was connected to a 10Base-T hub the problem vanished, 10Base-T and 100Base-TX working fine. Identical card in an identical host with same version of OS had no problems. It was an early chip though. the chipset seems to have changed slightly and it would be interesting to know if the problems related to one particular version or not. Perhaps manufacturing issues are the cause and chips from different lines in the fab behave differently. |
| ||||
| Ted Unangst <tedu@stanford.edu> writes: > On Mon, 26 Apr 2004, bards wrote: > > > > this must be one of those YMMV issues. I've been using Realtek8139 nics > > > for years because they've been cheap, fast and flawless for me. Others > > > say they suck. go figure > > > > Indeed, I think they get a mention on the openbsd website too. Cant find > > it but I'm sure I've read an 'avoid like plague' warning. > > i know i've shoveled gigabytes and gigabytes through rl nics without any > trouble. i can't actually recall anybody with a network problem that was > confirmed to be a realtek's fault. i've seen lots of posts where people > with rl nics had network trouble, but never a followup to say swapping the > nic solved the problem. in short, people have heard they suck, and tell > other people they suck, but no one has experienced said suckage. That's my impression as well. As my original problems also appear on a laptop with a 3Com card connected to tun0, I'm reluctant to blame the Realteks. Also, my setups _were_ running very reliable for weeks and months on 3.2, using Realteks (but not using altq then). > for the OP, output of vmstat -m and netstat -m may be informative. it > sounds like a memory leak. a lot of these were fixed with 3.4, and > another lot in 3.5. Thanks for the hint, I'll check the outputs. And 3.5 is on my list as well. /alex |