This is a discussion on Xeon hyperthreading problem within the Linux Operating System forums, part of the Unix Operating Systems category; --> Distribution: debian woody Kernel: 2.4.28, vanilla lilo.conf CPU: dual hyperthreaded Xeon, so cpuinfo reports 4 times this: processor : ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Distribution: debian woody Kernel: 2.4.28, vanilla lilo.conf CPU: dual hyperthreaded Xeon, so cpuinfo reports 4 times this: processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Xeon(TM) CPU 2.80GHz stepping : 1 cpu MHz : 2793.080 cache size : 1024 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm pni monitor ds_cpl cid bogomips : 5570.56 When a process doing heavy local IO runs, just about everything else freezes for up to minutes (then goes on). We suspect its related to a weird report from /proc/interrupts: CPU0 CPU1 CPU2 CPU3 0: 122 0 0 120615284 IO-APIC-edge timer 1: 0 0 0 272 IO-APIC-edge keyboard 2: 0 0 0 0 XT-PIC cascade 14: 1 0 0 4 IO-APIC-edge ide0 16: 0 0 0 0 IO-APIC-level usb-uhci 18: 0 0 0 0 IO-APIC-level usb-uhci 19: 0 0 0 0 IO-APIC-level usb-uhci 20: 0 0 0 16915687 IO-APIC-level aic7xxx 29: 0 0 0 857063989 IO-APIC-level eth0 48: 0 0 0 28268766 IO-APIC-level aacraid NMI: 0 0 0 0 LOC: 120622421 120622419 120622419 120622418 ERR: 0 MIS: 0 It does not seem normal that all interrupts are directed to the same CPU regardless of whether its busy or not. This *could* be a bug in the counters, or a bug in the mobo, or a bug in the setting up of the APIC by the kernel, or some mumo-jumbo is needed in lilo.conf (append option?). TIA for advice. -- Michel Bardiaux Peaktime Belgium S.A. Bd. du Souverain, 191 B-1160 Bruxelles Tel : +32 2 790.29.41 |
| |||
| Michel Bardiaux wrote: > Distribution: debian woody > > Kernel: 2.4.28, vanilla lilo.conf > > CPU: dual hyperthreaded Xeon, so cpuinfo reports 4 times this: > > processor : 0 > vendor_id : GenuineIntel > cpu family : 15 > model : 4 > model name : Intel(R) Xeon(TM) CPU 2.80GHz > stepping : 1 > cpu MHz : 2793.080 > cache size : 1024 KB > fdiv_bug : no > hlt_bug : no > f00f_bug : no > coma_bug : no > fpu : yes > fpu_exception : yes > cpuid level : 5 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge > mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm > pni monitor ds_cpl cid > bogomips : 5570.56 Kernel: 2.4.21-27.0.2.ELsmp My /proc/cpuinfo has 4 like this: processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Xeon(TM) CPU 3.06GHz stepping : 5 cpu MHz : 3056.561 cache size : 512 KB physical id : 0 siblings : 2 runqueue : 0 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm bogomips : 6094.84 > > When a process doing heavy local IO runs, just about everything else > freezes for up to minutes (then goes on). We suspect its related to a > weird report from /proc/interrupts: > > CPU0 CPU1 CPU2 CPU3 > 0: 122 0 0 120615284 IO-APIC-edge timer > 1: 0 0 0 272 IO-APIC-edge keyboard > 2: 0 0 0 0 XT-PIC cascade > 14: 1 0 0 4 IO-APIC-edge ide0 > 16: 0 0 0 0 IO-APIC-level usb-uhci > 18: 0 0 0 0 IO-APIC-level usb-uhci > 19: 0 0 0 0 IO-APIC-level usb-uhci > 20: 0 0 0 16915687 IO-APIC-level aic7xxx > 29: 0 0 0 857063989 IO-APIC-level eth0 > 48: 0 0 0 28268766 IO-APIC-level aacraid > NMI: 0 0 0 0 > LOC: 120622421 120622419 120622419 120622418 > ERR: 0 > MIS: 0 > My /proc/interrupts looks like this: CPU0 CPU1 CPU2 CPU3 0: 15306480 15299292 15304294 15304290 IO-APIC-edge timer 1: 28051 30884 31101 29039 IO-APIC-edge keyboard 2: 0 0 0 0 XT-PIC cascade 3: 1589033 427115 1592438 424021 IO-APIC-edge serial 8: 1 0 0 0 IO-APIC-edge rtc 12: 6154990 1277805 6101628 1285460 IO-APIC-edge PS/2 Mouse 14: 4410 9012265 3441 9078354 IO-APIC-edge ide0 15: 82 0 129 24 IO-APIC-edge ide1 16: 0 0 0 0 IO-APIC-level usb-uhci 18: 0 0 0 0 IO-APIC-level usb-uhci 19: 0 0 0 0 IO-APIC-level usb-uhci 28: 524737 0 0 0 IO-APIC-level eth0 29: 4 0 306041 0 IO-APIC-level eth1 76: 991888 1021284 1091264 1040122 IO-APIC-level aic79xx 77: 487917 850703 539028 882214 IO-APIC-level aic79xx 100: 174398 113755 178353 114161 IO-APIC-level EMU10K1 104: 4907319 1607055 5314933 1101314 IO-APIC-level serial NMI: 0 0 0 0 LOC: 61214842 61214841 61214840 61214840 ERR: 0 MIS: 0 > It does not seem normal that all interrupts are directed to the same CPU > regardless of whether its busy or not. This *could* be a bug in the > counters, or a bug in the mobo, or a bug in the setting up of the APIC > by the kernel, or some mumo-jumbo is needed in lilo.conf (append option?). > > TIA for advice. > You seem to have configured your system to direct all interrupts to CPU3. Notice it did not start up that way, as you have a few timer interrupts to CPU0. I thought this was set by affinity, but if so, I do not understand the manual page. -- .~. Jean-David Beyer Registered Linux User 85642. /V\ PGP-Key: 9A2FC99A Registered Machine 241939. /( )\ Shrewsbury, New Jersey http://counter.li.org ^^-^^ 08:40:00 up 7 days, 1:56, 4 users, load average: 4.14, 4.03, 3.58 |
| |||
| Jean-David Beyer wrote: > Michel Bardiaux wrote: > >>Distribution: debian woody >> >>Kernel: 2.4.28, vanilla lilo.conf >> >>CPU: dual hyperthreaded Xeon, so cpuinfo reports 4 times this: >> >>processor : 0 >>vendor_id : GenuineIntel >>cpu family : 15 >>model : 4 >>model name : Intel(R) Xeon(TM) CPU 2.80GHz >>stepping : 1 >>cpu MHz : 2793.080 >>cache size : 1024 KB >>fdiv_bug : no >>hlt_bug : no >>f00f_bug : no >>coma_bug : no >>fpu : yes >>fpu_exception : yes >>cpuid level : 5 >>wp : yes >>flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge >>mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm >>pni monitor ds_cpl cid >>bogomips : 5570.56 > > > Kernel: 2.4.21-27.0.2.ELsmp > > My /proc/cpuinfo has 4 like this: > > processor : 0 > vendor_id : GenuineIntel > cpu family : 15 > model : 2 > model name : Intel(R) Xeon(TM) CPU 3.06GHz > stepping : 5 > cpu MHz : 3056.561 > cache size : 512 KB > physical id : 0 > siblings : 2 > runqueue : 0 > fdiv_bug : no > hlt_bug : no > f00f_bug : no > coma_bug : no > fpu : yes > fpu_exception : yes > cpuid level : 2 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov > pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm > bogomips : 6094.84 > > >>When a process doing heavy local IO runs, just about everything else >>freezes for up to minutes (then goes on). We suspect its related to a >>weird report from /proc/interrupts: >> >> CPU0 CPU1 CPU2 CPU3 >> 0: 122 0 0 120615284 IO-APIC-edge timer >> 1: 0 0 0 272 IO-APIC-edge keyboard >> 2: 0 0 0 0 XT-PIC cascade >> 14: 1 0 0 4 IO-APIC-edge ide0 >> 16: 0 0 0 0 IO-APIC-level usb-uhci >> 18: 0 0 0 0 IO-APIC-level usb-uhci >> 19: 0 0 0 0 IO-APIC-level usb-uhci >> 20: 0 0 0 16915687 IO-APIC-level aic7xxx >> 29: 0 0 0 857063989 IO-APIC-level eth0 >> 48: 0 0 0 28268766 IO-APIC-level aacraid >>NMI: 0 0 0 0 >>LOC: 120622421 120622419 120622419 120622418 >>ERR: 0 >>MIS: 0 >> Additional info: we have restarted the system with HT disabled in BIOS. Now we have just 2 CPUs, but still all interrupts go to CPU1, and the 'congealing' (not as strong as a freezup) still occurs. > > My /proc/interrupts looks like this: > > CPU0 CPU1 CPU2 CPU3 > 0: 15306480 15299292 15304294 15304290 IO-APIC-edge timer > 1: 28051 30884 31101 29039 IO-APIC-edge keyboard > 2: 0 0 0 0 XT-PIC cascade > 3: 1589033 427115 1592438 424021 IO-APIC-edge serial > 8: 1 0 0 0 IO-APIC-edge rtc > 12: 6154990 1277805 6101628 1285460 IO-APIC-edge PS/2 Mouse > 14: 4410 9012265 3441 9078354 IO-APIC-edge ide0 > 15: 82 0 129 24 IO-APIC-edge ide1 > 16: 0 0 0 0 IO-APIC-level usb-uhci > 18: 0 0 0 0 IO-APIC-level usb-uhci > 19: 0 0 0 0 IO-APIC-level usb-uhci > 28: 524737 0 0 0 IO-APIC-level eth0 > 29: 4 0 306041 0 IO-APIC-level eth1 > 76: 991888 1021284 1091264 1040122 IO-APIC-level aic79xx > 77: 487917 850703 539028 882214 IO-APIC-level aic79xx > 100: 174398 113755 178353 114161 IO-APIC-level EMU10K1 > 104: 4907319 1607055 5314933 1101314 IO-APIC-level serial > NMI: 0 0 0 0 > LOC: 61214842 61214841 61214840 61214840 > ERR: 0 > MIS: 0 > Which is as I thought it should be. > > >>It does not seem normal that all interrupts are directed to the same CPU >>regardless of whether its busy or not. This *could* be a bug in the >>counters, or a bug in the mobo, or a bug in the setting up of the APIC >>by the kernel, or some mumo-jumbo is needed in lilo.conf (append option?). >> >>TIA for advice. >> > > You seem to have configured your system to direct all interrupts to CPU3. > Notice it did not start up that way, as you have a few timer interrupts to > CPU0. Clear to me too. > I thought this was set by affinity, but if so, I do not understand the > manual page. > I dont have any command named 'affinity'; in what package is it? HaND, -- Michel Bardiaux Peaktime Belgium S.A. Bd. du Souverain, 191 B-1160 Bruxelles Tel : +32 2 790.29.41 |
| |||
| Michel Bardiaux wrote: [snip] > > Additional info: we have restarted the system with HT disabled in BIOS. > Now we have just 2 CPUs, but still all interrupts go to CPU1, and the > 'congealing' (not as strong as a freezup) still occurs. > [snip] >> >> You seem to have configured your system to direct all interrupts to CPU3. >> Notice it did not start up that way, as you have a few timer >> interrupts to >> CPU0. > > > Clear to me too. > >> I thought this was set by affinity, but if so, I do not understand the >> manual page. >> > I dont have any command named 'affinity'; in what package is it? > > HaND, I have done a bit of web searches, seems interrupt/CPU affinity is controlled by the files /proc/irq/*/smp_affinity. On my system they are all ffffffff, and /proc/irq/prof_cpu_mask too. Could it be that the APIC is not initialized correctly by the kernel? Or is there some parameter of kernel config that has to be turned on during 'make xconfig'? -- Michel Bardiaux Peaktime Belgium S.A. Bd. du Souverain, 191 B-1160 Bruxelles Tel : +32 2 790.29.41 |
| |||
| Michel Bardiaux wrote: > Michel Bardiaux wrote: > [snip] > >> >> Additional info: we have restarted the system with HT disabled in >> BIOS. Now we have just 2 CPUs, but still all interrupts go to CPU1, >> and the 'congealing' (not as strong as a freezup) still occurs. >> > [snip] > >>> >>> You seem to have configured your system to direct all interrupts to >>> CPU3. >>> Notice it did not start up that way, as you have a few timer >>> interrupts to >>> CPU0. >> >> >> >> Clear to me too. >> >>> I thought this was set by affinity, but if so, I do not understand the >>> manual page. >>> >> I dont have any command named 'affinity'; in what package is it? >> >> HaND, > > > I have done a bit of web searches, seems interrupt/CPU affinity is > controlled by the files /proc/irq/*/smp_affinity. On my system they are > all ffffffff, and /proc/irq/prof_cpu_mask too. Could it be that the APIC > is not initialized correctly by the kernel? Or is there some parameter > of kernel config that has to be turned on during 'make xconfig'? > Well, I know I did not have to do anything after I installed my system. # cat /proc/irq/prof_cpu_mask ffffffff # cat /proc/irq/*/smp_affinity 00000001 00000001 00000004 ffffffff ffffffff 00000004 ffffffff 00000008 00000004 00000008 00000008 00000001 00000002 00000001 00000004 ffffffff 00000002 ffffffff ffffffff ffffffff 00000002 00000001 ffffffff 00000008 ffffffff I do not know what these numbers mean, but they might help someone who does. -- .~. Jean-David Beyer Registered Linux User 85642. /V\ PGP-Key: 9A2FC99A Registered Machine 241939. /( )\ Shrewsbury, New Jersey http://counter.li.org ^^-^^ 10:10:00 up 8 days, 3:26, 3 users, load average: 4.26, 4.05, 3.65 |
| |||
| Jean-David Beyer wrote: [snip] > > Well, I know I did not have to do anything after I installed my system. > > # cat /proc/irq/prof_cpu_mask > ffffffff > > # cat /proc/irq/*/smp_affinity > 00000001 > 00000001 > 00000004 > ffffffff > ffffffff > 00000004 > ffffffff > 00000008 > 00000004 > 00000008 > 00000008 > 00000001 > 00000002 > 00000001 > 00000004 > ffffffff > 00000002 > ffffffff > ffffffff > ffffffff > 00000002 > 00000001 > ffffffff > 00000008 > ffffffff > > I do not know what these numbers mean, but they might help someone who does. > What distro do you use? What does fgrep -i irq /etc/inetd.d/* reveal? And the values would be a little easier to interpret if you could (pretty please) post 'grep . /proc/irq/*/smp_affinity' so I would know the filenames too. And going back over the thread, I see another difference in /proc/interrupts: yours go up to #104, mine only to #48. Could you post your lspci output? TIA, -- Michel Bardiaux Peaktime Belgium S.A. Bd. du Souverain, 191 B-1160 Bruxelles Tel : +32 2 790.29.41 |
| |||
| Michel Bardiaux wrote: > Jean-David Beyer wrote: > [snip] > >> >> Well, I know I did not have to do anything after I installed my system. >> >> # cat /proc/irq/prof_cpu_mask >> ffffffff >> >> # cat /proc/irq/*/smp_affinity >> 00000001 >> 00000001 >> 00000004 >> ffffffff >> ffffffff >> 00000004 >> ffffffff >> 00000008 >> 00000004 >> 00000008 >> 00000008 >> 00000001 >> 00000002 >> 00000001 >> 00000004 >> ffffffff >> 00000002 >> ffffffff >> ffffffff >> ffffffff >> 00000002 >> 00000001 >> ffffffff >> 00000008 >> ffffffff >> >> I do not know what these numbers mean, but they might help someone who >> does. >> > What distro do you use? Red Hat Enterprise Linux 3 ES > What does fgrep -i irq /etc/inetd.d/* reveal? $ fgrep -i irq /etc/inetd.d/* fgrep: /etc/inetd.d/*: No such file or directory Do you mean $ fgrep -i irq /etc/xinetd.d/* $ > > And the values would be a little easier to interpret if you could > (pretty please) post 'grep . /proc/irq/*/smp_affinity' so I would know > the filenames too. # grep . /proc/irq/*/smp_affinity /proc/irq/0/smp_affinity:00000002 /proc/irq/100/smp_affinity:00000002 /proc/irq/104/smp_affinity:00000008 /proc/irq/10/smp_affinity:ffffffff /proc/irq/11/smp_affinity:ffffffff /proc/irq/12/smp_affinity:00000004 /proc/irq/13/smp_affinity:ffffffff /proc/irq/14/smp_affinity:00000008 /proc/irq/15/smp_affinity:00000004 /proc/irq/16/smp_affinity:00000001 /proc/irq/18/smp_affinity:00000004 /proc/irq/19/smp_affinity:00000002 /proc/irq/1/smp_affinity:00000008 /proc/irq/28/smp_affinity:00000001 /proc/irq/29/smp_affinity:00000004 /proc/irq/2/smp_affinity:ffffffff /proc/irq/3/smp_affinity:00000001 /proc/irq/4/smp_affinity:ffffffff /proc/irq/5/smp_affinity:ffffffff /proc/irq/6/smp_affinity:ffffffff /proc/irq/76/smp_affinity:00000008 /proc/irq/77/smp_affinity:00000004 /proc/irq/7/smp_affinity:ffffffff /proc/irq/8/smp_affinity:00000001 /proc/irq/9/smp_affinity:ffffffff > > And going back over the thread, I see another difference in > /proc/interrupts: yours go up to #104, mine only to #48. Could you post > your lspci output? # /sbin/lspci 00:00.0 Host bridge: Intel Corp. E7501 Memory Controller Hub (rev 01) 00:00.1 Class ff00: Intel Corp. E7500/E7501 Host RASUM Controller (rev 01) 00:02.0 PCI bridge: Intel Corp. E7500/E7501 Hub Interface B PCI-to-PCI Bridge (rev 01) 00:03.0 PCI bridge: Intel Corp. E7500/E7501 Hub Interface C PCI-to-PCI Bridge (rev 01) 00:1d.0 USB Controller: Intel Corp. 82801CA/CAM USB (Hub #1) (rev 02) 00:1d.1 USB Controller: Intel Corp. 82801CA/CAM USB (Hub #2) (rev 02) 00:1d.2 USB Controller: Intel Corp. 82801CA/CAM USB (Hub #3) (rev 02) 00:1e.0 PCI bridge: Intel Corp. 82801 PCI Bridge (rev 42) 00:1f.0 ISA bridge: Intel Corp. 82801CA LPC Interface Controller (rev 02) 00:1f.1 IDE interface: Intel Corp. 82801CA Ultra ATA Storage Controller (rev 02) 00:1f.3 SMBus: Intel Corp. 82801CA/CAM SMBus Controller (rev 02) 01:1c.0 PIC: Intel Corp. 82870P2 P64H2 I/OxAPIC (rev 04) 01:1d.0 PCI bridge: Intel Corp. 82870P2 P64H2 Hub PCI Bridge (rev 04) 01:1e.0 PIC: Intel Corp. 82870P2 P64H2 I/OxAPIC (rev 04) 01:1f.0 PCI bridge: Intel Corp. 82870P2 P64H2 Hub PCI Bridge (rev 04) 03:02.0 Ethernet controller: Intel Corp. 82546EB Gigabit Ethernet Controller (Copper) (rev 01) 03:02.1 Ethernet controller: Intel Corp. 82546EB Gigabit Ethernet Controller (Copper) (rev 01) 04:1c.0 PIC: Intel Corp. 82870P2 P64H2 I/OxAPIC (rev 04) 04:1d.0 PCI bridge: Intel Corp. 82870P2 P64H2 Hub PCI Bridge (rev 04) 04:1e.0 PIC: Intel Corp. 82870P2 P64H2 I/OxAPIC (rev 04) 04:1f.0 PCI bridge: Intel Corp. 82870P2 P64H2 Hub PCI Bridge (rev 04) 05:02.0 Multimedia audio controller: Creative Labs SB Live! EMU10k1 (rev 07) 05:02.1 Input device controller: Creative Labs SB Live! MIDI/Game Port (rev 07) 05:03.0 Serial controller: 3Com Corp, Modem Division (formerly US Robotics) 56K FaxModem Model 5610 (rev 01) 06:02.0 SCSI storage controller: Adaptec AIC-7902B U320 (rev 10) 06:02.1 SCSI storage controller: Adaptec AIC-7902B U320 (rev 10) 07:01.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) > > TIA, -- .~. Jean-David Beyer Registered Linux User 85642. /V\ PGP-Key: 9A2FC99A Registered Machine 241939. /( )\ Shrewsbury, New Jersey http://counter.li.org ^^-^^ 13:10:00 up 8 days, 6:26, 3 users, load average: 4.51, 4.26, 4.12 |
| |||
| Jean-David Beyer wrote: [snip >>> >> >>What distro do you use? > > > Red Hat Enterprise Linux 3 ES > > >>What does fgrep -i irq /etc/inetd.d/* reveal? > > > $ fgrep -i irq /etc/inetd.d/* > fgrep: /etc/inetd.d/*: No such file or directory > > Do you mean > > $ fgrep -i irq /etc/xinetd.d/* Sorry, my mistake, /etc/init.d/* > $ > > >>And the values would be a little easier to interpret if you could >>(pretty please) post 'grep . /proc/irq/*/smp_affinity' so I would know >>the filenames too. > > > # grep . /proc/irq/*/smp_affinity > /proc/irq/0/smp_affinity:00000002 Now I'm totally confused. With this value, on your system IRQ#0 should go only to CPU#1, but it goes equally to all 4 according to the /proc/interrupts previously posted! > /proc/irq/100/smp_affinity:00000002 > /proc/irq/104/smp_affinity:00000008 > /proc/irq/10/smp_affinity:ffffffff > /proc/irq/11/smp_affinity:ffffffff > /proc/irq/12/smp_affinity:00000004 > /proc/irq/13/smp_affinity:ffffffff > /proc/irq/14/smp_affinity:00000008 > /proc/irq/15/smp_affinity:00000004 > /proc/irq/16/smp_affinity:00000001 > /proc/irq/18/smp_affinity:00000004 > /proc/irq/19/smp_affinity:00000002 > /proc/irq/1/smp_affinity:00000008 > /proc/irq/28/smp_affinity:00000001 > /proc/irq/29/smp_affinity:00000004 > /proc/irq/2/smp_affinity:ffffffff > /proc/irq/3/smp_affinity:00000001 > /proc/irq/4/smp_affinity:ffffffff > /proc/irq/5/smp_affinity:ffffffff > /proc/irq/6/smp_affinity:ffffffff > /proc/irq/76/smp_affinity:00000008 > /proc/irq/77/smp_affinity:00000004 > /proc/irq/7/smp_affinity:ffffffff > /proc/irq/8/smp_affinity:00000001 > /proc/irq/9/smp_affinity:ffffffff > > >>And going back over the thread, I see another difference in >>/proc/interrupts: yours go up to #104, mine only to #48. Could you post >>your lspci output? > > > # /sbin/lspci > 00:00.0 Host bridge: Intel Corp. E7501 Memory Controller Hub (rev 01) > 00:00.1 Class ff00: Intel Corp. E7500/E7501 Host RASUM Controller (rev 01) > 00:02.0 PCI bridge: Intel Corp. E7500/E7501 Hub Interface B PCI-to-PCI > Bridge (rev 01) > 00:03.0 PCI bridge: Intel Corp. E7500/E7501 Hub Interface C PCI-to-PCI > Bridge (rev 01) > 00:1d.0 USB Controller: Intel Corp. 82801CA/CAM USB (Hub #1) (rev 02) > 00:1d.1 USB Controller: Intel Corp. 82801CA/CAM USB (Hub #2) (rev 02) > 00:1d.2 USB Controller: Intel Corp. 82801CA/CAM USB (Hub #3) (rev 02) > 00:1e.0 PCI bridge: Intel Corp. 82801 PCI Bridge (rev 42) > 00:1f.0 ISA bridge: Intel Corp. 82801CA LPC Interface Controller (rev 02) > 00:1f.1 IDE interface: Intel Corp. 82801CA Ultra ATA Storage Controller (rev 02) > 00:1f.3 SMBus: Intel Corp. 82801CA/CAM SMBus Controller (rev 02) > 01:1c.0 PIC: Intel Corp. 82870P2 P64H2 I/OxAPIC (rev 04) > 01:1d.0 PCI bridge: Intel Corp. 82870P2 P64H2 Hub PCI Bridge (rev 04) > 01:1e.0 PIC: Intel Corp. 82870P2 P64H2 I/OxAPIC (rev 04) > 01:1f.0 PCI bridge: Intel Corp. 82870P2 P64H2 Hub PCI Bridge (rev 04) > 03:02.0 Ethernet controller: Intel Corp. 82546EB Gigabit Ethernet Controller > (Copper) (rev 01) > 03:02.1 Ethernet controller: Intel Corp. 82546EB Gigabit Ethernet Controller > (Copper) (rev 01) > 04:1c.0 PIC: Intel Corp. 82870P2 P64H2 I/OxAPIC (rev 04) > 04:1d.0 PCI bridge: Intel Corp. 82870P2 P64H2 Hub PCI Bridge (rev 04) > 04:1e.0 PIC: Intel Corp. 82870P2 P64H2 I/OxAPIC (rev 04) > 04:1f.0 PCI bridge: Intel Corp. 82870P2 P64H2 Hub PCI Bridge (rev 04) > 05:02.0 Multimedia audio controller: Creative Labs SB Live! EMU10k1 (rev 07) > 05:02.1 Input device controller: Creative Labs SB Live! MIDI/Game Port (rev 07) > 05:03.0 Serial controller: 3Com Corp, Modem Division (formerly US Robotics) > 56K FaxModem Model 5610 (rev 01) > 06:02.0 SCSI storage controller: Adaptec AIC-7902B U320 (rev 10) > 06:02.1 SCSI storage controller: Adaptec AIC-7902B U320 (rev 10) > 07:01.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) > > >>TIA, So, you have 4 "Intel Corp. 82870P2 P64H2 I/OxAPIC (rev 04)", while I have 2 "Intel Corp. PCI Bridge Hub I/OxAPIC Interrupt Controller A (rev 09)". My problem could be buggy or lack of support for this particular chipset (we have other SMP systems, all showing balanced IRQ load, but lspci shows *no* APICs on these). One last thing you could do to help me, post (or e-mail me as a tgz) your /usr/src/linux-whatever/include/linux/config.h file (if there is one; we use debian with a re-made 2.4.28 kernel, there may be differences in setup files with RH) TIA, -- Michel Bardiaux Peaktime Belgium S.A. Bd. du Souverain, 191 B-1160 Bruxelles Tel : +32 2 790.29.41 |
| |||
| Jean-David Beyer wrote: > Hash: SHA1 > > Michel Bardiaux wrote: > >>I've compared autoconf.h and I think this is the significant difference; >>yours: >> >>autoconf.h:#define CONFIG_X86_GOOD_APIC 1 >>autoconf.h:#define CONFIG_X86_CLUSTERED_APIC 1 >>autoconf.h:#define CONFIG_X86_CLUSTERED_APIC 1 >>autoconf.h:#define CONFIG_X86_IO_APIC 1 >>autoconf.h:#define CONFIG_X86_LOCAL_APIC 1 >>autoconf.h:#define CONFIG_X86_NUMA 1 >> >> >>Mine: >> >>autoconf.h:#define CONFIG_X86_GOOD_APIC 1 >>autoconf.h:#define CONFIG_X86_IO_APIC 1 >>autoconf.h:#define CONFIG_X86_LOCAL_APIC 1 >>autoconf.h:#undef CONFIG_X86_NUMA We have made a new 2.4.28 with NUMA support selected to have the same autoconf.h, still the same problem of all interrupts to the same CPU. It seems to me now that the critical line in your dmesg output is this one: > xAPIC support is present I dont have it, moreover this string appears nowhere in the stock 2.4.28 sources! From web searches it seems this might be part of a RedHat patch. So my best hope now is to compare my kernel sources with yours. Where did you get your kernel 2.4.21-27.0.2.ELsmp ? Did you apply any additional patches? And hope is what I need now. Another machine having failed, the Dell Poweedge 1800 in question had to take over some production tasks, and because of these interrupt storms, we have to run without APIC, which means without SMP, which means without HT; in other words, *slow*! -- Michel Bardiaux Peaktime Belgium S.A. Bd. du Souverain, 191 B-1160 Bruxelles Tel : +32 2 790.29.41 |
| ||||
| Michel Bardiaux wrote: > Jean-David Beyer wrote: > > Hash: SHA1 > > > > Michel Bardiaux wrote: > > > >>I've compared autoconf.h and I think this is the significant difference; > >>yours: > >> > >>autoconf.h:#define CONFIG_X86_GOOD_APIC 1 > >>autoconf.h:#define CONFIG_X86_CLUSTERED_APIC 1 > >>autoconf.h:#define CONFIG_X86_CLUSTERED_APIC 1 > >>autoconf.h:#define CONFIG_X86_IO_APIC 1 > >>autoconf.h:#define CONFIG_X86_LOCAL_APIC 1 > >>autoconf.h:#define CONFIG_X86_NUMA 1 > >> > >> > >>Mine: > >> > >>autoconf.h:#define CONFIG_X86_GOOD_APIC 1 > >>autoconf.h:#define CONFIG_X86_IO_APIC 1 > >>autoconf.h:#define CONFIG_X86_LOCAL_APIC 1 > >>autoconf.h:#undef CONFIG_X86_NUMA > > We have made a new 2.4.28 with NUMA support selected to have the same > autoconf.h, still the same problem of all interrupts to the same CPU. It > seems to me now that the critical line in your dmesg output is this one: > > > xAPIC support is present > > I dont have it, moreover this string appears nowhere in the stock 2.4.28 > sources! From web searches it seems this might be part of a RedHat > patch. So my best hope now is to compare my kernel sources with yours. > Where did you get your kernel 2.4.21-27.0.2.ELsmp ? Did you apply any > additional patches? > > And hope is what I need now. Another machine having failed, the Dell > Poweedge 1800 in question had to take over some production tasks, and > because of these interrupt storms, we have to run without APIC, which > means without SMP, which means without HT; in other words, *slow*! This is a complete shot in the dark -- you sound like you're nearing the desperate edge. [q] I have also given into the peer pressure and enabled SMP with this release. The debian-dell-2.4.29.iso (md5sum: dcd375d887f18159dd01ba47f513db23) should support the following Dell products: PowerEdge 400SC PowerEdge 420SC PowerEdge 600SC (thanks Nat) PowerEdge 650 PowerEdge 700 (thanks Eric Busick) PowerEdge 750 (thanks Carsten Buchenau) PowerEdge 800 (thanks Jean-Christophe Montigny) PowerEdge 1300 (thanks Lukasz Andersson) PowerEdge 1400 (thanks Matt Griffin) PowerEdge 1550 PowerEdge 1650 PowerEdge 1600SC (thanks Stephane) PowerEdge 1655C PowerEdge 1750 PowerEdge 1800 (thanks weasel) PowerEdge 1850 (thanks James L. Morton) (and many more ...) [eq] http://wiki.osuosl.org/display/LNX/D...=true#comments You may already be aware of it. RH based kernels (kernel 2.4.21-27.0.2.ELsmp ) here or near? ftp://ftp.linux.ncsu.edu/pub/centos/...86/RedHat/RPMS ftp://ftp.linux.ncsu.edu/pub/centos/...64/RedHat/RPMS good luck, prg |