Unix Technical Forum

Xeon hyperthreading problem

This is a discussion on Xeon hyperthreading problem within the Linux Operating System forums, part of the Unix Operating Systems category; --> Distribution: debian woody Kernel: 2.4.28, vanilla lilo.conf CPU: dual hyperthreaded Xeon, so cpuinfo reports 4 times this: processor : ...


Go Back   Unix Technical Forum > Unix Operating Systems > Linux Operating System

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 01-18-2008, 08:26 AM
Michel Bardiaux
 
Posts: n/a
Default Xeon hyperthreading problem

Distribution: debian woody

Kernel: 2.4.28, vanilla lilo.conf

CPU: dual hyperthreaded Xeon, so cpuinfo reports 4 times this:

processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel(R) Xeon(TM) CPU 2.80GHz
stepping : 1
cpu MHz : 2793.080
cache size : 1024 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm
pni monitor ds_cpl cid
bogomips : 5570.56

When a process doing heavy local IO runs, just about everything else
freezes for up to minutes (then goes on). We suspect its related to a
weird report from /proc/interrupts:

CPU0 CPU1 CPU2 CPU3
0: 122 0 0 120615284 IO-APIC-edge timer
1: 0 0 0 272 IO-APIC-edge keyboard
2: 0 0 0 0 XT-PIC cascade
14: 1 0 0 4 IO-APIC-edge ide0
16: 0 0 0 0 IO-APIC-level usb-uhci
18: 0 0 0 0 IO-APIC-level usb-uhci
19: 0 0 0 0 IO-APIC-level usb-uhci
20: 0 0 0 16915687 IO-APIC-level aic7xxx
29: 0 0 0 857063989 IO-APIC-level eth0
48: 0 0 0 28268766 IO-APIC-level aacraid
NMI: 0 0 0 0
LOC: 120622421 120622419 120622419 120622418
ERR: 0
MIS: 0

It does not seem normal that all interrupts are directed to the same CPU
regardless of whether its busy or not. This *could* be a bug in the
counters, or a bug in the mobo, or a bug in the setting up of the APIC
by the kernel, or some mumo-jumbo is needed in lilo.conf (append option?).

TIA for advice.

--
Michel Bardiaux
Peaktime Belgium S.A. Bd. du Souverain, 191 B-1160 Bruxelles
Tel : +32 2 790.29.41
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 01-18-2008, 08:26 AM
Jean-David Beyer
 
Posts: n/a
Default Re: Xeon hyperthreading problem

Michel Bardiaux wrote:
> Distribution: debian woody
>
> Kernel: 2.4.28, vanilla lilo.conf
>
> CPU: dual hyperthreaded Xeon, so cpuinfo reports 4 times this:
>
> processor : 0
> vendor_id : GenuineIntel
> cpu family : 15
> model : 4
> model name : Intel(R) Xeon(TM) CPU 2.80GHz
> stepping : 1
> cpu MHz : 2793.080
> cache size : 1024 KB
> fdiv_bug : no
> hlt_bug : no
> f00f_bug : no
> coma_bug : no
> fpu : yes
> fpu_exception : yes
> cpuid level : 5
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
> mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm
> pni monitor ds_cpl cid
> bogomips : 5570.56


Kernel: 2.4.21-27.0.2.ELsmp

My /proc/cpuinfo has 4 like this:

processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Xeon(TM) CPU 3.06GHz
stepping : 5
cpu MHz : 3056.561
cache size : 512 KB
physical id : 0
siblings : 2
runqueue : 0
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips : 6094.84

>
> When a process doing heavy local IO runs, just about everything else
> freezes for up to minutes (then goes on). We suspect its related to a
> weird report from /proc/interrupts:
>
> CPU0 CPU1 CPU2 CPU3
> 0: 122 0 0 120615284 IO-APIC-edge timer
> 1: 0 0 0 272 IO-APIC-edge keyboard
> 2: 0 0 0 0 XT-PIC cascade
> 14: 1 0 0 4 IO-APIC-edge ide0
> 16: 0 0 0 0 IO-APIC-level usb-uhci
> 18: 0 0 0 0 IO-APIC-level usb-uhci
> 19: 0 0 0 0 IO-APIC-level usb-uhci
> 20: 0 0 0 16915687 IO-APIC-level aic7xxx
> 29: 0 0 0 857063989 IO-APIC-level eth0
> 48: 0 0 0 28268766 IO-APIC-level aacraid
> NMI: 0 0 0 0
> LOC: 120622421 120622419 120622419 120622418
> ERR: 0
> MIS: 0
>

My /proc/interrupts looks like this:

CPU0 CPU1 CPU2 CPU3
0: 15306480 15299292 15304294 15304290 IO-APIC-edge timer
1: 28051 30884 31101 29039 IO-APIC-edge keyboard
2: 0 0 0 0 XT-PIC cascade
3: 1589033 427115 1592438 424021 IO-APIC-edge serial
8: 1 0 0 0 IO-APIC-edge rtc
12: 6154990 1277805 6101628 1285460 IO-APIC-edge PS/2 Mouse
14: 4410 9012265 3441 9078354 IO-APIC-edge ide0
15: 82 0 129 24 IO-APIC-edge ide1
16: 0 0 0 0 IO-APIC-level usb-uhci
18: 0 0 0 0 IO-APIC-level usb-uhci
19: 0 0 0 0 IO-APIC-level usb-uhci
28: 524737 0 0 0 IO-APIC-level eth0
29: 4 0 306041 0 IO-APIC-level eth1
76: 991888 1021284 1091264 1040122 IO-APIC-level aic79xx
77: 487917 850703 539028 882214 IO-APIC-level aic79xx
100: 174398 113755 178353 114161 IO-APIC-level EMU10K1
104: 4907319 1607055 5314933 1101314 IO-APIC-level serial
NMI: 0 0 0 0
LOC: 61214842 61214841 61214840 61214840
ERR: 0
MIS: 0


> It does not seem normal that all interrupts are directed to the same CPU
> regardless of whether its busy or not. This *could* be a bug in the
> counters, or a bug in the mobo, or a bug in the setting up of the APIC
> by the kernel, or some mumo-jumbo is needed in lilo.conf (append option?).
>
> TIA for advice.
>

You seem to have configured your system to direct all interrupts to CPU3.
Notice it did not start up that way, as you have a few timer interrupts to
CPU0. I thought this was set by affinity, but if so, I do not understand the
manual page.

--
.~. Jean-David Beyer Registered Linux User 85642.
/V\ PGP-Key: 9A2FC99A Registered Machine 241939.
/( )\ Shrewsbury, New Jersey http://counter.li.org
^^-^^ 08:40:00 up 7 days, 1:56, 4 users, load average: 4.14, 4.03, 3.58
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 01-18-2008, 08:27 AM
Michel Bardiaux
 
Posts: n/a
Default Re: Xeon hyperthreading problem

Jean-David Beyer wrote:
> Michel Bardiaux wrote:
>
>>Distribution: debian woody
>>
>>Kernel: 2.4.28, vanilla lilo.conf
>>
>>CPU: dual hyperthreaded Xeon, so cpuinfo reports 4 times this:
>>
>>processor : 0
>>vendor_id : GenuineIntel
>>cpu family : 15
>>model : 4
>>model name : Intel(R) Xeon(TM) CPU 2.80GHz
>>stepping : 1
>>cpu MHz : 2793.080
>>cache size : 1024 KB
>>fdiv_bug : no
>>hlt_bug : no
>>f00f_bug : no
>>coma_bug : no
>>fpu : yes
>>fpu_exception : yes
>>cpuid level : 5
>>wp : yes
>>flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
>>mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm
>>pni monitor ds_cpl cid
>>bogomips : 5570.56

>
>
> Kernel: 2.4.21-27.0.2.ELsmp
>
> My /proc/cpuinfo has 4 like this:
>
> processor : 0
> vendor_id : GenuineIntel
> cpu family : 15
> model : 2
> model name : Intel(R) Xeon(TM) CPU 3.06GHz
> stepping : 5
> cpu MHz : 3056.561
> cache size : 512 KB
> physical id : 0
> siblings : 2
> runqueue : 0
> fdiv_bug : no
> hlt_bug : no
> f00f_bug : no
> coma_bug : no
> fpu : yes
> fpu_exception : yes
> cpuid level : 2
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
> pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
> bogomips : 6094.84
>
>
>>When a process doing heavy local IO runs, just about everything else
>>freezes for up to minutes (then goes on). We suspect its related to a
>>weird report from /proc/interrupts:
>>
>> CPU0 CPU1 CPU2 CPU3
>> 0: 122 0 0 120615284 IO-APIC-edge timer
>> 1: 0 0 0 272 IO-APIC-edge keyboard
>> 2: 0 0 0 0 XT-PIC cascade
>> 14: 1 0 0 4 IO-APIC-edge ide0
>> 16: 0 0 0 0 IO-APIC-level usb-uhci
>> 18: 0 0 0 0 IO-APIC-level usb-uhci
>> 19: 0 0 0 0 IO-APIC-level usb-uhci
>> 20: 0 0 0 16915687 IO-APIC-level aic7xxx
>> 29: 0 0 0 857063989 IO-APIC-level eth0
>> 48: 0 0 0 28268766 IO-APIC-level aacraid
>>NMI: 0 0 0 0
>>LOC: 120622421 120622419 120622419 120622418
>>ERR: 0
>>MIS: 0
>>


Additional info: we have restarted the system with HT disabled in BIOS.
Now we have just 2 CPUs, but still all interrupts go to CPU1, and the
'congealing' (not as strong as a freezup) still occurs.

>
> My /proc/interrupts looks like this:
>
> CPU0 CPU1 CPU2 CPU3
> 0: 15306480 15299292 15304294 15304290 IO-APIC-edge timer
> 1: 28051 30884 31101 29039 IO-APIC-edge keyboard
> 2: 0 0 0 0 XT-PIC cascade
> 3: 1589033 427115 1592438 424021 IO-APIC-edge serial
> 8: 1 0 0 0 IO-APIC-edge rtc
> 12: 6154990 1277805 6101628 1285460 IO-APIC-edge PS/2 Mouse
> 14: 4410 9012265 3441 9078354 IO-APIC-edge ide0
> 15: 82 0 129 24 IO-APIC-edge ide1
> 16: 0 0 0 0 IO-APIC-level usb-uhci
> 18: 0 0 0 0 IO-APIC-level usb-uhci
> 19: 0 0 0 0 IO-APIC-level usb-uhci
> 28: 524737 0 0 0 IO-APIC-level eth0
> 29: 4 0 306041 0 IO-APIC-level eth1
> 76: 991888 1021284 1091264 1040122 IO-APIC-level aic79xx
> 77: 487917 850703 539028 882214 IO-APIC-level aic79xx
> 100: 174398 113755 178353 114161 IO-APIC-level EMU10K1
> 104: 4907319 1607055 5314933 1101314 IO-APIC-level serial
> NMI: 0 0 0 0
> LOC: 61214842 61214841 61214840 61214840
> ERR: 0
> MIS: 0
>


Which is as I thought it should be.

>
>
>>It does not seem normal that all interrupts are directed to the same CPU
>>regardless of whether its busy or not. This *could* be a bug in the
>>counters, or a bug in the mobo, or a bug in the setting up of the APIC
>>by the kernel, or some mumo-jumbo is needed in lilo.conf (append option?).
>>
>>TIA for advice.
>>

>
> You seem to have configured your system to direct all interrupts to CPU3.
> Notice it did not start up that way, as you have a few timer interrupts to
> CPU0.


Clear to me too.

> I thought this was set by affinity, but if so, I do not understand the
> manual page.
>

I dont have any command named 'affinity'; in what package is it?

HaND,
--
Michel Bardiaux
Peaktime Belgium S.A. Bd. du Souverain, 191 B-1160 Bruxelles
Tel : +32 2 790.29.41
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 01-18-2008, 08:27 AM
Michel Bardiaux
 
Posts: n/a
Default Re: Xeon hyperthreading problem

Michel Bardiaux wrote:
[snip]
>
> Additional info: we have restarted the system with HT disabled in BIOS.
> Now we have just 2 CPUs, but still all interrupts go to CPU1, and the
> 'congealing' (not as strong as a freezup) still occurs.
>

[snip]

>>
>> You seem to have configured your system to direct all interrupts to CPU3.
>> Notice it did not start up that way, as you have a few timer
>> interrupts to
>> CPU0.

>
>
> Clear to me too.
>
>> I thought this was set by affinity, but if so, I do not understand the
>> manual page.
>>

> I dont have any command named 'affinity'; in what package is it?
>
> HaND,


I have done a bit of web searches, seems interrupt/CPU affinity is
controlled by the files /proc/irq/*/smp_affinity. On my system they are
all ffffffff, and /proc/irq/prof_cpu_mask too. Could it be that the APIC
is not initialized correctly by the kernel? Or is there some parameter
of kernel config that has to be turned on during 'make xconfig'?

--
Michel Bardiaux
Peaktime Belgium S.A. Bd. du Souverain, 191 B-1160 Bruxelles
Tel : +32 2 790.29.41
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 01-18-2008, 08:27 AM
Jean-David Beyer
 
Posts: n/a
Default Re: Xeon hyperthreading problem

Michel Bardiaux wrote:
> Michel Bardiaux wrote:
> [snip]
>
>>
>> Additional info: we have restarted the system with HT disabled in
>> BIOS. Now we have just 2 CPUs, but still all interrupts go to CPU1,
>> and the 'congealing' (not as strong as a freezup) still occurs.
>>

> [snip]
>
>>>
>>> You seem to have configured your system to direct all interrupts to
>>> CPU3.
>>> Notice it did not start up that way, as you have a few timer
>>> interrupts to
>>> CPU0.

>>
>>
>>
>> Clear to me too.
>>
>>> I thought this was set by affinity, but if so, I do not understand the
>>> manual page.
>>>

>> I dont have any command named 'affinity'; in what package is it?
>>
>> HaND,

>
>
> I have done a bit of web searches, seems interrupt/CPU affinity is
> controlled by the files /proc/irq/*/smp_affinity. On my system they are
> all ffffffff, and /proc/irq/prof_cpu_mask too. Could it be that the APIC
> is not initialized correctly by the kernel? Or is there some parameter
> of kernel config that has to be turned on during 'make xconfig'?
>

Well, I know I did not have to do anything after I installed my system.

# cat /proc/irq/prof_cpu_mask
ffffffff

# cat /proc/irq/*/smp_affinity
00000001
00000001
00000004
ffffffff
ffffffff
00000004
ffffffff
00000008
00000004
00000008
00000008
00000001
00000002
00000001
00000004
ffffffff
00000002
ffffffff
ffffffff
ffffffff
00000002
00000001
ffffffff
00000008
ffffffff

I do not know what these numbers mean, but they might help someone who does.

--
.~. Jean-David Beyer Registered Linux User 85642.
/V\ PGP-Key: 9A2FC99A Registered Machine 241939.
/( )\ Shrewsbury, New Jersey http://counter.li.org
^^-^^ 10:10:00 up 8 days, 3:26, 3 users, load average: 4.26, 4.05, 3.65
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 01-18-2008, 08:27 AM
Michel Bardiaux
 
Posts: n/a
Default Re: Xeon hyperthreading problem

Jean-David Beyer wrote:
[snip]

>
> Well, I know I did not have to do anything after I installed my system.
>
> # cat /proc/irq/prof_cpu_mask
> ffffffff
>
> # cat /proc/irq/*/smp_affinity
> 00000001
> 00000001
> 00000004
> ffffffff
> ffffffff
> 00000004
> ffffffff
> 00000008
> 00000004
> 00000008
> 00000008
> 00000001
> 00000002
> 00000001
> 00000004
> ffffffff
> 00000002
> ffffffff
> ffffffff
> ffffffff
> 00000002
> 00000001
> ffffffff
> 00000008
> ffffffff
>
> I do not know what these numbers mean, but they might help someone who does.
>

What distro do you use? What does fgrep -i irq /etc/inetd.d/* reveal?

And the values would be a little easier to interpret if you could
(pretty please) post 'grep . /proc/irq/*/smp_affinity' so I would know
the filenames too.

And going back over the thread, I see another difference in
/proc/interrupts: yours go up to #104, mine only to #48. Could you post
your lspci output?

TIA,
--
Michel Bardiaux
Peaktime Belgium S.A. Bd. du Souverain, 191 B-1160 Bruxelles
Tel : +32 2 790.29.41
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 01-18-2008, 08:27 AM
Jean-David Beyer
 
Posts: n/a
Default Re: Xeon hyperthreading problem

Michel Bardiaux wrote:
> Jean-David Beyer wrote:
> [snip]
>
>>
>> Well, I know I did not have to do anything after I installed my system.
>>
>> # cat /proc/irq/prof_cpu_mask
>> ffffffff
>>
>> # cat /proc/irq/*/smp_affinity
>> 00000001
>> 00000001
>> 00000004
>> ffffffff
>> ffffffff
>> 00000004
>> ffffffff
>> 00000008
>> 00000004
>> 00000008
>> 00000008
>> 00000001
>> 00000002
>> 00000001
>> 00000004
>> ffffffff
>> 00000002
>> ffffffff
>> ffffffff
>> ffffffff
>> 00000002
>> 00000001
>> ffffffff
>> 00000008
>> ffffffff
>>
>> I do not know what these numbers mean, but they might help someone who
>> does.
>>

> What distro do you use?


Red Hat Enterprise Linux 3 ES

> What does fgrep -i irq /etc/inetd.d/* reveal?


$ fgrep -i irq /etc/inetd.d/*
fgrep: /etc/inetd.d/*: No such file or directory

Do you mean

$ fgrep -i irq /etc/xinetd.d/*
$

>
> And the values would be a little easier to interpret if you could
> (pretty please) post 'grep . /proc/irq/*/smp_affinity' so I would know
> the filenames too.


# grep . /proc/irq/*/smp_affinity
/proc/irq/0/smp_affinity:00000002
/proc/irq/100/smp_affinity:00000002
/proc/irq/104/smp_affinity:00000008
/proc/irq/10/smp_affinity:ffffffff
/proc/irq/11/smp_affinity:ffffffff
/proc/irq/12/smp_affinity:00000004
/proc/irq/13/smp_affinity:ffffffff
/proc/irq/14/smp_affinity:00000008
/proc/irq/15/smp_affinity:00000004
/proc/irq/16/smp_affinity:00000001
/proc/irq/18/smp_affinity:00000004
/proc/irq/19/smp_affinity:00000002
/proc/irq/1/smp_affinity:00000008
/proc/irq/28/smp_affinity:00000001
/proc/irq/29/smp_affinity:00000004
/proc/irq/2/smp_affinity:ffffffff
/proc/irq/3/smp_affinity:00000001
/proc/irq/4/smp_affinity:ffffffff
/proc/irq/5/smp_affinity:ffffffff
/proc/irq/6/smp_affinity:ffffffff
/proc/irq/76/smp_affinity:00000008
/proc/irq/77/smp_affinity:00000004
/proc/irq/7/smp_affinity:ffffffff
/proc/irq/8/smp_affinity:00000001
/proc/irq/9/smp_affinity:ffffffff

>
> And going back over the thread, I see another difference in
> /proc/interrupts: yours go up to #104, mine only to #48. Could you post
> your lspci output?


# /sbin/lspci
00:00.0 Host bridge: Intel Corp. E7501 Memory Controller Hub (rev 01)
00:00.1 Class ff00: Intel Corp. E7500/E7501 Host RASUM Controller (rev 01)
00:02.0 PCI bridge: Intel Corp. E7500/E7501 Hub Interface B PCI-to-PCI
Bridge (rev 01)
00:03.0 PCI bridge: Intel Corp. E7500/E7501 Hub Interface C PCI-to-PCI
Bridge (rev 01)
00:1d.0 USB Controller: Intel Corp. 82801CA/CAM USB (Hub #1) (rev 02)
00:1d.1 USB Controller: Intel Corp. 82801CA/CAM USB (Hub #2) (rev 02)
00:1d.2 USB Controller: Intel Corp. 82801CA/CAM USB (Hub #3) (rev 02)
00:1e.0 PCI bridge: Intel Corp. 82801 PCI Bridge (rev 42)
00:1f.0 ISA bridge: Intel Corp. 82801CA LPC Interface Controller (rev 02)
00:1f.1 IDE interface: Intel Corp. 82801CA Ultra ATA Storage Controller (rev 02)
00:1f.3 SMBus: Intel Corp. 82801CA/CAM SMBus Controller (rev 02)
01:1c.0 PIC: Intel Corp. 82870P2 P64H2 I/OxAPIC (rev 04)
01:1d.0 PCI bridge: Intel Corp. 82870P2 P64H2 Hub PCI Bridge (rev 04)
01:1e.0 PIC: Intel Corp. 82870P2 P64H2 I/OxAPIC (rev 04)
01:1f.0 PCI bridge: Intel Corp. 82870P2 P64H2 Hub PCI Bridge (rev 04)
03:02.0 Ethernet controller: Intel Corp. 82546EB Gigabit Ethernet Controller
(Copper) (rev 01)
03:02.1 Ethernet controller: Intel Corp. 82546EB Gigabit Ethernet Controller
(Copper) (rev 01)
04:1c.0 PIC: Intel Corp. 82870P2 P64H2 I/OxAPIC (rev 04)
04:1d.0 PCI bridge: Intel Corp. 82870P2 P64H2 Hub PCI Bridge (rev 04)
04:1e.0 PIC: Intel Corp. 82870P2 P64H2 I/OxAPIC (rev 04)
04:1f.0 PCI bridge: Intel Corp. 82870P2 P64H2 Hub PCI Bridge (rev 04)
05:02.0 Multimedia audio controller: Creative Labs SB Live! EMU10k1 (rev 07)
05:02.1 Input device controller: Creative Labs SB Live! MIDI/Game Port (rev 07)
05:03.0 Serial controller: 3Com Corp, Modem Division (formerly US Robotics)
56K FaxModem Model 5610 (rev 01)
06:02.0 SCSI storage controller: Adaptec AIC-7902B U320 (rev 10)
06:02.1 SCSI storage controller: Adaptec AIC-7902B U320 (rev 10)
07:01.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)

>
> TIA,



--
.~. Jean-David Beyer Registered Linux User 85642.
/V\ PGP-Key: 9A2FC99A Registered Machine 241939.
/( )\ Shrewsbury, New Jersey http://counter.li.org
^^-^^ 13:10:00 up 8 days, 6:26, 3 users, load average: 4.51, 4.26, 4.12
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 01-18-2008, 08:27 AM
Michel Bardiaux
 
Posts: n/a
Default Re: Xeon hyperthreading problem

Jean-David Beyer wrote:
[snip

>>>

>>
>>What distro do you use?

>
>
> Red Hat Enterprise Linux 3 ES
>
>
>>What does fgrep -i irq /etc/inetd.d/* reveal?

>
>
> $ fgrep -i irq /etc/inetd.d/*
> fgrep: /etc/inetd.d/*: No such file or directory
>
> Do you mean
>
> $ fgrep -i irq /etc/xinetd.d/*


Sorry, my mistake, /etc/init.d/*

> $
>
>
>>And the values would be a little easier to interpret if you could
>>(pretty please) post 'grep . /proc/irq/*/smp_affinity' so I would know
>>the filenames too.

>
>
> # grep . /proc/irq/*/smp_affinity
> /proc/irq/0/smp_affinity:00000002


Now I'm totally confused. With this value, on your system IRQ#0 should
go only to CPU#1, but it goes equally to all 4 according to the
/proc/interrupts previously posted!

> /proc/irq/100/smp_affinity:00000002
> /proc/irq/104/smp_affinity:00000008
> /proc/irq/10/smp_affinity:ffffffff
> /proc/irq/11/smp_affinity:ffffffff
> /proc/irq/12/smp_affinity:00000004
> /proc/irq/13/smp_affinity:ffffffff
> /proc/irq/14/smp_affinity:00000008
> /proc/irq/15/smp_affinity:00000004
> /proc/irq/16/smp_affinity:00000001
> /proc/irq/18/smp_affinity:00000004
> /proc/irq/19/smp_affinity:00000002
> /proc/irq/1/smp_affinity:00000008
> /proc/irq/28/smp_affinity:00000001
> /proc/irq/29/smp_affinity:00000004
> /proc/irq/2/smp_affinity:ffffffff
> /proc/irq/3/smp_affinity:00000001
> /proc/irq/4/smp_affinity:ffffffff
> /proc/irq/5/smp_affinity:ffffffff
> /proc/irq/6/smp_affinity:ffffffff
> /proc/irq/76/smp_affinity:00000008
> /proc/irq/77/smp_affinity:00000004
> /proc/irq/7/smp_affinity:ffffffff
> /proc/irq/8/smp_affinity:00000001
> /proc/irq/9/smp_affinity:ffffffff
>
>
>>And going back over the thread, I see another difference in
>>/proc/interrupts: yours go up to #104, mine only to #48. Could you post
>>your lspci output?

>
>
> # /sbin/lspci
> 00:00.0 Host bridge: Intel Corp. E7501 Memory Controller Hub (rev 01)
> 00:00.1 Class ff00: Intel Corp. E7500/E7501 Host RASUM Controller (rev 01)
> 00:02.0 PCI bridge: Intel Corp. E7500/E7501 Hub Interface B PCI-to-PCI
> Bridge (rev 01)
> 00:03.0 PCI bridge: Intel Corp. E7500/E7501 Hub Interface C PCI-to-PCI
> Bridge (rev 01)
> 00:1d.0 USB Controller: Intel Corp. 82801CA/CAM USB (Hub #1) (rev 02)
> 00:1d.1 USB Controller: Intel Corp. 82801CA/CAM USB (Hub #2) (rev 02)
> 00:1d.2 USB Controller: Intel Corp. 82801CA/CAM USB (Hub #3) (rev 02)
> 00:1e.0 PCI bridge: Intel Corp. 82801 PCI Bridge (rev 42)
> 00:1f.0 ISA bridge: Intel Corp. 82801CA LPC Interface Controller (rev 02)
> 00:1f.1 IDE interface: Intel Corp. 82801CA Ultra ATA Storage Controller (rev 02)
> 00:1f.3 SMBus: Intel Corp. 82801CA/CAM SMBus Controller (rev 02)
> 01:1c.0 PIC: Intel Corp. 82870P2 P64H2 I/OxAPIC (rev 04)
> 01:1d.0 PCI bridge: Intel Corp. 82870P2 P64H2 Hub PCI Bridge (rev 04)
> 01:1e.0 PIC: Intel Corp. 82870P2 P64H2 I/OxAPIC (rev 04)
> 01:1f.0 PCI bridge: Intel Corp. 82870P2 P64H2 Hub PCI Bridge (rev 04)
> 03:02.0 Ethernet controller: Intel Corp. 82546EB Gigabit Ethernet Controller
> (Copper) (rev 01)
> 03:02.1 Ethernet controller: Intel Corp. 82546EB Gigabit Ethernet Controller
> (Copper) (rev 01)
> 04:1c.0 PIC: Intel Corp. 82870P2 P64H2 I/OxAPIC (rev 04)
> 04:1d.0 PCI bridge: Intel Corp. 82870P2 P64H2 Hub PCI Bridge (rev 04)
> 04:1e.0 PIC: Intel Corp. 82870P2 P64H2 I/OxAPIC (rev 04)
> 04:1f.0 PCI bridge: Intel Corp. 82870P2 P64H2 Hub PCI Bridge (rev 04)
> 05:02.0 Multimedia audio controller: Creative Labs SB Live! EMU10k1 (rev 07)
> 05:02.1 Input device controller: Creative Labs SB Live! MIDI/Game Port (rev 07)
> 05:03.0 Serial controller: 3Com Corp, Modem Division (formerly US Robotics)
> 56K FaxModem Model 5610 (rev 01)
> 06:02.0 SCSI storage controller: Adaptec AIC-7902B U320 (rev 10)
> 06:02.1 SCSI storage controller: Adaptec AIC-7902B U320 (rev 10)
> 07:01.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
>
>
>>TIA,


So, you have 4 "Intel Corp. 82870P2 P64H2 I/OxAPIC (rev 04)", while I
have 2 "Intel Corp. PCI Bridge Hub I/OxAPIC Interrupt Controller A (rev
09)". My problem could be buggy or lack of support for this particular
chipset (we have other SMP systems, all showing balanced IRQ load, but
lspci shows *no* APICs on these).

One last thing you could do to help me, post (or e-mail me as a tgz)
your /usr/src/linux-whatever/include/linux/config.h file (if there is
one; we use debian with a re-made 2.4.28 kernel, there may be
differences in setup files with RH)

TIA,
--
Michel Bardiaux
Peaktime Belgium S.A. Bd. du Souverain, 191 B-1160 Bruxelles
Tel : +32 2 790.29.41
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 01-18-2008, 08:30 AM
Michel Bardiaux
 
Posts: n/a
Default Re: Xeon hyperthreading problem LONG

Jean-David Beyer wrote:
> Hash: SHA1
>
> Michel Bardiaux wrote:
>
>>I've compared autoconf.h and I think this is the significant difference;
>>yours:
>>
>>autoconf.h:#define CONFIG_X86_GOOD_APIC 1
>>autoconf.h:#define CONFIG_X86_CLUSTERED_APIC 1
>>autoconf.h:#define CONFIG_X86_CLUSTERED_APIC 1
>>autoconf.h:#define CONFIG_X86_IO_APIC 1
>>autoconf.h:#define CONFIG_X86_LOCAL_APIC 1
>>autoconf.h:#define CONFIG_X86_NUMA 1
>>
>>
>>Mine:
>>
>>autoconf.h:#define CONFIG_X86_GOOD_APIC 1
>>autoconf.h:#define CONFIG_X86_IO_APIC 1
>>autoconf.h:#define CONFIG_X86_LOCAL_APIC 1
>>autoconf.h:#undef CONFIG_X86_NUMA


We have made a new 2.4.28 with NUMA support selected to have the same
autoconf.h, still the same problem of all interrupts to the same CPU. It
seems to me now that the critical line in your dmesg output is this one:

> xAPIC support is present


I dont have it, moreover this string appears nowhere in the stock 2.4.28
sources! From web searches it seems this might be part of a RedHat
patch. So my best hope now is to compare my kernel sources with yours.
Where did you get your kernel 2.4.21-27.0.2.ELsmp ? Did you apply any
additional patches?

And hope is what I need now. Another machine having failed, the Dell
Poweedge 1800 in question had to take over some production tasks, and
because of these interrupt storms, we have to run without APIC, which
means without SMP, which means without HT; in other words, *slow*!

--
Michel Bardiaux
Peaktime Belgium S.A. Bd. du Souverain, 191 B-1160 Bruxelles
Tel : +32 2 790.29.41
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 01-18-2008, 08:30 AM
prg
 
Posts: n/a
Default Re: Xeon hyperthreading problem LONG


Michel Bardiaux wrote:
> Jean-David Beyer wrote:
> > Hash: SHA1
> >
> > Michel Bardiaux wrote:
> >
> >>I've compared autoconf.h and I think this is the significant

difference;
> >>yours:
> >>
> >>autoconf.h:#define CONFIG_X86_GOOD_APIC 1
> >>autoconf.h:#define CONFIG_X86_CLUSTERED_APIC 1
> >>autoconf.h:#define CONFIG_X86_CLUSTERED_APIC 1
> >>autoconf.h:#define CONFIG_X86_IO_APIC 1
> >>autoconf.h:#define CONFIG_X86_LOCAL_APIC 1
> >>autoconf.h:#define CONFIG_X86_NUMA 1
> >>
> >>
> >>Mine:
> >>
> >>autoconf.h:#define CONFIG_X86_GOOD_APIC 1
> >>autoconf.h:#define CONFIG_X86_IO_APIC 1
> >>autoconf.h:#define CONFIG_X86_LOCAL_APIC 1
> >>autoconf.h:#undef CONFIG_X86_NUMA

>
> We have made a new 2.4.28 with NUMA support selected to have the same


> autoconf.h, still the same problem of all interrupts to the same CPU.

It
> seems to me now that the critical line in your dmesg output is this

one:
>
> > xAPIC support is present

>
> I dont have it, moreover this string appears nowhere in the stock

2.4.28
> sources! From web searches it seems this might be part of a RedHat
> patch. So my best hope now is to compare my kernel sources with

yours.
> Where did you get your kernel 2.4.21-27.0.2.ELsmp ? Did you apply any


> additional patches?
>
> And hope is what I need now. Another machine having failed, the Dell
> Poweedge 1800 in question had to take over some production tasks, and


> because of these interrupt storms, we have to run without APIC, which


> means without SMP, which means without HT; in other words, *slow*!


This is a complete shot in the dark -- you sound like you're nearing
the desperate edge.

[q]
I have also given into the peer pressure and enabled SMP with this
release.

The debian-dell-2.4.29.iso (md5sum: dcd375d887f18159dd01ba47f513db23)
should support the following Dell products:

PowerEdge 400SC
PowerEdge 420SC
PowerEdge 600SC (thanks Nat)
PowerEdge 650
PowerEdge 700 (thanks Eric Busick)
PowerEdge 750 (thanks Carsten Buchenau)
PowerEdge 800 (thanks Jean-Christophe Montigny)
PowerEdge 1300 (thanks Lukasz Andersson)
PowerEdge 1400 (thanks Matt Griffin)
PowerEdge 1550
PowerEdge 1650
PowerEdge 1600SC (thanks Stephane)
PowerEdge 1655C
PowerEdge 1750
PowerEdge 1800 (thanks weasel)
PowerEdge 1850 (thanks James L. Morton)
(and many more ...)
[eq]

http://wiki.osuosl.org/display/LNX/D...=true#comments

You may already be aware of it.

RH based kernels (kernel 2.4.21-27.0.2.ELsmp ) here or near?
ftp://ftp.linux.ncsu.edu/pub/centos/...86/RedHat/RPMS
ftp://ftp.linux.ncsu.edu/pub/centos/...64/RedHat/RPMS

good luck,
prg

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 07:48 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com