public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* P4 SMP load balancing
@ 2001-10-12  9:28 Sean Cavanaugh
  2001-10-12 17:59 ` Martin J. Bligh
  0 siblings, 1 reply; 5+ messages in thread
From: Sean Cavanaugh @ 2001-10-12  9:28 UTC (permalink / raw)
  To: linux-kernel

I posted this a while back in linux-smp (which seems like a dead list?)

I have several P4 Xeon SMP systems (Supermicro P4DCE, Intel i860
chipset)

ovendev:~# cat /proc/interrupts 
           CPU0       CPU1       
  0:    6348212          0    IO-APIC-edge  timer
  1:          2          0    IO-APIC-edge  keyboard
  2:          0          0          XT-PIC  cascade
  8:          1          0    IO-APIC-edge  rtc
  9:          0          0    IO-APIC-edge  acpi
 16:      92620          0   IO-APIC-level  eth0
 18:       5085          0   IO-APIC-level  aic7xxx, aic7xxx
NMI:          0          0 
LOC:    6348388    6348427 
ERR:          0
MIS:          0


	How much of a problem is this really?  The program's I am
running on these systems (I have 9 of them) seem do ok right now.
Currently the jobs running on them are heavily CPU bound and don't do
any I/O, but this is going to change when I link them up over a private
network so they can work together on some distributable jobs).  I am
running 2.4.10 on most of them, and 2.4.10-ac10 on my developer system
in the farm.  The only difference this newer kernel seems to have made
from older ones is that there is only one 'warning unexpected IO-APIC'
message in my startup instead of two.


Snippet from dmesg:

CPU1: Intel(R) Xeon(TM) CPU 1700MHz stepping 0a
Total of 2 processors activated (6723.99 BogoMIPS).
ENABLING IO-APIC IRQs
...changing IO-APIC physical APIC ID to 2 ... ok.
init IO_APIC IRQs
 IO-APIC (apicid-pin) 2-0, 2-5, 2-10, 2-11, 2-12, 2-17, 2-20, 2-21, 2-22
not connected.
..TIMER: vector=0x31 pin1=2 pin2=0
number of MP IRQ sources: 18.
number of IO-APIC #2 registers: 24.
testing the IO APIC.......................

IO APIC #2......
.... register #00: 02000000
.......    : physical APIC id: 02
.... register #01: 00178020
.......     : max redirection entries: 0017
.......     : PRQ implemented: 1
.......     : IO APIC version: 0020
 WARNING: unexpected IO-APIC, please mail
          to linux-smp@vger.kernel.org
.... register #02: 00000000
.......     : arbitration: 00
.... IRQ redirection table: 
<snip>



	- Sean


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: P4 SMP load balancing
  2001-10-12  9:28 Sean Cavanaugh
@ 2001-10-12 17:59 ` Martin J. Bligh
  2001-10-14  9:07   ` Sean Cavanaugh
  0 siblings, 1 reply; 5+ messages in thread
From: Martin J. Bligh @ 2001-10-12 17:59 UTC (permalink / raw)
  To: Sean Cavanaugh; +Cc: linux-kernel

> ovendev:~# cat /proc/interrupts 
>            CPU0       CPU1       
>   0:    6348212          0    IO-APIC-edge  timer
>   1:          2          0    IO-APIC-edge  keyboard
>   2:          0          0          XT-PIC  cascade
>   8:          1          0    IO-APIC-edge  rtc
>   9:          0          0    IO-APIC-edge  acpi
>  16:      92620          0   IO-APIC-level  eth0
>  18:       5085          0   IO-APIC-level  aic7xxx, aic7xxx
> NMI:          0          0 
> LOC:    6348388    6348427 
> ERR:          0
> MIS:          0

I don't think this should happen. In the event of both procs having equal 
priority (linux never changes them, so they always do), we should fall back 
to the arbitration priority of the lapic. Whether you have 1 or 2 I/O apics
working shouldn't make a difference. 

The arb priority of the local apic should change whenever a message
is sent (see the Intel docs on developer.intel.com), so we effectively
get round robin. For instance, a 4 way looks like this:

          CPU0       CPU1       CPU2       CPU3
  0:    1608606    1595657    2168078    1575546 	IO-APIC-edge  timer
  1:          0          0          0          2	IO-APIC-edge  keyboard
  2:          0          0          0          0          XT-PIC  cascade
  4:         76         52         62         48    IO-APIC-edge  serial
 23:       7983       8263       8286       8306   IO-APIC-level  qlogicisp
 39:          0          0          0          0          IO-APIC-level  eth1
 40:       6247       6216       6894       6325	IO-APIC-level  eth0
NMI:          0          0          0          0          
LOC:    6947876    6947859    6947873    6947874  
ERR:          0
MIS:          0

Which isn't perfectly balanced, but it looks a damned sight better than
yours does ;-) Do you have something in the log that looks like this?

Oct 11 15:35:04 elm3b76 kernel: IO APIC #13...... 
Oct 11 15:35:04 elm3b76 kernel: .... register #00: 0D000000 
Oct 11 15:35:04 elm3b76 kernel: .......    : physical APIC id: 0D 
Oct 11 15:35:04 elm3b76 kernel: .... register #01: 00170011 
Oct 11 15:35:04 elm3b76 kernel: .......     : max redirection entries: 0017 
Oct 11 15:35:04 elm3b76 kernel: .......     : IO APIC version: 0011 
Oct 11 15:35:04 elm3b76 kernel: .... register #02: 00000000 
Oct 11 15:35:04 elm3b76 kernel: .......     : arbitration: 00 
Oct 11 15:35:04 elm3b76 kernel: .... IRQ redirection table: 
Oct 11 15:35:04 elm3b76 kernel:  NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:    
Oct 11 15:35:04 elm3b76 kernel:  00 000 00  1    0    0   0   0    0    0    00 
Oct 11 15:35:05 elm3b76 kernel:  01 00F 0F  0    0    0   0   0    0    1    39 
Oct 11 15:35:05 elm3b76 kernel:  02 00F 0F  0    0    0   0   0    0    1    31 
Oct 11 15:35:05 elm3b76 kernel:  03 00F 0F  0    0    0   0   0    0    1    41 
Oct 11 15:35:05 elm3b76 kernel:  04 00F 0F  0    0    0   0   0    0    1    49 
Oct 11 15:35:05 elm3b76 kernel:  05 00F 0F  0    0    0   0   0    0    1    51 
Oct 11 15:35:05 elm3b76 kernel:  06 00F 0F  0    0    0   0   0    0    1    59 
Oct 11 15:35:05 elm3b76 kernel:  07 00F 0F  1    1    0   1   0    0    1    61 
Oct 11 15:35:05 elm3b76 kernel:  08 00F 0F  1    1    0   0   0    0    1    69 
Oct 11 15:35:05 elm3b76 kernel:  09 00F 0F  0    0    0   0   0    0    1    71 
Oct 11 15:35:05 elm3b76 kernel:  0a 00F 0F  0    0    0   0   0    0    1    79 
Oct 11 15:35:05 elm3b76 kernel:  0b 00F 0F  1    1    0   1   0    0    1    81 
Oct 11 15:35:05 elm3b76 kernel:  0c 00F 0F  0    0    0   0   0    0    1    89 
Oct 11 15:35:05 elm3b76 kernel:  0d 00F 0F  1    1    0   1   0    0    1    91 
Oct 11 15:35:05 elm3b76 kernel:  0e 00F 0F  0    0    0   0   0    0    1    99 
Oct 11 15:35:05 elm3b76 kernel:  0f 00F 0F  1    1    0   1   0    0    1    A1 
Oct 11 15:35:05 elm3b76 kernel:  10 00F 0F  1    1    0   1   0    0    1    A9 
Oct 11 15:35:05 elm3b76 kernel:  11 00F 0F  1    1    0   1   0    0    1    B1 
Oct 11 15:35:05 elm3b76 kernel:  12 00F 0F  1    1    0   1   0    0    1    B9 
Oct 11 15:35:05 elm3b76 kernel:  13 00F 0F  1    1    0   1   0    0    1    C1 
Oct 11 15:35:05 elm3b76 kernel:  14 00F 0F  1    1    0   1   0    0    1    C9 
Oct 11 15:35:05 elm3b76 kernel:  15 00F 0F  1    1    0   1   0    0    1    D1 
Oct 11 15:35:05 elm3b76 kernel:  16 00F 0F  1    1    0   1   0    0    1    D9 
Oct 11 15:35:05 elm3b76 kernel:  17 00F 0F  1    1    0   1   0    0    1    E1 

You might have to tweak syslog.conf to log the debug messages. And 
possibly increase LOG_BUF_LEN in kernel/printk.c to something sensible
(63356?)

M.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: P4 SMP load balancing
@ 2001-10-12 18:38 Manfred Spraul
  2001-10-12 21:38 ` Martin J. Bligh
  0 siblings, 1 reply; 5+ messages in thread
From: Manfred Spraul @ 2001-10-12 18:38 UTC (permalink / raw)
  To: Martin J. Bligh, linux-kernel, Sean Cavanaugh

 
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

> > ovendev:~# cat /proc/interrupts 
> >            CPU0       CPU1       
> >   0:    6348212          0    IO-APIC-edge  timer
> >   1:          2          0    IO-APIC-edge  keyboard
> >   2:          0          0          XT-PIC  cascade
> >   8:          1          0    IO-APIC-edge  rtc
> >   9:          0          0    IO-APIC-edge  acpi
> >  16:      92620          0   IO-APIC-level  eth0
> >  18:       5085          0   IO-APIC-level  aic7xxx, aic7xxx
> > NMI:          0          0 
> > LOC:    6348388    6348427 
> > ERR:          0
> > MIS:          0
> 
> I don't think this should happen. In the event of both procs having equal 
> priority (linux never changes them, so they always do), we should fall back 
> to the arbitration priority of the lapic. Whether you have 1 or 2 I/O apics
> working shouldn't make a difference. 

The P 4 has a new apic, and lowest priority delivery doesn't work
anymore.

<<<<<<< Chapter 7.6.10 of 24547202.pdf
In operating systems that use the lowest priority interrupt delivery
mode
but do not update the TPR, the TPR information saved in the chipset will
potentially cause the interrupt to be always delivered to the same
processor from the logical set. This behavior is functionally backward
compatible with the P6 family processor but may result in unexpected
performance implications.
<<<<<<< (search for 245472 on google for the pdf file)


--
	Manfred

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: P4 SMP load balancing
  2001-10-12 18:38 P4 SMP load balancing Manfred Spraul
@ 2001-10-12 21:38 ` Martin J. Bligh
  0 siblings, 0 replies; 5+ messages in thread
From: Martin J. Bligh @ 2001-10-12 21:38 UTC (permalink / raw)
  To: Manfred Spraul, linux-kernel, Sean Cavanaugh

>> > ovendev:~# cat /proc/interrupts 
>> >            CPU0       CPU1       
>> >   0:    6348212          0    IO-APIC-edge  timer
>> >   1:          2          0    IO-APIC-edge  keyboard
>> >   2:          0          0          XT-PIC  cascade
>> >   8:          1          0    IO-APIC-edge  rtc
>> >   9:          0          0    IO-APIC-edge  acpi
>> >  16:      92620          0   IO-APIC-level  eth0
>> >  18:       5085          0   IO-APIC-level  aic7xxx, aic7xxx
>> > NMI:          0          0 
>> > LOC:    6348388    6348427 
>> > ERR:          0
>> > MIS:          0
>> 
>> I don't think this should happen. In the event of both procs having equal 
>> priority (linux never changes them, so they always do), we should fall back 
>> to the arbitration priority of the lapic. Whether you have 1 or 2 I/O apics
>> working shouldn't make a difference. 
> 
> The P 4 has a new apic, and lowest priority delivery doesn't work
> anymore.
> 
> <<<<<<< Chapter 7.6.10 of 24547202.pdf
> In operating systems that use the lowest priority interrupt delivery
> mode
> but do not update the TPR, the TPR information saved in the chipset will
> potentially cause the interrupt to be always delivered to the same
> processor from the logical set. This behavior is functionally backward
> compatible with the P6 family processor but may result in unexpected
> performance implications.
> <<<<<<< (search for 245472 on google for the pdf file)

Ick.  Thanks for pointing this out ... will go read the P4 docs closer.

Someone here has patches to set the TPR properly, but they weren't
giving the performance gain we'd hoped for. In light of this, they'd
probably help out much more on the P4. I'll see if I can persuade them 
to publish ...

M.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: P4 SMP load balancing
  2001-10-12 17:59 ` Martin J. Bligh
@ 2001-10-14  9:07   ` Sean Cavanaugh
  0 siblings, 0 replies; 5+ messages in thread
From: Sean Cavanaugh @ 2001-10-14  9:07 UTC (permalink / raw)
  To: 'Martin J. Bligh'; +Cc: linux-kernel

> From: Martin J. Bligh [mailto:Martin.Bligh@us.ibm.com] 
> Sent: Friday, October 12, 2001 1:00 PM
> To: Sean Cavanaugh
> Cc: linux-kernel@vger.kernel.org
> Subject: Re: P4 SMP load balancing
> 
> 
> Which isn't perfectly balanced, but it looks a damned sight better
than yours does ;-) Do you have something in the log that looks like
this?
> 
> Oct 11 15:35:04 elm3b76 kernel: IO APIC #13...... 
> Oct 11 15:35:04 elm3b76 kernel: .... register #00: 0D000000 
> Oct 11 15:35:04 elm3b76 kernel: .......    : physical APIC id: 0D 
> Oct 11 15:35:04 elm3b76 kernel: .... register #01: 00170011 
> Oct 11 15:35:04 elm3b76 kernel: .......     : max redirection entries:
0017 
> Oct 11 15:35:04 elm3b76 kernel: .......     : IO APIC version: 0011 
> Oct 11 15:35:04 elm3b76 kernel: .... register #02: 00000000 
> Oct 11 15:35:04 elm3b76 kernel: .......     : arbitration: 00 
> Oct 11 15:35:04 elm3b76 kernel: .... IRQ redirection table: 
> Oct 11 15:35:04 elm3b76 kernel:  NR Log Phy Mask Trig IRR Pol Stat
Dest Deli Vect:    
> Oct 11 15:35:04 elm3b76 kernel:  00 000 00  1    0    0   0   0    0
0    00 
> Oct 11 15:35:05 elm3b76 kernel:  01 00F 0F  0    0    0   0   0    0
1    39 

<snip>



Relevent:  APIC info from dmesg (with some lead-in/lead-out):


Calibrating delay loop... 3368.55 BogoMIPS
CPU: Before vendor init, caps: 3febfbff 00000000 00000000, vendor = 0
CPU: L1 I cache: 12K, L1 D cache: 8K
CPU: L2 cache: 256K
CPU: After vendor init, caps: 3febfbff 00000000 00000000 00000000
Intel machine check reporting enabled on CPU#1.
CPU:     After generic, caps: 3febfbff 00000000 00000000 00000000
CPU:             Common caps: 3febfbff 00000000 00000000 00000000
CPU1: Intel(R) Xeon(TM) CPU 1700MHz stepping 0a
Total of 2 processors activated (6723.99 BogoMIPS).
ENABLING IO-APIC IRQs
...changing IO-APIC physical APIC ID to 2 ... ok.
init IO_APIC IRQs
 IO-APIC (apicid-pin) 2-0, 2-5, 2-10, 2-11, 2-12, 2-17, 2-20, 2-21, 2-22
not connected.
..TIMER: vector=0x31 pin1=2 pin2=0
number of MP IRQ sources: 18.
number of IO-APIC #2 registers: 24.
testing the IO APIC.......................

IO APIC #2......
.... register #00: 02000000
.......    : physical APIC id: 02
.... register #01: 00178020
.......     : max redirection entries: 0017
.......     : PRQ implemented: 1
.......     : IO APIC version: 0020
 WARNING: unexpected IO-APIC, please mail
          to linux-smp@vger.kernel.org
.... register #02: 00000000
.......     : arbitration: 00
.... IRQ redirection table:
 NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:   
 00 000 00  1    0    0   0   0    0    0    00
 01 003 03  0    0    0   0   0    1    1    39
 02 003 03  0    0    0   0   0    1    1    31
 03 003 03  0    0    0   0   0    1    1    41
 04 003 03  0    0    0   0   0    1    1    49
 05 000 00  1    0    0   0   0    0    0    00
 06 003 03  0    0    0   0   0    1    1    51
 07 003 03  0    0    0   0   0    1    1    59
 08 003 03  0    0    0   0   0    1    1    61
 09 003 03  0    0    0   0   0    1    1    69
 0a 000 00  1    0    0   0   0    0    0    00
 0b 000 00  1    0    0   0   0    0    0    00
 0c 000 00  1    0    0   0   0    0    0    00
 0d 003 03  0    0    0   0   0    1    1    71
 0e 003 03  0    0    0   0   0    1    1    79
 0f 003 03  0    0    0   0   0    1    1    81
 10 003 03  1    1    0   1   0    1    1    89
 11 000 00  1    0    0   0   0    0    0    00
 12 003 03  1    1    0   1   0    1    1    91
 13 003 03  1    1    0   1   0    1    1    99
 14 000 00  1    0    0   0   0    0    0    00
 15 000 00  1    0    0   0   0    0    0    00
 16 000 00  1    0    0   0   0    0    0    00
 17 003 03  1    1    0   1   0    1    1    A1
IRQ to pin mappings:
IRQ0 -> 0:2
IRQ1 -> 0:1
IRQ3 -> 0:3
IRQ4 -> 0:4
IRQ6 -> 0:6
IRQ7 -> 0:7
IRQ8 -> 0:8
IRQ9 -> 0:9
IRQ13 -> 0:13
IRQ14 -> 0:14
IRQ15 -> 0:15
IRQ16 -> 0:16
IRQ18 -> 0:18
IRQ19 -> 0:19
IRQ23 -> 0:23
.................................... done.
Using local APIC timer interrupts.
calibrating APIC timer ...
..... CPU clock speed is 1685.2574 MHz.
..... host bus clock speed is 99.1326 MHz.
cpu: 0, clocks: 991326, slice: 330442
CPU0<T0:991312,T1:660864,D:6,S:330442,C:991326>
cpu: 1, clocks: 991326, slice: 330442
CPU1<T0:991312,T1:330416,D:12,S:330442,C:991326>
checking TSC synchronization across CPUs: passed.
Waiting on wait_init_idle (map = 0x2)
All processors have done init_idle
PCI: PCI BIOS revision 2.10 entry at 0xfb3e0, last bus=4
PCI: Using configuration type 1
PCI: Probing PCI hardware
Unknown bridge resource 0: assuming transparent
Unknown bridge resource 1: assuming transparent
Unknown bridge resource 2: assuming transparent
Unknown bridge resource 2: assuming transparent
Unknown bridge resource 2: assuming transparent
Unknown bridge resource 2: assuming transparent
PCI: Using IRQ router PIIX [8086/2440] at 00:1f.0
PCI->APIC IRQ transform: (B0,I31,P3) -> 19
PCI->APIC IRQ transform: (B0,I31,P1) -> 19
PCI->APIC IRQ transform: (B0,I31,P2) -> 23
PCI->APIC IRQ transform: (B3,I4,P0) -> 18
PCI->APIC IRQ transform: (B3,I4,P1) -> 18
PCI->APIC IRQ transform: (B4,I4,P0) -> 16
PCI->APIC IRQ transform: (B4,I7,P0) -> 16
isapnp: Scanning for PnP cards...



	- Sean


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2001-10-14  9:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-10-12 18:38 P4 SMP load balancing Manfred Spraul
2001-10-12 21:38 ` Martin J. Bligh
  -- strict thread matches above, loose matches on Subject: below --
2001-10-12  9:28 Sean Cavanaugh
2001-10-12 17:59 ` Martin J. Bligh
2001-10-14  9:07   ` Sean Cavanaugh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox