netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Network card IRQ balancing with Intel 5000 series chipsets
@ 2006-12-24  9:34 Robert Iakobashvili
  2006-12-25  9:35 ` Arjan van de Ven
  0 siblings, 1 reply; 21+ messages in thread
From: Robert Iakobashvili @ 2006-12-24  9:34 UTC (permalink / raw)
  To: netdev

Sorry for repeating, now in text mode.

Is there a way to balance IRQs from a network card among Intel CPU cores
with Intel 5000 series chipset?

We tried the Broadcom network card (lspci is below) both in MSI and
io-apic mode, but found that the card interrupt may be moved to
another logical CPU, but not balanced among CPUs/cores.

Is that a policy of Intel chipset, that linux cannot overwrite? Can it
be configured
somewhere and by which tools?

Any clues and directions would be very much appreciated.

--------------------------------------------------------------------------------
CONFIG_IRQ_BALANCE=y
and with the same (2.6.9, patched) kernel irq balancing works properly
with older
Intel and with AMD HW.

#lspci -v
Is there a way to balance IRQs from a network card among Intel CPU cores
with Intel 5000 series chipset?

We tried the Broadcom network card (below lspci) both in MSI and
io-apic mode, but
found that the card interrupt may be moved to another logical CPU, but
not balanced
among CPUs/cores.

Is that a policy of Intel chipset, that linux cannot overwrite? Can it
be configured
somewhere and by which tools?

Any clues and directions would be very much appreciated.

--------------------------------------------------------------------------------
CONFIG_IRQ_BALANCE=y
and with the same (2.6.9, patched) kernel irq balancing works properly
with older
Intel and with AMD HW.

#lspci -v
00:00.0 Host bridge: Intel Corporation 5000P Chipset Memory Controller
Hub (rev 92)
        Subsystem: Intel Corporation: Unknown device 8086
        Flags: bus master, fast devsel, latency 0, IRQ 169
        Capabilities: [50] Power Management version 2
        Capabilities: [58] Message Signalled Interrupts: 64bit-
Queue=0/1 Enable-
        Capabilities: [6c] Express Root Port (Slot-) IRQ 0
        Capabilities: [100] Advanced Error Reporting

00:02.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express
x4 Port 2 (rev 92) (prog-if 00 [Normal decode])        Flags: bus
master, fast devsel, latency 0
        Bus: primary=00, secondary=1a, subordinate=25, sec-latency=0
        Capabilities: [50] Power Management version 2
        Capabilities: [58] Message Signalled Interrupts: 64bit-
Queue=0/1 Enable-
        Capabilities: [6c] Express Root Port (Slot-) IRQ 0
        Capabilities: [100] Advanced Error Reporting

00:03.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express
x4 Port 3 (rev 92) (prog-if 00 [Normal decode])        Flags: bus
master, fast devsel, latency 0
        Bus: primary=00, secondary=04, subordinate=04, sec-latency=0
        I/O behind bridge: 00005000-00005fff
        Memory behind bridge: c8000000-c9ffffff
        Prefetchable memory behind bridge: 00000000c7f00000-00000000c7f00000
        Capabilities: [50] Power Management version 2
        Capabilities: [58] Message Signalled Interrupts: 64bit-
Queue=0/1 Enable-
        Capabilities: [6c] Express Root Port (Slot-) IRQ 0
        Capabilities: [100] Advanced Error Reporting

00:04.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express
x8 Port 4-5 (rev 92) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0
        Bus: primary=00, secondary=10, subordinate=10, sec-latency=0
        I/O behind bridge: 00006000-0000ffff
        Capabilities: [50] Power Management version 2
        Capabilities: [58] Message Signalled Interrupts: 64bit-
Queue=0/1 Enable-
        Capabilities: [6c] Express Root Port (Slot-) IRQ 0
        Capabilities: [100] Advanced Error Reporting

00:05.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express
x4 Port 5 (rev 92) (prog-if 00 [Normal decode])        Flags: fast
devsel
        Bus: primary=00, secondary=45, subordinate=45, sec-latency=0
        Capabilities: [50] Power Management version 2
        Capabilities: [58] Message Signalled Interrupts: 64bit-
Queue=0/1 Enable-
        Capabilities: [6c] Express Root Port (Slot-) IRQ 0
        Capabilities: [100] Advanced Error Reporting

00:06.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express
x8 Port 6-7 (rev 92) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0
        Bus: primary=00, secondary=07, subordinate=07, sec-latency=0
        Capabilities: [50] Power Management version 2
        Capabilities: [58] Message Signalled Interrupts: 64bit-
Queue=0/1 Enable-
        Capabilities: [6c] Express Root Port (Slot-) IRQ 0
        Capabilities: [100] Advanced Error Reporting

00:07.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express
x4 Port 7 (rev 92) (prog-if 00 [Normal decode])        Flags: fast
devsel
        Bus: primary=00, secondary=44, subordinate=44, sec-latency=0
        Capabilities: [50] Power Management version 2
        Capabilities: [58] Message Signalled Interrupts: 64bit-
Queue=0/1 Enable-
        Capabilities: [6c] Express Root Port (Slot-) IRQ 0
        Capabilities: [100] Advanced Error Reporting

00:08.0 System peripheral: Intel Corporation 5000 Series Chipset DMA
Engine (rev 92)
        Subsystem: IBM: Unknown device 02dd
        Flags: bus master, fast devsel, latency 0, IRQ 169
        Memory at fe700000 (64-bit, non-prefetchable) [size=1K]
        Capabilities: [50] Power Management version 2
        Capabilities: [58] Message Signalled Interrupts: 64bit-
Queue=0/0 Enable-
        Capabilities: [6c] Express Unknown type IRQ 0

00:10.0 Host bridge: Intel Corporation 5000 Series Chipset Error
Reporting Registers (rev 92)
        Subsystem: IBM: Unknown device 02dd
        Flags: fast devsel

00:10.1 Host bridge: Intel Corporation 5000 Series Chipset Error
Reporting Registers (rev 92)
        Subsystem: Intel Corporation: Unknown device 8086
        Flags: fast devsel

00:10.2 Host bridge: Intel Corporation 5000 Series Chipset Error
Reporting Registers (rev 92)
        Subsystem: Intel Corporation: Unknown device 8086
        Flags: fast devsel

00:11.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved
Registers (rev 92)
        Subsystem: Intel Corporation: Unknown device 8086
        Flags: fast devsel

00:13.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved
Registers (rev 92)
        Subsystem: Intel Corporation: Unknown device 8086
        Flags: fast devsel

00:15.0 Host bridge: Intel Corporation 5000 Series Chipset FBD
Registers (rev 92)
        Subsystem: IBM: Unknown device 02dd
        Flags: fast devsel

00:16.0 Host bridge: Intel Corporation 5000 Series Chipset FBD
Registers (rev 92)
        Subsystem: IBM: Unknown device 02dd
        Flags: fast devsel

00:1c.0 PCI bridge: Intel Corporation 631xESB/632xESB/3100 Chipset PCI
Express Root Port 1 (rev 09) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0
        Bus: primary=00, secondary=02, subordinate=03, sec-latency=0
        Memory behind bridge: cd000000-cfffffff
        Capabilities: [40] Express Root Port (Slot-) IRQ 0
        Capabilities: [80] Message Signalled Interrupts: 64bit-
Queue=0/0 Enable-
        Capabilities: [90] #0d [0000]
        Capabilities: [a0] Power Management version 2
        Capabilities: [100] Virtual Channel
        Capabilities: [180] Unknown (5)

00:1c.1 PCI bridge: Intel Corporation 631xESB/632xESB/3100 Chipset PCI
Express Root Port 2 (rev 09) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0
        Bus: primary=00, secondary=05, subordinate=06, sec-latency=0
        Memory behind bridge: ca000000-ccffffff
        Capabilities: [40] Express Root Port (Slot-) IRQ 0
        Capabilities: [80] Message Signalled Interrupts: 64bit-
Queue=0/0 Enable-
        Capabilities: [90] #0d [0000]
        Capabilities: [a0] Power Management version 2
        Capabilities: [100] Virtual Channel
        Capabilities: [180] Unknown (5)

00:1d.0 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset
UHCI USB Controller #1 (rev 09) (prog-if 00 [UHCI])
        Subsystem: IBM: Unknown device 02dd
        Flags: bus master, medium devsel, latency 0, IRQ 193
        I/O ports at 2200 [size=32]

00:1d.1 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset
UHCI USB Controller #2 (rev 09) (prog-if 00 [UHCI])
        Subsystem: IBM: Unknown device 02dd
        Flags: bus master, medium devsel, latency 0, IRQ 201
        I/O ports at 2600 [size=32]

00:1d.2 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset
UHCI USB Controller #3 (rev 09) (prog-if 00 [UHCI])
        Subsystem: IBM: Unknown device 02dd
        Flags: bus master, medium devsel, latency 0, IRQ 193
        I/O ports at 2a00 [size=32]

00:1d.3 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset
UHCI USB Controller #4 (rev 09) (prog-if 00 [UHCI])
        Subsystem: IBM: Unknown device 02dd
        Flags: bus master, medium devsel, latency 0, IRQ 201
        I/O ports at 2e00 [size=32]

00:1d.7 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset
EHCI USB2 Controller (rev 09) (prog-if 20 [EHCI])
        Subsystem: IBM: Unknown device 02dd
        Flags: bus master, medium devsel, latency 0, IRQ 193
        Memory at f9000000 (32-bit, non-prefetchable) [size=1K]
        Capabilities: [50] Power Management version 2
        Capabilities: [58] Debug port

00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d9)
(prog-if 01 [Subtractive decode])
        Flags: bus master, fast devsel, latency 0
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        I/O behind bridge: 00003000-00004fff
        Memory behind bridge: de000000-dfffffff
        Prefetchable memory behind bridge: 00000000d0000000-00000000ddf00000
        Capabilities: [50] #0d [0000]

00:1f.0 ISA bridge: Intel Corporation 631xESB/632xESB/3100 Chipset LPC
Interface Controller (rev 09)
        Flags: bus master, medium devsel, latency 0

00:1f.2 IDE interface: Intel Corporation 631xESB/632xESB/3100 Chipset
SATA Storage Controller IDE (rev 09) (prog-if 80 [Master])
        Subsystem: IBM: Unknown device 02dd
        Flags: bus master, 66Mhz, medium devsel, latency 0, IRQ 185
        I/O ports at <ignored>
        I/O ports at <ignored>
        I/O ports at <ignored>
        I/O ports at <ignored>
        I/O ports at 0480 [size=16]
        Capabilities: [70] Power Management version 2

00:1f.3 SMBus: Intel Corporation 631xESB/632xESB/3100 Chipset SMBus
Controller (rev 09)
        Subsystem: IBM: Unknown device 02dd
        Flags: medium devsel, IRQ 185
        I/O ports at 0440 [size=32]

01:06.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev
02) (prog-if 00 [VGA])
        Subsystem: IBM: Unknown device 0305
        Flags: bus master, stepping, medium devsel, latency 64, IRQ 201
        Memory at d0000000 (32-bit, prefetchable) [size=128M]
        I/O ports at 4000 [size=256]
        Memory at dfff0000 (32-bit, non-prefetchable) [size=64K]
        Capabilities: [50] Power Management version 2

02:00.0 PCI bridge: Broadcom: Unknown device 0103 (rev c2) (prog-if 00
[Normal decode])
        Flags: bus master, fast devsel, latency 0
        Bus: primary=02, secondary=03, subordinate=03, sec-latency=0
        Memory behind bridge: cd000000-cfffffff
        Capabilities: [60] Express PCI/PCI-X Bridge IRQ 0
        Capabilities: [90] PCI-X bridge device.
        Capabilities: [b0] Power Management version 2
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [14c] Power Budgeting
        Capabilities: [160] Device Serial Number 23-ba-18-fe-ff-5e-14-00

03:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708
Gigabit Ethernet (rev 11)
        Subsystem: IBM: Unknown device 0342
        Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 169
        Memory at ce000000 (64-bit, non-prefetchable) [size=32M]
        Capabilities: [40] PCI-X non-bridge device.
        Capabilities: [48] Power Management version 2
        Capabilities: [50] Vital Product Data
        Capabilities: [58] Message Signalled Interrupts: 64bit+
Queue=0/0 Enable-

04:00.0 RAID bus controller: Adaptec AAC-RAID (Rocket) (rev 02)
        Subsystem: IBM ServeRAID 8k/8k-l8
        Flags: bus master, fast devsel, latency 0, IRQ 209
        Memory at c9e00000 (64-bit, non-prefetchable) [size=2M]
        Memory at c7fe0000 (64-bit, prefetchable) [size=128K]
        I/O ports at 5000 [size=256]
        Capabilities: [40] Express Endpoint IRQ 0
        Capabilities: [e0] Message Signalled Interrupts: 64bit+
Queue=0/2 Enable-
        Capabilities: [100] Advanced Error Reporting

05:00.0 PCI bridge: Broadcom: Unknown device 0103 (rev c2) (prog-if 00
[Normal decode])
        Flags: bus master, fast devsel, latency 0
        Bus: primary=05, secondary=06, subordinate=06, sec-latency=0
        Memory behind bridge: ca000000-ccffffff
        Capabilities: [60] Express PCI/PCI-X Bridge IRQ 0
        Capabilities: [90] PCI-X bridge device.
        Capabilities: [b0] Power Management version 2
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [14c] Power Budgeting
        Capabilities: [160] Device Serial Number 25-ba-18-fe-ff-5e-14-00

06:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708
Gigabit Ethernet (rev 11)
        Subsystem: IBM: Unknown device 0342
        Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 209
        Memory at ca000000 (64-bit, non-prefetchable) [size=32M]
        Capabilities: [40] PCI-X non-bridge device.
        Capabilities: [48] Power Management version 2
        Capabilities: [50] Vital Product Data
        Capabilities: [58] Message Signalled Interrupts: 64bit+
Queue=0/0 Enable-

1a:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express
Upstream Port (rev 01) (prog-if 00 [Normal decode])        Flags: bus
master, fast devsel, latency 0
        Bus: primary=1a, secondary=1b, subordinate=24, sec-latency=0
        Capabilities: [44] Express Upstream Port IRQ 0
        Capabilities: [70] Power Management version 2
        Capabilities: [80] #0d [0000]
        Capabilities: [100] Advanced Error Reporting

1a:00.3 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express to
PCI-X Bridge (rev 01) (prog-if 00 [Normal decode])
        Flags: fast devsel
        Bus: primary=1a, secondary=25, subordinate=25, sec-latency=64
        Capabilities: [44] Express PCI/PCI-X Bridge IRQ 0
        Capabilities: [6c] Power Management version 2
        Capabilities: [80] #0d [0000]
        Capabilities: [d8] PCI-X bridge device.
        Capabilities: [100] Advanced Error Reporting

1b:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express
Downstream Port E1 (rev 01) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0
        Bus: primary=1b, secondary=1c, subordinate=1c, sec-latency=0
        Capabilities: [44] Express Downstream Port (Slot-) IRQ 0
        Capabilities: [60] Message Signalled Interrupts: 64bit+
Queue=0/0 Enable-
        Capabilities: [70] Power Management version 2
        Capabilities: [80] #0d [0000]
        Capabilities: [100] Advanced Error Reporting

1b:01.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express
Downstream Port E2 (rev 01) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0
        Bus: primary=1b, secondary=24, subordinate=24, sec-latency=0
        Capabilities: [44] Express Downstream Port (Slot-) IRQ 0
        Capabilities: [60] Message Signalled Interrupts: 64bit+
Queue=0/0 Enable-
        Capabilities: [70] Power Management version 2
        Capabilities: [80] #0d [0000]
        Capabilities: [100] Advanced Error Reporting

-- 
Sincerely,
Robert Iakobashvili,
coroberti %x40 gmail %x2e com
...................................................................
Navigare necesse est, vivere non est necesse
...................................................................
http://sourceforge.net/projects/curl-loader

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Network card IRQ balancing with Intel 5000 series chipsets
  2006-12-24  9:34 Network card IRQ balancing with Intel 5000 series chipsets Robert Iakobashvili
@ 2006-12-25  9:35 ` Arjan van de Ven
  2006-12-25 11:26   ` Robert Iakobashvili
  0 siblings, 1 reply; 21+ messages in thread
From: Arjan van de Ven @ 2006-12-25  9:35 UTC (permalink / raw)
  To: Robert Iakobashvili; +Cc: netdev

On Sun, 2006-12-24 at 11:34 +0200, Robert Iakobashvili wrote:
> Sorry for repeating, now in text mode.
> 
> Is there a way to balance IRQs from a network card among Intel CPU cores
> with Intel 5000 series chipset?
> 
> We tried the Broadcom network card (lspci is below) both in MSI and
> io-apic mode, but found that the card interrupt may be moved to
> another logical CPU, but not balanced among CPUs/cores.
> 
> Is that a policy of Intel chipset, that linux cannot overwrite? Can it
> be configured
> somewhere and by which tools?

first of all please don't use the in-kernel irqbalancer, use the
userspace one from www.irqbalance.org instead... 

Am I understanding you correctly that you want to spread the load of the
networking IRQ roughly equally over 2 cpus (or cores or ..)?
If so, that is very very suboptimal, especially for networking (since
suddenly a lot of packet processing gets to deal with out of order
receives and cross-cpu reassembly).

As for the chipset capability; the behavior of the chipset you have is
to prefer the first cpu of the programmed affinity mask. There are some
ways to play with that but doing it on the granularity you seem to want
is both not practical and too expensive anyway....

Greetings,
    Arjan van de Ven

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Network card IRQ balancing with Intel 5000 series chipsets
  2006-12-25  9:35 ` Arjan van de Ven
@ 2006-12-25 11:26   ` Robert Iakobashvili
  2006-12-25 11:34     ` Arjan van de Ven
  0 siblings, 1 reply; 21+ messages in thread
From: Robert Iakobashvili @ 2006-12-25 11:26 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: netdev

Hi Arjan,

On 12/25/06, Arjan van de Ven <arjan@infradead.org> wrote:
> On Sun, 2006-12-24 at 11:34 +0200, Robert Iakobashvili wrote:
> > Sorry for repeating, now in text mode.
> >
> > Is there a way to balance IRQs from a network card among Intel CPU cores
> > with Intel 5000 series chipset?
> >
> > We tried the Broadcom network card (lspci is below) both in MSI and
> > io-apic mode, but found that the card interrupt may be moved to
> > another logical CPU, but not balanced among CPUs/cores.
> >
> > Is that a policy of Intel chipset, that linux cannot overwrite? Can it
> > be configured
> > somewhere and by which tools?
>
> first of all please don't use the in-kernel irqbalancer, use the
> userspace one from www.irqbalance.org instead...

Thanks, it was also attempted, but the result is not much different,
because, the problem seems to be in the chipset.

Kernel explicitly disables interrupt affinity for such Intel chipsets in
drivers/pci/quirk.c, unless BIOS enables such feature.
The question is not very much in linux, but rather in HW-area,
namely, Intel 5000 series chipset tuning for networking.


> Am I understanding you correctly that you want to spread the load of the
> networking IRQ roughly equally over 2 cpus (or cores or ..)?

Yes, 4 cores.

> If so, that is very very suboptimal, especially for networking (since
> suddenly a lot of packet processing gets to deal with out of order
> receives and cross-cpu reassembly).

Agree. Unfortunately, we have a flow of small RTP packets with heavy
processing and both Rx and Tx component on a single network card.
The application is not too much sensitive to the out of order, etc.
Thus, there 3 cores are actually doing nothing, whereas the CPU0
is overloaded, preventing system CPU scaling.

>
> As for the chipset capability; the behavior of the chipset you have is
> to prefer the first cpu of the programmed affinity mask. There are some
> ways to play with that but doing it on the granularity you seem to want
> is both not practical and too expensive anyway....

Agree. Particularly, for AMD NUMA I have used cpu affinity of a single card
to a single CPU. Unfortunately, our case now is the only network card with a
huge load of small RTP packets with both Rx and Tx.

Agree, that providing CPU affinity for a network interrupt is a rather
reasonable default.
However, should a chipset manufacture take from us the very freedom of
tuning, freedom of choice?

Referring to the paper below, it should be some option to balance CPU among
several CPU, which I fail to find.
http://download.intel.com/design/chipsets/applnots/31433702.pdf

> if you want to mail me at work (you don't), use arjan (at) linux.intel.com
> Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

Thanks. I will look into this site.


-- 
Sincerely,
Robert Iakobashvili,
coroberti %x40 gmail %x2e com
...................................................................
Navigare necesse est, vivere non est necesse
...................................................................
http://sourceforge.net/projects/curl-loader
A powerful open-source HTTP/S, FTP/S traffic
generating, loading and testing tool.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Network card IRQ balancing with Intel 5000 series chipsets
  2006-12-25 11:26   ` Robert Iakobashvili
@ 2006-12-25 11:34     ` Arjan van de Ven
  2006-12-25 12:54       ` Robert Iakobashvili
  0 siblings, 1 reply; 21+ messages in thread
From: Arjan van de Ven @ 2006-12-25 11:34 UTC (permalink / raw)
  To: Robert Iakobashvili; +Cc: netdev

On Mon, 2006-12-25 at 13:26 +0200, Robert Iakobashvili wrote:
> 
> > Am I understanding you correctly that you want to spread the load of the
> > networking IRQ roughly equally over 2 cpus (or cores or ..)?
> 
> Yes, 4 cores.
> 
> > If so, that is very very suboptimal, especially for networking (since
> > suddenly a lot of packet processing gets to deal with out of order
> > receives and cross-cpu reassembly).
> 
> Agree. Unfortunately, we have a flow of small RTP packets with heavy
> processing and both Rx and Tx component on a single network card.
> The application is not too much sensitive to the out of order, etc.
> Thus, there 3 cores are actually doing nothing, whereas the CPU0
> is overloaded, preventing system CPU scaling.

in principle the actual work should still be spread over the cores;
unless you do everything in kernel space that is..

> Agree, that providing CPU affinity for a network interrupt is a rather
> reasonable default.
> However, should a chipset manufacture take from us the very freedom of
> tuning, freedom of choice?

it can still be done using the TPR (Thread Priority Register) of the
APIC. It's just... not there in Linux (other OSes do use this).

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Network card IRQ balancing with Intel 5000 series chipsets
  2006-12-25 11:34     ` Arjan van de Ven
@ 2006-12-25 12:54       ` Robert Iakobashvili
  2006-12-26 18:44         ` jamal
  0 siblings, 1 reply; 21+ messages in thread
From: Robert Iakobashvili @ 2006-12-25 12:54 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: netdev

Arjan,

On 12/25/06, Arjan van de Ven <arjan@infradead.org> wrote:
> On Mon, 2006-12-25 at 13:26 +0200, Robert Iakobashvili wrote:
> >
> > > Am I understanding you correctly that you want to spread the load of the
> > > networking IRQ roughly equally over 2 cpus (or cores or ..)?
> >
> > Yes, 4 cores.
> >
> > > If so, that is very very suboptimal, especially for networking (since
> > > suddenly a lot of packet processing gets to deal with out of order
> > > receives and cross-cpu reassembly).
> >
> > Agree. Unfortunately, we have a flow of small RTP packets with heavy
> > processing and both Rx and Tx component on a single network card.
> > The application is not too much sensitive to the out of order, etc.
> > Thus, there 3 cores are actually doing nothing, whereas the CPU0
> > is overloaded, preventing system CPU scaling.
>
> in principle the actual work should still be spread over the cores;
> unless you do everything in kernel space that is..

This is the case. The processing is in kernel.

> > Agree, that providing CPU affinity for a network interrupt is a rather
> > reasonable default.
> > However, should a chipset manufacture take from us the very freedom of
> > tuning, freedom of choice?
>
> it can still be done using the TPR (Thread Priority Register) of the
> APIC. It's just... not there in Linux (other OSes do use this).

Interesting.
Have you any specific pointers for doing it (beyond Internet search)?
Your input would be very much appreciated.
Thank you.


-- 
Sincerely,
Robert Iakobashvili,
coroberti %x40 gmail %x2e com
...................................................................
Navigare necesse est, vivere non est necesse
...................................................................
http://sourceforge.net/projects/curl-loader
A powerful open-source HTTP/S, FTP/S traffic
generating, loading and testing tool.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Network card IRQ balancing with Intel 5000 series chipsets
  2006-12-25 12:54       ` Robert Iakobashvili
@ 2006-12-26 18:44         ` jamal
  2006-12-26 19:51           ` Robert Iakobashvili
  2006-12-26 22:06           ` Arjan van de Ven
  0 siblings, 2 replies; 21+ messages in thread
From: jamal @ 2006-12-26 18:44 UTC (permalink / raw)
  To: Robert Iakobashvili; +Cc: Arjan van de Ven, netdev


If you compile in PCI-E support you should have more control of the
MSI-X, no? I would tie the MSI to a specific processor statically; my
past experiences with any form of interupt balancing with network loads
has been horrible.

cheers,
jamal

On Mon, 2006-25-12 at 14:54 +0200, Robert Iakobashvili wrote:
> Arjan,
> 
> On 12/25/06, Arjan van de Ven <arjan@infradead.org> wrote:
> > On Mon, 2006-12-25 at 13:26 +0200, Robert Iakobashvili wrote:
> > >
> >
> > it can still be done using the TPR (Thread Priority Register) of the
> > APIC. It's just... not there in Linux (other OSes do use this).
> 
> Interesting.
> Have you any specific pointers for doing it (beyond Internet search)?
> Your input would be very much appreciated.
> Thank you.
> 
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Network card IRQ balancing with Intel 5000 series chipsets
  2006-12-26 18:44         ` jamal
@ 2006-12-26 19:51           ` Robert Iakobashvili
  2006-12-26 22:11             ` jamal
  2006-12-26 22:06           ` Arjan van de Ven
  1 sibling, 1 reply; 21+ messages in thread
From: Robert Iakobashvili @ 2006-12-26 19:51 UTC (permalink / raw)
  To: hadi; +Cc: Arjan van de Ven, netdev

On 12/26/06, jamal <hadi@cyberus.ca> wrote:
>
> If you compile in PCI-E support you should have more control of the
> MSI-X, no? I would tie the MSI to a specific processor statically; my
> past experiences with any form of interupt balancing with network loads
> has been horrible.
>
> cheers,
> jamal

Thanks for the direction.

In meanwhile I have removed all userland processes from CPU0,
that handles network card interrupts and all packet-processing (kernel-space).

Still, it should be some way of CPU-scaling; even for the case of the
only network
card.


> > >
> > > it can still be done using the TPR (Thread Priority Register) of the
> > > APIC. It's just... not there in Linux (other OSes do use this).
> >
> > Have you any specific pointers for doing it (beyond Internet search)?
> > Your input would be very much appreciated.



-- 
Sincerely,
Robert Iakobashvili,
coroberti %x40 gmail %x2e com
...................................................................
Navigare necesse est, vivere non est necesse
...................................................................
http://sourceforge.net/projects/curl-loader
A powerful open-source HTTP/S, FTP/S traffic
generating, loading and testing tool.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Network card IRQ balancing with Intel 5000 series chipsets
  2006-12-26 18:44         ` jamal
  2006-12-26 19:51           ` Robert Iakobashvili
@ 2006-12-26 22:06           ` Arjan van de Ven
  2006-12-26 22:46             ` jamal
  1 sibling, 1 reply; 21+ messages in thread
From: Arjan van de Ven @ 2006-12-26 22:06 UTC (permalink / raw)
  To: hadi; +Cc: Robert Iakobashvili, netdev

On Tue, 2006-12-26 at 13:44 -0500, jamal wrote:
> If you compile in PCI-E support you should have more control of the
> MSI-X, no? I would tie the MSI to a specific processor statically; my
> past experiences with any form of interupt balancing with network loads
> has been horrible.


it is; that's why irqbalance tries really hard (with a few very rare
exceptions) to keep networking irqs to the same cpu all the time...

but if your use case is kernel level packet processing of < MTU packets
then I can see why you would at some point would run out of cpu
power ... esp on multicore where you share the cache between cores you
probably can do a little better for that very specific use case.

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Network card IRQ balancing with Intel 5000 series chipsets
  2006-12-26 19:51           ` Robert Iakobashvili
@ 2006-12-26 22:11             ` jamal
  2007-01-02 17:56               ` Rick Jones
  0 siblings, 1 reply; 21+ messages in thread
From: jamal @ 2006-12-26 22:11 UTC (permalink / raw)
  To: Robert Iakobashvili; +Cc: Arjan van de Ven, netdev

On Tue, 2006-26-12 at 21:51 +0200, Robert Iakobashvili wrote:

BTW, turn on PCI-E on in the kernel build and do cat /proc/interupts to
see what i mean.

> In meanwhile I have removed all userland processes from CPU0,
> that handles network card interrupts and all packet-processing (kernel-space).
> 
> Still, it should be some way of CPU-scaling; even for the case of the
> only network card.

The best way to achieve such balancing is to have the network card help
and essentially be able to select the CPU to notify while at the same
time considering:
a) avoiding any packet reordering - which restricts a flow to be
processed to a single CPU at least within a timeframe
b) be per-CPU-load-aware - which means to busy out only CPUs which are
less utilized

Various such schemes have been discussed here but no vendor is making
such nics today (search Daves Blog - he did discuss this at one point or
other).


cheers,
jamal


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Network card IRQ balancing with Intel 5000 series chipsets
  2006-12-26 22:06           ` Arjan van de Ven
@ 2006-12-26 22:46             ` jamal
  2006-12-27  0:28               ` Arjan van de Ven
  0 siblings, 1 reply; 21+ messages in thread
From: jamal @ 2006-12-26 22:46 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Robert Iakobashvili, netdev

On Tue, 2006-26-12 at 23:06 +0100, Arjan van de Ven wrote:

> it is; that's why irqbalance tries really hard (with a few very rare
> exceptions) to keep networking irqs to the same cpu all the time...
> 

The problem with irqbalance when i last used it is it doesnt take into
consideration CPU utilization. 
With NAPI, if i have a few interupts it likely implies i have a huge
network load (and therefore CPU use) and would be much more happier if
you didnt start moving more interupt load to that already loaded CPU....
So if you start considering CPU load sampled over a period of time, you
could make some progress. 

> but if your use case is kernel level packet processing of < MTU packets
> then I can see why you would at some point would run out of cpu
> power ... 

Of course, otherwise there would be not much value in "balancing" ..

Note < MTU sized packets is not unusual for firewall/router middle boxen
and theres plenty of those out there. But these days for VOIP endpoints
(RTP and SIP) which may process such packets in user space (and handle
thousands of such flows).
Additional note: the average packet size on the internet today (and for
many years) is way below your standard ethernet MTU of 1500 bytes.
 
> esp on multicore where you share the cache between cores you
> probably can do a little better for that very specific use case.

Indeed - thats why i proposed to tie the IRQs statically. Modern
machines have much larger caches, so static config is less of a
nuisance.

cheers,
jamal


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Network card IRQ balancing with Intel 5000 series chipsets
  2006-12-26 22:46             ` jamal
@ 2006-12-27  0:28               ` Arjan van de Ven
  2006-12-27  3:47                 ` jamal
  2007-01-02 17:57                 ` Rick Jones
  0 siblings, 2 replies; 21+ messages in thread
From: Arjan van de Ven @ 2006-12-27  0:28 UTC (permalink / raw)
  To: hadi; +Cc: Robert Iakobashvili, netdev

On Tue, 2006-12-26 at 17:46 -0500, jamal wrote:
> On Tue, 2006-26-12 at 23:06 +0100, Arjan van de Ven wrote:
> 
> > it is; that's why irqbalance tries really hard (with a few very rare
> > exceptions) to keep networking irqs to the same cpu all the time...
> > 
> 
> The problem with irqbalance when i last used it is it doesnt take into
> consideration CPU utilization. 

then you used an old ancient version....

> With NAPI, if i have a few interupts it likely implies i have a huge
> network load (and therefore CPU use) and would be much more happier if
> you didnt start moving more interupt load to that already loaded CPU....

current irqbalance accounts for napi by using the number of packets as
indicator for load, not the number of interrupts. (for network
interrupts obviously)


-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Network card IRQ balancing with Intel 5000 series chipsets
  2006-12-27  0:28               ` Arjan van de Ven
@ 2006-12-27  3:47                 ` jamal
  2006-12-27  7:09                   ` Robert Iakobashvili
  2006-12-27 13:08                   ` Arjan van de Ven
  2007-01-02 17:57                 ` Rick Jones
  1 sibling, 2 replies; 21+ messages in thread
From: jamal @ 2006-12-27  3:47 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Robert Iakobashvili, netdev

On Wed, 2006-27-12 at 01:28 +0100, Arjan van de Ven wrote:

> current irqbalance accounts for napi by using the number of packets as
> indicator for load, not the number of interrupts. (for network
> interrupts obviously)
> 

Sounds a lot more promising.
Although still insufficient in certain cases. All flows are not equal; as an
example, an IPSEC flow with 1000 packets bound to one CPU  will likely
utilize more cycles than 5000 packets that are being plain forwarded on
another CPU.

cheers,
jamal


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Network card IRQ balancing with Intel 5000 series chipsets
  2006-12-27  3:47                 ` jamal
@ 2006-12-27  7:09                   ` Robert Iakobashvili
  2006-12-27 14:31                     ` jamal
  2006-12-27 13:08                   ` Arjan van de Ven
  1 sibling, 1 reply; 21+ messages in thread
From: Robert Iakobashvili @ 2006-12-27  7:09 UTC (permalink / raw)
  To: hadi; +Cc: Arjan van de Ven, netdev

On 12/27/06, jamal <hadi@cyberus.ca> wrote:
> On Wed, 2006-27-12 at 01:28 +0100, Arjan van de Ven wrote:
>
> > current irqbalance accounts for napi by using the number of packets as
> > indicator for load, not the number of interrupts. (for network
> > interrupts obviously)
> >
>
> Sounds a lot more promising.
> Although still insufficient in certain cases. All flows are not equal; as an
> example, an IPSEC flow with 1000 packets bound to one CPU  will likely
> utilize more cycles than 5000 packets that are being plain forwarded on
> another CPU.

I do agree with Jamal, that there is a problem here.

My scenario is treatment of RTP packets in kernel space with a single network
card (both Rx and Tx). The default of the Intel 5000 series chipset is
affinity of each
network card to a certain CPU. Currently, neither with irqbalance nor
with kernel
irq-balancing (MSI and io-apic attempted) I do not find a way to
balance that irq.

This is a good design in general to keep a static CPU-affinity for
network card interrupt.
However, what I have is that CPU0 is idle less than 10%, whereas 3
other core are
(2 dual-core CPUs, Intel) doing about nothing.
There is a real problem of CPU scaling with such design. Some day we
can wish to
add a 10Gbps network card and 16 cores/CPUs, but it will not be
helpful to scale.

Probably, some cards have separated Rx and Tx interrupts. Still,
scaling is an issue.

I will look into PCI-E option, thanks Jamal.


-- 
Sincerely,
Robert Iakobashvili,
coroberti %x40 gmail %x2e com
...................................................................
Navigare necesse est, vivere non est necesse
...................................................................
http://sourceforge.net/projects/curl-loader
A powerful open-source HTTP/S, FTP/S traffic
generating, loading and testing tool.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Network card IRQ balancing with Intel 5000 series chipsets
  2006-12-27  3:47                 ` jamal
  2006-12-27  7:09                   ` Robert Iakobashvili
@ 2006-12-27 13:08                   ` Arjan van de Ven
  2006-12-27 14:44                     ` jamal
  1 sibling, 1 reply; 21+ messages in thread
From: Arjan van de Ven @ 2006-12-27 13:08 UTC (permalink / raw)
  To: hadi; +Cc: Robert Iakobashvili, netdev


> Although still insufficient in certain cases. All flows are not equal; as an
> example, an IPSEC flow with 1000 packets bound to one CPU  will likely
> utilize more cycles than 5000 packets that are being plain forwarded on
> another CPU.

sure; however the kernel doesn't provide more accurate information
currently (and I doubt it could even, it's not so easy to figure out
which interface triggered the softirq if 2 interfaces share the cpu, and
then, how much work came from which etc).

also the "amount of work" estimate doesn't need to be accurate to 5
digits to be honest... just number of packets seems to be a quite
reasonable approximation already. (if the kernel starts exporting more
accurate data, irqbalance can easily use it of course)
-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Network card IRQ balancing with Intel 5000 series chipsets
  2006-12-27  7:09                   ` Robert Iakobashvili
@ 2006-12-27 14:31                     ` jamal
  2006-12-29  2:04                       ` Krzysztof Oledzki
  0 siblings, 1 reply; 21+ messages in thread
From: jamal @ 2006-12-27 14:31 UTC (permalink / raw)
  To: Robert Iakobashvili; +Cc: Arjan van de Ven, netdev

On Wed, 2006-27-12 at 09:09 +0200, Robert Iakobashvili wrote:

> 
> My scenario is treatment of RTP packets in kernel space with a single network
> card (both Rx and Tx). The default of the Intel 5000 series chipset is
> affinity of each
> network card to a certain CPU. Currently, neither with irqbalance nor
> with kernel
> irq-balancing (MSI and io-apic attempted) I do not find a way to
> balance that irq.

In the near future, when the NIC vendors wake up[1] because CPU vendors
- including big bad Intel -  are going to be putting out a large number
of hardware threads, you should be able to do more clever things with
such a setup. At the moment, just tie it to a single CPU and have your
other processes that are related running/bound on the other cores so you
can utilize them. OTOH, you say you are only using 30% of the one CPU,
so it may not be a big deal to tie your single nic to on cpu.

cheers,
jamal

[1] If you are able to change the NIC in your setup try looking at
netiron;  email Leonid.Grossman@netiron.com they have a much clever nic
than the e1000. It has multiple DMA receive rings which are selectable
via a little classifier (example you could have RTP going to CPU0 and
rest going to CPU1). The DMA rings could be tied to different
interupts/MSI and with some little work could be made to appear like
several interfaces.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Network card IRQ balancing with Intel 5000 series chipsets
  2006-12-27 13:08                   ` Arjan van de Ven
@ 2006-12-27 14:44                     ` jamal
  2006-12-27 15:06                       ` Arjan van de Ven
  0 siblings, 1 reply; 21+ messages in thread
From: jamal @ 2006-12-27 14:44 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Robert Iakobashvili, netdev

On Wed, 2006-27-12 at 14:08 +0100, Arjan van de Ven wrote:

> sure; however the kernel doesn't provide more accurate information
> currently (and I doubt it could even, it's not so easy to figure out
> which interface triggered the softirq if 2 interfaces share the cpu, and
> then, how much work came from which etc).
> 

If you sample CPU use and in between two samples you are able to know
which nic is tied to which CPU, how much cycles such cpu consumed in
user vs kernel, and how many packets were seen on such nic; then you
should have the info necessary to make a decision, no? Yes, I know it is
a handwave on my part and it is complex but by the same token, I would
suspect each kind of IO derived work (which results in interupts) will
have more inputs that could help you make a proper decision than a mere
glance of the interupts. I understand for example the SCSI subsystem
these days behaves very much like NAPI.
I think one of the failures of the APIC load balancing is a direct
result of not being able to factor in such enviromental factors.

> also the "amount of work" estimate doesn't need to be accurate to 5
> digits to be honest... just number of packets seems to be a quite
> reasonable approximation already. (if the kernel starts exporting more
> accurate data, irqbalance can easily use it of course)

It is certainly much more promising now than before. Most people will
probably have symettrical type of apps, so it should work for them.
For someone like myself i will still not use it because i typically dont
have symettrical loads.

cheers,
jamal
 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Network card IRQ balancing with Intel 5000 series chipsets
  2006-12-27 14:44                     ` jamal
@ 2006-12-27 15:06                       ` Arjan van de Ven
  0 siblings, 0 replies; 21+ messages in thread
From: Arjan van de Ven @ 2006-12-27 15:06 UTC (permalink / raw)
  To: hadi; +Cc: Robert Iakobashvili, netdev

On Wed, 2006-12-27 at 09:44 -0500, jamal wrote:
> On Wed, 2006-27-12 at 14:08 +0100, Arjan van de Ven wrote:
> 
> > sure; however the kernel doesn't provide more accurate information
> > currently (and I doubt it could even, it's not so easy to figure out
> > which interface triggered the softirq if 2 interfaces share the cpu, and
> > then, how much work came from which etc).
> > 
> 
> If you sample CPU use and in between two samples you are able to know
> which nic is tied to which CPU, how much cycles such cpu consumed in
> user vs kernel, and how many packets were seen on such nic; then you
> should have the info necessary to make a decision, no?

Note that getting softirq time itself isn't a problem, that is available
actually. (it's not very accurate but that's another kettle of fish
entirely)

But... No that isn't better than packet counts.
Cases where it simply breaks 
1) you have more nics than cpus, so you HAVE to have sharing 
2) Other loads going on than just pure networking (storage but also
timers and .. and ..)

And neither is even remotely artificial. 

> Yes, I know it is
> a handwave on my part and it is complex but by the same token, I would
> suspect each kind of IO derived work (which results in interupts) will
> have more inputs that could help you make a proper decision than a mere
> glance of the interupts. I understand for example the SCSI subsystem
> these days behaves very much like NAPI.

the difference between scsi and networking is that the work scsi does
per "sector" is orders and orders of magnitude less than what networking
does. SCSI does it's work mostly per "transfer" not per sector, and if
you're busy you tend to get larger transfers as well (megabytes is not
special). SCSI also doesn't look at the payload at all, unlike
networking (where there are those pesky headers every 1500 bytes or less
that the kernel needs to look at :)


> It is certainly much more promising now than before. Most people will
> probably have symettrical type of apps, so it should work for them.
> For someone like myself i will still not use it because i typically dont
> have symettrical loads.

unless you have more nics than you have cpus, irqbalance will do the
right thing anyway (it'll tend to not share or move networking
interrupts). And once you have more nics than you have cpus.... see
above.

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Network card IRQ balancing with Intel 5000 series chipsets
  2006-12-27 14:31                     ` jamal
@ 2006-12-29  2:04                       ` Krzysztof Oledzki
  2006-12-29 17:36                         ` Robert Iakobashvili
  0 siblings, 1 reply; 21+ messages in thread
From: Krzysztof Oledzki @ 2006-12-29  2:04 UTC (permalink / raw)
  To: jamal; +Cc: Robert Iakobashvili, Arjan van de Ven, netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1312 bytes --]



On Wed, 27 Dec 2006, jamal wrote:

> On Wed, 2006-27-12 at 09:09 +0200, Robert Iakobashvili wrote:
>
>>
>> My scenario is treatment of RTP packets in kernel space with a single network
>> card (both Rx and Tx). The default of the Intel 5000 series chipset is
>> affinity of each
>> network card to a certain CPU. Currently, neither with irqbalance nor
>> with kernel
>> irq-balancing (MSI and io-apic attempted) I do not find a way to
>> balance that irq.
>
> In the near future, when the NIC vendors wake up[1] because CPU vendors
> - including big bad Intel -  are going to be putting out a large number
> of hardware threads, you should be able to do more clever things with
> such a setup. At the moment, just tie it to a single CPU and have your
> other processes that are related running/bound on the other cores so you
> can utilize them. OTOH, you say you are only using 30% of the one CPU,
> so it may not be a big deal to tie your single nic to on cpu.

Anyway, it seems that with more advanced firewalls/routers kernel spends 
most of a time in IPSec/crypto code, netfilter conntrack and iptables 
rules/extensions, routing lookups, etc and not in hardware IRQ handler. 
So, it would be nice if this part coulde done by all CPUs.

Best regards,


 			Krzysztof Olędzki

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Network card IRQ balancing with Intel 5000 series chipsets
  2006-12-29  2:04                       ` Krzysztof Oledzki
@ 2006-12-29 17:36                         ` Robert Iakobashvili
  0 siblings, 0 replies; 21+ messages in thread
From: Robert Iakobashvili @ 2006-12-29 17:36 UTC (permalink / raw)
  To: Krzysztof Oledzki; +Cc: jamal, Arjan van de Ven, netdev

Hi Krzysztof,

On 12/29/06, Krzysztof Oledzki <olel@ans.pl> wrote:
>
>
> On Wed, 27 Dec 2006, jamal wrote:
>
> > On Wed, 2006-27-12 at 09:09 +0200, Robert Iakobashvili wrote:
> >
> >>
> >> My scenario is treatment of RTP packets in kernel space with a single network
> >> card (both Rx and Tx). The default of the Intel 5000 series chipset is
> >> affinity of each
> >> network card to a certain CPU. Currently, neither with irqbalance nor
> >> with kernel
> >> irq-balancing (MSI and io-apic attempted) I do not find a way to
> >> balance that irq.
> >
> > In the near future, when the NIC vendors wake up[1] because CPU vendors
> > - including big bad Intel -  are going to be putting out a large number
> > of hardware threads, you should be able to do more clever things with
> > such a setup. At the moment, just tie it to a single CPU and have your
> > other processes that are related running/bound on the other cores so you
> > can utilize them. OTOH, you say you are only using 30% of the one CPU,
> > so it may not be a big deal to tie your single nic to on cpu.
>
> Anyway, it seems that with more advanced firewalls/routers kernel spends
> most of a time in IPSec/crypto code, netfilter conntrack and iptables
> rules/extensions, routing lookups, etc and not in hardware IRQ handler.
> So, it would be nice if this part coulde done by all CPUs.


Do you mean, that it should be an option to configure BH soft-irq handling
to make packet processing balanced among several CPUs?
There is an issue of packet order, isn't it?


-- 
Sincerely,
Robert Iakobashvili,
coroberti %x40 gmail %x2e com
...................................................................
Navigare necesse est, vivere non est necesse
...................................................................
http://sourceforge.net/projects/curl-loader
A powerful open-source HTTP/S, FTP/S traffic
generating, loading and testing tool.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Network card IRQ balancing with Intel 5000 series chipsets
  2006-12-26 22:11             ` jamal
@ 2007-01-02 17:56               ` Rick Jones
  0 siblings, 0 replies; 21+ messages in thread
From: Rick Jones @ 2007-01-02 17:56 UTC (permalink / raw)
  To: hadi; +Cc: Robert Iakobashvili, Arjan van de Ven, netdev

> The best way to achieve such balancing is to have the network card help
> and essentially be able to select the CPU to notify while at the same
> time considering:
> a) avoiding any packet reordering - which restricts a flow to be
> processed to a single CPU at least within a timeframe
> b) be per-CPU-load-aware - which means to busy out only CPUs which are
> less utilized
> 
> Various such schemes have been discussed here but no vendor is making
> such nics today (search Daves Blog - he did discuss this at one point or
> other).

I thought that Neterion were doing something along those lines with 
their Xframe II NICs - perhaps not CPU loading aware, but doing stuff to 
spread the work of different connections across the CPUs.

I would add a:

c) some knowledge of the CPU on which the thread accessing the socket 
for that "connection" will run.  This could be as simple as the CPU on 
which the socket was last accessed.  Having a _NIC_ know this sort of 
thing is somewhat difficult and expensive (perhaps too much so).  If a 
NIC simply hashes the connection idendifiers you then have the issue of 
different connections, each "owned/accessed" by one thread, taking 
different paths through the system.  No issues about reordering, but 
perhaps some on cache lines going hither and yon.

The question boils down to - Should the application (via the scheduler) 
dictate where its connections are processed, or should the connections 
dictate where the application runs?

rick jones


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Network card IRQ balancing with Intel 5000 series chipsets
  2006-12-27  0:28               ` Arjan van de Ven
  2006-12-27  3:47                 ` jamal
@ 2007-01-02 17:57                 ` Rick Jones
  1 sibling, 0 replies; 21+ messages in thread
From: Rick Jones @ 2007-01-02 17:57 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: hadi, Robert Iakobashvili, netdev

>>With NAPI, if i have a few interupts it likely implies i have a huge
>>network load (and therefore CPU use) and would be much more happier if
>>you didnt start moving more interupt load to that already loaded CPU....
> 
> 
> current irqbalance accounts for napi by using the number of packets as
> indicator for load, not the number of interrupts. (for network
> interrupts obviously)

And hopefully some knowledge of NUMA so it doesn't "balance" the 
interrupts of a NIC to some far-off (topology-wise) CPU...

rick jones

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2007-01-02 17:57 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-24  9:34 Network card IRQ balancing with Intel 5000 series chipsets Robert Iakobashvili
2006-12-25  9:35 ` Arjan van de Ven
2006-12-25 11:26   ` Robert Iakobashvili
2006-12-25 11:34     ` Arjan van de Ven
2006-12-25 12:54       ` Robert Iakobashvili
2006-12-26 18:44         ` jamal
2006-12-26 19:51           ` Robert Iakobashvili
2006-12-26 22:11             ` jamal
2007-01-02 17:56               ` Rick Jones
2006-12-26 22:06           ` Arjan van de Ven
2006-12-26 22:46             ` jamal
2006-12-27  0:28               ` Arjan van de Ven
2006-12-27  3:47                 ` jamal
2006-12-27  7:09                   ` Robert Iakobashvili
2006-12-27 14:31                     ` jamal
2006-12-29  2:04                       ` Krzysztof Oledzki
2006-12-29 17:36                         ` Robert Iakobashvili
2006-12-27 13:08                   ` Arjan van de Ven
2006-12-27 14:44                     ` jamal
2006-12-27 15:06                       ` Arjan van de Ven
2007-01-02 17:57                 ` Rick Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).