Bad network performance over 2Gbps

All of lore.kernel.org
 help / color / mirror / Atom feed

* Bad network performance over 2Gbps
@ 2008-04-15 18:06 Anton Titov
  2008-04-15 20:14 ` Willy Tarreau
  2008-04-15 20:15 ` H. Willstrand
  0 siblings, 2 replies; 18+ messages in thread
From: Anton Titov @ 2008-04-15 18:06 UTC (permalink / raw)
  To: Linux Kernel Mailing List

I use Linux for serving a huge amount of static web on few servers. When
network traffic goes above 2Gbit/sec ksoftirqd/5 (not every time 5, but
every time just one) starts using exactly 100% CPU time and packet
packet loss starts preventing traffic from going up. When the network
traffic is lower than 1.9Gbit ksoftirqds use 0% CPU according to top.

Uplink is 6 gigabit Intel cards bonded together using 802.3ad algorithm
with xmit_hash_policy set to layer3+4. On the other side is Cisco 2960
switch. Machine is with two quad core Intel Xeons @2.33GHz.

Here goes a screen snapshot of "top" command. The described behavior
have nothing to do with 13% io-wait. It happens even if it is 0%
io-wait.
http://www.titov.net/misc/top-snap.png

kernel configuration:
http://www.titov.net/misc/config.gz

/proc/interrupts, lspci, dmesg (nothing intresting there), ifconfig,
uname -a:
http://www.titov.net/misc/misc.txt.gz

Is it a Linux bug or some hardware limitation?

Regards,
Anton Titov

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Bad network performance over 2Gbps
  2008-04-15 18:06 Bad network performance over 2Gbps Anton Titov
@ 2008-04-15 20:14 ` Willy Tarreau
  2008-04-15 20:40   ` Kok, Auke
  2008-04-15 20:15 ` H. Willstrand
  1 sibling, 1 reply; 18+ messages in thread
From: Willy Tarreau @ 2008-04-15 20:14 UTC (permalink / raw)
  To: Anton Titov; +Cc: Linux Kernel Mailing List, linux-net

On Tue, Apr 15, 2008 at 09:06:44PM +0300, Anton Titov wrote:
> I use Linux for serving a huge amount of static web on few servers. When
> network traffic goes above 2Gbit/sec ksoftirqd/5 (not every time 5, but
> every time just one) starts using exactly 100% CPU time and packet
> packet loss starts preventing traffic from going up. When the network
> traffic is lower than 1.9Gbit ksoftirqds use 0% CPU according to top.
> 
> Uplink is 6 gigabit Intel cards bonded together using 802.3ad algorithm
> with xmit_hash_policy set to layer3+4. On the other side is Cisco 2960
> switch. Machine is with two quad core Intel Xeons @2.33GHz.
> 
> Here goes a screen snapshot of "top" command. The described behavior
> have nothing to do with 13% io-wait. It happens even if it is 0%
> io-wait.
> http://www.titov.net/misc/top-snap.png
> 
> kernel configuration:
> http://www.titov.net/misc/config.gz
> 
> /proc/interrupts, lspci, dmesg (nothing intresting there), ifconfig,
> uname -a:
> http://www.titov.net/misc/misc.txt.gz
> 
> Is it a Linux bug or some hardware limitation?

possibly some missing parameters when loading your e1000 drivers.
e1000 NICs support interrupt rate limitation, which proves very
efficient in cases such as yours. I'm used to limit them to about
5k ints/s. Do a "modinfo e1000" to get the parameter name, I don't
have it quite right in mind.

Also, I've CCed linux-net.

Regards,
Willy


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Bad network performance over 2Gbps
  2008-04-15 18:06 Bad network performance over 2Gbps Anton Titov
  2008-04-15 20:14 ` Willy Tarreau
@ 2008-04-15 20:15 ` H. Willstrand
  2008-04-15 20:34   ` Kok, Auke
  1 sibling, 1 reply; 18+ messages in thread
From: H. Willstrand @ 2008-04-15 20:15 UTC (permalink / raw)
  To: Anton Titov, netdev

[Changed mail list]

On Tue, Apr 15, 2008 at 8:06 PM, Anton Titov <a.titov@host.bg> wrote:
> I use Linux for serving a huge amount of static web on few servers. When
>  network traffic goes above 2Gbit/sec ksoftirqd/5 (not every time 5, but
>  every time just one) starts using exactly 100% CPU time and packet
>  packet loss starts preventing traffic from going up. When the network
>  traffic is lower than 1.9Gbit ksoftirqds use 0% CPU according to top.
>
>  Uplink is 6 gigabit Intel cards bonded together using 802.3ad algorithm
>  with xmit_hash_policy set to layer3+4. On the other side is Cisco 2960
>  switch. Machine is with two quad core Intel Xeons @2.33GHz.
>
>  Here goes a screen snapshot of "top" command. The described behavior
>  have nothing to do with 13% io-wait. It happens even if it is 0%
>  io-wait.
>  http://www.titov.net/misc/top-snap.png
>
>  kernel configuration:
>  http://www.titov.net/misc/config.gz
>
>  /proc/interrupts, lspci, dmesg (nothing intresting there), ifconfig,
>  uname -a:
>  http://www.titov.net/misc/misc.txt.gz
>
>  Is it a Linux bug or some hardware limitation?
>
>  Regards,
>  Anton Titov
>
>  --
>  To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>  the body of a message to majordomo@vger.kernel.org
>  More majordomo info at  http://vger.kernel.org/majordomo-info.html
>  Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Bad network performance over 2Gbps
  2008-04-15 20:15 ` H. Willstrand
@ 2008-04-15 20:34   ` Kok, Auke
  2008-04-15 20:59     ` Chris Snook
  0 siblings, 1 reply; 18+ messages in thread
From: Kok, Auke @ 2008-04-15 20:34 UTC (permalink / raw)
  To: H. Willstrand; +Cc: Anton Titov, netdev, Jesse Brandeburg

H. Willstrand wrote:
> [Changed mail list]
> 
> On Tue, Apr 15, 2008 at 8:06 PM, Anton Titov <a.titov@host.bg> wrote:
>> I use Linux for serving a huge amount of static web on few servers. When
>>  network traffic goes above 2Gbit/sec ksoftirqd/5 (not every time 5, but
>>  every time just one) starts using exactly 100% CPU time and packet
>>  packet loss starts preventing traffic from going up. When the network
>>  traffic is lower than 1.9Gbit ksoftirqds use 0% CPU according to top.
>>
>>  Uplink is 6 gigabit Intel cards bonded together using 802.3ad algorithm
>>  with xmit_hash_policy set to layer3+4. On the other side is Cisco 2960
>>  switch. Machine is with two quad core Intel Xeons @2.33GHz.
>>
>>  Here goes a screen snapshot of "top" command. The described behavior
>>  have nothing to do with 13% io-wait. It happens even if it is 0%
>>  io-wait.
>>  http://www.titov.net/misc/top-snap.png
>>
>>  kernel configuration:
>>  http://www.titov.net/misc/config.gz
>>
>>  /proc/interrupts, lspci, dmesg (nothing intresting there), ifconfig,
>>  uname -a:
>>  http://www.titov.net/misc/misc.txt.gz
>>
>>  Is it a Linux bug or some hardware limitation?

I'm wondering if this is not a classical demonstration of the NAPI-irq trap where
after migration all the interrupts from the various cards are migrated to a single
CPU, and because of NAPI once they're busy polling won't ever migrate away from
that CPU again.

Have you looked at `cat /proc/interrupts` before and after this happens?

My guess is that your specific situation can benefit from setting smp_affinity and
forcing the NIC irq's so that you're at least occupying the load over multiple
CPU's (but preferably ones that use the same cache!) will help relieve the situation.

alternatively you might even see an improvement by disabling NAPI. depending on
the driver that you're using this might be possible.

I actually don't know much about bonding and how this affects everything, but my
guess is that that's a less important factor in this issue.

Cheers,

Auke

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Bad network performance over 2Gbps
  2008-04-15 20:14 ` Willy Tarreau
@ 2008-04-15 20:40   ` Kok, Auke
  2008-04-15 22:36     ` Anton Titov
  0 siblings, 1 reply; 18+ messages in thread
From: Kok, Auke @ 2008-04-15 20:40 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Anton Titov, Linux Kernel Mailing List, linux-net,
	Jesse Brandeburg

Willy Tarreau wrote:
> On Tue, Apr 15, 2008 at 09:06:44PM +0300, Anton Titov wrote:
>> I use Linux for serving a huge amount of static web on few servers. When
>> network traffic goes above 2Gbit/sec ksoftirqd/5 (not every time 5, but
>> every time just one) starts using exactly 100% CPU time and packet
>> packet loss starts preventing traffic from going up. When the network
>> traffic is lower than 1.9Gbit ksoftirqds use 0% CPU according to top.
>>
>> Uplink is 6 gigabit Intel cards bonded together using 802.3ad algorithm
>> with xmit_hash_policy set to layer3+4. On the other side is Cisco 2960
>> switch. Machine is with two quad core Intel Xeons @2.33GHz.
>>
>> Here goes a screen snapshot of "top" command. The described behavior
>> have nothing to do with 13% io-wait. It happens even if it is 0%
>> io-wait.
>> http://www.titov.net/misc/top-snap.png
>>
>> kernel configuration:
>> http://www.titov.net/misc/config.gz
>>
>> /proc/interrupts, lspci, dmesg (nothing intresting there), ifconfig,
>> uname -a:
>> http://www.titov.net/misc/misc.txt.gz
>>
>> Is it a Linux bug or some hardware limitation?
> 
> possibly some missing parameters when loading your e1000 drivers.
> e1000 NICs support interrupt rate limitation, which proves very
> efficient in cases such as yours. I'm used to limit them to about
> 5k ints/s. Do a "modinfo e1000" to get the parameter name, I don't
> have it quite right in mind.
> 
> Also, I've CCed linux-net.

# cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6
      CPU7
  0:        342        261        258        278        271        253        264
       283   IO-APIC-edge      timer
  1:          0          0          1          0          1          0          0
         0   IO-APIC-edge      i8042
  6:          0          1          0          1          0          0          1
         0   IO-APIC-edge      floppy
  9:          0          0          0          0          0          0          0
         0   IO-APIC-fasteoi   acpi
 12:          1          1          0          0          0          1          1
         0   IO-APIC-edge      i8042
 17:        180        190        178        183        182        186        186
       188   IO-APIC-fasteoi   uhci_hcd:usb1, ehci_hcd:usb4
 18:     843504     842514     843653     842033     842416     842742     841903
    842960   IO-APIC-fasteoi   3w-9xxx, uhci_hcd:usb3
 19:          0          0          0          0          0          0          0
         0   IO-APIC-fasteoi   uhci_hcd:usb2
498:  534642903  534635899  534726883  534732377  534701710  534708588  534730550
 534742730   PCI-MSI-edge      eth5
499:  531832274  531846609  531917849  531942676  531855140  531850692  531885565
 531863468   PCI-MSI-edge      eth4
500:  487251627  487279206  487248030  487220044  487239637  487231454  487281672
 487227202   PCI-MSI-edge      eth3
501:  486083953  486062203  486109925  486075793  486036977  486035152  486097551
 486117164   PCI-MSI-edge      eth2
502:  528889380  528863624  528760188  528798619  528891886  528890760  528807939
 528822746   PCI-MSI-edge      eth1
503:  529043135  529056706  528980250  528975209  529018995  529027386  528941583
 528970472   PCI-MSI-edge      eth0
NMI:          0          0          0          0          0          0          0
         0   Non-maskable interrupts
LOC:   62893699   62809502   62744208   62746035   62708815   62709055   62739182
  62620363   Local timer interrupts
RES:   15454866   15827970   16235695   15386970   15761053   16097167   16190851
  16159843   Rescheduling interrupts
CAL:         85         98         85         84         98         93         94
        91   function call interrupts
TLB:    3565361    3561798    3570271    3566272    3556996    3555866    3578257
   3564557   TLB shootdowns
TRM:          0          0          0          0          0          0          0
         0   Thermal event interrupts
THR:          0          0          0          0          0          0          0
         0   Threshold APIC interrupts
SPU:          0          0          0          0          0          0          0
         0   Spurious interrupts


Yikes! all wrong!

the network irq's are being ping-ponged around all the cores! bad!

1) turn the in-kernel IRQBALANCE option off !
2) use either the userspace `irqbalance` daemon or
3) set smp_affinity manually

Auke

> 
> Regards,
> Willy
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Bad network performance over 2Gbps
  2008-04-15 20:34   ` Kok, Auke
@ 2008-04-15 20:59     ` Chris Snook
  2008-04-15 21:05       ` Kok, Auke
  2008-04-17 10:02       ` Anton Titov
  0 siblings, 2 replies; 18+ messages in thread
From: Chris Snook @ 2008-04-15 20:59 UTC (permalink / raw)
  To: Kok, Auke; +Cc: H. Willstrand, Anton Titov, netdev, Jesse Brandeburg

Kok, Auke wrote:
> H. Willstrand wrote:
>> [Changed mail list]
>>
>> On Tue, Apr 15, 2008 at 8:06 PM, Anton Titov <a.titov@host.bg> wrote:
>>> I use Linux for serving a huge amount of static web on few servers. When
>>>  network traffic goes above 2Gbit/sec ksoftirqd/5 (not every time 5, but
>>>  every time just one) starts using exactly 100% CPU time and packet
>>>  packet loss starts preventing traffic from going up. When the network
>>>  traffic is lower than 1.9Gbit ksoftirqds use 0% CPU according to top.
>>>
>>>  Uplink is 6 gigabit Intel cards bonded together using 802.3ad algorithm
>>>  with xmit_hash_policy set to layer3+4. On the other side is Cisco 2960
>>>  switch. Machine is with two quad core Intel Xeons @2.33GHz.
>>>
>>>  Here goes a screen snapshot of "top" command. The described behavior
>>>  have nothing to do with 13% io-wait. It happens even if it is 0%
>>>  io-wait.
>>>  http://www.titov.net/misc/top-snap.png
>>>
>>>  kernel configuration:
>>>  http://www.titov.net/misc/config.gz
>>>
>>>  /proc/interrupts, lspci, dmesg (nothing intresting there), ifconfig,
>>>  uname -a:
>>>  http://www.titov.net/misc/misc.txt.gz
>>>
>>>  Is it a Linux bug or some hardware limitation?
> 
> I'm wondering if this is not a classical demonstration of the NAPI-irq trap where
> after migration all the interrupts from the various cards are migrated to a single
> CPU, and because of NAPI once they're busy polling won't ever migrate away from
> that CPU again.
> 
> Have you looked at `cat /proc/interrupts` before and after this happens?
> 
> My guess is that your specific situation can benefit from setting smp_affinity and
> forcing the NIC irq's so that you're at least occupying the load over multiple
> CPU's (but preferably ones that use the same cache!) will help relieve the situation.
> 
> alternatively you might even see an improvement by disabling NAPI. depending on
> the driver that you're using this might be possible.
> 
> I actually don't know much about bonding and how this affects everything, but my
> guess is that that's a less important factor in this issue.
> 
> Cheers,
> 
> Auke

I'm not sure that spreading IRQs out completely is necessarily a good 
idea, due to cache line ping-pong.  I suspect you'll get optimal 
performance by assigning the six IRQs to two cores that share an L2 cache.

Still, I think you're on to something here.  Disabling NAPI and instead 
tuning the cards' interrupt coalescing settings might allow irqbalance 
to do a better job than it is currently.

-- Chris

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Bad network performance over 2Gbps
  2008-04-15 20:59     ` Chris Snook
@ 2008-04-15 21:05       ` Kok, Auke
  2008-04-17 10:02       ` Anton Titov
  1 sibling, 0 replies; 18+ messages in thread
From: Kok, Auke @ 2008-04-15 21:05 UTC (permalink / raw)
  To: Chris Snook; +Cc: H. Willstrand, Anton Titov, netdev, Jesse Brandeburg

Chris Snook wrote:
> Kok, Auke wrote:
>> H. Willstrand wrote:
>>> [Changed mail list]
>>>
>>> On Tue, Apr 15, 2008 at 8:06 PM, Anton Titov <a.titov@host.bg> wrote:
>>>> I use Linux for serving a huge amount of static web on few servers.
>>>> When
>>>>  network traffic goes above 2Gbit/sec ksoftirqd/5 (not every time 5,
>>>> but
>>>>  every time just one) starts using exactly 100% CPU time and packet
>>>>  packet loss starts preventing traffic from going up. When the network
>>>>  traffic is lower than 1.9Gbit ksoftirqds use 0% CPU according to top.
>>>>
>>>>  Uplink is 6 gigabit Intel cards bonded together using 802.3ad
>>>> algorithm
>>>>  with xmit_hash_policy set to layer3+4. On the other side is Cisco 2960
>>>>  switch. Machine is with two quad core Intel Xeons @2.33GHz.
>>>>
>>>>  Here goes a screen snapshot of "top" command. The described behavior
>>>>  have nothing to do with 13% io-wait. It happens even if it is 0%
>>>>  io-wait.
>>>>  http://www.titov.net/misc/top-snap.png
>>>>
>>>>  kernel configuration:
>>>>  http://www.titov.net/misc/config.gz
>>>>
>>>>  /proc/interrupts, lspci, dmesg (nothing intresting there), ifconfig,
>>>>  uname -a:
>>>>  http://www.titov.net/misc/misc.txt.gz
>>>>
>>>>  Is it a Linux bug or some hardware limitation?
>>
>> I'm wondering if this is not a classical demonstration of the NAPI-irq
>> trap where
>> after migration all the interrupts from the various cards are migrated
>> to a single
>> CPU, and because of NAPI once they're busy polling won't ever migrate
>> away from
>> that CPU again.
>>
>> Have you looked at `cat /proc/interrupts` before and after this happens?
>>
>> My guess is that your specific situation can benefit from setting
>> smp_affinity and
>> forcing the NIC irq's so that you're at least occupying the load over
>> multiple
>> CPU's (but preferably ones that use the same cache!) will help relieve
>> the situation.
>>
>> alternatively you might even see an improvement by disabling NAPI.
>> depending on
>> the driver that you're using this might be possible.
>>
>> I actually don't know much about bonding and how this affects
>> everything, but my
>> guess is that that's a less important factor in this issue.
>>
>> Cheers,
>>
>> Auke
> 
> I'm not sure that spreading IRQs out completely is necessarily a good
> idea, due to cache line ping-pong.  I suspect you'll get optimal
> performance by assigning the six IRQs to two cores that share an L2 cache.
> 
> Still, I think you're on to something here.  Disabling NAPI and instead
> tuning the cards' interrupt coalescing settings might allow irqbalance
> to do a better job than it is currently.

well I posted another reply to him after I looked at the debug output he posted
and it appeats that the in-kernel irqbalance is the culprit and the 100% softirqd
is because his interrupts are being balanced across all cores, therefore pretty
much guaranteeing him full cache misses on every single receive, not to mention
unneeded migration of tasks.

I definately think we should disable the in-kernel irqbalance option by default :)

as to what the best solution is with 6 adapters is not clear as they're all on the
same bridge, but hard setting the affinity to two cores that share an L2 seems to
give the best results quickly.

Cheers,

Auke

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Bad network performance over 2Gbps
  2008-04-15 20:40   ` Kok, Auke
@ 2008-04-15 22:36     ` Anton Titov
  2008-04-16  4:27       ` Willy Tarreau
  0 siblings, 1 reply; 18+ messages in thread
From: Anton Titov @ 2008-04-15 22:36 UTC (permalink / raw)
  To: Kok, Auke
  Cc: Willy Tarreau, Linux Kernel Mailing List, linux-net,
	Jesse Brandeburg

On Tue, 2008-04-15 at 13:40 -0700, Kok, Auke wrote:
> Willy Tarreau wrote:
> 1) turn the in-kernel IRQBALANCE option off !
Actually it may be already removed. I remember it being under "Processor
type and features" and I currently cannot find it there for x86_64

> 2) use either the userspace `irqbalance` daemon or
> 3) set smp_affinity manually

I tried echoing 3 (assuming that CPU0 and CPU1 will share their cache,
as advised in other mails) into smp_affinity of all ethX interrupts and
no positive result was observed.

I will try disabling NAPI and limiting e1000 interrupts tomorrow.


> 
> Auke
> 
> > 
> > Regards,
> > Willy
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Bad network performance over 2Gbps
  2008-04-15 22:36     ` Anton Titov
@ 2008-04-16  4:27       ` Willy Tarreau
  0 siblings, 0 replies; 18+ messages in thread
From: Willy Tarreau @ 2008-04-16  4:27 UTC (permalink / raw)
  To: Anton Titov
  Cc: Kok, Auke, Linux Kernel Mailing List, linux-net, Jesse Brandeburg

On Wed, Apr 16, 2008 at 01:36:36AM +0300, Anton Titov wrote:
> On Tue, 2008-04-15 at 13:40 -0700, Kok, Auke wrote:
> > Willy Tarreau wrote:
> > 1) turn the in-kernel IRQBALANCE option off !
> Actually it may be already removed. I remember it being under "Processor
> type and features" and I currently cannot find it there for x86_64
> 
> > 2) use either the userspace `irqbalance` daemon or
> > 3) set smp_affinity manually
> 
> I tried echoing 3 (assuming that CPU0 and CPU1 will share their cache,
> as advised in other mails) into smp_affinity of all ethX interrupts and
> no positive result was observed.

But have you disabled irqbalance before doing this ? (you must reboot
and pass "noirqbalance" on the command line for this).

Also, if you are running on quad-core intel CPUs, I'm told that they're
simply two standard dual-core CPUs in the same case, so there is no
shared cache between any core. You should try to assign all irqs to
CPU0 for a test. It *must* make a difference, in either direction.

> I will try disabling NAPI and limiting e1000 interrupts tomorrow.

I found the parameter name I was speaking about : InterruptThrottleRate.
Beware it's an array with one entry per NIC, so you have to set as many
values as you have NICs. I have always observed huge performance boosts
when using the tunables the driver provides.

Willy

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Bad network performance over 2Gbps
  2008-04-15 20:59     ` Chris Snook
  2008-04-15 21:05       ` Kok, Auke
@ 2008-04-17 10:02       ` Anton Titov
  2008-04-17 17:37         ` [PATCH] " Kok, Auke
  1 sibling, 1 reply; 18+ messages in thread
From: Anton Titov @ 2008-04-17 10:02 UTC (permalink / raw)
  To: Chris Snook; +Cc: Kok, Auke, H. Willstrand, netdev, Jesse Brandeburg

On Tue, 2008-04-15 at 16:59 -0400, Chris Snook wrote:
> Still, I think you're on to something here.  Disabling NAPI and instead 
> tuning the cards' interrupt coalescing settings might allow irqbalance 
> to do a better job than it is currently.

Disabling NAPI allowed me to push as much as 3.5Gbit out of the same
server with ~ 20% of time CPUs doing software interrupts.

Regards,
Anton Titov


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] Re: Bad network performance over 2Gbps
  2008-04-17 10:02       ` Anton Titov
@ 2008-04-17 17:37         ` Kok, Auke
  2008-04-20 12:08           ` Denys Fedoryshchenko
                             ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Kok, Auke @ 2008-04-17 17:37 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Linux Kernel Mailing List
  Cc: Anton Titov, Chris Snook, H. Willstrand, netdev, Jesse Brandeburg,
	Linus Torvalds, Andrew Morton

Anton Titov wrote:
> On Tue, 2008-04-15 at 16:59 -0400, Chris Snook wrote:
>> Still, I think you're on to something here.  Disabling NAPI and instead 
>> tuning the cards' interrupt coalescing settings might allow irqbalance 
>> to do a better job than it is currently.
> 
> Disabling NAPI allowed me to push as much as 3.5Gbit out of the same
> server with ~ 20% of time CPUs doing software interrupts.

yes, I really don't see this is such an amazing discovery - the in-kernel
irqbalance code is totally wrong for network interrupts (and probably for most
interrupts).

on your system with 6 network interrupts it blows chunks and it's not NAPI that is
the issue - NAPI will work just fine on it's own. By disabling NAPI and reverting
to the in-driver irq moderation code you've effectively put the in-kernel
irqbalance code to the sideline and this is what makes it work again.

It's not the right solution.

We keep seing this exact issue pop up everywhere - especially with e1000(e)
datacenter users - this code _has_ to go or be fixed. Since there is a perfectly
viable solution, I strongly suggest disabling it.

This is not the first time I've sent this patch out in some form...

Auke

---
[X86] IRQBALANCE: Mark as BROKEN and disable by default

The IRQBALANCE option causes interrupts to bounce all around on SMP systems
quickly burying the CPU in migration cost and cache misses. Mainly affected are
network interrupts and this results in one CPU pegged in softirqd completely.

Disable this option and provide documentation to a better solution (userspace
irqbalance daemon does overall the best job to begin with and only manual setting
of smp_affinity will beat it).

Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>

---

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 6c70fed..956aa22 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1026,13 +1026,17 @@ config EFI
   	platforms.

 config IRQBALANCE
-	def_bool y
+	def_bool n
 	prompt "Enable kernel irq balancing"
-	depends on X86_32 && SMP && X86_IO_APIC
+	depends on X86_32 && SMP && X86_IO_APIC && BROKEN
 	help
 	  The default yes will allow the kernel to do irq load balancing.
 	  Saying no will keep the kernel from doing irq load balancing.

+	  This option is known to cause performance issues on SMP
+	  systems. The preferred method is to use the userspace
+	  'irqbalance' daemon instead. See http://irqbalance.org/.
+
 config SECCOMP
 	def_bool y
 	prompt "Enable seccomp to safely compute untrusted bytecode"

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] Re: Bad network performance over 2Gbps
  2008-04-17 17:37         ` [PATCH] " Kok, Auke
@ 2008-04-20 12:08           ` Denys Fedoryshchenko
  2008-04-21 13:19           ` Pavel Machek
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 18+ messages in thread
From: Denys Fedoryshchenko @ 2008-04-20 12:08 UTC (permalink / raw)
  To: Kok, Auke
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Linux Kernel Mailing List, Anton Titov, Chris Snook,
	H. Willstrand, netdev, Jesse Brandeburg, Andrew Morton

By default also without IRQBALANCE enabled in kernel, APIC or someone else distributing interrupts over processors too.
There is no irqbalance daemon or whatever.

For example:
Router-KARAM ~ # cat /proc/interrupts
           CPU0       CPU1
  0:   87956938 1403052485   IO-APIC-edge      timer
  1:          0          2   IO-APIC-edge      i8042
  9:          0          0   IO-APIC-fasteoi   acpi
 19:        140       5714   IO-APIC-fasteoi   ohci_hcd:usb1, ohci_hcd:usb2
 24:  675673280 1186506694   IO-APIC-fasteoi   eth2
 26:  717865662 2201633562   IO-APIC-fasteoi   eth0
 27:    1869190   23075556   IO-APIC-fasteoi   eth1
NMI:          0          0   Non-maskable interrupts
LOC: 1403052485   87956683   Local timer interrupts
RES:      75059      25408   Rescheduling interrupts
CAL:      99542         83   function call interrupts
TLB:        616        200   TLB shootdowns
TRM:          0          0   Thermal event interrupts
SPU:          0          0   Spurious interrupts
ERR:          0
MIS:          0

sunfire-1 ~ # cat config|grep -i irq
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_PENDING_IRQ=y
# CONFIG_IRQBALANCE is not set
CONFIG_HT_IRQ=y
# CONFIG_HPET_RTC_IRQ is not set
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
# CONFIG_DEBUG_SHIRQ is not set

Is it harmful too?

On Thursday 17 April 2008 20:37, Kok, Auke wrote:
> Anton Titov wrote:
> > On Tue, 2008-04-15 at 16:59 -0400, Chris Snook wrote:
> >> Still, I think you're on to something here.  Disabling NAPI and instead 
> >> tuning the cards' interrupt coalescing settings might allow irqbalance 
> >> to do a better job than it is currently.
> > 
> > Disabling NAPI allowed me to push as much as 3.5Gbit out of the same
> > server with ~ 20% of time CPUs doing software interrupts.
> 
> yes, I really don't see this is such an amazing discovery - the in-kernel
> irqbalance code is totally wrong for network interrupts (and probably for most
> interrupts).
> 
> on your system with 6 network interrupts it blows chunks and it's not NAPI that is
> the issue - NAPI will work just fine on it's own. By disabling NAPI and reverting
> to the in-driver irq moderation code you've effectively put the in-kernel
> irqbalance code to the sideline and this is what makes it work again.
> 
> It's not the right solution.
> 
> We keep seing this exact issue pop up everywhere - especially with e1000(e)
> datacenter users - this code _has_ to go or be fixed. Since there is a perfectly
> viable solution, I strongly suggest disabling it.
> 
> This is not the first time I've sent this patch out in some form...
> 
> Auke
> 
> 
> ---
> [X86] IRQBALANCE: Mark as BROKEN and disable by default
> 
> The IRQBALANCE option causes interrupts to bounce all around on SMP systems
> quickly burying the CPU in migration cost and cache misses. Mainly affected are
> network interrupts and this results in one CPU pegged in softirqd completely.
> 
> Disable this option and provide documentation to a better solution (userspace
> irqbalance daemon does overall the best job to begin with and only manual setting
> of smp_affinity will beat it).
> 
> Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
> 
> ---
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 6c70fed..956aa22 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1026,13 +1026,17 @@ config EFI
>    	platforms.
> 
>  config IRQBALANCE
> -	def_bool y
> +	def_bool n
>  	prompt "Enable kernel irq balancing"
> -	depends on X86_32 && SMP && X86_IO_APIC
> +	depends on X86_32 && SMP && X86_IO_APIC && BROKEN
>  	help
>  	  The default yes will allow the kernel to do irq load balancing.
>  	  Saying no will keep the kernel from doing irq load balancing.
> 
> +	  This option is known to cause performance issues on SMP
> +	  systems. The preferred method is to use the userspace
> +	  'irqbalance' daemon instead. See http://irqbalance.org/.
> +
>  config SECCOMP
>  	def_bool y
>  	prompt "Enable seccomp to safely compute untrusted bytecode"
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
------
Technical Manager
Virtual ISP S.A.L.
Lebanon

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Re: Bad network performance over 2Gbps
  2008-04-17 17:37         ` [PATCH] " Kok, Auke
  2008-04-20 12:08           ` Denys Fedoryshchenko
@ 2008-04-21 13:19           ` Pavel Machek
  2008-04-21 16:38             ` Kok, Auke
  2008-04-21 15:28           ` Ingo Molnar
  2008-04-22  5:07           ` Bill Fink
  3 siblings, 1 reply; 18+ messages in thread
From: Pavel Machek @ 2008-04-21 13:19 UTC (permalink / raw)
  To: Kok, Auke
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Linux Kernel Mailing List, Anton Titov, Chris Snook,
	H. Willstrand, netdev, Jesse Brandeburg, Linus Torvalds,
	Andrew Morton

Hi!

> [X86] IRQBALANCE: Mark as BROKEN and disable by default
> 
> The IRQBALANCE option causes interrupts to bounce all around on SMP systems
> quickly burying the CPU in migration cost and cache misses. Mainly affected are
> network interrupts and this results in one CPU pegged in softirqd completely.
> 
> Disable this option and provide documentation to a better solution (userspace
> irqbalance daemon does overall the best job to begin with and only manual setting
> of smp_affinity will beat it).
> 
> Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
> 
> ---
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 6c70fed..956aa22 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1026,13 +1026,17 @@ config EFI
>    	platforms.
> 
>  config IRQBALANCE
> -	def_bool y
> +	def_bool n

ACK.
>  	prompt "Enable kernel irq balancing"
> -	depends on X86_32 && SMP && X86_IO_APIC
> +	depends on X86_32 && SMP && X86_IO_APIC && BROKEN

This is wrong. irqbalance works, there's nothing wrong with it; but it
has nasty sideffects.

>  	help
>  	  The default yes will allow the kernel to do irq load balancing.
>  	  Saying no will keep the kernel from doing irq load balancing.
> 
> +	  This option is known to cause performance issues on SMP
> +	  systems. The preferred method is to use the userspace
> +	  'irqbalance' daemon instead. See http://irqbalance.org/.
> +

ACK.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Re: Bad network performance over 2Gbps
  2008-04-17 17:37         ` [PATCH] " Kok, Auke
  2008-04-20 12:08           ` Denys Fedoryshchenko
  2008-04-21 13:19           ` Pavel Machek
@ 2008-04-21 15:28           ` Ingo Molnar
  2008-04-21 16:58             ` Kok, Auke
  2008-04-22  5:07           ` Bill Fink
  3 siblings, 1 reply; 18+ messages in thread
From: Ingo Molnar @ 2008-04-21 15:28 UTC (permalink / raw)
  To: Kok, Auke
  Cc: Thomas Gleixner, H. Peter Anvin, Linux Kernel Mailing List,
	Anton Titov, Chris Snook, H. Willstrand, netdev, Jesse Brandeburg,
	Linus Torvalds, Andrew Morton


* Kok, Auke <auke-jan.h.kok@intel.com> wrote:

> We keep seing this exact issue pop up everywhere - especially with 
> e1000(e) datacenter users - this code _has_ to go or be fixed. Since 
> there is a perfectly viable solution, I strongly suggest disabling it.

strongly agreed. Thanks Auke, applied.

	Ingo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Re: Bad network performance over 2Gbps
  2008-04-21 13:19           ` Pavel Machek
@ 2008-04-21 16:38             ` Kok, Auke
  0 siblings, 0 replies; 18+ messages in thread
From: Kok, Auke @ 2008-04-21 16:38 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Linux Kernel Mailing List, Anton Titov, Chris Snook,
	H. Willstrand, netdev, Jesse Brandeburg, Linus Torvalds,
	Andrew Morton

Pavel Machek wrote:
> Hi!
> 
>> [X86] IRQBALANCE: Mark as BROKEN and disable by default
>>
>> The IRQBALANCE option causes interrupts to bounce all around on SMP systems
>> quickly burying the CPU in migration cost and cache misses. Mainly affected are
>> network interrupts and this results in one CPU pegged in softirqd completely.
>>
>> Disable this option and provide documentation to a better solution (userspace
>> irqbalance daemon does overall the best job to begin with and only manual setting
>> of smp_affinity will beat it).
>>
>> Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
>>
>> ---
>>
>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index 6c70fed..956aa22 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -1026,13 +1026,17 @@ config EFI
>>    	platforms.
>>
>>  config IRQBALANCE
>> -	def_bool y
>> +	def_bool n
> 
> ACK.
>>  	prompt "Enable kernel irq balancing"
>> -	depends on X86_32 && SMP && X86_IO_APIC
>> +	depends on X86_32 && SMP && X86_IO_APIC && BROKEN
> 
> This is wrong. irqbalance works, there's nothing wrong with it; but it
> has nasty sideffects.

ok, I'm fine with taking that part out of the patch.

Ingo, want me to send an updated patch?


> 
>>  	help
>>  	  The default yes will allow the kernel to do irq load balancing.
>>  	  Saying no will keep the kernel from doing irq load balancing.
>>
>> +	  This option is known to cause performance issues on SMP
>> +	  systems. The preferred method is to use the userspace
>> +	  'irqbalance' daemon instead. See http://irqbalance.org/.
>> +
> 
> ACK.
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Re: Bad network performance over 2Gbps
  2008-04-21 15:28           ` Ingo Molnar
@ 2008-04-21 16:58             ` Kok, Auke
  2008-04-21 18:35               ` Andi Kleen
  0 siblings, 1 reply; 18+ messages in thread
From: Kok, Auke @ 2008-04-21 16:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, H. Peter Anvin, Linux Kernel Mailing List,
	Anton Titov, Chris Snook, H. Willstrand, netdev, Jesse Brandeburg,
	Linus Torvalds, Andrew Morton

Ingo Molnar wrote:
> * Kok, Auke <auke-jan.h.kok@intel.com> wrote:
> 
>> We keep seing this exact issue pop up everywhere - especially with 
>> e1000(e) datacenter users - this code _has_ to go or be fixed. Since 
>> there is a perfectly viable solution, I strongly suggest disabling it.
> 
> strongly agreed. Thanks Auke, applied.
> 
> 	Ingo


excellent, ignore my other reply to Pavel - I didn't see this reply yet :)

Thanks Ingo


Auke


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Re: Bad network performance over 2Gbps
  2008-04-21 16:58             ` Kok, Auke
@ 2008-04-21 18:35               ` Andi Kleen
  0 siblings, 0 replies; 18+ messages in thread
From: Andi Kleen @ 2008-04-21 18:35 UTC (permalink / raw)
  To: Kok, Auke
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Linux Kernel Mailing List, Anton Titov, Chris Snook,
	H. Willstrand, netdev, Jesse Brandeburg, Linus Torvalds,
	Andrew Morton

"Kok, Auke" <auke-jan.h.kok@intel.com> writes:

> Ingo Molnar wrote:
>> * Kok, Auke <auke-jan.h.kok@intel.com> wrote:
>> 
>>> We keep seing this exact issue pop up everywhere - especially with 
>>> e1000(e) datacenter users - this code _has_ to go or be fixed. Since 
>>> there is a perfectly viable solution, I strongly suggest disabling it.
>> 
>> strongly agreed. Thanks Auke, applied.
>> 
>> 	Ingo
>
>
> excellent, ignore my other reply to Pavel - I didn't see this reply yet :)

Shouldn't you just add it to the FeatureRemoval list too and remove it 
then quickly? No need to keep disabled and known to be wrong code around.

-Andi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Re: Bad network performance over 2Gbps
  2008-04-17 17:37         ` [PATCH] " Kok, Auke
                             ` (2 preceding siblings ...)
  2008-04-21 15:28           ` Ingo Molnar
@ 2008-04-22  5:07           ` Bill Fink
  3 siblings, 0 replies; 18+ messages in thread
From: Bill Fink @ 2008-04-22  5:07 UTC (permalink / raw)
  To: Kok, Auke
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Linux Kernel Mailing List, Anton Titov, Chris Snook,
	H. Willstrand, netdev, Jesse Brandeburg, Linus Torvalds,
	Andrew Morton

On Thu, 17 Apr 2008, Kok, Auke wrote:

> [X86] IRQBALANCE: Mark as BROKEN and disable by default
> 
> The IRQBALANCE option causes interrupts to bounce all around on SMP systems
> quickly burying the CPU in migration cost and cache misses. Mainly affected are
> network interrupts and this results in one CPU pegged in softirqd completely.
> 
> Disable this option and provide documentation to a better solution (userspace
> irqbalance daemon does overall the best job to begin with and only manual setting
> of smp_affinity will beat it).
> 
> Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
> 
> ---
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 6c70fed..956aa22 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1026,13 +1026,17 @@ config EFI
>    	platforms.
> 
>  config IRQBALANCE
> -	def_bool y
> +	def_bool n
>  	prompt "Enable kernel irq balancing"
> -	depends on X86_32 && SMP && X86_IO_APIC
> +	depends on X86_32 && SMP && X86_IO_APIC && BROKEN
>  	help
>  	  The default yes will allow the kernel to do irq load balancing.
>  	  Saying no will keep the kernel from doing irq load balancing.

Since you're changing the default setting, shouldn't the above be
changed to:

 	  Saying yes will allow the kernel to do irq load balancing.
 	  The default no will keep the kernel from doing irq load balancing.

> +	  This option is known to cause performance issues on SMP
> +	  systems. The preferred method is to use the userspace
> +	  'irqbalance' daemon instead. See http://irqbalance.org/.
> +
>  config SECCOMP
>  	def_bool y
>  	prompt "Enable seccomp to safely compute untrusted bytecode"

						-Bill

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2008-04-22  5:08 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-15 18:06 Bad network performance over 2Gbps Anton Titov
2008-04-15 20:14 ` Willy Tarreau
2008-04-15 20:40   ` Kok, Auke
2008-04-15 22:36     ` Anton Titov
2008-04-16  4:27       ` Willy Tarreau
2008-04-15 20:15 ` H. Willstrand
2008-04-15 20:34   ` Kok, Auke
2008-04-15 20:59     ` Chris Snook
2008-04-15 21:05       ` Kok, Auke
2008-04-17 10:02       ` Anton Titov
2008-04-17 17:37         ` [PATCH] " Kok, Auke
2008-04-20 12:08           ` Denys Fedoryshchenko
2008-04-21 13:19           ` Pavel Machek
2008-04-21 16:38             ` Kok, Auke
2008-04-21 15:28           ` Ingo Molnar
2008-04-21 16:58             ` Kok, Auke
2008-04-21 18:35               ` Andi Kleen
2008-04-22  5:07           ` Bill Fink

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.