* Re: [PATCH] Re: Bad network performance over 2Gbps [not found] ` <ajGfA-7rt-7@gated-at.bofh.it> @ 2008-04-19 15:05 ` Bodo Eggert [not found] ` <E1JnEcl-0000xc-D9@be1.7eggert.dyndns.org> 1 sibling, 0 replies; 17+ messages in thread From: Bodo Eggert @ 2008-04-19 15:05 UTC (permalink / raw) To: Kok, Auke, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Linux Kernel Mailing List < Kok, Auke <auke-jan.h.kok@intel.com> wrote: > [X86] IRQBALANCE: Mark as BROKEN and disable by default > > The IRQBALANCE option causes interrupts to bounce all around on SMP systems > quickly burying the CPU in migration cost and cache misses. Mainly affected > are network interrupts and this results in one CPU pegged in softirqd > completely. If this is the problem, maybe it would help to only balance the IRQs each e.g. ten seconds? Unfortunately I have no SMP system to try it out. ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <E1JnEcl-0000xc-D9@be1.7eggert.dyndns.org>]
* Re: [PATCH] Re: Bad network performance over 2Gbps [not found] ` <E1JnEcl-0000xc-D9@be1.7eggert.dyndns.org> @ 2008-04-19 19:23 ` Stephen Hemminger 2008-04-21 16:42 ` Rick Jones 1 sibling, 0 replies; 17+ messages in thread From: Stephen Hemminger @ 2008-04-19 19:23 UTC (permalink / raw) To: 7eggert Cc: Kok, Auke, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Linux Kernel Mailing List, Anton Titov, Chris Snook, H. Willstrand, netdev, Jesse Brandeburg, Linus Torvalds, Andrew Morton Bodo Eggert wrote: > Kok, Auke <auke-jan.h.kok@intel.com> wrote: > > >> [X86] IRQBALANCE: Mark as BROKEN and disable by default >> >> The IRQBALANCE option causes interrupts to bounce all around on SMP systems >> quickly burying the CPU in migration cost and cache misses. Mainly affected >> are network interrupts and this results in one CPU pegged in softirqd >> completely. >> > > If this is the problem, maybe it would help to only balance the IRQs each > e.g. ten seconds? Unfortunately I have no SMP system to try it out. > > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > The kernel level IRQBALANCE is useless. The userlevel irqbalance does the right thing, it handles multi-core, and network devices, and all the other special cases. *Don't use kernel level irqbalance* ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Re: Bad network performance over 2Gbps [not found] ` <E1JnEcl-0000xc-D9@be1.7eggert.dyndns.org> 2008-04-19 19:23 ` Stephen Hemminger @ 2008-04-21 16:42 ` Rick Jones 2008-04-21 19:52 ` Bodo Eggert 1 sibling, 1 reply; 17+ messages in thread From: Rick Jones @ 2008-04-21 16:42 UTC (permalink / raw) To: 7eggert Cc: Kok, Auke, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Linux Kernel Mailing List, Anton Titov, Chris Snook, H. Willstrand, netdev, Jesse Brandeburg, Linus Torvalds, Andrew Morton Bodo Eggert wrote: > Kok, Auke <auke-jan.h.kok@intel.com> wrote: > > >>[X86] IRQBALANCE: Mark as BROKEN and disable by default >> >>The IRQBALANCE option causes interrupts to bounce all around on SMP systems >>quickly burying the CPU in migration cost and cache misses. Mainly affected >>are network interrupts and this results in one CPU pegged in softirqd >>completely. > > > If this is the problem, maybe it would help to only balance the IRQs each > e.g. ten seconds? Unfortunately I have no SMP system to try it out. Be it kernel or user space, for consistent benchmark results it needs to be able to be turned-off without turning the code. That leaves me in agreement with Stephen that if it must exist, the user space one would be preferable. It can be easily terminated with extreme prejudice. rick jones ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Re: Bad network performance over 2Gbps 2008-04-21 16:42 ` Rick Jones @ 2008-04-21 19:52 ` Bodo Eggert 2008-04-21 20:02 ` Rick Jones 0 siblings, 1 reply; 17+ messages in thread From: Bodo Eggert @ 2008-04-21 19:52 UTC (permalink / raw) To: Rick Jones Cc: 7eggert, Kok, Auke, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Linux Kernel Mailing List, Anton Titov, Chris Snook, H. Willstrand, netdev, Jesse Brandeburg, Linus Torvalds, Andrew Morton On Mon, 21 Apr 2008, Rick Jones wrote: > Bodo Eggert wrote: > > Kok, Auke <auke-jan.h.kok@intel.com> wrote: > > > [X86] IRQBALANCE: Mark as BROKEN and disable by default > > > > > > The IRQBALANCE option causes interrupts to bounce all around on SMP > > > systems > > > quickly burying the CPU in migration cost and cache misses. Mainly > > > affected > > > are network interrupts and this results in one CPU pegged in softirqd > > > completely. > > > > > > If this is the problem, maybe it would help to only balance the IRQs each > > e.g. ten seconds? Unfortunately I have no SMP system to try it out. > > Be it kernel or user space, for consistent benchmark results it needs to be > able to be turned-off without turning the code. That leaves me in agreement > with Stephen that if it must exist, the user space one would be preferable. > It can be easily terminated with extreme prejudice. I agree that having a full-featured userspace balancer daemon with lots of intelligence will be theoretically better, but if you can have a simple daemon doing OK on many machines for less than the userspace daemon's kernel stack, why not? -- Funny quotes: 31. Why do "overlook" and "oversee" mean opposite things? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Re: Bad network performance over 2Gbps 2008-04-21 19:52 ` Bodo Eggert @ 2008-04-21 20:02 ` Rick Jones 2008-04-21 21:08 ` Bodo Eggert 0 siblings, 1 reply; 17+ messages in thread From: Rick Jones @ 2008-04-21 20:02 UTC (permalink / raw) To: Bodo Eggert Cc: Kok, Auke, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Linux Kernel Mailing List, Anton Titov, Chris Snook, H. Willstrand, netdev, Jesse Brandeburg, Linus Torvalds, Andrew Morton Bodo Eggert wrote: > On Mon, 21 Apr 2008, Rick Jones wrote: >>Be it kernel or user space, for consistent benchmark results it needs to be >>able to be turned-off without turning the code. That leaves me in agreement >>with Stephen that if it must exist, the user space one would be preferable. >>It can be easily terminated with extreme prejudice. > > > I agree that having a full-featured userspace balancer daemon with lots of > intelligence will be theoretically better, but if you can have a simple > daemon doing OK on many machines for less than the userspace daemon's > kernel stack, why not? Perhaps my judgement is too colored by benchmark(et)ing, and desires to have repeatable results on things like neperf, but I very much like to know where my interrupts are going and don't like them moving around. That is why I am not particularly fond of either flavor of irq balancing. That being the case, whatever is out there aught to be able to be disabled on a running system without having to roll bits or reboot. rick jones ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Re: Bad network performance over 2Gbps 2008-04-21 20:02 ` Rick Jones @ 2008-04-21 21:08 ` Bodo Eggert 2008-04-21 21:30 ` Chris Snook 0 siblings, 1 reply; 17+ messages in thread From: Bodo Eggert @ 2008-04-21 21:08 UTC (permalink / raw) To: Rick Jones Cc: Bodo Eggert, Kok, Auke, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Linux Kernel Mailing List, Anton Titov, Chris Snook, H. Willstrand, netdev, Jesse Brandeburg, Linus Torvalds, Andrew Morton On Mon, 21 Apr 2008, Rick Jones wrote: > Bodo Eggert wrote: > > On Mon, 21 Apr 2008, Rick Jones wrote: > > > Be it kernel or user space, for consistent benchmark results it needs to > > > be > > > able to be turned-off without turning the code. That leaves me in > > > agreement > > > with Stephen that if it must exist, the user space one would be > > > preferable. > > > It can be easily terminated with extreme prejudice. > > > > > > I agree that having a full-featured userspace balancer daemon with lots of > > intelligence will be theoretically better, but if you can have a simple > > daemon doing OK on many machines for less than the userspace daemon's > > kernel stack, why not? > > Perhaps my judgement is too colored by benchmark(et)ing, and desires to have > repeatable results on things like neperf, but I very much like to know where > my interrupts are going and don't like them moving around. That is why I am > not particularly fond of either flavor of irq balancing. > > That being the case, whatever is out there aught to be able to be disabled on > a running system without having to roll bits or reboot. Adding a "module" parameter to disable it should be cheap, isn't it? -- Top 100 things you don't want the sysadmin to say: 34. The network's down, but we're working on it. Come back after diner. (Usually said at 2200 the night before thesis deadline... ) ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Re: Bad network performance over 2Gbps 2008-04-21 21:08 ` Bodo Eggert @ 2008-04-21 21:30 ` Chris Snook 2008-04-22 7:36 ` Bodo Eggert 0 siblings, 1 reply; 17+ messages in thread From: Chris Snook @ 2008-04-21 21:30 UTC (permalink / raw) To: Bodo Eggert Cc: Rick Jones, Kok, Auke, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Linux Kernel Mailing List, Anton Titov, H. Willstrand, netdev, Jesse Brandeburg, Linus Torvalds, Andrew Morton Bodo Eggert wrote: > On Mon, 21 Apr 2008, Rick Jones wrote: >> Bodo Eggert wrote: >>> On Mon, 21 Apr 2008, Rick Jones wrote: > >>>> Be it kernel or user space, for consistent benchmark results it needs to >>>> be >>>> able to be turned-off without turning the code. That leaves me in >>>> agreement >>>> with Stephen that if it must exist, the user space one would be >>>> preferable. >>>> It can be easily terminated with extreme prejudice. >>> >>> I agree that having a full-featured userspace balancer daemon with lots of >>> intelligence will be theoretically better, but if you can have a simple >>> daemon doing OK on many machines for less than the userspace daemon's >>> kernel stack, why not? >> Perhaps my judgement is too colored by benchmark(et)ing, and desires to have >> repeatable results on things like neperf, but I very much like to know where >> my interrupts are going and don't like them moving around. That is why I am >> not particularly fond of either flavor of irq balancing. >> >> That being the case, whatever is out there aught to be able to be disabled on >> a running system without having to roll bits or reboot. > > Adding a "module" parameter to disable it should be cheap, isn't it? Except the irq balancing is system-wide. Adding per-device exemptions to an obsolete feature seems like the wrong way to go. -- Chris ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Re: Bad network performance over 2Gbps 2008-04-21 21:30 ` Chris Snook @ 2008-04-22 7:36 ` Bodo Eggert 2008-04-22 17:46 ` Kok, Auke 0 siblings, 1 reply; 17+ messages in thread From: Bodo Eggert @ 2008-04-22 7:36 UTC (permalink / raw) To: Chris Snook Cc: Bodo Eggert, Rick Jones, Kok, Auke, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Linux Kernel Mailing List, Anton Titov, H. Willstrand, netdev, Jesse Brandeburg, Linus Torvalds, Andrew Morton On Mon, 21 Apr 2008, Chris Snook wrote: > Bodo Eggert wrote: > > On Mon, 21 Apr 2008, Rick Jones wrote: > >> Bodo Eggert wrote: > >>> On Mon, 21 Apr 2008, Rick Jones wrote: > >>>> Be it kernel or user space, for consistent benchmark results it needs to > >>>> be > >>>> able to be turned-off without turning the code. That leaves me in > >>>> agreement > >>>> with Stephen that if it must exist, the user space one would be > >>>> preferable. > >>>> It can be easily terminated with extreme prejudice. > >>> > >>> I agree that having a full-featured userspace balancer daemon with lots of > >>> intelligence will be theoretically better, but if you can have a simple > >>> daemon doing OK on many machines for less than the userspace daemon's > >>> kernel stack, why not? > >> Perhaps my judgement is too colored by benchmark(et)ing, and desires to have > >> repeatable results on things like neperf, but I very much like to know where > >> my interrupts are going and don't like them moving around. That is why I am > >> not particularly fond of either flavor of irq balancing. > >> > >> That being the case, whatever is out there aught to be able to be disabled on > >> a running system without having to roll bits or reboot. > > > > Adding a "module" parameter to disable it should be cheap, isn't it? > > Except the irq balancing is system-wide. Adding per-device exemptions to an > obsolete feature seems like the wrong way to go. No, not a per-device-exemption. My reasoning was: If the IRQ balancer bounces the IRQ too often, doing it less often seems to be the correct solution. One cache miss each ten seconds sounds like it should be OK. As said before, I can't verify this theory. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Re: Bad network performance over 2Gbps 2008-04-22 7:36 ` Bodo Eggert @ 2008-04-22 17:46 ` Kok, Auke 0 siblings, 0 replies; 17+ messages in thread From: Kok, Auke @ 2008-04-22 17:46 UTC (permalink / raw) To: Bodo Eggert Cc: Chris Snook, Rick Jones, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Linux Kernel Mailing List, Anton Titov, H. Willstrand, netdev, Jesse Brandeburg, Linus Torvalds, Andrew Morton Bodo Eggert wrote: > On Mon, 21 Apr 2008, Chris Snook wrote: >> Bodo Eggert wrote: >>> On Mon, 21 Apr 2008, Rick Jones wrote: >>>> Bodo Eggert wrote: >>>>> On Mon, 21 Apr 2008, Rick Jones wrote: > >>>>>> Be it kernel or user space, for consistent benchmark results it needs to >>>>>> be >>>>>> able to be turned-off without turning the code. That leaves me in >>>>>> agreement >>>>>> with Stephen that if it must exist, the user space one would be >>>>>> preferable. >>>>>> It can be easily terminated with extreme prejudice. >>>>> I agree that having a full-featured userspace balancer daemon with lots of >>>>> intelligence will be theoretically better, but if you can have a simple >>>>> daemon doing OK on many machines for less than the userspace daemon's >>>>> kernel stack, why not? >>>> Perhaps my judgement is too colored by benchmark(et)ing, and desires to have >>>> repeatable results on things like neperf, but I very much like to know where >>>> my interrupts are going and don't like them moving around. That is why I am >>>> not particularly fond of either flavor of irq balancing. >>>> >>>> That being the case, whatever is out there aught to be able to be disabled on >>>> a running system without having to roll bits or reboot. >>> Adding a "module" parameter to disable it should be cheap, isn't it? >> Except the irq balancing is system-wide. Adding per-device exemptions to an >> obsolete feature seems like the wrong way to go. > > No, not a per-device-exemption. My reasoning was: If the IRQ balancer > bounces the IRQ too often, doing it less often seems to be the correct > solution. One cache miss each ten seconds sounds like it should be OK. > As said before, I can't verify this theory. this is exaclty what the userspace irqbalance does and it's even optimized to not do those migrations once every 10 seconds if things look OK. from that perspective, it's definately more mature and it's maintained as well. Auke ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <1208282804.23631.27.camel@localhost>]
* Re: Bad network performance over 2Gbps [not found] <1208282804.23631.27.camel@localhost> @ 2008-04-15 20:15 ` H. Willstrand 2008-04-15 20:34 ` Kok, Auke 0 siblings, 1 reply; 17+ messages in thread From: H. Willstrand @ 2008-04-15 20:15 UTC (permalink / raw) To: Anton Titov, netdev [Changed mail list] On Tue, Apr 15, 2008 at 8:06 PM, Anton Titov <a.titov@host.bg> wrote: > I use Linux for serving a huge amount of static web on few servers. When > network traffic goes above 2Gbit/sec ksoftirqd/5 (not every time 5, but > every time just one) starts using exactly 100% CPU time and packet > packet loss starts preventing traffic from going up. When the network > traffic is lower than 1.9Gbit ksoftirqds use 0% CPU according to top. > > Uplink is 6 gigabit Intel cards bonded together using 802.3ad algorithm > with xmit_hash_policy set to layer3+4. On the other side is Cisco 2960 > switch. Machine is with two quad core Intel Xeons @2.33GHz. > > Here goes a screen snapshot of "top" command. The described behavior > have nothing to do with 13% io-wait. It happens even if it is 0% > io-wait. > http://www.titov.net/misc/top-snap.png > > kernel configuration: > http://www.titov.net/misc/config.gz > > /proc/interrupts, lspci, dmesg (nothing intresting there), ifconfig, > uname -a: > http://www.titov.net/misc/misc.txt.gz > > Is it a Linux bug or some hardware limitation? > > Regards, > Anton Titov > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Bad network performance over 2Gbps 2008-04-15 20:15 ` H. Willstrand @ 2008-04-15 20:34 ` Kok, Auke 2008-04-15 20:59 ` Chris Snook 0 siblings, 1 reply; 17+ messages in thread From: Kok, Auke @ 2008-04-15 20:34 UTC (permalink / raw) To: H. Willstrand; +Cc: Anton Titov, netdev, Jesse Brandeburg H. Willstrand wrote: > [Changed mail list] > > On Tue, Apr 15, 2008 at 8:06 PM, Anton Titov <a.titov@host.bg> wrote: >> I use Linux for serving a huge amount of static web on few servers. When >> network traffic goes above 2Gbit/sec ksoftirqd/5 (not every time 5, but >> every time just one) starts using exactly 100% CPU time and packet >> packet loss starts preventing traffic from going up. When the network >> traffic is lower than 1.9Gbit ksoftirqds use 0% CPU according to top. >> >> Uplink is 6 gigabit Intel cards bonded together using 802.3ad algorithm >> with xmit_hash_policy set to layer3+4. On the other side is Cisco 2960 >> switch. Machine is with two quad core Intel Xeons @2.33GHz. >> >> Here goes a screen snapshot of "top" command. The described behavior >> have nothing to do with 13% io-wait. It happens even if it is 0% >> io-wait. >> http://www.titov.net/misc/top-snap.png >> >> kernel configuration: >> http://www.titov.net/misc/config.gz >> >> /proc/interrupts, lspci, dmesg (nothing intresting there), ifconfig, >> uname -a: >> http://www.titov.net/misc/misc.txt.gz >> >> Is it a Linux bug or some hardware limitation? I'm wondering if this is not a classical demonstration of the NAPI-irq trap where after migration all the interrupts from the various cards are migrated to a single CPU, and because of NAPI once they're busy polling won't ever migrate away from that CPU again. Have you looked at `cat /proc/interrupts` before and after this happens? My guess is that your specific situation can benefit from setting smp_affinity and forcing the NIC irq's so that you're at least occupying the load over multiple CPU's (but preferably ones that use the same cache!) will help relieve the situation. alternatively you might even see an improvement by disabling NAPI. depending on the driver that you're using this might be possible. I actually don't know much about bonding and how this affects everything, but my guess is that that's a less important factor in this issue. Cheers, Auke ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Bad network performance over 2Gbps 2008-04-15 20:34 ` Kok, Auke @ 2008-04-15 20:59 ` Chris Snook 2008-04-17 10:02 ` Anton Titov 0 siblings, 1 reply; 17+ messages in thread From: Chris Snook @ 2008-04-15 20:59 UTC (permalink / raw) To: Kok, Auke; +Cc: H. Willstrand, Anton Titov, netdev, Jesse Brandeburg Kok, Auke wrote: > H. Willstrand wrote: >> [Changed mail list] >> >> On Tue, Apr 15, 2008 at 8:06 PM, Anton Titov <a.titov@host.bg> wrote: >>> I use Linux for serving a huge amount of static web on few servers. When >>> network traffic goes above 2Gbit/sec ksoftirqd/5 (not every time 5, but >>> every time just one) starts using exactly 100% CPU time and packet >>> packet loss starts preventing traffic from going up. When the network >>> traffic is lower than 1.9Gbit ksoftirqds use 0% CPU according to top. >>> >>> Uplink is 6 gigabit Intel cards bonded together using 802.3ad algorithm >>> with xmit_hash_policy set to layer3+4. On the other side is Cisco 2960 >>> switch. Machine is with two quad core Intel Xeons @2.33GHz. >>> >>> Here goes a screen snapshot of "top" command. The described behavior >>> have nothing to do with 13% io-wait. It happens even if it is 0% >>> io-wait. >>> http://www.titov.net/misc/top-snap.png >>> >>> kernel configuration: >>> http://www.titov.net/misc/config.gz >>> >>> /proc/interrupts, lspci, dmesg (nothing intresting there), ifconfig, >>> uname -a: >>> http://www.titov.net/misc/misc.txt.gz >>> >>> Is it a Linux bug or some hardware limitation? > > I'm wondering if this is not a classical demonstration of the NAPI-irq trap where > after migration all the interrupts from the various cards are migrated to a single > CPU, and because of NAPI once they're busy polling won't ever migrate away from > that CPU again. > > Have you looked at `cat /proc/interrupts` before and after this happens? > > My guess is that your specific situation can benefit from setting smp_affinity and > forcing the NIC irq's so that you're at least occupying the load over multiple > CPU's (but preferably ones that use the same cache!) will help relieve the situation. > > alternatively you might even see an improvement by disabling NAPI. depending on > the driver that you're using this might be possible. > > I actually don't know much about bonding and how this affects everything, but my > guess is that that's a less important factor in this issue. > > Cheers, > > Auke I'm not sure that spreading IRQs out completely is necessarily a good idea, due to cache line ping-pong. I suspect you'll get optimal performance by assigning the six IRQs to two cores that share an L2 cache. Still, I think you're on to something here. Disabling NAPI and instead tuning the cards' interrupt coalescing settings might allow irqbalance to do a better job than it is currently. -- Chris ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Bad network performance over 2Gbps 2008-04-15 20:59 ` Chris Snook @ 2008-04-17 10:02 ` Anton Titov 2008-04-17 17:37 ` [PATCH] " Kok, Auke 0 siblings, 1 reply; 17+ messages in thread From: Anton Titov @ 2008-04-17 10:02 UTC (permalink / raw) To: Chris Snook; +Cc: Kok, Auke, H. Willstrand, netdev, Jesse Brandeburg On Tue, 2008-04-15 at 16:59 -0400, Chris Snook wrote: > Still, I think you're on to something here. Disabling NAPI and instead > tuning the cards' interrupt coalescing settings might allow irqbalance > to do a better job than it is currently. Disabling NAPI allowed me to push as much as 3.5Gbit out of the same server with ~ 20% of time CPUs doing software interrupts. Regards, Anton Titov ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH] Re: Bad network performance over 2Gbps 2008-04-17 10:02 ` Anton Titov @ 2008-04-17 17:37 ` Kok, Auke 2008-04-20 12:08 ` Denys Fedoryshchenko ` (3 more replies) 0 siblings, 4 replies; 17+ messages in thread From: Kok, Auke @ 2008-04-17 17:37 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Linux Kernel Mailing List Cc: Anton Titov, Chris Snook, H. Willstrand, netdev, Jesse Brandeburg, Linus Torvalds, Andrew Morton Anton Titov wrote: > On Tue, 2008-04-15 at 16:59 -0400, Chris Snook wrote: >> Still, I think you're on to something here. Disabling NAPI and instead >> tuning the cards' interrupt coalescing settings might allow irqbalance >> to do a better job than it is currently. > > Disabling NAPI allowed me to push as much as 3.5Gbit out of the same > server with ~ 20% of time CPUs doing software interrupts. yes, I really don't see this is such an amazing discovery - the in-kernel irqbalance code is totally wrong for network interrupts (and probably for most interrupts). on your system with 6 network interrupts it blows chunks and it's not NAPI that is the issue - NAPI will work just fine on it's own. By disabling NAPI and reverting to the in-driver irq moderation code you've effectively put the in-kernel irqbalance code to the sideline and this is what makes it work again. It's not the right solution. We keep seing this exact issue pop up everywhere - especially with e1000(e) datacenter users - this code _has_ to go or be fixed. Since there is a perfectly viable solution, I strongly suggest disabling it. This is not the first time I've sent this patch out in some form... Auke --- [X86] IRQBALANCE: Mark as BROKEN and disable by default The IRQBALANCE option causes interrupts to bounce all around on SMP systems quickly burying the CPU in migration cost and cache misses. Mainly affected are network interrupts and this results in one CPU pegged in softirqd completely. Disable this option and provide documentation to a better solution (userspace irqbalance daemon does overall the best job to begin with and only manual setting of smp_affinity will beat it). Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com> --- diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 6c70fed..956aa22 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1026,13 +1026,17 @@ config EFI platforms. config IRQBALANCE - def_bool y + def_bool n prompt "Enable kernel irq balancing" - depends on X86_32 && SMP && X86_IO_APIC + depends on X86_32 && SMP && X86_IO_APIC && BROKEN help The default yes will allow the kernel to do irq load balancing. Saying no will keep the kernel from doing irq load balancing. + This option is known to cause performance issues on SMP + systems. The preferred method is to use the userspace + 'irqbalance' daemon instead. See http://irqbalance.org/. + config SECCOMP def_bool y prompt "Enable seccomp to safely compute untrusted bytecode" ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] Re: Bad network performance over 2Gbps 2008-04-17 17:37 ` [PATCH] " Kok, Auke @ 2008-04-20 12:08 ` Denys Fedoryshchenko 2008-04-21 13:19 ` Pavel Machek ` (2 subsequent siblings) 3 siblings, 0 replies; 17+ messages in thread From: Denys Fedoryshchenko @ 2008-04-20 12:08 UTC (permalink / raw) To: Kok, Auke Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Linux Kernel Mailing List, Anton Titov, Chris Snook, H. Willstrand, netdev, Jesse Brandeburg, Andrew Morton By default also without IRQBALANCE enabled in kernel, APIC or someone else distributing interrupts over processors too. There is no irqbalance daemon or whatever. For example: Router-KARAM ~ # cat /proc/interrupts CPU0 CPU1 0: 87956938 1403052485 IO-APIC-edge timer 1: 0 2 IO-APIC-edge i8042 9: 0 0 IO-APIC-fasteoi acpi 19: 140 5714 IO-APIC-fasteoi ohci_hcd:usb1, ohci_hcd:usb2 24: 675673280 1186506694 IO-APIC-fasteoi eth2 26: 717865662 2201633562 IO-APIC-fasteoi eth0 27: 1869190 23075556 IO-APIC-fasteoi eth1 NMI: 0 0 Non-maskable interrupts LOC: 1403052485 87956683 Local timer interrupts RES: 75059 25408 Rescheduling interrupts CAL: 99542 83 function call interrupts TLB: 616 200 TLB shootdowns TRM: 0 0 Thermal event interrupts SPU: 0 0 Spurious interrupts ERR: 0 MIS: 0 sunfire-1 ~ # cat config|grep -i irq CONFIG_GENERIC_HARDIRQS=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_PENDING_IRQ=y # CONFIG_IRQBALANCE is not set CONFIG_HT_IRQ=y # CONFIG_HPET_RTC_IRQ is not set CONFIG_TRACE_IRQFLAGS_SUPPORT=y # CONFIG_DEBUG_SHIRQ is not set Is it harmful too? On Thursday 17 April 2008 20:37, Kok, Auke wrote: > Anton Titov wrote: > > On Tue, 2008-04-15 at 16:59 -0400, Chris Snook wrote: > >> Still, I think you're on to something here. Disabling NAPI and instead > >> tuning the cards' interrupt coalescing settings might allow irqbalance > >> to do a better job than it is currently. > > > > Disabling NAPI allowed me to push as much as 3.5Gbit out of the same > > server with ~ 20% of time CPUs doing software interrupts. > > yes, I really don't see this is such an amazing discovery - the in-kernel > irqbalance code is totally wrong for network interrupts (and probably for most > interrupts). > > on your system with 6 network interrupts it blows chunks and it's not NAPI that is > the issue - NAPI will work just fine on it's own. By disabling NAPI and reverting > to the in-driver irq moderation code you've effectively put the in-kernel > irqbalance code to the sideline and this is what makes it work again. > > It's not the right solution. > > We keep seing this exact issue pop up everywhere - especially with e1000(e) > datacenter users - this code _has_ to go or be fixed. Since there is a perfectly > viable solution, I strongly suggest disabling it. > > This is not the first time I've sent this patch out in some form... > > Auke > > > --- > [X86] IRQBALANCE: Mark as BROKEN and disable by default > > The IRQBALANCE option causes interrupts to bounce all around on SMP systems > quickly burying the CPU in migration cost and cache misses. Mainly affected are > network interrupts and this results in one CPU pegged in softirqd completely. > > Disable this option and provide documentation to a better solution (userspace > irqbalance daemon does overall the best job to begin with and only manual setting > of smp_affinity will beat it). > > Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com> > > --- > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index 6c70fed..956aa22 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -1026,13 +1026,17 @@ config EFI > platforms. > > config IRQBALANCE > - def_bool y > + def_bool n > prompt "Enable kernel irq balancing" > - depends on X86_32 && SMP && X86_IO_APIC > + depends on X86_32 && SMP && X86_IO_APIC && BROKEN > help > The default yes will allow the kernel to do irq load balancing. > Saying no will keep the kernel from doing irq load balancing. > > + This option is known to cause performance issues on SMP > + systems. The preferred method is to use the userspace > + 'irqbalance' daemon instead. See http://irqbalance.org/. > + > config SECCOMP > def_bool y > prompt "Enable seccomp to safely compute untrusted bytecode" > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- ------ Technical Manager Virtual ISP S.A.L. Lebanon ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Re: Bad network performance over 2Gbps 2008-04-17 17:37 ` [PATCH] " Kok, Auke 2008-04-20 12:08 ` Denys Fedoryshchenko @ 2008-04-21 13:19 ` Pavel Machek 2008-04-21 16:38 ` Kok, Auke 2008-04-21 15:28 ` Ingo Molnar 2008-04-22 5:07 ` Bill Fink 3 siblings, 1 reply; 17+ messages in thread From: Pavel Machek @ 2008-04-21 13:19 UTC (permalink / raw) To: Kok, Auke Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Linux Kernel Mailing List, Anton Titov, Chris Snook, H. Willstrand, netdev, Jesse Brandeburg, Linus Torvalds, Andrew Morton Hi! > [X86] IRQBALANCE: Mark as BROKEN and disable by default > > The IRQBALANCE option causes interrupts to bounce all around on SMP systems > quickly burying the CPU in migration cost and cache misses. Mainly affected are > network interrupts and this results in one CPU pegged in softirqd completely. > > Disable this option and provide documentation to a better solution (userspace > irqbalance daemon does overall the best job to begin with and only manual setting > of smp_affinity will beat it). > > Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com> > > --- > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index 6c70fed..956aa22 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -1026,13 +1026,17 @@ config EFI > platforms. > > config IRQBALANCE > - def_bool y > + def_bool n ACK. > prompt "Enable kernel irq balancing" > - depends on X86_32 && SMP && X86_IO_APIC > + depends on X86_32 && SMP && X86_IO_APIC && BROKEN This is wrong. irqbalance works, there's nothing wrong with it; but it has nasty sideffects. > help > The default yes will allow the kernel to do irq load balancing. > Saying no will keep the kernel from doing irq load balancing. > > + This option is known to cause performance issues on SMP > + systems. The preferred method is to use the userspace > + 'irqbalance' daemon instead. See http://irqbalance.org/. > + ACK. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Re: Bad network performance over 2Gbps 2008-04-21 13:19 ` Pavel Machek @ 2008-04-21 16:38 ` Kok, Auke 0 siblings, 0 replies; 17+ messages in thread From: Kok, Auke @ 2008-04-21 16:38 UTC (permalink / raw) To: Pavel Machek Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Linux Kernel Mailing List, Anton Titov, Chris Snook, H. Willstrand, netdev, Jesse Brandeburg, Linus Torvalds, Andrew Morton Pavel Machek wrote: > Hi! > >> [X86] IRQBALANCE: Mark as BROKEN and disable by default >> >> The IRQBALANCE option causes interrupts to bounce all around on SMP systems >> quickly burying the CPU in migration cost and cache misses. Mainly affected are >> network interrupts and this results in one CPU pegged in softirqd completely. >> >> Disable this option and provide documentation to a better solution (userspace >> irqbalance daemon does overall the best job to begin with and only manual setting >> of smp_affinity will beat it). >> >> Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com> >> >> --- >> >> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig >> index 6c70fed..956aa22 100644 >> --- a/arch/x86/Kconfig >> +++ b/arch/x86/Kconfig >> @@ -1026,13 +1026,17 @@ config EFI >> platforms. >> >> config IRQBALANCE >> - def_bool y >> + def_bool n > > ACK. >> prompt "Enable kernel irq balancing" >> - depends on X86_32 && SMP && X86_IO_APIC >> + depends on X86_32 && SMP && X86_IO_APIC && BROKEN > > This is wrong. irqbalance works, there's nothing wrong with it; but it > has nasty sideffects. ok, I'm fine with taking that part out of the patch. Ingo, want me to send an updated patch? > >> help >> The default yes will allow the kernel to do irq load balancing. >> Saying no will keep the kernel from doing irq load balancing. >> >> + This option is known to cause performance issues on SMP >> + systems. The preferred method is to use the userspace >> + 'irqbalance' daemon instead. See http://irqbalance.org/. >> + > > ACK. > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Re: Bad network performance over 2Gbps 2008-04-17 17:37 ` [PATCH] " Kok, Auke 2008-04-20 12:08 ` Denys Fedoryshchenko 2008-04-21 13:19 ` Pavel Machek @ 2008-04-21 15:28 ` Ingo Molnar 2008-04-21 16:58 ` Kok, Auke 2008-04-22 5:07 ` Bill Fink 3 siblings, 1 reply; 17+ messages in thread From: Ingo Molnar @ 2008-04-21 15:28 UTC (permalink / raw) To: Kok, Auke Cc: Thomas Gleixner, H. Peter Anvin, Linux Kernel Mailing List, Anton Titov, Chris Snook, H. Willstrand, netdev, Jesse Brandeburg, Linus Torvalds, Andrew Morton * Kok, Auke <auke-jan.h.kok@intel.com> wrote: > We keep seing this exact issue pop up everywhere - especially with > e1000(e) datacenter users - this code _has_ to go or be fixed. Since > there is a perfectly viable solution, I strongly suggest disabling it. strongly agreed. Thanks Auke, applied. Ingo ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Re: Bad network performance over 2Gbps 2008-04-21 15:28 ` Ingo Molnar @ 2008-04-21 16:58 ` Kok, Auke 2008-04-21 18:35 ` Andi Kleen 0 siblings, 1 reply; 17+ messages in thread From: Kok, Auke @ 2008-04-21 16:58 UTC (permalink / raw) To: Ingo Molnar Cc: Thomas Gleixner, H. Peter Anvin, Linux Kernel Mailing List, Anton Titov, Chris Snook, H. Willstrand, netdev, Jesse Brandeburg, Linus Torvalds, Andrew Morton Ingo Molnar wrote: > * Kok, Auke <auke-jan.h.kok@intel.com> wrote: > >> We keep seing this exact issue pop up everywhere - especially with >> e1000(e) datacenter users - this code _has_ to go or be fixed. Since >> there is a perfectly viable solution, I strongly suggest disabling it. > > strongly agreed. Thanks Auke, applied. > > Ingo excellent, ignore my other reply to Pavel - I didn't see this reply yet :) Thanks Ingo Auke ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Re: Bad network performance over 2Gbps 2008-04-21 16:58 ` Kok, Auke @ 2008-04-21 18:35 ` Andi Kleen 0 siblings, 0 replies; 17+ messages in thread From: Andi Kleen @ 2008-04-21 18:35 UTC (permalink / raw) To: Kok, Auke Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Linux Kernel Mailing List, Anton Titov, Chris Snook, H. Willstrand, netdev, Jesse Brandeburg, Linus Torvalds, Andrew Morton "Kok, Auke" <auke-jan.h.kok@intel.com> writes: > Ingo Molnar wrote: >> * Kok, Auke <auke-jan.h.kok@intel.com> wrote: >> >>> We keep seing this exact issue pop up everywhere - especially with >>> e1000(e) datacenter users - this code _has_ to go or be fixed. Since >>> there is a perfectly viable solution, I strongly suggest disabling it. >> >> strongly agreed. Thanks Auke, applied. >> >> Ingo > > > excellent, ignore my other reply to Pavel - I didn't see this reply yet :) Shouldn't you just add it to the FeatureRemoval list too and remove it then quickly? No need to keep disabled and known to be wrong code around. -Andi ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Re: Bad network performance over 2Gbps 2008-04-17 17:37 ` [PATCH] " Kok, Auke ` (2 preceding siblings ...) 2008-04-21 15:28 ` Ingo Molnar @ 2008-04-22 5:07 ` Bill Fink 3 siblings, 0 replies; 17+ messages in thread From: Bill Fink @ 2008-04-22 5:07 UTC (permalink / raw) To: Kok, Auke Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Linux Kernel Mailing List, Anton Titov, Chris Snook, H. Willstrand, netdev, Jesse Brandeburg, Linus Torvalds, Andrew Morton On Thu, 17 Apr 2008, Kok, Auke wrote: > [X86] IRQBALANCE: Mark as BROKEN and disable by default > > The IRQBALANCE option causes interrupts to bounce all around on SMP systems > quickly burying the CPU in migration cost and cache misses. Mainly affected are > network interrupts and this results in one CPU pegged in softirqd completely. > > Disable this option and provide documentation to a better solution (userspace > irqbalance daemon does overall the best job to begin with and only manual setting > of smp_affinity will beat it). > > Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com> > > --- > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index 6c70fed..956aa22 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -1026,13 +1026,17 @@ config EFI > platforms. > > config IRQBALANCE > - def_bool y > + def_bool n > prompt "Enable kernel irq balancing" > - depends on X86_32 && SMP && X86_IO_APIC > + depends on X86_32 && SMP && X86_IO_APIC && BROKEN > help > The default yes will allow the kernel to do irq load balancing. > Saying no will keep the kernel from doing irq load balancing. Since you're changing the default setting, shouldn't the above be changed to: Saying yes will allow the kernel to do irq load balancing. The default no will keep the kernel from doing irq load balancing. > + This option is known to cause performance issues on SMP > + systems. The preferred method is to use the userspace > + 'irqbalance' daemon instead. See http://irqbalance.org/. > + > config SECCOMP > def_bool y > prompt "Enable seccomp to safely compute untrusted bytecode" -Bill ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2008-04-22 17:46 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <aiXVe-Yn-7@gated-at.bofh.it>
[not found] ` <ajGfA-7rt-9@gated-at.bofh.it>
[not found] ` <ajGfA-7rt-11@gated-at.bofh.it>
[not found] ` <ajGfA-7rt-13@gated-at.bofh.it>
[not found] ` <ajGfA-7rt-15@gated-at.bofh.it>
[not found] ` <ajGfA-7rt-7@gated-at.bofh.it>
2008-04-19 15:05 ` [PATCH] Re: Bad network performance over 2Gbps Bodo Eggert
[not found] ` <E1JnEcl-0000xc-D9@be1.7eggert.dyndns.org>
2008-04-19 19:23 ` Stephen Hemminger
2008-04-21 16:42 ` Rick Jones
2008-04-21 19:52 ` Bodo Eggert
2008-04-21 20:02 ` Rick Jones
2008-04-21 21:08 ` Bodo Eggert
2008-04-21 21:30 ` Chris Snook
2008-04-22 7:36 ` Bodo Eggert
2008-04-22 17:46 ` Kok, Auke
[not found] <1208282804.23631.27.camel@localhost>
2008-04-15 20:15 ` H. Willstrand
2008-04-15 20:34 ` Kok, Auke
2008-04-15 20:59 ` Chris Snook
2008-04-17 10:02 ` Anton Titov
2008-04-17 17:37 ` [PATCH] " Kok, Auke
2008-04-20 12:08 ` Denys Fedoryshchenko
2008-04-21 13:19 ` Pavel Machek
2008-04-21 16:38 ` Kok, Auke
2008-04-21 15:28 ` Ingo Molnar
2008-04-21 16:58 ` Kok, Auke
2008-04-21 18:35 ` Andi Kleen
2008-04-22 5:07 ` Bill Fink
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).