netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Re: Bad network performance over 2Gbps
  2008-04-17 10:02       ` Anton Titov
@ 2008-04-17 17:37         ` Kok, Auke
  2008-04-20 12:08           ` Denys Fedoryshchenko
                             ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Kok, Auke @ 2008-04-17 17:37 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Linux Kernel Mailing List
  Cc: Anton Titov, Chris Snook, H. Willstrand, netdev, Jesse Brandeburg,
	Linus Torvalds, Andrew Morton

Anton Titov wrote:
> On Tue, 2008-04-15 at 16:59 -0400, Chris Snook wrote:
>> Still, I think you're on to something here.  Disabling NAPI and instead 
>> tuning the cards' interrupt coalescing settings might allow irqbalance 
>> to do a better job than it is currently.
> 
> Disabling NAPI allowed me to push as much as 3.5Gbit out of the same
> server with ~ 20% of time CPUs doing software interrupts.

yes, I really don't see this is such an amazing discovery - the in-kernel
irqbalance code is totally wrong for network interrupts (and probably for most
interrupts).

on your system with 6 network interrupts it blows chunks and it's not NAPI that is
the issue - NAPI will work just fine on it's own. By disabling NAPI and reverting
to the in-driver irq moderation code you've effectively put the in-kernel
irqbalance code to the sideline and this is what makes it work again.

It's not the right solution.

We keep seing this exact issue pop up everywhere - especially with e1000(e)
datacenter users - this code _has_ to go or be fixed. Since there is a perfectly
viable solution, I strongly suggest disabling it.

This is not the first time I've sent this patch out in some form...

Auke


---
[X86] IRQBALANCE: Mark as BROKEN and disable by default

The IRQBALANCE option causes interrupts to bounce all around on SMP systems
quickly burying the CPU in migration cost and cache misses. Mainly affected are
network interrupts and this results in one CPU pegged in softirqd completely.

Disable this option and provide documentation to a better solution (userspace
irqbalance daemon does overall the best job to begin with and only manual setting
of smp_affinity will beat it).

Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>

---

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 6c70fed..956aa22 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1026,13 +1026,17 @@ config EFI
   	platforms.

 config IRQBALANCE
-	def_bool y
+	def_bool n
 	prompt "Enable kernel irq balancing"
-	depends on X86_32 && SMP && X86_IO_APIC
+	depends on X86_32 && SMP && X86_IO_APIC && BROKEN
 	help
 	  The default yes will allow the kernel to do irq load balancing.
 	  Saying no will keep the kernel from doing irq load balancing.

+	  This option is known to cause performance issues on SMP
+	  systems. The preferred method is to use the userspace
+	  'irqbalance' daemon instead. See http://irqbalance.org/.
+
 config SECCOMP
 	def_bool y
 	prompt "Enable seccomp to safely compute untrusted bytecode"

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] Re: Bad network performance over 2Gbps
       [not found]         ` <ajGfA-7rt-7@gated-at.bofh.it>
@ 2008-04-19 15:05           ` Bodo Eggert
       [not found]           ` <E1JnEcl-0000xc-D9@be1.7eggert.dyndns.org>
  1 sibling, 0 replies; 17+ messages in thread
From: Bodo Eggert @ 2008-04-19 15:05 UTC (permalink / raw)
  To: Kok, Auke, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Linux Kernel Mailing List <

Kok, Auke <auke-jan.h.kok@intel.com> wrote:

> [X86] IRQBALANCE: Mark as BROKEN and disable by default
> 
> The IRQBALANCE option causes interrupts to bounce all around on SMP systems
> quickly burying the CPU in migration cost and cache misses. Mainly affected
> are network interrupts and this results in one CPU pegged in softirqd
> completely.

If this is the problem, maybe it would help to only balance the IRQs each
e.g. ten seconds? Unfortunately I have no SMP system to try it out.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Re: Bad network performance over 2Gbps
       [not found]           ` <E1JnEcl-0000xc-D9@be1.7eggert.dyndns.org>
@ 2008-04-19 19:23             ` Stephen Hemminger
  2008-04-21 16:42             ` Rick Jones
  1 sibling, 0 replies; 17+ messages in thread
From: Stephen Hemminger @ 2008-04-19 19:23 UTC (permalink / raw)
  To: 7eggert
  Cc: Kok, Auke, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Linux Kernel Mailing List, Anton Titov, Chris Snook,
	H. Willstrand, netdev, Jesse Brandeburg, Linus Torvalds,
	Andrew Morton

Bodo Eggert wrote:
> Kok, Auke <auke-jan.h.kok@intel.com> wrote:
>
>   
>> [X86] IRQBALANCE: Mark as BROKEN and disable by default
>>
>> The IRQBALANCE option causes interrupts to bounce all around on SMP systems
>> quickly burying the CPU in migration cost and cache misses. Mainly affected
>> are network interrupts and this results in one CPU pegged in softirqd
>> completely.
>>     
>
> If this is the problem, maybe it would help to only balance the IRQs each
> e.g. ten seconds? Unfortunately I have no SMP system to try it out.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>   
The kernel level IRQBALANCE is useless. The userlevel irqbalance does 
the right thing,
it handles multi-core, and network devices, and all the other special cases.
*Don't use kernel level irqbalance*

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Re: Bad network performance over 2Gbps
  2008-04-17 17:37         ` [PATCH] " Kok, Auke
@ 2008-04-20 12:08           ` Denys Fedoryshchenko
  2008-04-21 13:19           ` Pavel Machek
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: Denys Fedoryshchenko @ 2008-04-20 12:08 UTC (permalink / raw)
  To: Kok, Auke
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Linux Kernel Mailing List, Anton Titov, Chris Snook,
	H. Willstrand, netdev, Jesse Brandeburg, Andrew Morton

By default also without IRQBALANCE enabled in kernel, APIC or someone else distributing interrupts over processors too.
There is no irqbalance daemon or whatever.

For example:
Router-KARAM ~ # cat /proc/interrupts
           CPU0       CPU1
  0:   87956938 1403052485   IO-APIC-edge      timer
  1:          0          2   IO-APIC-edge      i8042
  9:          0          0   IO-APIC-fasteoi   acpi
 19:        140       5714   IO-APIC-fasteoi   ohci_hcd:usb1, ohci_hcd:usb2
 24:  675673280 1186506694   IO-APIC-fasteoi   eth2
 26:  717865662 2201633562   IO-APIC-fasteoi   eth0
 27:    1869190   23075556   IO-APIC-fasteoi   eth1
NMI:          0          0   Non-maskable interrupts
LOC: 1403052485   87956683   Local timer interrupts
RES:      75059      25408   Rescheduling interrupts
CAL:      99542         83   function call interrupts
TLB:        616        200   TLB shootdowns
TRM:          0          0   Thermal event interrupts
SPU:          0          0   Spurious interrupts
ERR:          0
MIS:          0

sunfire-1 ~ # cat config|grep -i irq
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_PENDING_IRQ=y
# CONFIG_IRQBALANCE is not set
CONFIG_HT_IRQ=y
# CONFIG_HPET_RTC_IRQ is not set
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
# CONFIG_DEBUG_SHIRQ is not set

Is it harmful too?

On Thursday 17 April 2008 20:37, Kok, Auke wrote:
> Anton Titov wrote:
> > On Tue, 2008-04-15 at 16:59 -0400, Chris Snook wrote:
> >> Still, I think you're on to something here.  Disabling NAPI and instead 
> >> tuning the cards' interrupt coalescing settings might allow irqbalance 
> >> to do a better job than it is currently.
> > 
> > Disabling NAPI allowed me to push as much as 3.5Gbit out of the same
> > server with ~ 20% of time CPUs doing software interrupts.
> 
> yes, I really don't see this is such an amazing discovery - the in-kernel
> irqbalance code is totally wrong for network interrupts (and probably for most
> interrupts).
> 
> on your system with 6 network interrupts it blows chunks and it's not NAPI that is
> the issue - NAPI will work just fine on it's own. By disabling NAPI and reverting
> to the in-driver irq moderation code you've effectively put the in-kernel
> irqbalance code to the sideline and this is what makes it work again.
> 
> It's not the right solution.
> 
> We keep seing this exact issue pop up everywhere - especially with e1000(e)
> datacenter users - this code _has_ to go or be fixed. Since there is a perfectly
> viable solution, I strongly suggest disabling it.
> 
> This is not the first time I've sent this patch out in some form...
> 
> Auke
> 
> 
> ---
> [X86] IRQBALANCE: Mark as BROKEN and disable by default
> 
> The IRQBALANCE option causes interrupts to bounce all around on SMP systems
> quickly burying the CPU in migration cost and cache misses. Mainly affected are
> network interrupts and this results in one CPU pegged in softirqd completely.
> 
> Disable this option and provide documentation to a better solution (userspace
> irqbalance daemon does overall the best job to begin with and only manual setting
> of smp_affinity will beat it).
> 
> Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
> 
> ---
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 6c70fed..956aa22 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1026,13 +1026,17 @@ config EFI
>    	platforms.
> 
>  config IRQBALANCE
> -	def_bool y
> +	def_bool n
>  	prompt "Enable kernel irq balancing"
> -	depends on X86_32 && SMP && X86_IO_APIC
> +	depends on X86_32 && SMP && X86_IO_APIC && BROKEN
>  	help
>  	  The default yes will allow the kernel to do irq load balancing.
>  	  Saying no will keep the kernel from doing irq load balancing.
> 
> +	  This option is known to cause performance issues on SMP
> +	  systems. The preferred method is to use the userspace
> +	  'irqbalance' daemon instead. See http://irqbalance.org/.
> +
>  config SECCOMP
>  	def_bool y
>  	prompt "Enable seccomp to safely compute untrusted bytecode"
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
------
Technical Manager
Virtual ISP S.A.L.
Lebanon

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Re: Bad network performance over 2Gbps
  2008-04-17 17:37         ` [PATCH] " Kok, Auke
  2008-04-20 12:08           ` Denys Fedoryshchenko
@ 2008-04-21 13:19           ` Pavel Machek
  2008-04-21 16:38             ` Kok, Auke
  2008-04-21 15:28           ` Ingo Molnar
  2008-04-22  5:07           ` Bill Fink
  3 siblings, 1 reply; 17+ messages in thread
From: Pavel Machek @ 2008-04-21 13:19 UTC (permalink / raw)
  To: Kok, Auke
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Linux Kernel Mailing List, Anton Titov, Chris Snook,
	H. Willstrand, netdev, Jesse Brandeburg, Linus Torvalds,
	Andrew Morton

Hi!

> [X86] IRQBALANCE: Mark as BROKEN and disable by default
> 
> The IRQBALANCE option causes interrupts to bounce all around on SMP systems
> quickly burying the CPU in migration cost and cache misses. Mainly affected are
> network interrupts and this results in one CPU pegged in softirqd completely.
> 
> Disable this option and provide documentation to a better solution (userspace
> irqbalance daemon does overall the best job to begin with and only manual setting
> of smp_affinity will beat it).
> 
> Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
> 
> ---
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 6c70fed..956aa22 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1026,13 +1026,17 @@ config EFI
>    	platforms.
> 
>  config IRQBALANCE
> -	def_bool y
> +	def_bool n

ACK.
>  	prompt "Enable kernel irq balancing"
> -	depends on X86_32 && SMP && X86_IO_APIC
> +	depends on X86_32 && SMP && X86_IO_APIC && BROKEN

This is wrong. irqbalance works, there's nothing wrong with it; but it
has nasty sideffects.

>  	help
>  	  The default yes will allow the kernel to do irq load balancing.
>  	  Saying no will keep the kernel from doing irq load balancing.
> 
> +	  This option is known to cause performance issues on SMP
> +	  systems. The preferred method is to use the userspace
> +	  'irqbalance' daemon instead. See http://irqbalance.org/.
> +

ACK.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Re: Bad network performance over 2Gbps
  2008-04-17 17:37         ` [PATCH] " Kok, Auke
  2008-04-20 12:08           ` Denys Fedoryshchenko
  2008-04-21 13:19           ` Pavel Machek
@ 2008-04-21 15:28           ` Ingo Molnar
  2008-04-21 16:58             ` Kok, Auke
  2008-04-22  5:07           ` Bill Fink
  3 siblings, 1 reply; 17+ messages in thread
From: Ingo Molnar @ 2008-04-21 15:28 UTC (permalink / raw)
  To: Kok, Auke
  Cc: Thomas Gleixner, H. Peter Anvin, Linux Kernel Mailing List,
	Anton Titov, Chris Snook, H. Willstrand, netdev, Jesse Brandeburg,
	Linus Torvalds, Andrew Morton


* Kok, Auke <auke-jan.h.kok@intel.com> wrote:

> We keep seing this exact issue pop up everywhere - especially with 
> e1000(e) datacenter users - this code _has_ to go or be fixed. Since 
> there is a perfectly viable solution, I strongly suggest disabling it.

strongly agreed. Thanks Auke, applied.

	Ingo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Re: Bad network performance over 2Gbps
  2008-04-21 13:19           ` Pavel Machek
@ 2008-04-21 16:38             ` Kok, Auke
  0 siblings, 0 replies; 17+ messages in thread
From: Kok, Auke @ 2008-04-21 16:38 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Linux Kernel Mailing List, Anton Titov, Chris Snook,
	H. Willstrand, netdev, Jesse Brandeburg, Linus Torvalds,
	Andrew Morton

Pavel Machek wrote:
> Hi!
> 
>> [X86] IRQBALANCE: Mark as BROKEN and disable by default
>>
>> The IRQBALANCE option causes interrupts to bounce all around on SMP systems
>> quickly burying the CPU in migration cost and cache misses. Mainly affected are
>> network interrupts and this results in one CPU pegged in softirqd completely.
>>
>> Disable this option and provide documentation to a better solution (userspace
>> irqbalance daemon does overall the best job to begin with and only manual setting
>> of smp_affinity will beat it).
>>
>> Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
>>
>> ---
>>
>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index 6c70fed..956aa22 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -1026,13 +1026,17 @@ config EFI
>>    	platforms.
>>
>>  config IRQBALANCE
>> -	def_bool y
>> +	def_bool n
> 
> ACK.
>>  	prompt "Enable kernel irq balancing"
>> -	depends on X86_32 && SMP && X86_IO_APIC
>> +	depends on X86_32 && SMP && X86_IO_APIC && BROKEN
> 
> This is wrong. irqbalance works, there's nothing wrong with it; but it
> has nasty sideffects.

ok, I'm fine with taking that part out of the patch.

Ingo, want me to send an updated patch?


> 
>>  	help
>>  	  The default yes will allow the kernel to do irq load balancing.
>>  	  Saying no will keep the kernel from doing irq load balancing.
>>
>> +	  This option is known to cause performance issues on SMP
>> +	  systems. The preferred method is to use the userspace
>> +	  'irqbalance' daemon instead. See http://irqbalance.org/.
>> +
> 
> ACK.
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Re: Bad network performance over 2Gbps
       [not found]           ` <E1JnEcl-0000xc-D9@be1.7eggert.dyndns.org>
  2008-04-19 19:23             ` Stephen Hemminger
@ 2008-04-21 16:42             ` Rick Jones
  2008-04-21 19:52               ` Bodo Eggert
  1 sibling, 1 reply; 17+ messages in thread
From: Rick Jones @ 2008-04-21 16:42 UTC (permalink / raw)
  To: 7eggert
  Cc: Kok, Auke, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Linux Kernel Mailing List, Anton Titov, Chris Snook,
	H. Willstrand, netdev, Jesse Brandeburg, Linus Torvalds,
	Andrew Morton

Bodo Eggert wrote:
> Kok, Auke <auke-jan.h.kok@intel.com> wrote:
> 
> 
>>[X86] IRQBALANCE: Mark as BROKEN and disable by default
>>
>>The IRQBALANCE option causes interrupts to bounce all around on SMP systems
>>quickly burying the CPU in migration cost and cache misses. Mainly affected
>>are network interrupts and this results in one CPU pegged in softirqd
>>completely.
> 
> 
> If this is the problem, maybe it would help to only balance the IRQs each
> e.g. ten seconds? Unfortunately I have no SMP system to try it out.

Be it kernel or user space, for consistent benchmark results it needs to 
be able to be turned-off without turning the code.  That leaves me in 
agreement with Stephen that if it must exist, the user space one would 
be preferable.  It can be easily terminated with extreme prejudice.

rick jones

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Re: Bad network performance over 2Gbps
  2008-04-21 15:28           ` Ingo Molnar
@ 2008-04-21 16:58             ` Kok, Auke
  2008-04-21 18:35               ` Andi Kleen
  0 siblings, 1 reply; 17+ messages in thread
From: Kok, Auke @ 2008-04-21 16:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, H. Peter Anvin, Linux Kernel Mailing List,
	Anton Titov, Chris Snook, H. Willstrand, netdev, Jesse Brandeburg,
	Linus Torvalds, Andrew Morton

Ingo Molnar wrote:
> * Kok, Auke <auke-jan.h.kok@intel.com> wrote:
> 
>> We keep seing this exact issue pop up everywhere - especially with 
>> e1000(e) datacenter users - this code _has_ to go or be fixed. Since 
>> there is a perfectly viable solution, I strongly suggest disabling it.
> 
> strongly agreed. Thanks Auke, applied.
> 
> 	Ingo


excellent, ignore my other reply to Pavel - I didn't see this reply yet :)

Thanks Ingo


Auke


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Re: Bad network performance over 2Gbps
  2008-04-21 16:58             ` Kok, Auke
@ 2008-04-21 18:35               ` Andi Kleen
  0 siblings, 0 replies; 17+ messages in thread
From: Andi Kleen @ 2008-04-21 18:35 UTC (permalink / raw)
  To: Kok, Auke
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Linux Kernel Mailing List, Anton Titov, Chris Snook,
	H. Willstrand, netdev, Jesse Brandeburg, Linus Torvalds,
	Andrew Morton

"Kok, Auke" <auke-jan.h.kok@intel.com> writes:

> Ingo Molnar wrote:
>> * Kok, Auke <auke-jan.h.kok@intel.com> wrote:
>> 
>>> We keep seing this exact issue pop up everywhere - especially with 
>>> e1000(e) datacenter users - this code _has_ to go or be fixed. Since 
>>> there is a perfectly viable solution, I strongly suggest disabling it.
>> 
>> strongly agreed. Thanks Auke, applied.
>> 
>> 	Ingo
>
>
> excellent, ignore my other reply to Pavel - I didn't see this reply yet :)

Shouldn't you just add it to the FeatureRemoval list too and remove it 
then quickly? No need to keep disabled and known to be wrong code around.

-Andi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Re: Bad network performance over 2Gbps
  2008-04-21 16:42             ` Rick Jones
@ 2008-04-21 19:52               ` Bodo Eggert
  2008-04-21 20:02                 ` Rick Jones
  0 siblings, 1 reply; 17+ messages in thread
From: Bodo Eggert @ 2008-04-21 19:52 UTC (permalink / raw)
  To: Rick Jones
  Cc: 7eggert, Kok, Auke, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Linux Kernel Mailing List, Anton Titov, Chris Snook,
	H. Willstrand, netdev, Jesse Brandeburg, Linus Torvalds,
	Andrew Morton

On Mon, 21 Apr 2008, Rick Jones wrote:
> Bodo Eggert wrote:
> > Kok, Auke <auke-jan.h.kok@intel.com> wrote:

> > > [X86] IRQBALANCE: Mark as BROKEN and disable by default
> > > 
> > > The IRQBALANCE option causes interrupts to bounce all around on SMP
> > > systems
> > > quickly burying the CPU in migration cost and cache misses. Mainly
> > > affected
> > > are network interrupts and this results in one CPU pegged in softirqd
> > > completely.
> > 
> > 
> > If this is the problem, maybe it would help to only balance the IRQs each
> > e.g. ten seconds? Unfortunately I have no SMP system to try it out.
> 
> Be it kernel or user space, for consistent benchmark results it needs to be
> able to be turned-off without turning the code.  That leaves me in agreement
> with Stephen that if it must exist, the user space one would be preferable.
> It can be easily terminated with extreme prejudice.

I agree that having a full-featured userspace balancer daemon with lots of 
intelligence will be theoretically better, but if you can have a simple
daemon doing OK on many machines for less than the userspace daemon's
kernel stack, why not?
-- 
Funny quotes:
31. Why do "overlook" and "oversee" mean opposite things?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Re: Bad network performance over 2Gbps
  2008-04-21 19:52               ` Bodo Eggert
@ 2008-04-21 20:02                 ` Rick Jones
  2008-04-21 21:08                   ` Bodo Eggert
  0 siblings, 1 reply; 17+ messages in thread
From: Rick Jones @ 2008-04-21 20:02 UTC (permalink / raw)
  To: Bodo Eggert
  Cc: Kok, Auke, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Linux Kernel Mailing List, Anton Titov, Chris Snook,
	H. Willstrand, netdev, Jesse Brandeburg, Linus Torvalds,
	Andrew Morton

Bodo Eggert wrote:
> On Mon, 21 Apr 2008, Rick Jones wrote:
>>Be it kernel or user space, for consistent benchmark results it needs to be
>>able to be turned-off without turning the code.  That leaves me in agreement
>>with Stephen that if it must exist, the user space one would be preferable.
>>It can be easily terminated with extreme prejudice.
> 
> 
> I agree that having a full-featured userspace balancer daemon with lots of 
> intelligence will be theoretically better, but if you can have a simple
> daemon doing OK on many machines for less than the userspace daemon's
> kernel stack, why not?

Perhaps my judgement is too colored by benchmark(et)ing, and desires to 
have repeatable results on things like neperf, but I very much like to 
know where my interrupts are going and don't like them moving around. 
That is why I am not particularly fond of either flavor of irq balancing.

That being the case, whatever is out there aught to be able to be 
disabled on a running system without having to roll bits or reboot.

rick jones

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Re: Bad network performance over 2Gbps
  2008-04-21 20:02                 ` Rick Jones
@ 2008-04-21 21:08                   ` Bodo Eggert
  2008-04-21 21:30                     ` Chris Snook
  0 siblings, 1 reply; 17+ messages in thread
From: Bodo Eggert @ 2008-04-21 21:08 UTC (permalink / raw)
  To: Rick Jones
  Cc: Bodo Eggert, Kok, Auke, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Linux Kernel Mailing List, Anton Titov,
	Chris Snook, H. Willstrand, netdev, Jesse Brandeburg,
	Linus Torvalds, Andrew Morton

On Mon, 21 Apr 2008, Rick Jones wrote:
> Bodo Eggert wrote:
> > On Mon, 21 Apr 2008, Rick Jones wrote:

> > > Be it kernel or user space, for consistent benchmark results it needs to
> > > be
> > > able to be turned-off without turning the code.  That leaves me in
> > > agreement
> > > with Stephen that if it must exist, the user space one would be
> > > preferable.
> > > It can be easily terminated with extreme prejudice.
> > 
> > 
> > I agree that having a full-featured userspace balancer daemon with lots of
> > intelligence will be theoretically better, but if you can have a simple
> > daemon doing OK on many machines for less than the userspace daemon's
> > kernel stack, why not?
> 
> Perhaps my judgement is too colored by benchmark(et)ing, and desires to have
> repeatable results on things like neperf, but I very much like to know where
> my interrupts are going and don't like them moving around. That is why I am
> not particularly fond of either flavor of irq balancing.
> 
> That being the case, whatever is out there aught to be able to be disabled on
> a running system without having to roll bits or reboot.

Adding a "module" parameter to disable it should be cheap, isn't it?
-- 
Top 100 things you don't want the sysadmin to say:
34. The network's down, but we're working on it. Come back after diner.
    (Usually said at 2200 the night before thesis deadline... )

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Re: Bad network performance over 2Gbps
  2008-04-21 21:08                   ` Bodo Eggert
@ 2008-04-21 21:30                     ` Chris Snook
  2008-04-22  7:36                       ` Bodo Eggert
  0 siblings, 1 reply; 17+ messages in thread
From: Chris Snook @ 2008-04-21 21:30 UTC (permalink / raw)
  To: Bodo Eggert
  Cc: Rick Jones, Kok, Auke, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Linux Kernel Mailing List, Anton Titov,
	H. Willstrand, netdev, Jesse Brandeburg, Linus Torvalds,
	Andrew Morton

Bodo Eggert wrote:
> On Mon, 21 Apr 2008, Rick Jones wrote:
>> Bodo Eggert wrote:
>>> On Mon, 21 Apr 2008, Rick Jones wrote:
> 
>>>> Be it kernel or user space, for consistent benchmark results it needs to
>>>> be
>>>> able to be turned-off without turning the code.  That leaves me in
>>>> agreement
>>>> with Stephen that if it must exist, the user space one would be
>>>> preferable.
>>>> It can be easily terminated with extreme prejudice.
>>>
>>> I agree that having a full-featured userspace balancer daemon with lots of
>>> intelligence will be theoretically better, but if you can have a simple
>>> daemon doing OK on many machines for less than the userspace daemon's
>>> kernel stack, why not?
>> Perhaps my judgement is too colored by benchmark(et)ing, and desires to have
>> repeatable results on things like neperf, but I very much like to know where
>> my interrupts are going and don't like them moving around. That is why I am
>> not particularly fond of either flavor of irq balancing.
>>
>> That being the case, whatever is out there aught to be able to be disabled on
>> a running system without having to roll bits or reboot.
> 
> Adding a "module" parameter to disable it should be cheap, isn't it?

Except the irq balancing is system-wide.  Adding per-device exemptions to an 
obsolete feature seems like the wrong way to go.

-- Chris

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Re: Bad network performance over 2Gbps
  2008-04-17 17:37         ` [PATCH] " Kok, Auke
                             ` (2 preceding siblings ...)
  2008-04-21 15:28           ` Ingo Molnar
@ 2008-04-22  5:07           ` Bill Fink
  3 siblings, 0 replies; 17+ messages in thread
From: Bill Fink @ 2008-04-22  5:07 UTC (permalink / raw)
  To: Kok, Auke
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Linux Kernel Mailing List, Anton Titov, Chris Snook,
	H. Willstrand, netdev, Jesse Brandeburg, Linus Torvalds,
	Andrew Morton

On Thu, 17 Apr 2008, Kok, Auke wrote:

> [X86] IRQBALANCE: Mark as BROKEN and disable by default
> 
> The IRQBALANCE option causes interrupts to bounce all around on SMP systems
> quickly burying the CPU in migration cost and cache misses. Mainly affected are
> network interrupts and this results in one CPU pegged in softirqd completely.
> 
> Disable this option and provide documentation to a better solution (userspace
> irqbalance daemon does overall the best job to begin with and only manual setting
> of smp_affinity will beat it).
> 
> Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
> 
> ---
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 6c70fed..956aa22 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1026,13 +1026,17 @@ config EFI
>    	platforms.
> 
>  config IRQBALANCE
> -	def_bool y
> +	def_bool n
>  	prompt "Enable kernel irq balancing"
> -	depends on X86_32 && SMP && X86_IO_APIC
> +	depends on X86_32 && SMP && X86_IO_APIC && BROKEN
>  	help
>  	  The default yes will allow the kernel to do irq load balancing.
>  	  Saying no will keep the kernel from doing irq load balancing.

Since you're changing the default setting, shouldn't the above be
changed to:

 	  Saying yes will allow the kernel to do irq load balancing.
 	  The default no will keep the kernel from doing irq load balancing.

> +	  This option is known to cause performance issues on SMP
> +	  systems. The preferred method is to use the userspace
> +	  'irqbalance' daemon instead. See http://irqbalance.org/.
> +
>  config SECCOMP
>  	def_bool y
>  	prompt "Enable seccomp to safely compute untrusted bytecode"

						-Bill

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Re: Bad network performance over 2Gbps
  2008-04-21 21:30                     ` Chris Snook
@ 2008-04-22  7:36                       ` Bodo Eggert
  2008-04-22 17:46                         ` Kok, Auke
  0 siblings, 1 reply; 17+ messages in thread
From: Bodo Eggert @ 2008-04-22  7:36 UTC (permalink / raw)
  To: Chris Snook
  Cc: Bodo Eggert, Rick Jones, Kok, Auke, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Linux Kernel Mailing List, Anton Titov,
	H. Willstrand, netdev, Jesse Brandeburg, Linus Torvalds,
	Andrew Morton

On Mon, 21 Apr 2008, Chris Snook wrote:
> Bodo Eggert wrote:
> > On Mon, 21 Apr 2008, Rick Jones wrote:
> >> Bodo Eggert wrote:
> >>> On Mon, 21 Apr 2008, Rick Jones wrote:

> >>>> Be it kernel or user space, for consistent benchmark results it needs to
> >>>> be
> >>>> able to be turned-off without turning the code.  That leaves me in
> >>>> agreement
> >>>> with Stephen that if it must exist, the user space one would be
> >>>> preferable.
> >>>> It can be easily terminated with extreme prejudice.
> >>>
> >>> I agree that having a full-featured userspace balancer daemon with lots of
> >>> intelligence will be theoretically better, but if you can have a simple
> >>> daemon doing OK on many machines for less than the userspace daemon's
> >>> kernel stack, why not?
> >> Perhaps my judgement is too colored by benchmark(et)ing, and desires to have
> >> repeatable results on things like neperf, but I very much like to know where
> >> my interrupts are going and don't like them moving around. That is why I am
> >> not particularly fond of either flavor of irq balancing.
> >>
> >> That being the case, whatever is out there aught to be able to be disabled on
> >> a running system without having to roll bits or reboot.
> > 
> > Adding a "module" parameter to disable it should be cheap, isn't it?
> 
> Except the irq balancing is system-wide.  Adding per-device exemptions to an 
> obsolete feature seems like the wrong way to go.

No, not a per-device-exemption. My reasoning was: If the IRQ balancer 
bounces the IRQ too often, doing it less often seems to be the correct 
solution. One cache miss each ten seconds sounds like it should be OK.
As said before, I can't verify this theory.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Re: Bad network performance over 2Gbps
  2008-04-22  7:36                       ` Bodo Eggert
@ 2008-04-22 17:46                         ` Kok, Auke
  0 siblings, 0 replies; 17+ messages in thread
From: Kok, Auke @ 2008-04-22 17:46 UTC (permalink / raw)
  To: Bodo Eggert
  Cc: Chris Snook, Rick Jones, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Linux Kernel Mailing List, Anton Titov,
	H. Willstrand, netdev, Jesse Brandeburg, Linus Torvalds,
	Andrew Morton

Bodo Eggert wrote:
> On Mon, 21 Apr 2008, Chris Snook wrote:
>> Bodo Eggert wrote:
>>> On Mon, 21 Apr 2008, Rick Jones wrote:
>>>> Bodo Eggert wrote:
>>>>> On Mon, 21 Apr 2008, Rick Jones wrote:
> 
>>>>>> Be it kernel or user space, for consistent benchmark results it needs to
>>>>>> be
>>>>>> able to be turned-off without turning the code.  That leaves me in
>>>>>> agreement
>>>>>> with Stephen that if it must exist, the user space one would be
>>>>>> preferable.
>>>>>> It can be easily terminated with extreme prejudice.
>>>>> I agree that having a full-featured userspace balancer daemon with lots of
>>>>> intelligence will be theoretically better, but if you can have a simple
>>>>> daemon doing OK on many machines for less than the userspace daemon's
>>>>> kernel stack, why not?
>>>> Perhaps my judgement is too colored by benchmark(et)ing, and desires to have
>>>> repeatable results on things like neperf, but I very much like to know where
>>>> my interrupts are going and don't like them moving around. That is why I am
>>>> not particularly fond of either flavor of irq balancing.
>>>>
>>>> That being the case, whatever is out there aught to be able to be disabled on
>>>> a running system without having to roll bits or reboot.
>>> Adding a "module" parameter to disable it should be cheap, isn't it?
>> Except the irq balancing is system-wide.  Adding per-device exemptions to an 
>> obsolete feature seems like the wrong way to go.
> 
> No, not a per-device-exemption. My reasoning was: If the IRQ balancer 
> bounces the IRQ too often, doing it less often seems to be the correct 
> solution. One cache miss each ten seconds sounds like it should be OK.
> As said before, I can't verify this theory.

this is exaclty what the userspace irqbalance does and it's even optimized to not
do those migrations once every 10 seconds if things look OK. from that
perspective, it's definately more mature and it's maintained as well.

Auke

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2008-04-22 17:46 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <aiXVe-Yn-7@gated-at.bofh.it>
     [not found] ` <ajGfA-7rt-9@gated-at.bofh.it>
     [not found]   ` <ajGfA-7rt-11@gated-at.bofh.it>
     [not found]     ` <ajGfA-7rt-13@gated-at.bofh.it>
     [not found]       ` <ajGfA-7rt-15@gated-at.bofh.it>
     [not found]         ` <ajGfA-7rt-7@gated-at.bofh.it>
2008-04-19 15:05           ` [PATCH] Re: Bad network performance over 2Gbps Bodo Eggert
     [not found]           ` <E1JnEcl-0000xc-D9@be1.7eggert.dyndns.org>
2008-04-19 19:23             ` Stephen Hemminger
2008-04-21 16:42             ` Rick Jones
2008-04-21 19:52               ` Bodo Eggert
2008-04-21 20:02                 ` Rick Jones
2008-04-21 21:08                   ` Bodo Eggert
2008-04-21 21:30                     ` Chris Snook
2008-04-22  7:36                       ` Bodo Eggert
2008-04-22 17:46                         ` Kok, Auke
     [not found] <1208282804.23631.27.camel@localhost>
2008-04-15 20:15 ` H. Willstrand
2008-04-15 20:34   ` Kok, Auke
2008-04-15 20:59     ` Chris Snook
2008-04-17 10:02       ` Anton Titov
2008-04-17 17:37         ` [PATCH] " Kok, Auke
2008-04-20 12:08           ` Denys Fedoryshchenko
2008-04-21 13:19           ` Pavel Machek
2008-04-21 16:38             ` Kok, Auke
2008-04-21 15:28           ` Ingo Molnar
2008-04-21 16:58             ` Kok, Auke
2008-04-21 18:35               ` Andi Kleen
2008-04-22  5:07           ` Bill Fink

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).