public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* RE: CONFIG_IRQBALANCE for AMD64?
@ 2004-05-28 18:20 Nakajima, Jun
  2004-05-28 18:33 ` Martin J. Bligh
  0 siblings, 1 reply; 34+ messages in thread
From: Nakajima, Jun @ 2004-05-28 18:20 UTC (permalink / raw)
  To: Arjan van de Ven, Martin J. Bligh
  Cc: Jeff Garzik, Andrew Morton, Anton Blanchard, linux-kernel

>From: Arjan van de Ven [mailto:arjanv@redhat.com]
>Sent: Friday, May 28, 2004 10:57 AM
>To: Martin J. Bligh
>Cc: Jeff Garzik; Nakajima, Jun; Andrew Morton; Anton Blanchard; linux-
>kernel@vger.kernel.org
>Subject: Re: CONFIG_IRQBALANCE for AMD64?
>
>On Fri, May 28, 2004 at 10:46:18AM -0700, Martin J. Bligh wrote:
>>
>> Personally, I find the argument that it's hardware-specific control
code
>> a much better reason for it to belong in the kernel.
>
>Is it really hardware specific ??

I think automatic IRQ binding business should belong to the user-level;
it can use generic statistics, application, or platform configuration
knowledge.

The kernel-level should have some simple chipset model, such as lowest
priority delivery mode with finer granularity of control. The kirqd at
this point, is doing automatic IRQ binding business as well today,
although it does not literally bind them. So I think we need to remove
that part of code from kirqd. 

Jun


^ permalink raw reply	[flat|nested] 34+ messages in thread
* RE: CONFIG_IRQBALANCE for AMD64?
@ 2004-05-28 23:37 Nakajima, Jun
  0 siblings, 0 replies; 34+ messages in thread
From: Nakajima, Jun @ 2004-05-28 23:37 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Martin J. Bligh, linux-kernel

>From: Andi Kleen [mailto:ak@muc.de]
>Sent: Friday, May 28, 2004 3:54 PM
>To: Nakajima, Jun
>Cc: Andi Kleen; Martin J. Bligh; linux-kernel@vger.kernel.org
>Subject: Re: CONFIG_IRQBALANCE for AMD64?
>
>On Fri, May 28, 2004 at 03:05:48PM -0700, Nakajima, Jun wrote:
>> >At least the AMD chipsets found in most Opteron boxes need software
>> >balancing too.
>>
>> Actually lowest priority delivery works on P4 and AMD (I did not
tested
>> it on AMD, though), if we _update_ TPR. But I don't recommend that,
>
>True. I remember there was even a patch for that from James C. long
ago.
>I considered at one point to add it to x86-64, but ended up
>not doing anything and just recommending irqbalanced.
>
>[I didn't test it neither, so no guarantee TPR really works on AMD]
>
>> instead we should implement the similar or optimized behavior in
>> software because "soft TPR" can be more efficient and scalable. And I
>> think this is something in my mind, and I think the kernel should do
it.
>
>I'm not convinced of that. At least the current i386 implemention
>is basically a kernel thread that wakes up regularly, reads some
>statistics and then updates the APIC.

That's true, and the kirqd code basically fixed the initial solution
from Ingo, i.e. round-robin. However, during the development, we found
such simple things (e.g. round-robin) worked better in _some cases_ than
the statistics-based irq balancing. I think SuSE has some improved
round-robin for x86. 

The problems with the initial round-robin were basically:
1. Too frequent -- it hurts cache, as you know
2. Interrupts throng to idle CPUs -- everybody jumps to idle CPU(s) if
any
3. IRQs move together in sync -- they bomb the same CPU together

Instead of digging those more, we implemented the current code in the
kernel simply because we got that idea. So based on those experiences,
there are a couple things we should explore more, and "soft TPR" will be
one of them. Anyway, we need to show the code and data, to convince
people.

Jun

>
>There is not much reason this cannot be done in user space, and
>user space has the advantage that more advanced heuristics (which
>I'm sure will be there) can be more easily implemented.
>
>And handling all interrupts at CPU #0 during early boot up is
>not really an issue.
>
>An kernel implementation may make sense when you're doing something
>really dynamic: e.g. not just a timer, but dynamically redirecting
>network interrupts to the CPU the process who will read from the
>socket runs on. Obviously it would need kernel support for that, since
>user space could not keep up with such a high sampling rate. But that's
>future research work (if it can be even done generically at all)
>and I don't see it on the radar screen anytime soon. We first need to
solve
>the NUMA scheduling problem, which is already hard enough ;-)
>
>And for the simpler heuristics that don't need real time updates
>user space is probably better.
>
>-Andi
>


^ permalink raw reply	[flat|nested] 34+ messages in thread
* RE: CONFIG_IRQBALANCE for AMD64?
@ 2004-05-28 22:05 Nakajima, Jun
  2004-05-28 22:54 ` Andi Kleen
  0 siblings, 1 reply; 34+ messages in thread
From: Nakajima, Jun @ 2004-05-28 22:05 UTC (permalink / raw)
  To: Andi Kleen, Martin J. Bligh; +Cc: linux-kernel

>From: Andi Kleen [mailto:ak@muc.de]
>Sent: Friday, May 28, 2004 2:45 PM
>To: Martin J. Bligh
>Cc: linux-kernel@vger.kernel.org; Nakajima, Jun
>Subject: Re: CONFIG_IRQBALANCE for AMD64?
>
>"Martin J. Bligh" <mbligh@aracnet.com> writes:
>
>> Whatever we do ... all arches are going to need to provide a way to
>direct
>> interrupts to a certain CPU, or group thereof. Can they all do that
>already?
>> I'll confess to not having looked at non-i386 arches. And are others
as
>> brain damaged as the P4? or do they do something round-robin by
default?
>
>I wouldn't really blame the the P4, it's the IO-APICs in the chipsets
>that balance or not balance.
>
>At least the AMD chipsets found in most Opteron boxes need software
>balancing too.

Actually lowest priority delivery works on P4 and AMD (I did not tested
it on AMD, though), if we _update_ TPR. But I don't recommend that,
instead we should implement the similar or optimized behavior in
software because "soft TPR" can be more efficient and scalable. And I
think this is something in my mind, and I think the kernel should do it.

Jun

>
>-Andi
>


^ permalink raw reply	[flat|nested] 34+ messages in thread
[parent not found: <20Uhn-7bP-11@gated-at.bofh.it>]
* RE: CONFIG_IRQBALANCE for AMD64?
@ 2004-05-28 17:09 Nakajima, Jun
  2004-05-28 17:40 ` Jeff Garzik
  0 siblings, 1 reply; 34+ messages in thread
From: Nakajima, Jun @ 2004-05-28 17:09 UTC (permalink / raw)
  To: Chris Wedgwood, Arjan van de Ven, Anton Blanchard
  Cc: Thomas Zehetbauer, linux-kernel

>From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
>owner@vger.kernel.org] On Behalf Of Chris Wedgwood
>Sent: Thursday, May 27, 2004 2:38 PM
>To: Arjan van de Ven; Anton Blanchard
>Cc: Thomas Zehetbauer; 'linux-kernel@vger.kernel.org'
>Subject: Re: CONFIG_IRQBALANCE for AMD64?
>
>On Thu, May 27, 2004 at 06:50:25PM +0200, Arjan van de Ven wrote:
>
>> irqbalanced has NOT been obsoleted by CONFIG_IRQBALANCE.
>
>On Fri, May 28, 2004 at 03:03:34AM +1000, Anton Blanchard wrote:
>
>> > Seems to work, just like the i386 irqbalanced before it has been
>> > obsoleted by CONFIG_IRQBALANCE
>>
>> No, CONFIG_IRQBALANCE is an x86 specific hack.
>
The issue is a xAPIC thing, and the both kernel-level and user-level are
applicable to x86_64 as well. 

The kernel does the default IRQ balancing, without assuming a user-level
irq balancing (because it's a distribution issue). If the user-level has
better knowledge, it just does a write to /proc/irq/N/smp_affinity to
bind that IRQ to a particular CPU, as Arjan's program is doing. In other
words, the kernel-level does _not_ move the ones bound by the
user-level.

>
>
>Why do we have CONFIG_IRQBALANCE at all then?
>
Today Linux is used for various configurations, including the ones that
substantially limit the set of user commands, libraries, etc. So we want
to keep it.

Jun

>
>
>  --cw
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel"
in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 34+ messages in thread
* CONFIG_IRQBALANCE for AMD64?
@ 2004-05-27  3:48 Thomas Zehetbauer
  2004-05-27  5:13 ` Jeff Garzik
  0 siblings, 1 reply; 34+ messages in thread
From: Thomas Zehetbauer @ 2004-05-27  3:48 UTC (permalink / raw)
  To: 'linux-kernel@vger.kernel.org'

[-- Attachment #1: Type: text/plain, Size: 423 bytes --]

I wonder why there is no in-kernel irq balancing for the AMD64
architecture yet. I guess this shouldn't be much different to the i386
code. Someone willing to explain/provide a patch?

Tom
-- 
  T h o m a s   Z e h e t b a u e r   ( TZ251 )
  PGP encrypted mail preferred - KeyID 96FFCB89
      finger thomasz@hostmaster.org for key

Windows 98 supports real multitasking - it can boot and crash simultaneously.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 481 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2004-05-29 11:18 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-05-28 18:20 CONFIG_IRQBALANCE for AMD64? Nakajima, Jun
2004-05-28 18:33 ` Martin J. Bligh
2004-05-28 18:44   ` Arjan van de Ven
2004-05-28 18:57     ` Martin J. Bligh
2004-05-28 19:01       ` Arjan van de Ven
2004-05-29  8:38     ` michael
2004-05-29  8:41     ` michael
2004-05-29  8:45       ` Arjan van de Ven
  -- strict thread matches above, loose matches on Subject: below --
2004-05-28 23:37 Nakajima, Jun
2004-05-28 22:05 Nakajima, Jun
2004-05-28 22:54 ` Andi Kleen
2004-05-29  1:27   ` Nick Piggin
2004-05-29 10:06     ` Andi Kleen
2004-05-29 10:10       ` Nick Piggin
2004-05-29 11:18         ` Andi Kleen
     [not found] <20Uhn-7bP-11@gated-at.bofh.it>
     [not found] ` <20UqZ-7i7-5@gated-at.bofh.it>
2004-05-28 21:45   ` Andi Kleen
2004-05-28 17:09 Nakajima, Jun
2004-05-28 17:40 ` Jeff Garzik
2004-05-28 17:45   ` Arjan van de Ven
2004-05-28 17:46   ` Martin J. Bligh
2004-05-28 17:57     ` Arjan van de Ven
2004-05-28 19:51       ` Nivedita Singhvi
2004-05-28 19:58         ` Jeff Garzik
2004-05-28 20:14           ` Nivedita Singhvi
2004-05-28 21:45             ` Jeff Garzik
2004-05-28 20:03         ` Arjan van de Ven
2004-05-27  3:48 Thomas Zehetbauer
2004-05-27  5:13 ` Jeff Garzik
2004-05-27 16:36   ` Thomas Zehetbauer
2004-05-27 16:50     ` Arjan van de Ven
2004-05-27 21:37       ` Chris Wedgwood
2004-05-27 17:03     ` Anton Blanchard
2004-05-27 22:36       ` Thomas Zehetbauer
2004-05-28  5:57         ` Arjan van de Ven

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox