public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Hansen <haveblue@us.ibm.com>
To: Linus Torvalds <torvalds@transmeta.com>
Cc: "Martin J. Bligh" <Martin.Bligh@us.ibm.com>,
	Alan Cox <alan@lxorguk.ukuu.org.uk>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Andrew Theurer <habanero@us.ibm.com>
Subject: Re: [PATCH] NUMA-Q disable irqbalance
Date: Mon, 19 Aug 2002 17:49:48 -0700	[thread overview]
Message-ID: <3D61922C.90607@us.ibm.com> (raw)
In-Reply-To: Pine.LNX.4.33.0208131421190.3110-100000@penguin.transmeta.com

[-- Attachment #1: Type: text/plain, Size: 2078 bytes --]

Linus Torvalds wrote:
> On Tue, 13 Aug 2002, Martin J. Bligh wrote:
> 
>>Was that before or after you changed HZ to 1000? I *think* that increased
>>the frequency of IO-APIC reprogramming by a factor of 10, though I might
>>be misreading the code. If it does depend on HZ, I think that's bad.
> 
> The 1000Hz thing came much later, and I never noticed any impact of that 
> on my machines.
> 
> (Note that this is all entrely subjective. I was very disappointed in the
> feel of the first HT P4 machine I had for the first few weeks, but apart
> from running lmbench - which looked ok even though it shows that P4's are
> bad at system calls - I've not actually put numbers on it. But my feeling
> was that the irq thing made a noticeable difference. Caveat emptor -
> subjective feelings are not good).
> 
>>People in our benchmarking group (Andrew, cc'ed) have told me that
>>reducing the frequency of IO-APIC reprogramming by a factor of 20 or
>>so improves performance greatly - don't know what HZ that was at, but
>>the whole thing seems a little overenthusiastic to me.
> 
> The rebalancing was certainly done with a 100Hz clock, so yes, it might 
> have become much worse lately.

Here's a patch from Andrea's tree that uses IRQ_BALANCE_INTERVAL to 
define how often interrupts are balanced, staying independent from HZ. 
  It also makes sure that there _is_ a change to the configuration 
before it actually writes it.  It reminds me of the mod_timer 
optimization.

While observing the affect that this has on /proc/interrupts, I 
noticed that timer interrupts aren't very balanced across my 8 CPUs.

stock 2.5.30 kernel:
       CPU0  CPU1  CPU2  CPU3  CPU4  CPU5    CPU6    CPU7
   0:213935 63894 32373  1322  1900  1779    2895  181441  timer

2.5.31 w/attached patch
       CPU0  CPU1  CPU2  CPU3 CPU4   CPU5    CPU6    CPU7
   0: 19322  8778 35805 36771 9219  14091 1091741 1092294   timer

So, my patch didn't cause this situation.  I don't know if it is 
normal, but the other irq's were much more balanced than the timer one.
-- 
Dave Hansen
haveblue@us.ibm.com

[-- Attachment #2: irq-balance-tune-2.5.31+bk-0.patch --]
[-- Type: text/plain, Size: 1730 bytes --]

# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
#	           ChangeSet	1.489   -> 1.490  
#	arch/i386/kernel/io_apic.c	1.26    -> 1.27   
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 02/08/19	haveblue@elm3b96.(none)	1.490
# Reduce irqbalance's dependence on HZ.  Only write to io_apic when there is actually
# a change to write into it.  
# --------------------------------------------
#
diff -Nru a/arch/i386/kernel/io_apic.c b/arch/i386/kernel/io_apic.c
--- a/arch/i386/kernel/io_apic.c	Mon Aug 19 14:49:03 2002
+++ b/arch/i386/kernel/io_apic.c	Mon Aug 19 14:49:03 2002
@@ -220,6 +220,9 @@
 		((1 << cpu) & (allowed_mask))
 
 #if CONFIG_SMP
+
+#define IRQ_BALANCE_INTERVAL (HZ/50)
+	
 static unsigned long move(int curr_cpu, unsigned long allowed_mask, unsigned long now, int direction)
 {
 	int search_idle = 1;
@@ -254,8 +257,9 @@
 	if (clustered_apic_mode)
 		return;
 
-	if (entry->timestamp != now) {
+	if (unlikely(time_after(now, entry->timestamp + IRQ_BALANCE_INTERVAL))) {
 		unsigned long allowed_mask;
+		unsigned int new_cpu;
 		int random_number;
 
 		rdtscl(random_number);
@@ -263,8 +267,11 @@
 
 		allowed_mask = cpu_online_map & irq_affinity[irq];
 		entry->timestamp = now;
-		entry->cpu = move(entry->cpu, allowed_mask, now, random_number);
-		set_ioapic_affinity(irq, 1 << entry->cpu);
+		new_cpu = move(entry->cpu, allowed_mask, now, random_number);
+		if (entry->cpu != new_cpu) {
+			entry->cpu = new_cpu;
+			set_ioapic_affinity(irq, 1 << new_cpu);
+		}
 	}
 }
 #else /* !SMP */

  parent reply	other threads:[~2002-08-20  0:46 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-08-05 23:51 [PATCH] NUMA-Q disable irqbalance Martin J. Bligh
2002-08-13 16:13 ` Martin J. Bligh
2002-08-13 16:41   ` Linus Torvalds
2002-08-13 16:57     ` Alan Cox
2002-08-13 17:24       ` Martin J. Bligh
2002-08-13 17:38         ` Alan Cox
2002-08-13 17:14     ` Martin J. Bligh
2002-08-13 17:24       ` Linus Torvalds
2002-08-13 18:02         ` Martin J. Bligh
2002-08-13 18:20           ` Linus Torvalds
2002-08-13 18:58             ` Martin J. Bligh
2002-08-13 19:22               ` Linus Torvalds
2002-08-13 20:04                 ` Martin J. Bligh
2002-08-13 20:22                   ` Linus Torvalds
2002-08-14  5:52                     ` Martin J. Bligh
2002-08-14 10:10                     ` Jos Hulzink
2002-08-14 11:12                       ` David Lang
2002-08-13 20:22                 ` Alan Cox
2002-08-13 20:35                   ` Linus Torvalds
2002-08-13 20:34                     ` Alan Cox
2002-08-13 20:42                     ` Martin J. Bligh
2002-08-13 21:24                       ` Linus Torvalds
2002-08-13 22:29                         ` Andrew Theurer
2002-08-13 23:30                           ` Andrea Arcangeli
2002-08-14 21:16                             ` James Cleverdon
2002-08-23  2:31                             ` [PATCH] 2.5.31 Summit NUMA patch with dynamic IRQ balancing James Cleverdon
2002-08-20  0:49                         ` Dave Hansen [this message]
2002-08-13 22:08                     ` [PATCH] NUMA-Q disable irqbalance Rik van Riel
2002-08-13 22:14                       ` Rik van Riel
2002-08-14 14:49                         ` Linus Torvalds
2002-08-14 15:19                           ` Rik van Riel
2002-08-24 12:19                 ` Zwane Mwaikambo
2002-08-27  1:23                   ` James Cleverdon
2002-08-27  7:46                     ` Zwane Mwaikambo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3D61922C.90607@us.ibm.com \
    --to=haveblue@us.ibm.com \
    --cc=Martin.Bligh@us.ibm.com \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=habanero@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@transmeta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox