From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751037Ab0I0EIH (ORCPT ); Mon, 27 Sep 2010 00:08:07 -0400 Received: from relay3.sgi.com ([192.48.152.1]:46506 "EHLO relay.sgi.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750905Ab0I0EIE (ORCPT ); Mon, 27 Sep 2010 00:08:04 -0400 Date: Sun, 26 Sep 2010 21:08:03 -0700 From: Arthur Kepner To: linux-kernel@vger.kernel.org Cc: Thomas Gleixner Subject: [RFC/PATCH] x86/irq: round-robin distribution of irqs to cpus w/in node Message-ID: <20100927040803.GJ20474@sgi.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.19 (2009-01-05) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org SGI has encountered situations where particular CPUs run out of interrupt vectors on systems with many (several hundred or more) CPUs. This happens because some drivers (particularly the mlx4_core driver) select the number of interrupts they allocate based on the number of CPUs, and because of how the default irq affinity is used. Do psuedo round-robin distribution of irqs to CPUs within a node to avoid (or at least delay) running out of vectors on any particular CPU. Signed-off-by: Arthur Kepner --- arch/x86/kernel/apic/io_apic.c | 28 ++++++++++++++++++++++++++-- 1 file changed, 26 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index f1efeba..ad540a9 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -3254,6 +3254,8 @@ unsigned int create_irq_nr(unsigned int irq_want, int node) raw_spin_lock_irqsave(&vector_lock, flags); for (new = irq_want; new < nr_irqs; new++) { + cpumask_var_t tmp_mask; + desc_new = irq_to_desc_alloc_node(new, node); if (!desc_new) { printk(KERN_INFO "can not get irq_desc for %d\n", new); @@ -3267,8 +3269,30 @@ unsigned int create_irq_nr(unsigned int irq_want, int node) desc_new = move_irq_desc(desc_new, node); cfg_new = desc_new->chip_data; - if (__assign_irq_vector(new, cfg_new, apic->target_cpus()) == 0) - irq = new; + if ((node != -1) && alloc_cpumask_var(&tmp_mask, GFP_ATOMIC)) { + + static int cpu; + + /* try to place irq on a cpu in the node in psuedo- + * round robin order*/ + + cpu = __next_cpu_nr(cpu, cpumask_of_node(node)); + if (cpu >= nr_cpu_ids) + cpu = 0; + + cpumask_set_cpu(cpu, tmp_mask); + + if (cpumask_test_cpu(cpu, apic->target_cpus()) && + __assign_irq_vector(new, cfg_new, tmp_mask) == 0) + irq = new; + + free_cpumask_var(tmp_mask); + } + + if (irq == 0) + if (__assign_irq_vector(new, cfg_new, + apic->target_cpus()) == 0) + irq = new; break; } raw_spin_unlock_irqrestore(&vector_lock, flags);