From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759562Ab0JFUwU (ORCPT ); Wed, 6 Oct 2010 16:52:20 -0400 Received: from e33.co.us.ibm.com ([32.97.110.151]:59739 "EHLO e33.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751539Ab0JFUwT (ORCPT ); Wed, 6 Oct 2010 16:52:19 -0400 Date: Wed, 6 Oct 2010 15:55:19 -0500 From: Sonny Rao To: Thomas Gleixner Cc: Nishanth Aravamudan , miltonm@bga.com, Ian Campbell , Peter Zijlstra , Peter P Waskiewicz Jr , linux-kernel@vger.kernel.org, sonnyrao@linux.vnet.ibm.com Subject: Re: [RESEND PATCH 1/2] IRQ: use cpu_possible_mask rather than online_mask in setup_affinity Message-ID: <20101006205519.GP13726@us.ibm.com> References: <1285968378-12805-1-git-send-email-nacc@us.ibm.com> <1285968378-12805-2-git-send-email-nacc@us.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Oct 02, 2010 at 12:57:10PM +0200, Thomas Gleixner wrote: > On Fri, 1 Oct 2010, Nishanth Aravamudan wrote: > > > The use of online_mask requires architecture code to be hotplug-aware to > > account for IRQ round-robin'ing. With user-driven dynamic SMT, this > > could commonly occur even without physical hotplug. Without this change > > and "pseries/xics: use cpu_possible_mask rather than cpu_all_mask", IRQs > > are all routed to CPU0 on power machines with XICS not running > > irqbalance. > > > > Signed-off-by: Nishanth Aravamudan > > --- > > I have boot-tested this on ppc64, but not yet on x86/x86_64. This is > > generic-code, and perhaps an audit of all .set_affinity functions should > > occur before upstream acceptance? > > --- > > kernel/irq/manage.c | 2 +- > > 1 files changed, 1 insertions(+), 1 deletions(-) > > > > diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c > > index c3003e9..ef85b95 100644 > > --- a/kernel/irq/manage.c > > +++ b/kernel/irq/manage.c > > @@ -175,7 +175,7 @@ static int setup_affinity(unsigned int irq, struct irq_desc *desc) > > desc->status &= ~IRQ_AFFINITY_SET; > > } > > > > - cpumask_and(desc->affinity, cpu_online_mask, irq_default_affinity); > > + cpumask_and(desc->affinity, cpu_possible_mask, irq_default_affinity); > > Hmm, that looks dangerous. And auditing everything is rather horrible > especially when we need to add cpumask_and(..., cpu_online_mask, ..) > all over the place. I'm not sure it's that dangerous, because we already have a code path today which should force the lower level arch code to sanity check vs online_mask -- namely when a user sets the affinity from /proc/ irq_affinity_proc_write() -> irq_set_affinity() -> desc->chip->set_affinity(irq, cpumask) The proc code does check to see that there is at least some intersection with online_cpus -- but it's not strong enough to tell us whether there are any offline cpus in the mask. It will obviously depend on the specific interrupt controller architectures -- I guess it's possible that some would allow offline cpus to be in the mask and intelligently skip over them. So that makes me think the arch code has to check this _anyway_ since the generic code can't make assumptions about how interrupt controllers operate. But, I started wondering if we're already broken here.. and I did a bit of reviewing of the arch code which implements that set_affinity function just to see what various architecutres are doing x86 has: kernel/apic/io_apic.c: .set_affinity = set_ioapic_affinity_irq, kernel/apic/io_apic.c: .set_affinity = set_ir_ioapic_affinity_irq, kernel/apic/io_apic.c: .set_affinity = set_msi_irq_affinity, kernel/apic/io_apic.c: .set_affinity = ir_set_msi_irq_affinity, kernel/apic/io_apic.c: .set_affinity = dmar_msi_set_affinity, kernel/apic/io_apic.c: .set_affinity = ir_set_msi_irq_affinity, kernel/apic/io_apic.c: .set_affinity = hpet_msi_set_affinity, kernel/apic/io_apic.c: .set_affinity = set_ht_irq_affinity, kernel/uv_irq.c: .set_affinity = uv_set_irq_affinity, all of these end up checking vs online by calling set_desc_affinity() with the exception of set_ir_ioapic_affinity_irq which does the check by calling migrate_ioapic_irq_desc() powerpc has: platforms/pseries/xics.c: .set_affinity = xics_set_affinity platforms/pseries/xics.c: .set_affinity = xics_set_affinity sysdev/mpic_u3msi.c: .set_affinity = mpic_set_affinity, sysdev/mpic_pasemi_msi.c: .set_affinity = mpic_set_affinity, sysdev/mpic.c: mpic->hc_irq.set_affinity = mpic_set_affinity; sysdev/mpic.c: mpic->hc_ht_irq.set_affinity = mpic_set_affinity; both xics_set_affinity and mpic_set_affinity both AND the mask parameter with cpu_online_mask in mips: cavium-octeon/octeon-irq.c: .set_affinity = octeon_irq_ciu0_set_affinity_v2, cavium-octeon/octeon-irq.c: .set_affinity = octeon_irq_ciu0_set_affinity, cavium-octeon/octeon-irq.c: .set_affinity = octeon_irq_ciu1_set_affinity_v2, cavium-octeon/octeon-irq.c: .set_affinity = octeon_irq_ciu1_set_affinity, kernel/irq-gic.c: .set_affinity = gic_set_affinity, kernel/i8259.c: .set_affinity = plat_set_irq_affinity, sibyte/bcm1480/irq.c: .set_affinity = bcm1480_set_affinity sibyte/sb1250/irq.c: .set_affinity = sb1250_set_affinity octeon functions are iterating over online cpus gic_set_affinity is anding with cpu_online_mask plat_set_irq_affinity is checking for online bcm1480_set_affinity and sb1250_set_affinity are _not_ checking vs online explicitly in ia64: hp/sim/hpsim_irq.c: .set_affinity = hpsim_set_affinity_noop, kernel/iosapic.c: .set_affinity = iosapic_set_affinity kernel/iosapic.c: .set_affinity = iosapic_set_affinity kernel/msi_ia64.c: .set_affinity = ia64_set_msi_irq_affinity, kernel/msi_ia64.c: .set_affinity = dmar_msi_set_affinity, sn/kernel/msi_sn.c: .set_affinity = sn_set_msi_irq_affinity, sn/kernel/irq.c: .set_affinity = sn_set_affinity_irq iosapic_set_affinity() does a check by anding the mask with cpu_online_mask ia64_set_msi_irq_affinity and dmar_msi_set_affinity both only check to see that the first cpu in the mask is online -- (not sure why, maybe that's the only requirement for their interrupt controller?) in sn_set_msi_irq_affinity -- is _not_ doing explicit checking vs online mask (maybe their firmware will tell them if they're doing something illegal?) in sparc: kernel/irq_64.c: .set_affinity = sun4u_set_affinity, kernel/irq_64.c: .set_affinity = sun4v_set_affinity, kernel/irq_64.c: .set_affinity = sun4v_virt_set_affinity, does no explicit checks of online cpus, but they seem to be making firmware calls which can return an error other architectures including: arm alpha blackfin cris do _not_ appear to be checking the online mask I'm not sure that these architectures support CPU hotplug either though so perhaps it's not an issue... does hotplug because of power management make it a broader problem ? So, my basic point is that many of the low level arch specific functions are checking, some do not and that *might* be a problem for people trying to set affinity via proc also that the generic code is making an assumption that cpu_online_mask is the correct mask -- which I believe may not be correct for everybody -- especially not for some powerpc platforms. Sonny