* [RESEND PATCH 0/2] Fix IRQ round-robing w/o irqbalance on pseries @ 2010-10-01 21:26 Nishanth Aravamudan 2010-10-01 21:26 ` [RESEND PATCH 1/2] IRQ: use cpu_possible_mask rather than online_mask in setup_affinity Nishanth Aravamudan 2010-10-01 21:26 ` [RESEND PATCH 2/2] pseries/xics: use cpu_possible_mask rather than cpu_all_mask Nishanth Aravamudan 0 siblings, 2 replies; 9+ messages in thread From: Nishanth Aravamudan @ 2010-10-01 21:26 UTC (permalink / raw) To: nacc; +Cc: miltonm, linuxppc-dev, linux-kernel We have received reports on power systems not running irqbalance where all interrupts are being routed to CPU0 rather than being interleaved by default across the system. Current firmware only allows either sending interrupts to all CPUs or sending them to one CPU. The following two patches address this issue by fixing the mask used in generic code and by fixing the check for the "all" setting in the pseries code. Nishanth Aravamudan (2): IRQ: use cpu_possible_mask rather than online_mask in setup_affinity pseries/xics: use cpu_possible_mask rather than cpu_all_mask arch/powerpc/platforms/pseries/xics.c | 2 +- kernel/irq/manage.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) ^ permalink raw reply [flat|nested] 9+ messages in thread
* [RESEND PATCH 1/2] IRQ: use cpu_possible_mask rather than online_mask in setup_affinity 2010-10-01 21:26 [RESEND PATCH 0/2] Fix IRQ round-robing w/o irqbalance on pseries Nishanth Aravamudan @ 2010-10-01 21:26 ` Nishanth Aravamudan 2010-10-02 10:57 ` Thomas Gleixner 2010-10-02 11:01 ` Peter Zijlstra 2010-10-01 21:26 ` [RESEND PATCH 2/2] pseries/xics: use cpu_possible_mask rather than cpu_all_mask Nishanth Aravamudan 1 sibling, 2 replies; 9+ messages in thread From: Nishanth Aravamudan @ 2010-10-01 21:26 UTC (permalink / raw) To: nacc Cc: miltonm, Thomas Gleixner, Ian Campbell, Peter Zijlstra, Peter P Waskiewicz Jr, linux-kernel The use of online_mask requires architecture code to be hotplug-aware to account for IRQ round-robin'ing. With user-driven dynamic SMT, this could commonly occur even without physical hotplug. Without this change and "pseries/xics: use cpu_possible_mask rather than cpu_all_mask", IRQs are all routed to CPU0 on power machines with XICS not running irqbalance. Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> --- I have boot-tested this on ppc64, but not yet on x86/x86_64. This is generic-code, and perhaps an audit of all .set_affinity functions should occur before upstream acceptance? --- kernel/irq/manage.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index c3003e9..ef85b95 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -175,7 +175,7 @@ static int setup_affinity(unsigned int irq, struct irq_desc *desc) desc->status &= ~IRQ_AFFINITY_SET; } - cpumask_and(desc->affinity, cpu_online_mask, irq_default_affinity); + cpumask_and(desc->affinity, cpu_possible_mask, irq_default_affinity); set_affinity: desc->chip->set_affinity(irq, desc->affinity); -- 1.7.0.4 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [RESEND PATCH 1/2] IRQ: use cpu_possible_mask rather than online_mask in setup_affinity 2010-10-01 21:26 ` [RESEND PATCH 1/2] IRQ: use cpu_possible_mask rather than online_mask in setup_affinity Nishanth Aravamudan @ 2010-10-02 10:57 ` Thomas Gleixner 2010-10-06 20:55 ` Sonny Rao 2010-10-02 11:01 ` Peter Zijlstra 1 sibling, 1 reply; 9+ messages in thread From: Thomas Gleixner @ 2010-10-02 10:57 UTC (permalink / raw) To: Nishanth Aravamudan Cc: miltonm, Ian Campbell, Peter Zijlstra, Peter P Waskiewicz Jr, linux-kernel On Fri, 1 Oct 2010, Nishanth Aravamudan wrote: > The use of online_mask requires architecture code to be hotplug-aware to > account for IRQ round-robin'ing. With user-driven dynamic SMT, this > could commonly occur even without physical hotplug. Without this change > and "pseries/xics: use cpu_possible_mask rather than cpu_all_mask", IRQs > are all routed to CPU0 on power machines with XICS not running > irqbalance. > > Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> > --- > I have boot-tested this on ppc64, but not yet on x86/x86_64. This is > generic-code, and perhaps an audit of all .set_affinity functions should > occur before upstream acceptance? > --- > kernel/irq/manage.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c > index c3003e9..ef85b95 100644 > --- a/kernel/irq/manage.c > +++ b/kernel/irq/manage.c > @@ -175,7 +175,7 @@ static int setup_affinity(unsigned int irq, struct irq_desc *desc) > desc->status &= ~IRQ_AFFINITY_SET; > } > > - cpumask_and(desc->affinity, cpu_online_mask, irq_default_affinity); > + cpumask_and(desc->affinity, cpu_possible_mask, irq_default_affinity); Hmm, that looks dangerous. And auditing everything is rather horrible especially when we need to add cpumask_and(..., cpu_online_mask, ..) all over the place. We should rather have something like: cpumask_var_t *cpumask_restrict_to = &cpu_online_mask; + cpumask_and(desc->affinity, *cpumask_restrict_to, irq_default_affinity); So an arch can override it in arch_early_irq_init(). Thanks, tglx ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RESEND PATCH 1/2] IRQ: use cpu_possible_mask rather than online_mask in setup_affinity 2010-10-02 10:57 ` Thomas Gleixner @ 2010-10-06 20:55 ` Sonny Rao 0 siblings, 0 replies; 9+ messages in thread From: Sonny Rao @ 2010-10-06 20:55 UTC (permalink / raw) To: Thomas Gleixner Cc: Nishanth Aravamudan, miltonm, Ian Campbell, Peter Zijlstra, Peter P Waskiewicz Jr, linux-kernel, sonnyrao On Sat, Oct 02, 2010 at 12:57:10PM +0200, Thomas Gleixner wrote: > On Fri, 1 Oct 2010, Nishanth Aravamudan wrote: > > > The use of online_mask requires architecture code to be hotplug-aware to > > account for IRQ round-robin'ing. With user-driven dynamic SMT, this > > could commonly occur even without physical hotplug. Without this change > > and "pseries/xics: use cpu_possible_mask rather than cpu_all_mask", IRQs > > are all routed to CPU0 on power machines with XICS not running > > irqbalance. > > > > Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> > > --- > > I have boot-tested this on ppc64, but not yet on x86/x86_64. This is > > generic-code, and perhaps an audit of all .set_affinity functions should > > occur before upstream acceptance? > > --- > > kernel/irq/manage.c | 2 +- > > 1 files changed, 1 insertions(+), 1 deletions(-) > > > > diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c > > index c3003e9..ef85b95 100644 > > --- a/kernel/irq/manage.c > > +++ b/kernel/irq/manage.c > > @@ -175,7 +175,7 @@ static int setup_affinity(unsigned int irq, struct irq_desc *desc) > > desc->status &= ~IRQ_AFFINITY_SET; > > } > > > > - cpumask_and(desc->affinity, cpu_online_mask, irq_default_affinity); > > + cpumask_and(desc->affinity, cpu_possible_mask, irq_default_affinity); > > Hmm, that looks dangerous. And auditing everything is rather horrible > especially when we need to add cpumask_and(..., cpu_online_mask, ..) > all over the place. I'm not sure it's that dangerous, because we already have a code path today which should force the lower level arch code to sanity check vs online_mask -- namely when a user sets the affinity from /proc/ irq_affinity_proc_write() -> irq_set_affinity() -> desc->chip->set_affinity(irq, cpumask) The proc code does check to see that there is at least some intersection with online_cpus -- but it's not strong enough to tell us whether there are any offline cpus in the mask. It will obviously depend on the specific interrupt controller architectures -- I guess it's possible that some would allow offline cpus to be in the mask and intelligently skip over them. So that makes me think the arch code has to check this _anyway_ since the generic code can't make assumptions about how interrupt controllers operate. But, I started wondering if we're already broken here.. and I did a bit of reviewing of the arch code which implements that set_affinity function just to see what various architecutres are doing x86 has: kernel/apic/io_apic.c: .set_affinity = set_ioapic_affinity_irq, kernel/apic/io_apic.c: .set_affinity = set_ir_ioapic_affinity_irq, kernel/apic/io_apic.c: .set_affinity = set_msi_irq_affinity, kernel/apic/io_apic.c: .set_affinity = ir_set_msi_irq_affinity, kernel/apic/io_apic.c: .set_affinity = dmar_msi_set_affinity, kernel/apic/io_apic.c: .set_affinity = ir_set_msi_irq_affinity, kernel/apic/io_apic.c: .set_affinity = hpet_msi_set_affinity, kernel/apic/io_apic.c: .set_affinity = set_ht_irq_affinity, kernel/uv_irq.c: .set_affinity = uv_set_irq_affinity, all of these end up checking vs online by calling set_desc_affinity() with the exception of set_ir_ioapic_affinity_irq which does the check by calling migrate_ioapic_irq_desc() powerpc has: platforms/pseries/xics.c: .set_affinity = xics_set_affinity platforms/pseries/xics.c: .set_affinity = xics_set_affinity sysdev/mpic_u3msi.c: .set_affinity = mpic_set_affinity, sysdev/mpic_pasemi_msi.c: .set_affinity = mpic_set_affinity, sysdev/mpic.c: mpic->hc_irq.set_affinity = mpic_set_affinity; sysdev/mpic.c: mpic->hc_ht_irq.set_affinity = mpic_set_affinity; both xics_set_affinity and mpic_set_affinity both AND the mask parameter with cpu_online_mask in mips: cavium-octeon/octeon-irq.c: .set_affinity = octeon_irq_ciu0_set_affinity_v2, cavium-octeon/octeon-irq.c: .set_affinity = octeon_irq_ciu0_set_affinity, cavium-octeon/octeon-irq.c: .set_affinity = octeon_irq_ciu1_set_affinity_v2, cavium-octeon/octeon-irq.c: .set_affinity = octeon_irq_ciu1_set_affinity, kernel/irq-gic.c: .set_affinity = gic_set_affinity, kernel/i8259.c: .set_affinity = plat_set_irq_affinity, sibyte/bcm1480/irq.c: .set_affinity = bcm1480_set_affinity sibyte/sb1250/irq.c: .set_affinity = sb1250_set_affinity octeon functions are iterating over online cpus gic_set_affinity is anding with cpu_online_mask plat_set_irq_affinity is checking for online bcm1480_set_affinity and sb1250_set_affinity are _not_ checking vs online explicitly in ia64: hp/sim/hpsim_irq.c: .set_affinity = hpsim_set_affinity_noop, kernel/iosapic.c: .set_affinity = iosapic_set_affinity kernel/iosapic.c: .set_affinity = iosapic_set_affinity kernel/msi_ia64.c: .set_affinity = ia64_set_msi_irq_affinity, kernel/msi_ia64.c: .set_affinity = dmar_msi_set_affinity, sn/kernel/msi_sn.c: .set_affinity = sn_set_msi_irq_affinity, sn/kernel/irq.c: .set_affinity = sn_set_affinity_irq iosapic_set_affinity() does a check by anding the mask with cpu_online_mask ia64_set_msi_irq_affinity and dmar_msi_set_affinity both only check to see that the first cpu in the mask is online -- (not sure why, maybe that's the only requirement for their interrupt controller?) in sn_set_msi_irq_affinity -- is _not_ doing explicit checking vs online mask (maybe their firmware will tell them if they're doing something illegal?) in sparc: kernel/irq_64.c: .set_affinity = sun4u_set_affinity, kernel/irq_64.c: .set_affinity = sun4v_set_affinity, kernel/irq_64.c: .set_affinity = sun4v_virt_set_affinity, does no explicit checks of online cpus, but they seem to be making firmware calls which can return an error other architectures including: arm alpha blackfin cris do _not_ appear to be checking the online mask I'm not sure that these architectures support CPU hotplug either though so perhaps it's not an issue... does hotplug because of power management make it a broader problem ? So, my basic point is that many of the low level arch specific functions are checking, some do not and that *might* be a problem for people trying to set affinity via proc also that the generic code is making an assumption that cpu_online_mask is the correct mask -- which I believe may not be correct for everybody -- especially not for some powerpc platforms. Sonny ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RESEND PATCH 1/2] IRQ: use cpu_possible_mask rather than online_mask in setup_affinity 2010-10-01 21:26 ` [RESEND PATCH 1/2] IRQ: use cpu_possible_mask rather than online_mask in setup_affinity Nishanth Aravamudan 2010-10-02 10:57 ` Thomas Gleixner @ 2010-10-02 11:01 ` Peter Zijlstra 2010-10-06 21:02 ` Sonny Rao 1 sibling, 1 reply; 9+ messages in thread From: Peter Zijlstra @ 2010-10-02 11:01 UTC (permalink / raw) To: Nishanth Aravamudan Cc: miltonm, Thomas Gleixner, Ian Campbell, Peter P Waskiewicz Jr, linux-kernel On Fri, 2010-10-01 at 14:26 -0700, Nishanth Aravamudan wrote: > The use of online_mask requires architecture code to be hotplug-aware to > account for IRQ round-robin'ing. Architectures that support hotplug should be hotplug aware, that's not too much to ask imho. > With user-driven dynamic SMT, What's that? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RESEND PATCH 1/2] IRQ: use cpu_possible_mask rather than online_mask in setup_affinity 2010-10-02 11:01 ` Peter Zijlstra @ 2010-10-06 21:02 ` Sonny Rao 2010-10-11 18:48 ` Peter Zijlstra 0 siblings, 1 reply; 9+ messages in thread From: Sonny Rao @ 2010-10-06 21:02 UTC (permalink / raw) To: Peter Zijlstra Cc: Nishanth Aravamudan, miltonm, Thomas Gleixner, Ian Campbell, Peter P Waskiewicz Jr, linux-kernel, sonnyrao On Sat, Oct 02, 2010 at 01:01:02PM +0200, Peter Zijlstra wrote: > On Fri, 2010-10-01 at 14:26 -0700, Nishanth Aravamudan wrote: > > The use of online_mask requires architecture code to be hotplug-aware to > > account for IRQ round-robin'ing. > > Architectures that support hotplug should be hotplug aware, that's not > too much to ask imho. It seems like most architectures support HOTPLUG_CPU a quick grep for HOTPLUG_CPU in arch shows: arm blackfin ia64 m32r mips mn10300 parisc powerpc s390 sh sparc x86 also see my reply to Thomas -- it appears that many of the interrupt controller implementations enforce only affinitizing to online cpus The other point is, as this is generic code, it's making an assumption that online cpus is the right mask to test against and we know of at least one case where this isn't quite correct. > > With user-driven dynamic SMT, > > What's that? Well, that is basically a feature where we can use CPU hotplug to force a particular mode on an SMT (hardware multithreaded) processor The point here was really that on such multi-threaded processors -- which are becoming more common -- cpu hotplug can potentially be used fairly often. Sonny ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RESEND PATCH 1/2] IRQ: use cpu_possible_mask rather than online_mask in setup_affinity 2010-10-06 21:02 ` Sonny Rao @ 2010-10-11 18:48 ` Peter Zijlstra 2010-10-11 19:52 ` Sonny Rao 0 siblings, 1 reply; 9+ messages in thread From: Peter Zijlstra @ 2010-10-11 18:48 UTC (permalink / raw) To: Sonny Rao Cc: Nishanth Aravamudan, miltonm, Thomas Gleixner, Ian Campbell, Peter P Waskiewicz Jr, linux-kernel, sonnyrao On Wed, 2010-10-06 at 16:02 -0500, Sonny Rao wrote: > > > With user-driven dynamic SMT, > > > > What's that? > > Well, that is basically a feature where we can use CPU hotplug to > force a particular mode on an SMT (hardware multithreaded) processor > > The point here was really that on such multi-threaded processors -- which are > becoming more common -- cpu hotplug can potentially be used fairly > often. I guess you're talking about the trainwreck called power7? Where you want to force the thing into SMT1/2 mode instead of letting it degrade into SMT4 mode? Why would you want to change that often? Do realize that cpu-hotplug is a very heavy, very expensive operation, doing it more than a few times an hour counts as excessive in my book. For RT I've been thinking of extending cpusets with a feature that allows you to disable things like SMT and MC on a set, ensuring you get no pipeline/cache interference. But this is something you setup once and then run your workload, its not something you'll change often (in fact, hotplugging cpus will utterly wreck your RT workload). Also, I don't see why you would want to have interrupts with affinity to offline cpus only, that sounds plain wrong, an offline'd cpu is not able to deal with interrupts. I wish people would stop abusing hotplug as: - resource management - power management - other crazy ass things Its not suitable for those.. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RESEND PATCH 1/2] IRQ: use cpu_possible_mask rather than online_mask in setup_affinity 2010-10-11 18:48 ` Peter Zijlstra @ 2010-10-11 19:52 ` Sonny Rao 0 siblings, 0 replies; 9+ messages in thread From: Sonny Rao @ 2010-10-11 19:52 UTC (permalink / raw) To: Peter Zijlstra Cc: Nishanth Aravamudan, miltonm, Thomas Gleixner, Ian Campbell, Peter P Waskiewicz Jr, linux-kernel, sonnyrao On Mon, Oct 11, 2010 at 08:48:25PM +0200, Peter Zijlstra wrote: > On Wed, 2010-10-06 at 16:02 -0500, Sonny Rao wrote: > > > > With user-driven dynamic SMT, > > > > > > What's that? > > > > Well, that is basically a feature where we can use CPU hotplug to > > force a particular mode on an SMT (hardware multithreaded) processor > > > > The point here was really that on such multi-threaded processors -- which are > > becoming more common -- cpu hotplug can potentially be used fairly > > often. > > I guess you're talking about the trainwreck called power7? Where you > want to force the thing into SMT1/2 mode instead of letting it degrade > into SMT4 mode? This is not really Power7 specific, People could do this on *any* multi-threaded processor just to ensure all core resources get dedicated to one thread. I've seen HPC users in particular do this before. > Why would you want to change that often? Do realize that cpu-hotplug is > a very heavy, very expensive operation, doing it more than a few times > an hour counts as excessive in my book. > > For RT I've been thinking of extending cpusets with a feature that > allows you to disable things like SMT and MC on a set, ensuring you get > no pipeline/cache interference. > > But this is something you setup once and then run your workload, its not > something you'll change often (in fact, hotplugging cpus will utterly > wreck your RT workload). This is exactly what I'm talking about -- there are people who have multi-threaded processors who want to run long-running batch jobs and sometimes will want to change the SMT mode of the processor without rebooting. CPU hotplug has been used in this manner for some time. So it's not something that happens often, but it does happen. > Also, I don't see why you would want to have interrupts with affinity to > offline cpus only, that sounds plain wrong, an offline'd cpu is not able > to deal with interrupts. > > I wish people would stop abusing hotplug as: > - resource management > - power management > - other crazy ass things > > Its not suitable for those.. "resource management" is a rather broad term -- I think one of the major use cases of cpu hotplug is in fact "resource management" in the sense of a partition or guest VM needs more or less compute resources. Multi-threaded processors are also doing a form of resource management in that they're dividing (or not dividing) the resources available to a core -- cache, pipelines, functional units, etc. If it's not suitable for "resource management" then why does it exist? Sonny ^ permalink raw reply [flat|nested] 9+ messages in thread
* [RESEND PATCH 2/2] pseries/xics: use cpu_possible_mask rather than cpu_all_mask 2010-10-01 21:26 [RESEND PATCH 0/2] Fix IRQ round-robing w/o irqbalance on pseries Nishanth Aravamudan 2010-10-01 21:26 ` [RESEND PATCH 1/2] IRQ: use cpu_possible_mask rather than online_mask in setup_affinity Nishanth Aravamudan @ 2010-10-01 21:26 ` Nishanth Aravamudan 1 sibling, 0 replies; 9+ messages in thread From: Nishanth Aravamudan @ 2010-10-01 21:26 UTC (permalink / raw) To: nacc Cc: miltonm, Benjamin Herrenschmidt, Paul Mackerras, Anton Blanchard, Mark Nelson, Thomas Gleixner, Michael Ellerman, linuxppc-dev, linux-kernel Current firmware only allows us to send IRQs to the first processor or all processors. We currently check to see if the passed in mask is equal to the all_mask, but the firmware is only considering whether the request is for the equivalent of the possible_mask. Thus, we think the request is for some subset of CPUs and only assign IRQs to the first CPU (on systems without irqbalance running) as evidenced by /proc/interrupts. By using possible_mask instead, we account for this and proper interleaving of interrupts occurs. Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> --- arch/powerpc/platforms/pseries/xics.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/platforms/pseries/xics.c b/arch/powerpc/platforms/pseries/xics.c index 93834b0..7c1e342 100644 --- a/arch/powerpc/platforms/pseries/xics.c +++ b/arch/powerpc/platforms/pseries/xics.c @@ -178,7 +178,7 @@ static int get_irq_server(unsigned int virq, const struct cpumask *cpumask, if (!distribute_irqs) return default_server; - if (!cpumask_equal(cpumask, cpu_all_mask)) { + if (!cpumask_subset(cpu_possible_mask, cpumask)) { int server = cpumask_first_and(cpu_online_mask, cpumask); if (server < nr_cpu_ids) -- 1.7.0.4 ^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-10-11 19:48 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-10-01 21:26 [RESEND PATCH 0/2] Fix IRQ round-robing w/o irqbalance on pseries Nishanth Aravamudan 2010-10-01 21:26 ` [RESEND PATCH 1/2] IRQ: use cpu_possible_mask rather than online_mask in setup_affinity Nishanth Aravamudan 2010-10-02 10:57 ` Thomas Gleixner 2010-10-06 20:55 ` Sonny Rao 2010-10-02 11:01 ` Peter Zijlstra 2010-10-06 21:02 ` Sonny Rao 2010-10-11 18:48 ` Peter Zijlstra 2010-10-11 19:52 ` Sonny Rao 2010-10-01 21:26 ` [RESEND PATCH 2/2] pseries/xics: use cpu_possible_mask rather than cpu_all_mask Nishanth Aravamudan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox