* [RFC] export irq_set/get_affinity() for multiqueue network drivers
@ 2008-08-28 20:21 Brice Goglin
2008-08-28 20:56 ` David Miller
2008-08-29 12:50 ` Arjan van de Ven
0 siblings, 2 replies; 7+ messages in thread
From: Brice Goglin @ 2008-08-28 20:21 UTC (permalink / raw)
To: LKML; +Cc: netdev
[-- Attachment #1: Type: text/plain, Size: 655 bytes --]
Hello,
Is there any way to setup IRQ masks from within a driver? myri10ge
currently relies on an external script (writing in
/proc/irq/*/smp_affinity) to bind each queue/MSI-X to a different
processor. By default, Linux will either:
* round-robin the interrupts (killing the benefit of DCA for instance)
* put all IRQs on the same CPU (killing much of the benefit of multislices)
With more and more drivers using multiqueues, I think we need a nice way
to bind MSI-X from within the drivers. I am not sure what's best, the
attached (untested) patch would just export the existing
irq_set_affinity() and add irq_get_affinity(). Comments?
thanks,
Brice
[-- Attachment #2: export_irq_affinity.patch --]
[-- Type: text/x-patch, Size: 1804 bytes --]
[PATCH] export irq_set/get_affinity() to modules
Export irq_set_affinity() and add/export irq_get_affinity() so that
network drivers may access MSI-X interrupt masks to bind multiqueues
to different CPUs. Otherwise Linux will either:
* round-robin the interrupts (killing the benefit of DCA, for instance)
* put all IRQs on the same CPU (killing much of the benefit of multislices)
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
---
include/linux/interrupt.h | 3 +++
kernel/irq/manage.c | 15 +++++++++++++++
2 files changed, 18 insertions(+)
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -107,6 +107,7 @@ extern void enable_irq(unsigned int irq);
extern cpumask_t irq_default_affinity;
extern int irq_set_affinity(unsigned int irq, cpumask_t cpumask);
+extern int irq_get_affinity(unsigned int irq, cpumask_t *cpumask);
extern int irq_can_set_affinity(unsigned int irq);
extern int irq_select_affinity(unsigned int irq);
@@ -117,6 +118,8 @@ static inline int irq_set_affinity(unsigned int irq, cpumask_t cpumask)
return -EINVAL;
}
+static inline int irq_get_affinity(unsigned int irq) { return 0; }
+
static inline int irq_can_set_affinity(unsigned int irq)
{
return 0;
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -96,6 +96,21 @@ int irq_set_affinity(unsigned int irq, cpumask_t cpumask)
#endif
return 0;
}
+EXPORT_SYMBOL(irq_set_affinity);
+
+int irq_get_affinity(unsigned int irq, cpumask_t *cpumask)
+{
+ struct irq_desc *desc = irq_desc + irq;
+ cpumask_t *mask = &desc->affinity;
+
+#ifdef CONFIG_GENERIC_PENDING_IRQ
+ if (desc->status & IRQ_MOVE_PENDING)
+ mask = &desc->pending_mask;
+#endif
+ memcpy(cpumask, mask, sizeof(*mask);
+ return 0;
+}
+EXPORT_SYMBOL(irq_get_affinity);
#ifndef CONFIG_AUTO_IRQ_AFFINITY
/*
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [RFC] export irq_set/get_affinity() for multiqueue network drivers
2008-08-28 20:21 [RFC] export irq_set/get_affinity() for multiqueue network drivers Brice Goglin
@ 2008-08-28 20:56 ` David Miller
2008-08-29 7:08 ` Brice Goglin
2008-08-29 12:50 ` Arjan van de Ven
1 sibling, 1 reply; 7+ messages in thread
From: David Miller @ 2008-08-28 20:56 UTC (permalink / raw)
To: Brice.Goglin; +Cc: linux-kernel, netdev
From: Brice Goglin <Brice.Goglin@inria.fr>
Date: Thu, 28 Aug 2008 22:21:53 +0200
> With more and more drivers using multiqueues, I think we need a nice way
> to bind MSI-X from within the drivers. I am not sure what's best, the
> attached (untested) patch would just export the existing
> irq_set_affinity() and add irq_get_affinity(). Comments?
I think we should rather have some kind of generic thing in the
IRQ layer that allows specifying the usage model of the device's
interrupts, so that the IRQ layer can choose a default affinities.
I never notice any of this complete insanity on sparc64 because
we flat spread out all of the interrupts across the machine.
What we don't want it drivers choosing IRQ affinity settings,
they have no idea about NUMA topology, what NUMA node the
PCI controller sits behind, what cpus are there, etc. and
without that kind of knowledge you cannot possible make
affinity decisions properly.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC] export irq_set/get_affinity() for multiqueue network drivers
2008-08-28 20:56 ` David Miller
@ 2008-08-29 7:08 ` Brice Goglin
0 siblings, 0 replies; 7+ messages in thread
From: Brice Goglin @ 2008-08-29 7:08 UTC (permalink / raw)
To: David Miller; +Cc: linux-kernel, netdev
David Miller wrote:
> I think we should rather have some kind of generic thing in the
> IRQ layer that allows specifying the usage model of the device's
> interrupts, so that the IRQ layer can choose a default affinities.
>
> I never notice any of this complete insanity on sparc64 because
> we flat spread out all of the interrupts across the machine.
>
> What we don't want it drivers choosing IRQ affinity settings,
> they have no idea about NUMA topology, what NUMA node the
> PCI controller sits behind, what cpus are there, etc. and
> without that kind of knowledge you cannot possible make
> affinity decisions properly.
As long as we get something better than the current behavior, I am fine
with it :)
Brice
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC] export irq_set/get_affinity() for multiqueue network drivers
2008-08-28 20:21 [RFC] export irq_set/get_affinity() for multiqueue network drivers Brice Goglin
2008-08-28 20:56 ` David Miller
@ 2008-08-29 12:50 ` Arjan van de Ven
2008-08-29 16:48 ` Andi Kleen
1 sibling, 1 reply; 7+ messages in thread
From: Arjan van de Ven @ 2008-08-29 12:50 UTC (permalink / raw)
To: Brice Goglin; +Cc: LKML, netdev
On Thu, 28 Aug 2008 22:21:53 +0200
Brice Goglin <Brice.Goglin@inria.fr> wrote:
> Hello,
>
> Is there any way to setup IRQ masks from within a driver? myri10ge
> currently relies on an external script (writing in
> /proc/irq/*/smp_affinity) to bind each queue/MSI-X to a different
> processor. By default, Linux will either:
> * round-robin the interrupts (killing the benefit of DCA for instance)
> * put all IRQs on the same CPU (killing much of th
* do the right thing with the userspace irq balancer
--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC] export irq_set/get_affinity() for multiqueue network drivers
2008-08-29 12:50 ` Arjan van de Ven
@ 2008-08-29 16:48 ` Andi Kleen
2008-08-29 16:52 ` Arjan van de Ven
2008-08-29 17:14 ` Rick Jones
0 siblings, 2 replies; 7+ messages in thread
From: Andi Kleen @ 2008-08-29 16:48 UTC (permalink / raw)
To: Arjan van de Ven; +Cc: Brice Goglin, LKML, netdev
Arjan van de Ven <arjan@infradead.org> writes:
> On Thu, 28 Aug 2008 22:21:53 +0200
> Brice Goglin <Brice.Goglin@inria.fr> wrote:
>
>> Hello,
>>
>> Is there any way to setup IRQ masks from within a driver? myri10ge
>> currently relies on an external script (writing in
>> /proc/irq/*/smp_affinity) to bind each queue/MSI-X to a different
>> processor. By default, Linux will either:
>> * round-robin the interrupts (killing the benefit of DCA for instance)
>> * put all IRQs on the same CPU (killing much of th
>
> * do the right thing with the userspace irq balancer
It probably also needs to be hooked up the sched_mc_power_savings
When the switch is on the interrupts shouldn't be spread out over
that many sockets.
Does it need callbacks to change the interrupts when that variable
changes?
Also I suspect handling SMT explicitely is a good idea. e.g. I would
always set the affinity to all thread siblings in a core, not
just a single one, because context switch is very cheap between them.
-Andi
--
ak@linux.intel.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC] export irq_set/get_affinity() for multiqueue network drivers
2008-08-29 16:48 ` Andi Kleen
@ 2008-08-29 16:52 ` Arjan van de Ven
2008-08-29 17:14 ` Rick Jones
1 sibling, 0 replies; 7+ messages in thread
From: Arjan van de Ven @ 2008-08-29 16:52 UTC (permalink / raw)
To: Andi Kleen; +Cc: Brice Goglin, LKML, netdev
On Fri, 29 Aug 2008 18:48:12 +0200
Andi Kleen <andi@firstfloor.org> wrote:
> Arjan van de Ven <arjan@infradead.org> writes:
>
> > On Thu, 28 Aug 2008 22:21:53 +0200
> > Brice Goglin <Brice.Goglin@inria.fr> wrote:
> >
> >> Hello,
> >>
> >> Is there any way to setup IRQ masks from within a driver? myri10ge
> >> currently relies on an external script (writing in
> >> /proc/irq/*/smp_affinity) to bind each queue/MSI-X to a different
> >> processor. By default, Linux will either:
> >> * round-robin the interrupts (killing the benefit of DCA for
> >> instance)
> >> * put all IRQs on the same CPU (killing much of th
> >
> > * do the right thing with the userspace irq balancer
>
> It probably also needs to be hooked up the sched_mc_power_savings
> When the switch is on the interrupts shouldn't be spread out over
> that many sockets.
that's what irqbalance already does today.
>
> Also I suspect handling SMT explicitely is a good idea. e.g. I would
> always set the affinity to all thread siblings in a core, not
> just a single one, because context switch is very cheap between them.
that is what irqbalance already does today, at least for what it
considers somewhat slower irqs.
for networking it still sucks because the packet reordering logic is
per logical cpu so you still don't want to receive packets from the
same "stream" over multiple logical cpus.
--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC] export irq_set/get_affinity() for multiqueue network drivers
2008-08-29 16:48 ` Andi Kleen
2008-08-29 16:52 ` Arjan van de Ven
@ 2008-08-29 17:14 ` Rick Jones
1 sibling, 0 replies; 7+ messages in thread
From: Rick Jones @ 2008-08-29 17:14 UTC (permalink / raw)
To: Andi Kleen; +Cc: Arjan van de Ven, Brice Goglin, LKML, netdev
> Also I suspect handling SMT explicitely is a good idea. e.g. I would
> always set the affinity to all thread siblings in a core, not
> just a single one, because context switch is very cheap between them.
That is true, but don't they also "compete" for pipeline resources?
rick jones
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-08-29 17:14 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-28 20:21 [RFC] export irq_set/get_affinity() for multiqueue network drivers Brice Goglin
2008-08-28 20:56 ` David Miller
2008-08-29 7:08 ` Brice Goglin
2008-08-29 12:50 ` Arjan van de Ven
2008-08-29 16:48 ` Andi Kleen
2008-08-29 16:52 ` Arjan van de Ven
2008-08-29 17:14 ` Rick Jones
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).