netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* multiqueue interrupts...
@ 2008-09-19  2:38 David Miller
  2008-09-19 11:38 ` Ben Hutchings
       [not found] ` <41b516cb0809191050t6c9783dele8926f697854bb1@mail.gmail.com>
  0 siblings, 2 replies; 13+ messages in thread
From: David Miller @ 2008-09-19  2:38 UTC (permalink / raw)
  To: netdev


During kernel summit I was speaking with Arjan van de Ven
about irqbalanced and networking card multiqueue interrupts.

In order for irqbalanaced to make smart decisions, what needs to
happen in drivers is that the individual interrupts need to be
named in such a way that he can tell by looking at /proc/interrupts
output that these interrupts are related.

I would suggest that people use something like:

	char buf[IFNAMSIZ+6];

	sprintf(buf, "%s-%s-%d",
	        netdev->name,
		(RX_INTERRUPT ? "rx" : "tx"),
		queue->index);

So on a multiqueue card with 2 RX queues and 2 TX queues we'd
have names like:

	eth0-rx-0
	eth0-rx-1
	eth0-tx-0
	eth0-tx-1

So let's make an effort to get this done right in 2.6.28 and meanwhile
Arjan can add the irqbalanced code.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: multiqueue interrupts...
  2008-09-19  2:38 multiqueue interrupts David Miller
@ 2008-09-19 11:38 ` Ben Hutchings
  2008-09-19 12:29   ` Brice Goglin
  2008-09-19 20:12   ` David Miller
       [not found] ` <41b516cb0809191050t6c9783dele8926f697854bb1@mail.gmail.com>
  1 sibling, 2 replies; 13+ messages in thread
From: Ben Hutchings @ 2008-09-19 11:38 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Thu, 2008-09-18 at 19:38 -0700, David Miller wrote:
[...]
> So on a multiqueue card with 2 RX queues and 2 TX queues we'd
> have names like:
> 
> 	eth0-rx-0
> 	eth0-rx-1
> 	eth0-tx-0
> 	eth0-tx-1
> 
> So let's make an effort to get this done right in 2.6.28 and meanwhile
> Arjan can add the irqbalanced code.

What about the case where an interrupt is shared between RX and TX
completions?  Our hardware is very flexible in this regard, but based on
performance testing prior to the introduction of TX multiqueue we
currently allocate multiple interrupts for RX completions and share the
first with TX completions.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: multiqueue interrupts...
  2008-09-19 11:38 ` Ben Hutchings
@ 2008-09-19 12:29   ` Brice Goglin
  2008-09-19 20:12     ` David Miller
  2008-09-19 20:12   ` David Miller
  1 sibling, 1 reply; 13+ messages in thread
From: Brice Goglin @ 2008-09-19 12:29 UTC (permalink / raw)
  To: David Miller; +Cc: Ben Hutchings, netdev

Ben Hutchings wrote:
> On Thu, 2008-09-18 at 19:38 -0700, David Miller wrote:
> [...]
>   
>> So on a multiqueue card with 2 RX queues and 2 TX queues we'd
>> have names like:
>>
>> 	eth0-rx-0
>> 	eth0-rx-1
>> 	eth0-tx-0
>> 	eth0-tx-1
>>
>> So let's make an effort to get this done right in 2.6.28 and meanwhile
>> Arjan can add the irqbalanced code.
>>     
>
> What about the case where an interrupt is shared between RX and TX
> completions?  Our hardware is very flexible in this regard, but based on
> performance testing prior to the introduction of TX multiqueue we
> currently allocate multiple interrupts for RX completions and share the
> first with TX completions.
>   

myri10ge uses the same interrupts for TX and RX. The current name is
eth%d:slice-%d but we could change it if there's a consensus.

Brice


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: multiqueue interrupts...
       [not found] ` <41b516cb0809191050t6c9783dele8926f697854bb1@mail.gmail.com>
@ 2008-09-19 18:18   ` Matthew Wilcox
  2008-09-19 20:14     ` David Miller
                       ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Matthew Wilcox @ 2008-09-19 18:18 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Arjan van de Ven

On Thu, Sep 18, 2008 at 7:38 PM, David Miller wrote:
> During kernel summit I was speaking with Arjan van de Ven
> about irqbalanced and networking card multiqueue interrupts.
> 
> In order for irqbalanaced to make smart decisions, what needs to
> happen in drivers is that the individual interrupts need to be
> named in such a way that he can tell by looking at /proc/interrupts
> output that these interrupts are related.
> 
> So on a multiqueue card with 2 RX queues and 2 TX queues we'd
> have names like:
> 
>        eth0-rx-0
>        eth0-rx-1
>        eth0-tx-0
>        eth0-tx-1
> 
> So let's make an effort to get this done right in 2.6.28 and meanwhile
> Arjan can add the irqbalanced code.

Instead of having magic names, how about we put something in
/proc/irq/nnn/ that lets us tell which interrupts are connected to which
queues?

Another idea I've been thinking about is a flag to tell irqbalance to
leave stuff alone, and we just set stuff up right the first time.

We were discussing various options around multiqueue at first the scsi
multiqueue BOF and later at the PCI MSI BOF.  There's a general feeling
that drivers should be given some guidance about how many queues they
should be enabling, and the sysadmin needs to be the one telling the
PCI layer, which drivers should then query.  The use cases vary wildly
depending whether you're doing routing or are an end node, whether
you're doing v12n or NUMA or both and on just how many cards and cpus
you have.

In a storage / NUMA configuration we really want to set up one queue per
cpu / package / node (depending on resource constraints) and know that
the interrupt is going to come back to the same cpu / package / node.
We definitely don't want irqbalanced moving the interrupt around.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: multiqueue interrupts...
  2008-09-19 11:38 ` Ben Hutchings
  2008-09-19 12:29   ` Brice Goglin
@ 2008-09-19 20:12   ` David Miller
  1 sibling, 0 replies; 13+ messages in thread
From: David Miller @ 2008-09-19 20:12 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Fri, 19 Sep 2008 12:38:01 +0100

> On Thu, 2008-09-18 at 19:38 -0700, David Miller wrote:
> [...]
> > So on a multiqueue card with 2 RX queues and 2 TX queues we'd
> > have names like:
> > 
> > 	eth0-rx-0
> > 	eth0-rx-1
> > 	eth0-tx-0
> > 	eth0-tx-1
> > 
> > So let's make an effort to get this done right in 2.6.28 and meanwhile
> > Arjan can add the irqbalanced code.
> 
> What about the case where an interrupt is shared between RX and TX
> completions?  Our hardware is very flexible in this regard, but based on
> performance testing prior to the introduction of TX multiqueue we
> currently allocate multiple interrupts for RX completions and share the
> first with TX completions.

Probably it would be sufficient to purely use eth0-N, it's just so
that irqbalanced knows that the interrupts are related.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: multiqueue interrupts...
  2008-09-19 12:29   ` Brice Goglin
@ 2008-09-19 20:12     ` David Miller
  0 siblings, 0 replies; 13+ messages in thread
From: David Miller @ 2008-09-19 20:12 UTC (permalink / raw)
  To: brice; +Cc: bhutchings, netdev

From: Brice Goglin <brice@myri.com>
Date: Fri, 19 Sep 2008 14:29:07 +0200

> Ben Hutchings wrote:
> > On Thu, 2008-09-18 at 19:38 -0700, David Miller wrote:
> > [...]
> >   
> >> So on a multiqueue card with 2 RX queues and 2 TX queues we'd
> >> have names like:
> >>
> >> 	eth0-rx-0
> >> 	eth0-rx-1
> >> 	eth0-tx-0
> >> 	eth0-tx-1
> >>
> >> So let's make an effort to get this done right in 2.6.28 and meanwhile
> >> Arjan can add the irqbalanced code.
> >>     
> >
> > What about the case where an interrupt is shared between RX and TX
> > completions?  Our hardware is very flexible in this regard, but based on
> > performance testing prior to the introduction of TX multiqueue we
> > currently allocate multiple interrupts for RX completions and share the
> > first with TX completions.
> >   
> 
> myri10ge uses the same interrupts for TX and RX. The current name is
> eth%d:slice-%d but we could change it if there's a consensus.

You might be OK as-is.  If we make irqbalanced look for X-%d, you'll
likely be fine.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: multiqueue interrupts...
  2008-09-19 18:18   ` Matthew Wilcox
@ 2008-09-19 20:14     ` David Miller
  2008-09-19 20:57     ` Brandeburg, Jesse
  2008-09-19 22:11     ` Arjan van de Ven
  2 siblings, 0 replies; 13+ messages in thread
From: David Miller @ 2008-09-19 20:14 UTC (permalink / raw)
  To: matthew; +Cc: netdev, arjan

From: Matthew Wilcox <matthew@wil.cx>
Date: Fri, 19 Sep 2008 12:18:41 -0600

> On Thu, Sep 18, 2008 at 7:38 PM, David Miller wrote:
> > During kernel summit I was speaking with Arjan van de Ven
> > about irqbalanced and networking card multiqueue interrupts.
> > 
> > In order for irqbalanaced to make smart decisions, what needs to
> > happen in drivers is that the individual interrupts need to be
> > named in such a way that he can tell by looking at /proc/interrupts
> > output that these interrupts are related.
> > 
> > So on a multiqueue card with 2 RX queues and 2 TX queues we'd
> > have names like:
> > 
> >        eth0-rx-0
> >        eth0-rx-1
> >        eth0-tx-0
> >        eth0-tx-1
> > 
> > So let's make an effort to get this done right in 2.6.28 and meanwhile
> > Arjan can add the irqbalanced code.
> 
> Instead of having magic names, how about we put something in
> /proc/irq/nnn/ that lets us tell which interrupts are connected to which
> queues?

More code for irqbalanced to write, to look for the files and
directories under there, etc.  Harder to make work with simple
individual driver backports, which you know dists are going to
want to do.

That's why.

No, this new subdirectory thing isn't a good idea.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: multiqueue interrupts...
  2008-09-19 18:18   ` Matthew Wilcox
  2008-09-19 20:14     ` David Miller
@ 2008-09-19 20:57     ` Brandeburg, Jesse
  2008-09-19 21:09       ` David Miller
  2008-09-19 22:11     ` Arjan van de Ven
  2 siblings, 1 reply; 13+ messages in thread
From: Brandeburg, Jesse @ 2008-09-19 20:57 UTC (permalink / raw)
  To: Matthew Wilcox, David Miller
  Cc: netdev, Arjan van de Ven, Brice Goglin, Ben Hutchings

Matthew Wilcox wrote:
>> So let's make an effort to get this done right in 2.6.28 and
>> meanwhile Arjan can add the irqbalanced code.
 
> Another idea I've been thinking about is a flag to tell irqbalance to
> leave stuff alone, and we just set stuff up right the first time.

There is already this flag called IRQF_NOBALANCING, at least I think
that's what we want.  irqbalanced's treatment of this flag is another
matter.
 
> We were discussing various options around multiqueue at first the scsi
> multiqueue BOF and later at the PCI MSI BOF.  There's a general
> feeling that drivers should be given some guidance about how many
> queues they should be enabling, and the sysadmin needs to be the one
> telling the PCI layer, which drivers should then query.  The use
> cases vary wildly depending whether you're doing routing or are an
> end node, whether you're doing v12n or NUMA or both and on just how
> many cards and cpus you have.

not a bad, idea, but I can appreciate why DaveM thinks this is
un-necessary.  However all we are left with right now is code changes or
module parameters when trying to configure the number of queues.

How about some new ethtool options having to do with multiqueue
configurations?  Here is a proposal.  I haven't spent much time thinking
about this before but here is an idea.

query multiqueue capabilities:
ethtool -q ethX

set multiqueue capabilities:
ethtool -Q tx N rx N int <fixedcpu|pairs|somethingelse?>

tx N and rx N are pretty self explanitory
int fixedcpu - each queue gets a cpu and is registered IRQF_NOBALANCING
int pairs - tx and rx queues are allocated per cpu and (probably) share
a vector
There should be others here, I'm not sure how/if we would want to make a
pluggable way to do this within ethtool's design without opening buffer
overlow kinds of holes.
 
> In a storage / NUMA configuration we really want to set up one queue
> per cpu / package / node (depending on resource constraints) and know
> that the interrupt is going to come back to the same cpu / package /
> node. We definitely don't want irqbalanced moving the interrupt
> around. 

ethtool doesn't help storage, but I read this on netdev anyway...

Jesse

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: multiqueue interrupts...
  2008-09-19 20:57     ` Brandeburg, Jesse
@ 2008-09-19 21:09       ` David Miller
  2008-09-19 21:15         ` Matthew Wilcox
  0 siblings, 1 reply; 13+ messages in thread
From: David Miller @ 2008-09-19 21:09 UTC (permalink / raw)
  To: jesse.brandeburg; +Cc: matthew, netdev, arjan, brice, bhutchings

From: "Brandeburg, Jesse" <jesse.brandeburg@intel.com>
Date: Fri, 19 Sep 2008 13:57:58 -0700

> set multiqueue capabilities:
> ethtool -Q tx N rx N int <fixedcpu|pairs|somethingelse?>

No way.

Let irqbalanced implement the policy.  Just configure as many
queues as possible and let it figure out what to do.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: multiqueue interrupts...
  2008-09-19 21:09       ` David Miller
@ 2008-09-19 21:15         ` Matthew Wilcox
  0 siblings, 0 replies; 13+ messages in thread
From: Matthew Wilcox @ 2008-09-19 21:15 UTC (permalink / raw)
  To: David Miller; +Cc: jesse.brandeburg, netdev, arjan, brice, bhutchings

On Fri, Sep 19, 2008 at 02:09:22PM -0700, David Miller wrote:
> Let irqbalanced implement the policy.  Just configure as many
> queues as possible and let it figure out what to do.

That isn't a great idea.  Some cards consume a significant amount of
resources per queue and the cost of each queue may be larger than the
benefit.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: multiqueue interrupts...
  2008-09-19 18:18   ` Matthew Wilcox
  2008-09-19 20:14     ` David Miller
  2008-09-19 20:57     ` Brandeburg, Jesse
@ 2008-09-19 22:11     ` Arjan van de Ven
  2008-09-19 22:24       ` Andy Fleming
  2 siblings, 1 reply; 13+ messages in thread
From: Arjan van de Ven @ 2008-09-19 22:11 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: David Miller, netdev

On Fri, 19 Sep 2008 12:18:41 -0600
Matthew Wilcox <matthew@wil.cx> wrote:
> 
> Another idea I've been thinking about is a flag to tell irqbalance to
> leave stuff alone, and we just set stuff up right the first time.

that's not a good answer. There are reasons for moving interrupts that
are a sysadmin choice (like power management policy that if the system
is seriously idle, that all interrupts go to one of the sockets so that
the others can stay in low power mode). Putting the policy in the kernel
to prohibit such admin choices sounds like a bad idea to me.

There are better ways to do what you want, for example by exposing a
"preferred cpu" somewhere so that irqbalance will place it there
"unless <...>". That is, if such kernel policy binding is right in the
first place

> In a storage / NUMA configuration we really want to set up one queue
> per cpu / package / node (depending on resource constraints) and know
> that the interrupt is going to come back to the same cpu / package /
> node. We definitely don't want irqbalanced moving the interrupt
> around.

irqbalance is NUMA aware and places a penalty on placing an interrupt
"wrongly". We can argue on how strong this penalty should be, but
thinking that irqbalance doesn't use the numa info the kernel exposes
is incorrect.



-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: multiqueue interrupts...
  2008-09-19 22:11     ` Arjan van de Ven
@ 2008-09-19 22:24       ` Andy Fleming
  2008-09-19 22:28         ` Arjan van de Ven
  0 siblings, 1 reply; 13+ messages in thread
From: Andy Fleming @ 2008-09-19 22:24 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Matthew Wilcox, David Miller, netdev

On Fri, Sep 19, 2008 at 5:11 PM, Arjan van de Ven <arjan@infradead.org> wrote:
> On Fri, 19 Sep 2008 12:18:41 -0600
> Matthew Wilcox <matthew@wil.cx> wrote:

>> In a storage / NUMA configuration we really want to set up one queue
>> per cpu / package / node (depending on resource constraints) and know
>> that the interrupt is going to come back to the same cpu / package /
>> node. We definitely don't want irqbalanced moving the interrupt
>> around.
>
> irqbalance is NUMA aware and places a penalty on placing an interrupt
> "wrongly". We can argue on how strong this penalty should be, but
> thinking that irqbalance doesn't use the numa info the kernel exposes
> is incorrect.
>

I'm only just now wading into this area, but I thought one of the
advantages of multiple hardware queues was that we don't have to worry
about multiple cpus trying to access the buffer rings at the same
time, thus eliminating locking.  If the driver can't rely on that,
don't we lose that advantage?

Andy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: multiqueue interrupts...
  2008-09-19 22:24       ` Andy Fleming
@ 2008-09-19 22:28         ` Arjan van de Ven
  0 siblings, 0 replies; 13+ messages in thread
From: Arjan van de Ven @ 2008-09-19 22:28 UTC (permalink / raw)
  To: Andy Fleming; +Cc: Matthew Wilcox, David Miller, netdev

On Fri, 19 Sep 2008 17:24:00 -0500
"Andy Fleming" <afleming@gmail.com> wrote:

> On Fri, Sep 19, 2008 at 5:11 PM, Arjan van de Ven
> <arjan@infradead.org> wrote:
> > On Fri, 19 Sep 2008 12:18:41 -0600
> > Matthew Wilcox <matthew@wil.cx> wrote:
> 
> >> In a storage / NUMA configuration we really want to set up one
> >> queue per cpu / package / node (depending on resource constraints)
> >> and know that the interrupt is going to come back to the same
> >> cpu / package / node. We definitely don't want irqbalanced moving
> >> the interrupt around.
> >
> > irqbalance is NUMA aware and places a penalty on placing an
> > interrupt "wrongly". We can argue on how strong this penalty should
> > be, but thinking that irqbalance doesn't use the numa info the
> > kernel exposes is incorrect.
> >
> 
> I'm only just now wading into this area, but I thought one of the
> advantages of multiple hardware queues was that we don't have to worry
> about multiple cpus trying to access the buffer rings at the same
> time, thus eliminating locking.  If the driver can't rely on that,
> don't we lose that advantage?

that's only true if you have at least the amount of queues as you have
logical cpus. Ask SGI about how many cpus they have in 3 years, and
then ask your NIC vendor how many queues they have planned for ;-)

and a per-cpu lock isn't really all THAT expensive.
the really big advantage is that you no longer cacheline bounce to hell
and back... and that you have either way.


-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2008-09-19 22:28 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-19  2:38 multiqueue interrupts David Miller
2008-09-19 11:38 ` Ben Hutchings
2008-09-19 12:29   ` Brice Goglin
2008-09-19 20:12     ` David Miller
2008-09-19 20:12   ` David Miller
     [not found] ` <41b516cb0809191050t6c9783dele8926f697854bb1@mail.gmail.com>
2008-09-19 18:18   ` Matthew Wilcox
2008-09-19 20:14     ` David Miller
2008-09-19 20:57     ` Brandeburg, Jesse
2008-09-19 21:09       ` David Miller
2008-09-19 21:15         ` Matthew Wilcox
2008-09-19 22:11     ` Arjan van de Ven
2008-09-19 22:24       ` Andy Fleming
2008-09-19 22:28         ` Arjan van de Ven

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).