From: Andrew Gallatin <gallatin@myri.com>
To: Neil Horman <nhorman@tuxdriver.com>
Cc: Bill Fink <billfink@mindspring.com>,
Brice Goglin <Brice.Goglin@inria.fr>,
Linux Network Developers <netdev@vger.kernel.org>,
Yinghai Lu <yhlu.kernel@gmail.com>
Subject: Re: Receive side performance issue with multi-10-GigE and NUMA
Date: Sat, 08 Aug 2009 14:21:36 -0400 [thread overview]
Message-ID: <4A7DC230.6060206@myri.com> (raw)
In-Reply-To: <20090808112636.GB18518@localhost.localdomain>
Neil Horman wrote:
> On Sat, Aug 08, 2009 at 07:08:20AM -0400, Andrew Gallatin wrote:
>> Bill Fink wrote:
>>> On Fri, 07 Aug 2009, Andrew Gallatin wrote:
>>>
>>>> Bill Fink wrote:
>>>>
>>>>> All sysfs local_cpus values are the same (00000000,000000ff),
>>>>> so yes they are also wrong.
>>>> How were you handling IRQ binding? If local_cpus is wrong,
>>>> the irqbalance will not be able to make good decisions about
>>>> where to bind the NICs' IRQs. Did you try manually binding
>>>> each NICs's interrupt to a separate CPU on the correct node?
>>> Yes, all the NIC IRQs were bound to a CPU on the local NUMA node,
>>> and the nuttcp application had its CPU affinity set to the same
>>> CPU with its memory affinity bound to the same local NUMA node.
>>> And the irqbalance daemon wasn't running.
>> I must be misunderstanding something. I had thought that
>> alloc_pages() on NUMA would wind up doing alloc_pages_current(), which
>> would allocate based on default policy which (if not interleaved)
>> should allocate from the current NUMA node. And since restocking the
>> RX ring happens from a the driver's NAPI softirq context, then it
>> should always be restocking on the same node the memory is destined to
>> be consumed on.
>>
>> Do I just not understand how alloc_pages() works on NUMA?
>>
>
> Thats how alloc_works, but most drivers use netdev_alloc_skb to refill their rx
> ring in their napi context. netdev_alloc_skb specifically allocates an skb from
> memory in the node that the actually NIC is local to (rather than the cpu that
> the interrupt is running on). That cuts out cross numa node chatter when the
> device is dma-ing a frame from the hardware to the allocated skb. The offshoot
> of that however (especially in 10G cards with lots of rx queues whos interrupts
> are spread out through the system) is that the irq affinity for a given irq has
> an increased risk of not being on the same node as the skb memory. The ftrace
> module I referenced earlier will help illustrate this, as well as cases where
> its causing applications to run on processors that create lots of cross-node
> chatter.
One thing worth noting is that myri10ge is rather unusual in that
it fills its RX rings with pages, then attaches them to skbs after
the receive is done. Given how (I think) alloc_page() works, I
don't understand why correct CPU binding does not have the same
benefit as Bill's patch to assign the NUMA node manually.
I'm certainly willing to change to myri10ge to use alloc_pages_node()
based on NIC locality, if that provides a benefit, but I'd really
like to understand why CPU binding is not helping.
Drew
next prev parent reply other threads:[~2009-08-08 18:22 UTC|newest]
Thread overview: 89+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-07 21:06 Receive side performance issue with multi-10-GigE and NUMA Bill Fink
2009-08-07 21:18 ` Brice Goglin
2009-08-07 21:51 ` Bill Fink
2009-08-07 21:53 ` Brice Goglin
2009-08-07 22:08 ` Bill Fink
2009-08-07 22:17 ` Brice Goglin
2009-08-07 22:55 ` Bill Fink
2009-08-08 1:03 ` Andrew Gallatin
2009-08-08 1:35 ` Bill Fink
2009-08-08 11:08 ` Andrew Gallatin
2009-08-08 11:26 ` Neil Horman
2009-08-08 18:21 ` Andrew Gallatin [this message]
2009-08-08 18:32 ` Neil Horman
2009-08-11 7:32 ` Bill Fink
2009-08-11 11:02 ` Neil Horman
2009-08-11 19:15 ` Christoph Lameter
2009-08-11 22:27 ` Andi Kleen
2009-08-12 4:30 ` Bill Fink
2009-08-12 7:21 ` Andi Kleen
[not found] ` <4A856781.2080301@myri.com>
2009-08-14 16:38 ` Bill Fink
2009-08-14 16:55 ` Andrew Gallatin
2009-08-14 21:13 ` Aviv Greenberg
2009-08-20 7:26 ` Bill Fink
2009-08-20 13:14 ` Ben Hutchings
2009-08-21 4:00 ` Bill Fink
2009-08-20 13:17 ` Aviv Greenberg
2009-08-12 0:02 ` Brandeburg, Jesse
2009-08-12 4:38 ` Bill Fink
2009-08-12 16:00 ` Jesse Barnes
2009-08-14 20:31 ` Bill Fink
2009-08-17 16:53 ` Jesse Barnes
2009-08-18 7:07 ` Bill Fink
2009-08-18 11:54 ` Andrew Gallatin
2009-08-19 17:59 ` Bill Fink
2009-08-07 22:12 ` Neil Horman
2009-08-08 0:54 ` Bill Fink
2009-08-08 1:56 ` Neil Horman
2009-08-14 20:44 ` Bill Fink
2009-08-14 23:25 ` Neil Horman
2009-08-20 7:50 ` Bill Fink
2009-08-20 20:19 ` Neil Horman
2009-08-21 4:14 ` Bill Fink
2009-08-21 15:23 ` Neil Horman
2009-08-21 15:36 ` Andrew Gallatin
2009-08-26 7:10 ` Bill Fink
2009-08-26 11:00 ` Neil Horman
2009-08-26 18:08 ` Neil Horman
2009-08-26 18:15 ` Ingo Molnar
2009-08-26 19:04 ` Neil Horman
2009-08-26 19:08 ` Ingo Molnar
2009-08-26 19:36 ` David Miller
2009-08-26 19:48 ` Ingo Molnar
2009-08-26 20:23 ` Neil Horman
2009-08-26 20:40 ` Ingo Molnar
2009-08-26 22:39 ` Neil Horman
2009-08-26 22:44 ` David Miller
2009-08-26 23:05 ` Ingo Molnar
2009-08-26 23:08 ` David Miller
2009-08-26 23:58 ` Ingo Molnar
2009-08-27 0:05 ` Steven Rostedt
2009-08-27 0:35 ` Christoph Hellwig
2009-08-27 9:28 ` Ingo Molnar
2009-08-26 23:05 ` Steven Rostedt
2009-08-26 23:09 ` David Miller
2009-08-26 23:30 ` Ingo Molnar
2009-08-26 23:23 ` Neil Horman
2009-08-26 23:29 ` David Miller
2009-08-26 23:19 ` Neil Horman
2009-08-26 23:14 ` Ingo Molnar
2009-08-26 23:33 ` Steven Rostedt
2009-08-27 0:14 ` Neil Horman
2009-08-27 0:29 ` Steven Rostedt
2009-08-27 1:17 ` Neil Horman
2009-08-27 9:06 ` Ingo Molnar
2009-08-27 9:34 ` Ingo Molnar
2009-08-27 0:34 ` Christoph Hellwig
2009-08-26 23:46 ` Frederic Weisbecker
2009-08-26 20:28 ` Ingo Molnar
2009-08-26 20:01 ` Neil Horman
2009-08-26 22:57 ` Ingo Molnar
2009-08-27 17:32 ` Bill Fink
2009-09-02 5:28 ` Bill Fink
2009-08-27 17:44 ` Bill Fink
2009-08-27 17:51 ` Neil Horman
2009-09-02 5:11 ` Bill Fink
2009-09-02 10:49 ` Neil Horman
2009-09-02 15:38 ` Bill Fink
2009-08-12 23:29 ` David Miller
2009-08-13 2:35 ` Bill Fink
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A7DC230.6060206@myri.com \
--to=gallatin@myri.com \
--cc=Brice.Goglin@inria.fr \
--cc=billfink@mindspring.com \
--cc=netdev@vger.kernel.org \
--cc=nhorman@tuxdriver.com \
--cc=yhlu.kernel@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).