From: Bill Fink <billfink@mindspring.com>
To: Andi Kleen <andi@firstfloor.org>
Cc: Neil Horman <nhorman@tuxdriver.com>,
Andrew Gallatin <gallatin@myri.com>,
Brice Goglin <Brice.Goglin@inria.fr>,
Linux Network Developers <netdev@vger.kernel.org>,
Yinghai Lu <yhlu.kernel@gmail.com>
Subject: Re: Receive side performance issue with multi-10-GigE and NUMA
Date: Wed, 12 Aug 2009 00:30:49 -0400 [thread overview]
Message-ID: <20090812003049.185cd52a.billfink@mindspring.com> (raw)
In-Reply-To: <87ws5af0km.fsf@basil.nowhere.org>
On Wed, 12 Aug 2009, Andi Kleen wrote:
> Bill Fink <billfink@mindspring.com> writes:
> >
> > I originally tried to just use alloc_pages_node() instead of alloc_pages(),
> > but it didn't help. As mentioned in an earlier e-mail, that seems to
> > be because I discovered that doing:
> >
> > find /sys -name numa_node -exec grep . {} /dev/null \;
> >
> > revealed that the NUMA node associated with _all_ the PCI devices was
> > always 0, when at least some of them should have been associated with
> > NUMA node 2, including 6 of the 12 Myricom 10-GigE devices.
>
> > I discovered today that the NUMA node cpulist/cpumap is also wrong.
> > A cat of /sys/devices/system/node/node0/cpulist returns "0-7" (with a
> > cpumask of 00000000,000000ff), while the cpulist for node2 is empty
> > (with a cpumask of 00000000,00000000). The distance is correct,
> > with "10 20" for node 0 and "20 10" for node2.
>
> When the CPU nodes are not correct the device nodes are unlikely
> to correct either. In fact your system likely has no node 1 configured,
> right?
That was right. There was no node 1, only nodes 0 and 2.
> This information comes from the BIOS. So either your BIOS is broken
> or you simply didn't enable NUMA mode in the BIOS, but configured
> memory interleaving.
>
> If you post dmesg output somewhere I can take a look.
I did have NUMA enabled, and memory was configured as independent
rather than interleaved.
Based on all the discussions, it seemed a good possibility that the
BIOS was broken. Today a colleague checked the SuperMicro site, and
discovered and installed a newer version of the BIOS. Things seem
better now, but not totally correct.
There are now NUMA nodes 0 and 1 instead of 0 and 2, and the CPUs
for node 0 are 0 through 3 while the CPUs for node 1 are 4 through 7
(previously the even CPUs were on the first Xeon 5580 processor while
the odd CPUs were on the second processor).
[root@xeontest1 ~]# numastat
node0 node1
numa_hit 28087735 27195340
numa_miss 0 0
numa_foreign 0 0
interleave_hit 12065 11978
local_node 28081559 27182572
other_node 6176 12768
[root@xeontest1 ~]# grep 'physical id' /proc/cpuinfo
physical id : 0
physical id : 0
physical id : 0
physical id : 0
physical id : 1
physical id : 1
physical id : 1
physical id : 1
[root@xeontest1 ~]# cat /sys/devices/system/node/node0/cpulist
0-3
[root@xeontest1 ~]# cat /sys/devices/system/node/node1/cpulist
4-7
But _all_ the PCI devices are still just on node 0.
[root@xeontest1 ~]# find /sys -name numa_node -exec grep . {} /dev/null \;
shows numa_node is always 0.
[root@xeontest1 ~]# find /sys -name local_cpulist -exec grep . {} /dev/null \;
shows local_cpulist is always 0-3.
I now can get basically the same level of aggregate receive side
performance (55 Gbps) without my patch that I could previously get
only with my hacked workaround in the myri10ge driver. But this
still seems significantly subpar to what I believe it should be
capable of.
BTW when I first booted the test system after upgrading the BIOS,
I got a kernel oops because it was still using my hacked myri10ge
driver, and apparently it didn't like that I was specifying to
use a then nonexistent node 2 (I was checking for success of the
alloc_pages_node() call and falling back to the original alloc_pages()
call on failure). Or it could have been on the __alloc_skb() call
where I had a similar hack for the skb allocation.
Are you still interested in me posting the dmesg output?
-Thanks
-Bill
next prev parent reply other threads:[~2009-08-12 4:30 UTC|newest]
Thread overview: 89+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-07 21:06 Receive side performance issue with multi-10-GigE and NUMA Bill Fink
2009-08-07 21:18 ` Brice Goglin
2009-08-07 21:51 ` Bill Fink
2009-08-07 21:53 ` Brice Goglin
2009-08-07 22:08 ` Bill Fink
2009-08-07 22:17 ` Brice Goglin
2009-08-07 22:55 ` Bill Fink
2009-08-08 1:03 ` Andrew Gallatin
2009-08-08 1:35 ` Bill Fink
2009-08-08 11:08 ` Andrew Gallatin
2009-08-08 11:26 ` Neil Horman
2009-08-08 18:21 ` Andrew Gallatin
2009-08-08 18:32 ` Neil Horman
2009-08-11 7:32 ` Bill Fink
2009-08-11 11:02 ` Neil Horman
2009-08-11 19:15 ` Christoph Lameter
2009-08-11 22:27 ` Andi Kleen
2009-08-12 4:30 ` Bill Fink [this message]
2009-08-12 7:21 ` Andi Kleen
[not found] ` <4A856781.2080301@myri.com>
2009-08-14 16:38 ` Bill Fink
2009-08-14 16:55 ` Andrew Gallatin
2009-08-14 21:13 ` Aviv Greenberg
2009-08-20 7:26 ` Bill Fink
2009-08-20 13:14 ` Ben Hutchings
2009-08-21 4:00 ` Bill Fink
2009-08-20 13:17 ` Aviv Greenberg
2009-08-12 0:02 ` Brandeburg, Jesse
2009-08-12 4:38 ` Bill Fink
2009-08-12 16:00 ` Jesse Barnes
2009-08-14 20:31 ` Bill Fink
2009-08-17 16:53 ` Jesse Barnes
2009-08-18 7:07 ` Bill Fink
2009-08-18 11:54 ` Andrew Gallatin
2009-08-19 17:59 ` Bill Fink
2009-08-07 22:12 ` Neil Horman
2009-08-08 0:54 ` Bill Fink
2009-08-08 1:56 ` Neil Horman
2009-08-14 20:44 ` Bill Fink
2009-08-14 23:25 ` Neil Horman
2009-08-20 7:50 ` Bill Fink
2009-08-20 20:19 ` Neil Horman
2009-08-21 4:14 ` Bill Fink
2009-08-21 15:23 ` Neil Horman
2009-08-21 15:36 ` Andrew Gallatin
2009-08-26 7:10 ` Bill Fink
2009-08-26 11:00 ` Neil Horman
2009-08-26 18:08 ` Neil Horman
2009-08-26 18:15 ` Ingo Molnar
2009-08-26 19:04 ` Neil Horman
2009-08-26 19:08 ` Ingo Molnar
2009-08-26 19:36 ` David Miller
2009-08-26 19:48 ` Ingo Molnar
2009-08-26 20:23 ` Neil Horman
2009-08-26 20:40 ` Ingo Molnar
2009-08-26 22:39 ` Neil Horman
2009-08-26 22:44 ` David Miller
2009-08-26 23:05 ` Ingo Molnar
2009-08-26 23:08 ` David Miller
2009-08-26 23:58 ` Ingo Molnar
2009-08-27 0:05 ` Steven Rostedt
2009-08-27 0:35 ` Christoph Hellwig
2009-08-27 9:28 ` Ingo Molnar
2009-08-26 23:05 ` Steven Rostedt
2009-08-26 23:09 ` David Miller
2009-08-26 23:30 ` Ingo Molnar
2009-08-26 23:23 ` Neil Horman
2009-08-26 23:29 ` David Miller
2009-08-26 23:19 ` Neil Horman
2009-08-26 23:14 ` Ingo Molnar
2009-08-26 23:33 ` Steven Rostedt
2009-08-27 0:14 ` Neil Horman
2009-08-27 0:29 ` Steven Rostedt
2009-08-27 1:17 ` Neil Horman
2009-08-27 9:06 ` Ingo Molnar
2009-08-27 9:34 ` Ingo Molnar
2009-08-27 0:34 ` Christoph Hellwig
2009-08-26 23:46 ` Frederic Weisbecker
2009-08-26 20:28 ` Ingo Molnar
2009-08-26 20:01 ` Neil Horman
2009-08-26 22:57 ` Ingo Molnar
2009-08-27 17:32 ` Bill Fink
2009-09-02 5:28 ` Bill Fink
2009-08-27 17:44 ` Bill Fink
2009-08-27 17:51 ` Neil Horman
2009-09-02 5:11 ` Bill Fink
2009-09-02 10:49 ` Neil Horman
2009-09-02 15:38 ` Bill Fink
2009-08-12 23:29 ` David Miller
2009-08-13 2:35 ` Bill Fink
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090812003049.185cd52a.billfink@mindspring.com \
--to=billfink@mindspring.com \
--cc=Brice.Goglin@inria.fr \
--cc=andi@firstfloor.org \
--cc=gallatin@myri.com \
--cc=netdev@vger.kernel.org \
--cc=nhorman@tuxdriver.com \
--cc=yhlu.kernel@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).