From: Bill Fink <billfink@mindspring.com>
To: Andi Kleen <andi@firstfloor.org>
Cc: Neil Horman <nhorman@tuxdriver.com>,
Andrew Gallatin <gallatin@myri.com>,
Brice Goglin <Brice.Goglin@inria.fr>,
Linux Network Developers <netdev@vger.kernel.org>,
Yinghai Lu <yhlu.kernel@gmail.com>
Subject: Re: Receive side performance issue with multi-10-GigE and NUMA
Date: Wed, 12 Aug 2009 00:30:49 -0400 [thread overview]
Message-ID: <20090812003049.185cd52a.billfink@mindspring.com> (raw)
In-Reply-To: <87ws5af0km.fsf@basil.nowhere.org>
On Wed, 12 Aug 2009, Andi Kleen wrote:
> Bill Fink <billfink@mindspring.com> writes:
> >
> > I originally tried to just use alloc_pages_node() instead of alloc_pages(),
> > but it didn't help. As mentioned in an earlier e-mail, that seems to
> > be because I discovered that doing:
> >
> > find /sys -name numa_node -exec grep . {} /dev/null \;
> >
> > revealed that the NUMA node associated with _all_ the PCI devices was
> > always 0, when at least some of them should have been associated with
> > NUMA node 2, including 6 of the 12 Myricom 10-GigE devices.
>
> > I discovered today that the NUMA node cpulist/cpumap is also wrong.
> > A cat of /sys/devices/system/node/node0/cpulist returns "0-7" (with a
> > cpumask of 00000000,000000ff), while the cpulist for node2 is empty
> > (with a cpumask of 00000000,00000000). The distance is correct,
> > with "10 20" for node 0 and "20 10" for node2.
>
> When the CPU nodes are not correct the device nodes are unlikely
> to correct either. In fact your system likely has no node 1 configured,
> right?
That was right. There was no node 1, only nodes 0 and 2.
> This information comes from the BIOS. So either your BIOS is broken
> or you simply didn't enable NUMA mode in the BIOS, but configured
> memory interleaving.
>
> If you post dmesg output somewhere I can take a look.
I did have NUMA enabled, and memory was configured as independent
rather than interleaved.
Based on all the discussions, it seemed a good possibility that the
BIOS was broken. Today a colleague checked the SuperMicro site, and
discovered and installed a newer version of the BIOS. Things seem
better now, but not totally correct.
There are now NUMA nodes 0 and 1 instead of 0 and 2, and the CPUs
for node 0 are 0 through 3 while the CPUs for node 1 are 4 through 7
(previously the even CPUs were on the first Xeon 5580 processor while
the odd CPUs were on the second processor).
[root@xeontest1 ~]# numastat
node0 node1
numa_hit 28087735 27195340
numa_miss 0 0
numa_foreign 0 0
interleave_hit 12065 11978
local_node 28081559 27182572
other_node 6176 12768
[root@xeontest1 ~]# grep 'physical id' /proc/cpuinfo
physical id : 0
physical id : 0
physical id : 0
physical id : 0
physical id : 1
physical id : 1
physical id : 1
physical id : 1
[root@xeontest1 ~]# cat /sys/devices/system/node/node0/cpulist
0-3
[root@xeontest1 ~]# cat /sys/devices/system/node/node1/cpulist
4-7
But _all_ the PCI devices are still just on node 0.
[root@xeontest1 ~]# find /sys -name numa_node -exec grep . {} /dev/null \;
shows numa_node is always 0.
[root@xeontest1 ~]# find /sys -name local_cpulist -exec grep . {} /dev/null \;
shows local_cpulist is always 0-3.
I now can get basically the same level of aggregate receive side
performance (55 Gbps) without my patch that I could previously get
only with my hacked workaround in the myri10ge driver. But this
still seems significantly subpar to what I believe it should be
capable of.
BTW when I first booted the test system after upgrading the BIOS,
I got a kernel oops because it was still using my hacked myri10ge
driver, and apparently it didn't like that I was specifying to
use a then nonexistent node 2 (I was checking for success of the
alloc_pages_node() call and falling back to the original alloc_pages()
call on failure). Or it could have been on the __alloc_skb() call
where I had a similar hack for the skb allocation.
Are you still interested in me posting the dmesg output?
-Thanks
-Bill
next prev parent reply other threads:[~2009-08-12 4:30 UTC|newest]
Thread overview: 95+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-07 21:06 Receive side performance issue with multi-10-GigE and NUMA Bill Fink
2009-08-07 21:18 ` Brice Goglin
2009-08-07 21:51 ` Bill Fink
2009-08-07 21:53 ` Brice Goglin
2009-08-07 22:08 ` Bill Fink
2009-08-07 22:17 ` Brice Goglin
2009-08-07 22:55 ` Bill Fink
2009-08-08 1:03 ` Andrew Gallatin
2009-08-08 1:35 ` Bill Fink
2009-08-08 11:08 ` Andrew Gallatin
2009-08-08 11:26 ` Neil Horman
2009-08-08 18:21 ` Andrew Gallatin
2009-08-08 18:32 ` Neil Horman
2009-08-11 7:32 ` Bill Fink
2009-08-11 11:02 ` Neil Horman
2009-08-11 19:15 ` Christoph Lameter
2009-08-11 22:27 ` Andi Kleen
2009-08-12 4:30 ` Bill Fink [this message]
2009-08-12 7:21 ` Andi Kleen
[not found] ` <4A856781.2080301@myri.com>
2009-08-14 16:38 ` Bill Fink
2009-08-14 16:55 ` Andrew Gallatin
2009-08-14 21:13 ` Aviv Greenberg
2009-08-20 7:26 ` Bill Fink
2009-08-20 13:14 ` Ben Hutchings
2009-08-21 4:00 ` Bill Fink
2009-08-20 13:17 ` Aviv Greenberg
2009-08-12 0:02 ` Brandeburg, Jesse
2009-08-12 4:38 ` Bill Fink
2009-08-12 16:00 ` Jesse Barnes
2009-08-14 20:31 ` Bill Fink
2009-08-17 16:53 ` Jesse Barnes
2009-08-18 7:07 ` Bill Fink
2009-08-18 11:54 ` Andrew Gallatin
2009-08-19 17:59 ` Bill Fink
2009-08-07 22:12 ` Neil Horman
2009-08-08 0:54 ` Bill Fink
2009-08-08 1:56 ` Neil Horman
2009-08-14 20:44 ` Bill Fink
2009-08-14 23:25 ` Neil Horman
2009-08-20 7:50 ` Bill Fink
2009-08-20 20:19 ` Neil Horman
2009-08-21 4:14 ` Bill Fink
2009-08-21 15:23 ` Neil Horman
2009-08-21 15:36 ` Andrew Gallatin
2009-08-26 7:10 ` Bill Fink
2009-08-26 11:00 ` Neil Horman
2009-08-26 18:08 ` Neil Horman
2009-08-26 18:15 ` Ingo Molnar
2009-08-26 19:04 ` Neil Horman
2009-08-26 19:08 ` Ingo Molnar
2009-08-26 19:36 ` David Miller
2009-08-26 19:48 ` Ingo Molnar
2009-08-26 20:23 ` Neil Horman
2009-08-26 20:40 ` Ingo Molnar
2009-08-26 22:39 ` Neil Horman
2009-08-26 22:44 ` David Miller
2009-08-26 23:05 ` Ingo Molnar
2009-08-26 23:08 ` David Miller
2009-08-26 23:58 ` Ingo Molnar
2009-08-27 0:05 ` Steven Rostedt
2009-08-27 0:35 ` Christoph Hellwig
2009-08-27 9:28 ` Ingo Molnar
2009-08-26 23:05 ` Steven Rostedt
2009-08-26 23:09 ` David Miller
2009-08-26 23:30 ` Ingo Molnar
2009-08-26 23:23 ` Neil Horman
2009-08-26 23:29 ` David Miller
2009-08-26 23:19 ` Neil Horman
2009-08-26 23:14 ` Ingo Molnar
2009-08-26 23:33 ` Steven Rostedt
2009-08-27 0:14 ` Neil Horman
2009-08-27 0:29 ` Steven Rostedt
2009-08-27 1:17 ` Neil Horman
2009-08-27 9:06 ` Ingo Molnar
2009-08-27 9:34 ` Ingo Molnar
2009-08-27 0:34 ` Christoph Hellwig
2009-08-27 0:30 ` blktrace ftrace plugin, was " Christoph Hellwig
2009-08-27 5:26 ` Jens Axboe
2009-08-27 9:12 ` Ingo Molnar
2009-08-27 9:14 ` Jens Axboe
2009-08-27 13:55 ` Arnaldo Carvalho de Melo
2009-08-28 2:03 ` Li Zefan
2009-08-26 23:46 ` Frederic Weisbecker
2009-08-26 20:28 ` Ingo Molnar
2009-08-26 20:01 ` Neil Horman
2009-08-26 22:57 ` Ingo Molnar
2009-08-27 17:32 ` Bill Fink
2009-09-02 5:28 ` Bill Fink
2009-08-27 17:44 ` Bill Fink
2009-08-27 17:51 ` Neil Horman
2009-09-02 5:11 ` Bill Fink
2009-09-02 10:49 ` Neil Horman
2009-09-02 15:38 ` Bill Fink
2009-08-12 23:29 ` David Miller
2009-08-13 2:35 ` Bill Fink
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090812003049.185cd52a.billfink@mindspring.com \
--to=billfink@mindspring.com \
--cc=Brice.Goglin@inria.fr \
--cc=andi@firstfloor.org \
--cc=gallatin@myri.com \
--cc=netdev@vger.kernel.org \
--cc=nhorman@tuxdriver.com \
--cc=yhlu.kernel@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.