netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bill Fink <billfink@mindspring.com>
To: Andi Kleen <andi@firstfloor.org>
Cc: Neil Horman <nhorman@tuxdriver.com>,
	Andrew Gallatin <gallatin@myri.com>,
	Brice Goglin <Brice.Goglin@inria.fr>,
	Linux Network Developers <netdev@vger.kernel.org>,
	Yinghai Lu <yhlu.kernel@gmail.com>
Subject: Re: Receive side performance issue with multi-10-GigE and NUMA
Date: Wed, 12 Aug 2009 00:30:49 -0400	[thread overview]
Message-ID: <20090812003049.185cd52a.billfink@mindspring.com> (raw)
In-Reply-To: <87ws5af0km.fsf@basil.nowhere.org>

On Wed, 12 Aug 2009, Andi Kleen wrote:

> Bill Fink <billfink@mindspring.com> writes:
> >
> > I originally tried to just use alloc_pages_node() instead of alloc_pages(),
> > but it didn't help.  As mentioned in an earlier e-mail, that seems to
> > be because I discovered that doing:
> >
> > 	find /sys -name numa_node -exec grep . {} /dev/null \;
> >
> > revealed that the NUMA node associated with _all_ the PCI devices was
> > always 0, when at least some of them should have been associated with
> > NUMA node 2, including 6 of the 12 Myricom 10-GigE devices.
> 
> > I discovered today that the NUMA node cpulist/cpumap is also wrong.
> > A cat of /sys/devices/system/node/node0/cpulist returns "0-7" (with a
> > cpumask of 00000000,000000ff), while the cpulist for node2 is empty
> > (with a cpumask of 00000000,00000000).  The distance is correct,
> > with "10 20" for node 0 and "20 10" for node2.
> 
> When the CPU nodes are not correct the device nodes are unlikely
> to correct either. In fact your system likely has no node 1 configured, 
> right?

That was right.  There was no node 1, only nodes 0 and 2.

> This information comes from the BIOS. So either your BIOS is broken
> or you simply didn't enable NUMA mode in the BIOS, but configured
> memory interleaving.
> 
> If you post dmesg output somewhere I can take a look.

I did have NUMA enabled, and memory was configured as independent
rather than interleaved.

Based on all the discussions, it seemed a good possibility that the
BIOS was broken.  Today a colleague checked the SuperMicro site, and
discovered and installed a newer version of the BIOS.  Things seem
better now, but not totally correct.

There are now NUMA nodes 0 and 1 instead of 0 and 2, and the CPUs
for node 0 are 0 through 3 while the CPUs for node 1 are 4 through 7
(previously the even CPUs were on the first Xeon 5580 processor while
the odd CPUs were on the second processor).

[root@xeontest1 ~]# numastat
                           node0           node1
numa_hit                28087735        27195340
numa_miss                      0               0
numa_foreign                   0               0
interleave_hit             12065           11978
local_node              28081559        27182572
other_node                  6176           12768

[root@xeontest1 ~]# grep 'physical id' /proc/cpuinfo
physical id     : 0
physical id     : 0
physical id     : 0
physical id     : 0
physical id     : 1
physical id     : 1
physical id     : 1
physical id     : 1

[root@xeontest1 ~]# cat /sys/devices/system/node/node0/cpulist
0-3
[root@xeontest1 ~]# cat /sys/devices/system/node/node1/cpulist
4-7

But _all_ the PCI devices are still just on node 0.

[root@xeontest1 ~]# find /sys -name numa_node -exec grep . {} /dev/null \;

shows numa_node is always 0.

[root@xeontest1 ~]# find /sys -name local_cpulist -exec grep . {} /dev/null \;

shows local_cpulist is always 0-3.

I now can get basically the same level of aggregate receive side
performance (55 Gbps) without my patch that I could previously get
only with my hacked workaround in the myri10ge driver.  But this
still seems significantly subpar to what I believe it should be
capable of.

BTW when I first booted the test system after upgrading the BIOS,
I got a kernel oops because it was still using my hacked myri10ge
driver, and apparently it didn't like that I was specifying to
use a then nonexistent node 2 (I was checking for success of the
alloc_pages_node() call and falling back to the original alloc_pages()
call on failure).  Or it could have been on the __alloc_skb() call
where I had a similar hack for the skb allocation.

Are you still interested in me posting the dmesg output?

						-Thanks

						-Bill

  reply	other threads:[~2009-08-12  4:30 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-07 21:06 Receive side performance issue with multi-10-GigE and NUMA Bill Fink
2009-08-07 21:18 ` Brice Goglin
2009-08-07 21:51   ` Bill Fink
2009-08-07 21:53     ` Brice Goglin
2009-08-07 22:08       ` Bill Fink
2009-08-07 22:17         ` Brice Goglin
2009-08-07 22:55           ` Bill Fink
2009-08-08  1:03     ` Andrew Gallatin
2009-08-08  1:35       ` Bill Fink
2009-08-08 11:08         ` Andrew Gallatin
2009-08-08 11:26           ` Neil Horman
2009-08-08 18:21             ` Andrew Gallatin
2009-08-08 18:32               ` Neil Horman
2009-08-11  7:32                 ` Bill Fink
2009-08-11 11:02                   ` Neil Horman
2009-08-11 19:15                     ` Christoph Lameter
2009-08-11 22:27                   ` Andi Kleen
2009-08-12  4:30                     ` Bill Fink [this message]
2009-08-12  7:21                       ` Andi Kleen
     [not found]                       ` <4A856781.2080301@myri.com>
2009-08-14 16:38                         ` Bill Fink
2009-08-14 16:55                           ` Andrew Gallatin
2009-08-14 21:13                             ` Aviv Greenberg
2009-08-20  7:26                               ` Bill Fink
2009-08-20 13:14                                 ` Ben Hutchings
2009-08-21  4:00                                   ` Bill Fink
2009-08-20 13:17                                 ` Aviv Greenberg
2009-08-12  0:02                   ` Brandeburg, Jesse
2009-08-12  4:38                     ` Bill Fink
2009-08-12 16:00                       ` Jesse Barnes
2009-08-14 20:31                       ` Bill Fink
2009-08-17 16:53                         ` Jesse Barnes
2009-08-18  7:07                           ` Bill Fink
2009-08-18 11:54                             ` Andrew Gallatin
2009-08-19 17:59                               ` Bill Fink
2009-08-07 22:12 ` Neil Horman
2009-08-08  0:54   ` Bill Fink
2009-08-08  1:56     ` Neil Horman
2009-08-14 20:44       ` Bill Fink
2009-08-14 23:25         ` Neil Horman
2009-08-20  7:50           ` Bill Fink
2009-08-20 20:19             ` Neil Horman
2009-08-21  4:14               ` Bill Fink
2009-08-21 15:23                 ` Neil Horman
2009-08-21 15:36                   ` Andrew Gallatin
2009-08-26  7:10                   ` Bill Fink
2009-08-26 11:00                     ` Neil Horman
2009-08-26 18:08                       ` Neil Horman
2009-08-26 18:15                         ` Ingo Molnar
2009-08-26 19:04                           ` Neil Horman
2009-08-26 19:08                             ` Ingo Molnar
2009-08-26 19:36                               ` David Miller
2009-08-26 19:48                                 ` Ingo Molnar
2009-08-26 20:23                                   ` Neil Horman
2009-08-26 20:40                                     ` Ingo Molnar
2009-08-26 22:39                                       ` Neil Horman
2009-08-26 22:44                                         ` David Miller
2009-08-26 23:05                                           ` Ingo Molnar
2009-08-26 23:08                                             ` David Miller
2009-08-26 23:58                                               ` Ingo Molnar
2009-08-27  0:05                                                 ` Steven Rostedt
2009-08-27  0:35                                                 ` Christoph Hellwig
2009-08-27  9:28                                                   ` Ingo Molnar
2009-08-26 23:05                                           ` Steven Rostedt
2009-08-26 23:09                                             ` David Miller
2009-08-26 23:30                                               ` Ingo Molnar
2009-08-26 23:23                                             ` Neil Horman
2009-08-26 23:29                                               ` David Miller
2009-08-26 23:19                                           ` Neil Horman
2009-08-26 23:14                                         ` Ingo Molnar
2009-08-26 23:33                                         ` Steven Rostedt
2009-08-27  0:14                                           ` Neil Horman
2009-08-27  0:29                                             ` Steven Rostedt
2009-08-27  1:17                                               ` Neil Horman
2009-08-27  9:06                                                 ` Ingo Molnar
2009-08-27  9:34                                               ` Ingo Molnar
2009-08-27  0:34                                         ` Christoph Hellwig
2009-08-26 23:46                                     ` Frederic Weisbecker
2009-08-26 20:28                                   ` Ingo Molnar
2009-08-26 20:01                               ` Neil Horman
2009-08-26 22:57                                 ` Ingo Molnar
2009-08-27 17:32                         ` Bill Fink
2009-09-02  5:28                           ` Bill Fink
2009-08-27 17:44                         ` Bill Fink
2009-08-27 17:51                           ` Neil Horman
2009-09-02  5:11                             ` Bill Fink
2009-09-02 10:49                               ` Neil Horman
2009-09-02 15:38                                 ` Bill Fink
2009-08-12 23:29 ` David Miller
2009-08-13  2:35   ` Bill Fink

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090812003049.185cd52a.billfink@mindspring.com \
    --to=billfink@mindspring.com \
    --cc=Brice.Goglin@inria.fr \
    --cc=andi@firstfloor.org \
    --cc=gallatin@myri.com \
    --cc=netdev@vger.kernel.org \
    --cc=nhorman@tuxdriver.com \
    --cc=yhlu.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).