From: Brice Goglin <Brice.Goglin@inria.fr>
To: Paul Mundt <lethal@linux-sh.org>, Chris Snook <csnook@redhat.com>,
linux-kernel@vger.kernel.org
Subject: Re: Purpose of numa_node?
Date: Thu, 31 Jan 2008 14:42:18 +0100 [thread overview]
Message-ID: <47A1D03A.3020508@inria.fr> (raw)
In-Reply-To: <20080131074045.GA13788@linux-sh.org>
Paul Mundt wrote:
> On Wed, Jan 30, 2008 at 07:48:13PM -0500, Chris Snook wrote:
>
>> While pondering ways to optimize I/O and swapping on large NUMA machines, I
>> noticed that the numa_node field in struct device isn't actually used
>> anywhere. We just have a couple dozen lines of code to conditionally
>> create a sysfs file that will always return -1. Is anyone even working on
>> code to actually use this field? I think it's a good piece of information
>> to keep track of, so I'm not suggesting we remove it, but I want to make
>> sure I'm not stepping on toes or duplicating effort if I try to make it
>> useful.
>>
> It's manipulated with accessors. If you look at the users of
> dev_to_node()/set_dev_node() you can see where it's being used. It's
> primarily used in allocation paths for node locality, and the existing
> set_dev_node() callsites are places where node locality information
> already exists (ie, which node a given controller sits on). You can see
> this in places like PCI (pcibus_to_node()) and USB, with node allocation
> hints used in places like the dmapool and skb alloc paths.
>
> The in-kernel use looks perfectly sane in that regard, though I'm not
> sure what the point of exporting this as a RO attribute to userspace is.
> Presumably someone has a tool somewhere that cares about this.
>
I added the numa_node sysfs attribute in the beginning to make it easier
to bind processes near some devices. So yes I have some user-space tool
using it. It is much easier to use than the local_cpus field on large
machines, especially when you use the libnuma interface to bind things,
since you don't have to translate numa_node from/to cpumasks.
It works fine on regular machines such as dual opterons. However, I
noticed recently that it was wrong on some quad-opteron machines (see
http://marc.info/?l=linux-pci&m=119072400008538&w=2) because something
is not initialized in the right order. But I haven't tested 2.6.24 on
this hardware yet, and I don't know if things have changed regarding this.
Brice
next prev parent reply other threads:[~2008-01-31 13:42 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-31 0:48 Purpose of numa_node? Chris Snook
2008-01-31 7:40 ` Paul Mundt
2008-01-31 9:56 ` Andi Kleen
2008-01-31 13:42 ` Brice Goglin [this message]
2008-01-31 21:29 ` Yinghai Lu
2008-01-31 21:35 ` Brice Goglin
2008-01-31 21:42 ` Yinghai Lu
2008-01-31 23:35 ` Yinghai Lu
2008-02-13 18:52 ` Brice Goglin
2008-02-13 21:31 ` Yinghai Lu
2008-02-20 21:55 ` Yinghai Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47A1D03A.3020508@inria.fr \
--to=brice.goglin@inria.fr \
--cc=csnook@redhat.com \
--cc=lethal@linux-sh.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox