linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Michal Hocko <mhocko@kernel.org>
Cc: peterz@infradead.org, Yunsheng Lin <linyunsheng@huawei.com>,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	robin.murphy@arm.com, geert@linux-m68k.org,
	gregkh@linuxfoundation.org, paul.burton@mips.com
Subject: Re: [PATCH] PCI: Warn about host bridge device when its numa node is NO_NODE
Date: Thu, 24 Oct 2019 12:40:13 -0500	[thread overview]
Message-ID: <20191024174013.GA149516@google.com> (raw)
In-Reply-To: <20191024092013.GW17610@dhcp22.suse.cz>

On Thu, Oct 24, 2019 at 11:20:13AM +0200, Michal Hocko wrote:
> On Wed 23-10-19 12:10:39, Bjorn Helgaas wrote:
> > On Wed, Oct 23, 2019 at 04:22:43PM +0800, Yunsheng Lin wrote:
> > > On 2019/10/23 5:04, Bjorn Helgaas wrote:
> > > > On Sat, Oct 19, 2019 at 02:45:43PM +0800, Yunsheng Lin wrote:
> > 
> > > > I think the underlying problem you're addressing is that:
> > > > 
> > > >   - NUMA_NO_NODE == -1,
> > > >   - dev_to_node(dev) may return NUMA_NO_NODE,
> > > >   - kmalloc(dev) relies on cpumask_of_node(dev_to_node(dev)), and
> > > >   - cpumask_of_node(NUMA_NO_NODE) makes an invalid array reference
> > > > 
> > > > For example, on arm64, mips loongson, s390, and x86,
> > > > cpumask_of_node(node) returns "node_to_cpumask_map[node]", and -1 is
> > > > an invalid array index.
> > > 
> > > The invalid array index of -1 is the underlying problem here when
> > > cpumask_of_node(dev_to_node(dev)) is called and cpumask_of_node()
> > > is not NUMA_NO_NODE aware yet.
> > > 
> > > In the "numa: make node_to_cpumask_map() NUMA_NO_NODE aware" thread
> > > disscusion, it is requested that it is better to warn about the pcie
> > > device without a node assigned by the firmware before making the
> > > cpumask_of_node() NUMA_NO_NODE aware, so that the system with pci
> > > devices of "NUMA_NO_NODE" node can be fixed by their vendor.
> > > 
> > > See: https://lore.kernel.org/lkml/20191011111539.GX2311@hirez.programming.kicks-ass.net/
> > 
> > Right.  We should warn if the NUMA node number would help us but DT or
> > the firmware didn't give us one.
> > 
> > But we can do that independently of any cpumask_of_node() changes.
> > There's no need to do one patch before the other.  Even if you make
> > cpumask_of_node() tolerate NUMA_NO_NODE, we'll still get the warning
> > because we're not actually changing any node assignments.
> 
> Agreed. And this has been proposed previously I believe but my
> understanding was that Petr was against that.

I don't think Peter was opposed to a warning.  I assume you're
referring to [1], which is about how cpumask_of_node() should work.
That's not my area, so I don't have a strong opinion.  From that
discussion:

Yunsheng> From what I can see, the problem can be fixed in three
Yunsheng> place:
Yunsheng> 1. Make user dev_to_node return a valid node id
Yunsheng>    even when proximity domain is not set by bios(or node id
Yunsheng>    set by buggy bios is not valid), which may need info from
Yunsheng>    the numa system to make sure it will return a valid node.
Yunsheng>
Yunsheng> 2. User that call cpumask_of_node should ensure the node
Yunsheng>    id is valid before calling cpumask_of_node, and user also
Yunsheng>    need some info to make ensure node id is valid.
Yunsheng>
Yunsheng> 3. Make sure cpumask_of_node deal with invalid node id as
Yunsheng>    this patchset.

Peter> 1) because even it is not set, the device really does belong
Peter> to a node.  It is impossible a device will have magic
Peter> uniform access to memory when CPUs cannot.
Peter> 
Peter> 2) is already true today, cpumask_of_node() requires a valid
Peter> node_id.
Peter> 
Peter> 3) is just wrong and increases overhead for everyone.

I think Peter is advocating (1), i.e., if firmware doesn't tell us a
node ID, we just pick one.  We can certainly do that in *addition* to
adding a warning.  I'd like it to be a separate patch because I think
we want the warning no matter what so users have some clue that
performance could be better.

If we pick one, I guess we either assign some default, like node 0, to
everything; or we somehow pick a random node to assign.

FWIW, (3) might be wrong, but it is what alpha, ia64, mips (ip27),
powerpc, sparc do already.  It'd be nice if they were all the same one
way or the other.

[1] https://lore.kernel.org/linux-arm-kernel/20190831161247.GM2369@hirez.programming.kicks-ass.net/

  reply	other threads:[~2019-10-24 17:40 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-19  6:45 [PATCH] PCI: Warn about host bridge device when its numa node is NO_NODE Yunsheng Lin
2019-10-19  8:34 ` Christoph Hellwig
2019-10-21  4:05   ` Yunsheng Lin
2019-10-22 13:55     ` Robin Murphy
2019-10-23  8:24       ` Yunsheng Lin
2019-10-22 21:04 ` Bjorn Helgaas
2019-10-23  8:22   ` Yunsheng Lin
2019-10-23 17:10     ` Bjorn Helgaas
2019-10-24  9:20       ` Michal Hocko
2019-10-24 17:40         ` Bjorn Helgaas [this message]
2019-10-25  8:16           ` Michal Hocko
2019-10-25  8:51             ` Yunsheng Lin
2019-10-24  9:39       ` Yunsheng Lin
2019-10-24 10:16       ` Robin Murphy
2019-10-25 12:51         ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191024174013.GA149516@google.com \
    --to=helgaas@kernel.org \
    --cc=geert@linux-m68k.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linyunsheng@huawei.com \
    --cc=mhocko@kernel.org \
    --cc=paul.burton@mips.com \
    --cc=peterz@infradead.org \
    --cc=robin.murphy@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).