All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg KH <gregkh@linuxfoundation.org>
To: Yunsheng Lin <linyunsheng@huawei.com>
Cc: rafael@kernel.org, linux-kernel@vger.kernel.org,
	peterz@infradead.org, mingo@kernel.org, mhocko@kernel.org,
	linuxarm@huawei.com
Subject: Re: [PATCH RFC] driver core: ensure a device has valid node id in device_add()
Date: Fri, 6 Sep 2019 16:00:25 +0200	[thread overview]
Message-ID: <20190906140025.GB8570@kroah.com> (raw)
In-Reply-To: <26d7a539-96fc-8a92-d60d-7e76e418ab63@huawei.com>

On Fri, Sep 06, 2019 at 04:21:47PM +0800, Yunsheng Lin wrote:
> On 2019/9/6 14:52, Greg KH wrote:
> > On Fri, Sep 06, 2019 at 02:41:36PM +0800, Yunsheng Lin wrote:
> >> On 2019/9/5 15:33, Greg KH wrote:
> >>> On Thu, Sep 05, 2019 at 02:48:24PM +0800, Yunsheng Lin wrote:
> >>>> On 2019/9/5 13:57, Greg KH wrote:
> >>>>> On Thu, Sep 05, 2019 at 09:33:50AM +0800, Yunsheng Lin wrote:
> >>>>>> Currently a device does not belong to any of the numa nodes
> >>>>>> (dev->numa_node is NUMA_NO_NODE) when the FW does not provide
> >>>>>> the node id and the device has not no parent device.
> >>>>>>
> >>>>>> According to discussion in [1]:
> >>>>>> Even if a device's numa node is not set by fw, the device
> >>>>>> really does belong to a node.
> >>>>>>
> >>>>>> This patch sets the device node to node 0 in device_add() if
> >>>>>> the fw has not specified the node id and it either has no
> >>>>>> parent device, or the parent device also does not have a valid
> >>>>>> node id.
> >>>>>>
> >>>>>> There may be explicit handling out there relying on NUMA_NO_NODE,
> >>>>>> like in nvme_probe().
> >>>>>>
> >>>>>> [1] https://lkml.org/lkml/2019/9/2/466
> >>>>>>
> >>>>>> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> >>>>>> ---
> >>>>>>  drivers/base/core.c  | 17 ++++++++++++++---
> >>>>>>  include/linux/numa.h |  2 ++
> >>>>>>  2 files changed, 16 insertions(+), 3 deletions(-)
> >>>>>>
> >>>>>> diff --git a/drivers/base/core.c b/drivers/base/core.c
> >>>>>> index 1669d41..466b8ff 100644
> >>>>>> --- a/drivers/base/core.c
> >>>>>> +++ b/drivers/base/core.c
> >>>>>> @@ -2107,9 +2107,20 @@ int device_add(struct device *dev)
> >>>>>>  	if (kobj)
> >>>>>>  		dev->kobj.parent = kobj;
> >>>>>>  
> >>>>>> -	/* use parent numa_node */
> >>>>>> -	if (parent && (dev_to_node(dev) == NUMA_NO_NODE))
> >>>>>> -		set_dev_node(dev, dev_to_node(parent));
> >>>>>> +	/* use parent numa_node or default node 0 */
> >>>>>> +	if (!numa_node_valid(dev_to_node(dev))) {
> >>>>>> +		int nid = parent ? dev_to_node(parent) : NUMA_NO_NODE;
> >>>>>
> >>>>> Can you expand this to be a "real" if statement please?
> >>>>
> >>>> Sure. May I ask why "? :" is not appropriate here?
> >>>
> >>> Because it is a pain to read, just spell it out and make it obvious what
> >>> is happening.  You write code for developers first, and the compiler
> >>> second, and in this case, either way is identical to the compiler.
> >>>
> >>>>>> +
> >>>>>> +		if (numa_node_valid(nid)) {
> >>>>>> +			set_dev_node(dev, nid);
> >>>>>> +		} else {
> >>>>>> +			if (nr_node_ids > 1U)
> >>>>>> +				pr_err("device: '%s': has invalid NUMA node(%d)\n",
> >>>>>> +				       dev_name(dev), dev_to_node(dev));
> >>>>>
> >>>>> dev_err() will show you the exact device properly, instead of having to
> >>>>> rely on dev_name().
> >>>>>
> >>>>> And what is a user to do if this message happens?  How do they fix this?
> >>>>> If they can not, what good is this error message?
> >>>>
> >>>> If user know about their system's topology well enough and node 0
> >>>> is not the nearest node to the device, maybe user can readjust that by
> >>>> writing the nearest node to /sys/class/pci_bus/XXXX/device/numa_node,
> >>>> if not, then maybe user need to contact the vendor for info or updates.
> >>>>
> >>>> Maybe print error message as below:
> >>>>
> >>>> dev_err(dev, FW_BUG "has invalid NUMA node(%d). Readjust it by writing to sysfs numa_node or contact your vendor for updates.\n",
> >>>> 	dev_to_node(dev));
> >>>
> >>> FW_BUG?
> >>>
> >>> Anyway, if you make this change, how many machines start reporting this
> >>> error? 
> >>
> >> Any machines with more than one numa node will start reporting this error.
> >>
> >> 1) many virtual deivces maybe do not set the node id before calling
> >>    device_register(), such as vfio, tun, etc.
> >>
> >> 2) struct cpu has a dev, but does not set the dev' node according to
> >>    cpu_to_node().
> >>
> >> 3) Many platform Device also do not have a node id provided by FW.
> > 
> > Then this patch is not ok, as you are flooding the kernel log saying the
> > system is "broken" when this is just what it always has been like.  How
> > is anyone going to "fix" things?
> 
> cpu->node_id does not seem to be used, maybe we can fix the cpu device:
> 
> diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
> index cc37511d..ad0a841 100644
> --- a/drivers/base/cpu.c
> +++ b/drivers/base/cpu.c
> @@ -41,7 +41,7 @@ static void change_cpu_under_node(struct cpu *cpu,
>         int cpuid = cpu->dev.id;
>         unregister_cpu_under_node(cpuid, from_nid);
>         register_cpu_under_node(cpuid, to_nid);
> -       cpu->node_id = to_nid;
> +       set_dev_node(&cpu->dev, to_nid);
>  }
> 
>  static int cpu_subsys_online(struct device *dev)
> @@ -367,7 +367,7 @@ int register_cpu(struct cpu *cpu, int num)
>  {
>         int error;
> 
> -       cpu->node_id = cpu_to_node(num);
> +       set_dev_node(&cpu->dev, cpu_to_node(num));
>         memset(&cpu->dev, 0x00, sizeof(struct device));
>         cpu->dev.id = num;
>         cpu->dev.bus = &cpu_subsys;
> diff --git a/include/linux/cpu.h b/include/linux/cpu.h
> index fcb1386..9a6fc51 100644
> --- a/include/linux/cpu.h
> +++ b/include/linux/cpu.h
> @@ -24,7 +24,6 @@ struct device_node;
>  struct attribute_group;
> 
>  struct cpu {
> -       int node_id;            /* The node which contains the CPU */
>         int hotpluggable;       /* creates sysfs control file if hotpluggable */
>         struct device dev;
>  };

I have no idea what you are trying to do here, it feels like you are
flailing around trying to set something that almost no bios/firmware
sets or cares about.

If setting the proper node is a requirement, make sure your firmware
does this and then all should be fine.  Otherwise just use the default
node like what happens today, right?

thanks,

greg k-h

  reply	other threads:[~2019-09-06 14:00 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-05  1:33 [PATCH RFC] driver core: ensure a device has valid node id in device_add() Yunsheng Lin
2019-09-05  5:57 ` Greg KH
2019-09-05  6:48   ` Yunsheng Lin
2019-09-05  7:33     ` Greg KH
2019-09-05  8:57       ` Yunsheng Lin
2019-09-05  9:02         ` Greg KH
2019-09-05  9:07           ` Yunsheng Lin
2019-09-06  6:41       ` Yunsheng Lin
2019-09-06  6:52         ` Greg KH
2019-09-06  8:21           ` Yunsheng Lin
2019-09-06 14:00             ` Greg KH [this message]
2019-09-07  2:10               ` Yunsheng Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190906140025.GB8570@kroah.com \
    --to=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=linyunsheng@huawei.com \
    --cc=mhocko@kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.