Linux CXL
 help / color / mirror / Atom feed
From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Fan Ni <nifan.cxl@gmail.com>, Dave Jiang <dave.jiang@intel.com>,
	<alison.schofield@intel.com>, <vishal.l.verma@intel.com>,
	<ira.weiny@intel.com>, <linux-cxl@vger.kernel.org>,
	<a.manzanares@samsung.com>, <dave@stgolabs.net>,
	<linux-kernel@vger.kernel.org>, <anisa.su887@gmail.com>
Subject: Re: [RFC] cxl/region: set numa node for target memdevs when a region is committed
Date: Fri, 21 Mar 2025 12:22:56 +0000	[thread overview]
Message-ID: <20250321122256.00005b71@huawei.com> (raw)
In-Reply-To: <67da0ccb80781_201f029449@dwillia2-xfh.jf.intel.com.notmuch>

On Tue, 18 Mar 2025 17:16:11 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> Fan Ni wrote:
> > On Tue, Mar 18, 2025 at 02:25:40PM -0700, Dan Williams wrote:  
> > > Dave Jiang wrote:  
> > > > 
> > > > 
> > > > On 3/14/25 9:40 AM, nifan.cxl@gmail.com wrote:  
> > > > > From: Fan Ni <fan.ni@samsung.com>
> > > > > 
> > > > > There is a sysfs attribute named "numa_node" for cxl memory device.
> > > > > however, it is never set so -1 is returned whenever it is read.
> > > > > 
> > > > > With this change, the numa_node of each target memdev is set based on the
> > > > > start address of the hpa_range of the endpoint decoder it associated when a
> > > > > cxl region is created; and it is reset when the region decoders are
> > > > > reset.
> > > > > 
> > > > > Open qeustion: do we need to set the numa_node when the memdev is
> > > > > probed instead of waiting until a region is created?  
> > > > 
> > > > Typically, the numa node for a PCI device should be dev_to_node(),
> > > > where the device resides. So when the device is probed, it should be
> > > > set with that. See documentation [1]. Region should have its own NUMA
> > > > node based on phys_to_target_node() of the starting address.    
> > > 
> > > Right, the memdev node is the affinity of device-MMIO to a CPU. The
> > > HDM-memory that the device decodes may land in multiple proximity
> > > domains and is subject to CDAT, CXL QoS, HMAT Generic Port, etc...
> > > 
> > > If your memdev node is "NUMA_NO_NODE" then that likely means the
> > > affinity information for the PCI device is missing.
> > > 
> > > I would double check that first. See set_dev_node() in device_add().  
> > 
> > Thanks Dave and Dan for the explanation. 
> > Then the issue must be from qemu setup.
> > 
> > I added some debug code as below
> > ---------------------------------------------
> > fan:~/cxl/linux-fixes$ git diff
> > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > index 5a1f05198114..c86a9eb58e99 100644
> > --- a/drivers/base/core.c
> > +++ b/drivers/base/core.c
> > @@ -3594,6 +3594,10 @@ int device_add(struct device *dev)
> >         if (kobj)
> >                 dev->kobj.parent = kobj;
> >  
> > +        dev_dbg(dev, "device: '%s': %s XX node %d\n", dev_name(dev), __func__, dev_to_node(dev));
> > +        if (parent) {
> > +                dev_dbg(parent, "parent device: '%s': %s XX node %d\n", dev_name(parent), __func__, dev_to_node(parent));
> > +        }
> >         /* use parent numa_node */
> >         if (parent && (dev_to_node(dev) == NUMA_NO_NODE))
> >                 set_dev_node(dev, dev_to_node(parent));
> > ---------------------------------------------
> > 
> > The output after loading cxl related drivers looks like below. All
> > numa_node is -1 in the cxl topology. 
> > 
> > Hi Jonathan,
> >    do I miss something in the qemu setup ??  
> 
> IIUC the typical expectation for communicating the affinity of PCI
> devices is an ACPI _PXM property for the host bridge object in the
> [DS]SDT. As far as I can see QEMU does not build _PXM information for
> its host bridges.
> 
First a side note.  _PXM on device is in theory also an option, but
long ago the 'fix' for that was reverted due to some really broken old
AMD platforms that put devices in non existent nodes. Hmm. I should
revisit that as I 'think' all the allocation with broken numa nodes
is long fixed (included an ACPI spec clarification so took a while!)
https://lore.kernel.org/linux-pci/20181211094737.71554-1-Jonathan.Cameron@huawei.com/

As for _PXM on host bridges, the gpex ACPI code does assign them for
PCI Expander Bridges, if you pass in the node
https://elixir.bootlin.com/qemu/v9.2.2/source/hw/pci-host/gpex-acpi.c#L178
(So we are good on ARM :)
https://elixir.bootlin.com/qemu/v9.2.2/source/hw/i386/acpi-build.c#L1533
does the same on x86. 

Those go via some indirections to a callback here:
https://elixir.bootlin.com/qemu/v9.2.2/source/hw/pci-bridge/pci_expander_bridge.c#L80

So set numa_node=X for each of your PXB instances.

Jonathan




      reply	other threads:[~2025-03-21 12:23 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-14 16:40 [RFC] cxl/region: set numa node for target memdevs when a region is committed nifan.cxl
2025-03-18 21:00 ` Dave Jiang
2025-03-18 21:25   ` Dan Williams
2025-03-18 23:11     ` Fan Ni
2025-03-19  0:16       ` Dan Williams
2025-03-21 12:22         ` Jonathan Cameron [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250321122256.00005b71@huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=a.manzanares@samsung.com \
    --cc=alison.schofield@intel.com \
    --cc=anisa.su887@gmail.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nifan.cxl@gmail.com \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox