All of lore.kernel.org
 help / color / mirror / Atom feed
From: Fan Ni <nifan.cxl@gmail.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>,
	nifan.cxl@gmail.com, alison.schofield@intel.com,
	vishal.l.verma@intel.com, ira.weiny@intel.com,
	Jonathan.Cameron@huawei.com, linux-cxl@vger.kernel.org,
	a.manzanares@samsung.com, dave@stgolabs.net,
	linux-kernel@vger.kernel.org, anisa.su887@gmail.com
Subject: Re: [RFC] cxl/region: set numa node for target memdevs when a region is committed
Date: Tue, 18 Mar 2025 16:11:38 -0700	[thread overview]
Message-ID: <Z9n9qpySEkwbXN_F@debian> (raw)
In-Reply-To: <67d9e4d43360e_201f0294d6@dwillia2-xfh.jf.intel.com.notmuch>

On Tue, Mar 18, 2025 at 02:25:40PM -0700, Dan Williams wrote:
> Dave Jiang wrote:
> > 
> > 
> > On 3/14/25 9:40 AM, nifan.cxl@gmail.com wrote:
> > > From: Fan Ni <fan.ni@samsung.com>
> > > 
> > > There is a sysfs attribute named "numa_node" for cxl memory device.
> > > however, it is never set so -1 is returned whenever it is read.
> > > 
> > > With this change, the numa_node of each target memdev is set based on the
> > > start address of the hpa_range of the endpoint decoder it associated when a
> > > cxl region is created; and it is reset when the region decoders are
> > > reset.
> > > 
> > > Open qeustion: do we need to set the numa_node when the memdev is
> > > probed instead of waiting until a region is created?
> > 
> > Typically, the numa node for a PCI device should be dev_to_node(),
> > where the device resides. So when the device is probed, it should be
> > set with that. See documentation [1]. Region should have its own NUMA
> > node based on phys_to_target_node() of the starting address.  
> 
> Right, the memdev node is the affinity of device-MMIO to a CPU. The
> HDM-memory that the device decodes may land in multiple proximity
> domains and is subject to CDAT, CXL QoS, HMAT Generic Port, etc...
> 
> If your memdev node is "NUMA_NO_NODE" then that likely means the
> affinity information for the PCI device is missing.
> 
> I would double check that first. See set_dev_node() in device_add().

Thanks Dave and Dan for the explanation. 
Then the issue must be from qemu setup.

I added some debug code as below
---------------------------------------------
fan:~/cxl/linux-fixes$ git diff
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 5a1f05198114..c86a9eb58e99 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -3594,6 +3594,10 @@ int device_add(struct device *dev)
        if (kobj)
                dev->kobj.parent = kobj;
 
+        dev_dbg(dev, "device: '%s': %s XX node %d\n", dev_name(dev), __func__, dev_to_node(dev));
+        if (parent) {
+                dev_dbg(parent, "parent device: '%s': %s XX node %d\n", dev_name(parent), __func__, dev_to_node(parent));
+        }
        /* use parent numa_node */
        if (parent && (dev_to_node(dev) == NUMA_NO_NODE))
                set_dev_node(dev, dev_to_node(parent));
---------------------------------------------

The output after loading cxl related drivers looks like below. All
numa_node is -1 in the cxl topology. 

Hi Jonathan,
   do I miss something in the qemu setup ??

qemu-system-x86_64 -s  -kernel bzImage -append "root=/dev/sda rw console=ttyS0,115200 ignore_loglevel nokaslr \
cxl_acpi.dyndbg=+fplm cxl_pci.dyndbg=+fplm cxl_core.dyndbg=+fplm cxl_mem.dyndbg=+fplm cxl_pmem.dyndbg=+fplm \
cxl_port.dyndbg=+fplm cxl_region.dyndbg=+fplm cxl_test.dyndbg=+fplm cxl_mock.dyndbg=+fplm \
cxl_mock_mem.dyndbg=+fplm dax.dyndbg=+fplm dax_cxl.dyndbg=+fplm device_dax.dyndbg=+fplm" \
-smp 8 -accel kvm -serial mon:stdio  -nographic  -qmp tcp:localhost:4445,server,wait=off \
-netdev user,id=network0,hostfwd=tcp::2024-:22 -device e1000,netdev=network0  -monitor telnet:127.0.0.1:12346,server,nowait \
-drive file=/home/fan/cxl/images/qemu-image.img,index=0,media=disk,format=raw -machine q35,cxl=on -cpu qemu64,mce=on \
-m 8G,maxmem=64G,slots=8  -virtfs local,path=/opt/lib/modules,mount_tag=modshare,security_model=mapped  \
-virtfs local,path=/home/fan,mount_tag=homeshare,security_model=mapped -object \
memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/host//cxltest.raw,size=512M  \ 
-object memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/host//lsa.raw,size=1M \
-device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1,hdm_for_passthrough=true  \
-device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2   \
-device cxl-type3,bus=root_port13,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0,sn=0xabcd    \
-M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=8k

---------------------------------------------
fan:~/cxl/linux-fixes$ cat .config | grep CONFIG_NUMA
# CONFIG_NUMA_BALANCING is not set
CONFIG_NUMA=y
CONFIG_NUMA_KEEP_MEMINFO=y
CONFIG_NUMA_MEMBLKS=y
# CONFIG_NUMA_EMU is not set
fan:~/cxl/linux-fixes$ 

---------------------------------------------
root@debian:~# echo 'file core.c +p' >> /sys/kernel/debug/dynamic_debug/control
root@debian:~# dmesg | grep XX
root@debian:~# dmesg | grep XX
[   44.939510] wakeup wakeup14: device: 'wakeup14': device_add XX node -1
[   44.940195] acpi ACPI0017:00: parent device: 'ACPI0017:00': device_add XX node -1
[   44.941402] cxl root0: device: 'root0': device_add XX node -1
[   44.942023] cxl_acpi ACPI0017:00: parent device: 'ACPI0017:00': device_add XX node -1
[   44.947546] cxl decoder0.0: device: 'decoder0.0': device_add XX node -1
[   44.948219] cxl root0: parent device: 'root0': device_add XX node -1
[   44.958637] cxl port1: device: 'port1': device_add XX node -1
[   44.959245] cxl root0: parent device: 'root0': device_add XX node -1
[   44.990326] cxl decoder1.0: device: 'decoder1.0': device_add XX node -1
[   44.991014] cxl_port port1: parent device: 'port1': device_add XX node -1
[   44.993947] cxl decoder1.1: device: 'decoder1.1': device_add XX node -1
[   44.994593] cxl_port port1: parent device: 'port1': device_add XX node -1
[   44.997521] cxl decoder1.2: device: 'decoder1.2': device_add XX node -1
[   44.998203] cxl_port port1: parent device: 'port1': device_add XX node -1
[   45.001142] cxl decoder1.3: device: 'decoder1.3': device_add XX node -1
[   45.001821] cxl_port port1: parent device: 'port1': device_add XX node -1
[   45.005465] cxl nvdimm-bridge0: device: 'nvdimm-bridge0': device_add XX node -1
[   45.006206] cxl root0: parent device: 'root0': device_add XX node -1
[   45.072975] cxl mem0: device: 'mem0': device_add XX node -1
[   45.073519] cxl_pci 0000:0d:00.0: parent device: '0000:0d:00.0': device_add XX node -1
[   45.074937] firmware mem0: device: 'mem0': device_add XX node -1
[   45.075525] cxl mem0: parent device: 'mem0': device_add XX node -1
[   45.095409] nd ndbus0: device: 'ndbus0': device_add XX node -1
[   45.096135] cxl_nvdimm_bridge nvdimm-bridge0: parent device: 'nvdimm-bridge0': device_add XX node -1
[   45.097476] nd ndctl0: device: 'ndctl0': device_add XX node -1
[   45.099208] nd_bus ndbus0: parent device: 'ndbus0': device_add XX node -1
[   45.101286] cxl pmem0: device: 'pmem0': device_add XX node -1
[   45.102633] cxl_mem mem0: parent device: 'mem0': device_add XX node -1
[   45.108757] nd nmem0: device: 'nmem0': device_add XX node -1
[   45.109317] nd_bus ndbus0: parent device: 'ndbus0': device_add XX node -1
[   45.119846] cxl endpoint2: device: 'endpoint2': device_add XX node -1
[   45.120474] cxl_port port1: parent device: 'port1': device_add XX node -1
[   45.149351] cxl decoder2.0: device: 'decoder2.0': device_add XX node -1
[   45.150029] cxl_port endpoint2: parent device: 'endpoint2': device_add XX node -1
[   45.153057] cxl decoder2.1: device: 'decoder2.1': device_add XX node -1
[   45.153700] cxl_port endpoint2: parent device: 'endpoint2': device_add XX node -1
[   45.156723] cxl decoder2.2: device: 'decoder2.2': device_add XX node -1
[   45.157384] cxl_port endpoint2: parent device: 'endpoint2': device_add XX node -1
[   45.160407] cxl decoder2.3: device: 'decoder2.3': device_add XX node -1
[   45.161073] cxl_port endpoint2: parent device: 'endpoint2': device_add XX node -1
root@debian:~# 



  reply	other threads:[~2025-03-18 23:11 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-14 16:40 [RFC] cxl/region: set numa node for target memdevs when a region is committed nifan.cxl
2025-03-18 21:00 ` Dave Jiang
2025-03-18 21:25   ` Dan Williams
2025-03-18 23:11     ` Fan Ni [this message]
2025-03-19  0:16       ` Dan Williams
2025-03-21 12:22         ` Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z9n9qpySEkwbXN_F@debian \
    --to=nifan.cxl@gmail.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=a.manzanares@samsung.com \
    --cc=alison.schofield@intel.com \
    --cc=anisa.su887@gmail.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.