On Thu, 14 May 2026 15:31:11 +0800, Richard Cheng wrote: > > On some platforms (e.g., RISC-V and ARM64) that use the generic > > pci_acpi_scan_root() implementation, cxl_acpi_probe may run before > > acpi_pci_root driver has bound to ACPI0016 (CXL host bridge) devices. > > In this case, acpi_pci_find_root() returns NULL, causing > > to_cxl_host_bridge() to skip the device silently. This results in > > incomplete CXL port enumeration on first boot. > > > > Fix this by detecting the case where an ACPI0016 device exists but its > > PCI root bridge is not yet ready, and returning -EPROBE_DEFER to trigger > > a deferred probe retry. > > > > Signed-off-by: Chen Pei > > --- > > drivers/cxl/acpi.c | 26 ++++++++++++++++++++++++-- > > 1 file changed, 24 insertions(+), 2 deletions(-) > > > > Hi Chen Pei, > > Thanks for the patch. > I have a few questions and suggestions regarding to your changes. > > First of all I would like in which scenario did you encounter the bug? > Any specific CONFIG options and the devices ? what's the error log ? > > It would be nice if you can attach it for us. Hi Richard, Thanks for the review. I'm currently working on bringing up CXL support on the RISC-V QEMU virt platform with ACPI (EDK2 UEFI firmware). This is still in the early debugging/enabling stage. During testing, I found that cxl_acpi (ACPI0017) probes before acpi_pci_root has bound to the ACPI0016 (CXL host bridge) device. RISC-V uses the generic pci_acpi_scan_root() implementation, where the probe ordering of acpi_pci_root relative to cxl_acpi is not guaranteed. On x86, acpi_pci_root uses subsys_initcall and binds very early, so this race does not manifest there. This is a silent failure (no explicit error log), which makes it particularly hard to diagnose. When cxl_acpi probes before acpi_pci_root has bound the ACPI0016 device: 1. acpi_pci_find_root() returns NULL in to_cxl_host_bridge() 2. to_cxl_host_bridge() returns NULL 2. Both add_host_bridge_dport() and add_host_bridge_uport() return 0 (not an error), silently skipping the host bridge 3. cxl_acpi_probe() returns success, but the CXL port topology is incomplete — no dports or uports are registered The observable result after boot: # memdev is visible but decoder hierarchy is missing $ cxl list -M [ { "memdev":"mem0", ... } ] # No decoders or ports registered for the host bridge $ cxl list -BDP [] # Workaround: manually unbind/bind triggers re-probe after # acpi_pci_root is ready, and CXL topology enumerates correctly $ echo ACPI0017:00 > /sys/bus/platform/drivers/cxl_acpi/unbind $ echo ACPI0017:00 > /sys/bus/platform/drivers/cxl_acpi/bind # After re-probe, full topology is available and CXL memory # can be enabled/onlined successfully $ cxl enable-memdev mem0 $ cxl create-region -m -t ram -d decoder0.0 -w 1 mem0 -s 4G $ daxctl online-memory dax0.0 > > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c > > index 127537628817..9952d0cff903 100644 > > --- a/drivers/cxl/acpi.c > > +++ b/drivers/cxl/acpi.c > > @@ -631,8 +631,21 @@ static int add_host_bridge_dport(struct device *match, void *arg) > > struct acpi_pci_root *pci_root; > > struct cxl_port *root_port = arg; > > struct device *host = root_port->dev.parent; > > - struct acpi_device *hb = to_cxl_host_bridge(host, match); > > + struct acpi_device *adev = to_acpi_device(match); > > + struct acpi_device *hb; > > > > + /* > > + * If this is an ACPI0016 device but acpi_pci_find_root() hasn't > > + * found the PCI root yet (driver not probed), defer the probe > > + * to allow acpi_pci_root to bind first. > > + */ > > + if (strcmp(acpi_device_hid(adev), "ACPI0016") == 0 && > > + !acpi_pci_find_root(adev->handle)) { > > + dev_dbg(host, "deferring probe, ACPI0016 PCI root not ready\n"); > > + return -EPROBE_DEFER; > > + } > > What about strncpy() here since we already know we're comparing against "ACPI0016" ? > At the same time, why not just use "acpi_dev_hid_match()" ? it's widely used across > numerous files. Good point, thanks. I will switch to the acpi_dev_hid_match() in v2. > > + > > + hb = to_cxl_host_bridge(host, match); > > if (!hb) > > return 0; > > > > @@ -688,7 +701,8 @@ static int add_host_bridge_uport(struct device *match, void *arg) > > { > > struct cxl_port *root_port = arg; > > struct device *host = root_port->dev.parent; > > - struct acpi_device *hb = to_cxl_host_bridge(host, match); > > + struct acpi_device *adev = to_acpi_device(match); > > + struct acpi_device *hb; > > struct acpi_pci_root *pci_root; > > struct cxl_dport *dport; > > struct cxl_port *port; > > @@ -697,6 +711,14 @@ static int add_host_bridge_uport(struct device *match, void *arg) > > resource_size_t component_reg_phys; > > int rc; > > > > + /* Same deferral check as in add_host_bridge_dport() */ > > + if (strcmp(acpi_device_hid(adev), "ACPI0016") == 0 && > > + !acpi_pci_find_root(adev->handle)) { > > + dev_dbg(host, "deferring probe, ACPI0016 PCI root not ready\n"); > > + return -EPROBE_DEFER; > > + } > > + > > + hb = to_cxl_host_bridge(host, match); > > if (!hb) > > return 0; > > > > -- > > 2.50.1 > > > > > > These 2 checks are basically the same, can we put it in a static inline helper or > a macro if possible? something like the following might be better > > ``` > static int cxl_acpi_defer_host_bridge(struct device *host, > struct acpi_device *adev) > { > if (acpi_dev_hid_match(adev, "ACPI0016") && > !acpi_pci_find_root(adev->handle)) { > dev_dbg(host, "deferring probe, ACPI0016 PCI root not ready\n"); > return -EPROBE_DEFER; > } > return 0; > } > ``` > and use it in your code like > > ``` > int rc = cxl_acpi_defer_host_bridge(host, adev); > if (rc) > return rc; > ``` Agreed, will extract it into a helper in v2. > Last but not least, have you run the kselftests of CXL ? some mock bridges > are platform devices, not ACPI devices, you are using "to_acpi_device(match)", this > is not a safe runtime check when "match" is a platform_device, the code will read the memory > layout wrongly. Good catch, I haven't run the CXL kselftests. You're right that the mock bridges are platform devices and unconditionally calling to_acpi_device(match) would be unsafe there. I'll fix this in v2 by adding a dev_is_platform() guard in the helper, so it only applies to real ACPI devices. I'll also run the CXL kselftests to validate before sending v2. Thanks, Pei