* [PATCH v2 0/6] cxl: Initialization and shutdown fixes
@ 2024-10-23 1:43 Dan Williams
2024-10-23 1:43 ` [PATCH v2 1/6] cxl/port: Fix CXL port initialization order when the subsystem is built-in Dan Williams
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Dan Williams @ 2024-10-23 1:43 UTC (permalink / raw)
To: ira.weiny
Cc: Dave Jiang, Davidlohr Bueso, Jonathan Cameron, stable,
Greg Kroah-Hartman, Jonathan Cameron, Alison Schofield,
Gregory Price, Zijun Hu, Vishal Verma, dave.jiang, linux-cxl
Changes since v1 [1]:
- Fix some misspellings missed by checkpatch in changelogs (Jonathan)
- Add comments explaining the order of objects in drivers/cxl/Makefile
(Jonathan)
- Rename attach_device => cxl_rescan_attach (Jonathan)
- Fixup Zijun's email (Zijun)
[1]: http://lore.kernel.org/172862483180.2150669.5564474284074502692.stgit@dwillia2-xfh.jf.intel.com
---
Original cover:
Gregory's modest proposal to fix CXL cxl_mem_probe() failures due to
delayed arrival of the CXL "root" infrastructure [1] prompted questions
of how the existing mechanism for retrying cxl_mem_probe() could be
failing.
The critical missing piece in the debug was that Gregory's setup had
almost all CXL modules built-in to the kernel.
On the way to that discovery several other bugs and init-order corner
cases were discovered.
The main fix is to make sure the drivers/cxl/Makefile object order
supports root CXL ports being fully initialized upon cxl_acpi_probe()
exit. The modular case has some similar potential holes that are fixed
with MODULE_SOFTDEP() and other fix ups. Finally, an attempt to update
cxl_test to reproduce the original report resulted in the discovery of a
separate long standing use after free bug in cxl_region_detach().
[2]: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net
---
Dan Williams (6):
cxl/port: Fix CXL port initialization order when the subsystem is built-in
cxl/port: Fix cxl_bus_rescan() vs bus_rescan_devices()
cxl/acpi: Ensure ports ready at cxl_acpi_probe() return
cxl/port: Fix use-after-free, permit out-of-order decoder shutdown
cxl/port: Prevent out-of-order decoder allocation
cxl/test: Improve init-order fidelity relative to real-world systems
drivers/base/core.c | 35 +++++++
drivers/cxl/Kconfig | 1
drivers/cxl/Makefile | 20 +++-
drivers/cxl/acpi.c | 7 +
drivers/cxl/core/hdm.c | 50 +++++++++--
drivers/cxl/core/port.c | 13 ++-
drivers/cxl/core/region.c | 91 ++++++++++---------
drivers/cxl/cxl.h | 3 -
include/linux/device.h | 3 +
tools/testing/cxl/test/cxl.c | 200 +++++++++++++++++++++++-------------------
tools/testing/cxl/test/mem.c | 1
11 files changed, 269 insertions(+), 155 deletions(-)
base-commit: 8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2 1/6] cxl/port: Fix CXL port initialization order when the subsystem is built-in
2024-10-23 1:43 [PATCH v2 0/6] cxl: Initialization and shutdown fixes Dan Williams
@ 2024-10-23 1:43 ` Dan Williams
2024-10-24 9:42 ` Jonathan Cameron
` (3 more replies)
2024-10-23 1:43 ` [PATCH v2 4/6] cxl/port: Fix use-after-free, permit out-of-order decoder shutdown Dan Williams
2024-10-23 13:12 ` [PATCH v2 0/6] cxl: Initialization and shutdown fixes Robert Richter
2 siblings, 4 replies; 17+ messages in thread
From: Dan Williams @ 2024-10-23 1:43 UTC (permalink / raw)
To: ira.weiny
Cc: Gregory Price, Gregory Price, stable, Davidlohr Bueso,
Jonathan Cameron, Dave Jiang, Alison Schofield, Vishal Verma,
dave.jiang, linux-cxl
When the CXL subsystem is built-in the module init order is determined
by Makefile order. That order violates expectations. The expectation is
that cxl_acpi and cxl_mem can race to attach and that if cxl_acpi wins
the race cxl_mem will find the enabled CXL root ports it needs and if
cxl_acpi loses the race it will retrigger cxl_mem to attach via
cxl_bus_rescan(). That only works if cxl_acpi can assume ports are
enabled immediately upon cxl_acpi_probe() return. That in turn can only
happen in the CONFIG_CXL_ACPI=y case if the cxl_port object appears
before the cxl_acpi object in the Makefile.
Fix up the order to prevent initialization failures, and make sure that
cxl_port is built-in if cxl_acpi is also built-in.
As for what contributed to this not being found earlier, the CXL
regression environment, cxl_test, builds all CXL functionality as a
module to allow to symbol mocking and other dynamic reload tests. As a
result there is no regression coverage for the built-in case.
Reported-by: Gregory Price <gourry@gourry.net>
Closes: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net
Tested-by: Gregory Price <gourry@gourry.net>
Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver")
Cc: <stable@vger.kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/cxl/Kconfig | 1 +
drivers/cxl/Makefile | 20 ++++++++++++++------
2 files changed, 15 insertions(+), 6 deletions(-)
diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 29c192f20082..876469e23f7a 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -60,6 +60,7 @@ config CXL_ACPI
default CXL_BUS
select ACPI_TABLE_LIB
select ACPI_HMAT
+ select CXL_PORT
help
Enable support for host managed device memory (HDM) resources
published by a platform's ACPI CXL memory layout description. See
diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile
index db321f48ba52..2caa90fa4bf2 100644
--- a/drivers/cxl/Makefile
+++ b/drivers/cxl/Makefile
@@ -1,13 +1,21 @@
# SPDX-License-Identifier: GPL-2.0
+
+# Order is important here for the built-in case:
+# - 'core' first for fundamental init
+# - 'port' before platform root drivers like 'acpi' so that CXL-root ports
+# are immediately enabled
+# - 'mem' and 'pmem' before endpoint drivers so that memdevs are
+# immediately enabled
+# - 'pci' last, also mirrors the hardware enumeration hierarchy
obj-y += core/
-obj-$(CONFIG_CXL_PCI) += cxl_pci.o
-obj-$(CONFIG_CXL_MEM) += cxl_mem.o
+obj-$(CONFIG_CXL_PORT) += cxl_port.o
obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o
obj-$(CONFIG_CXL_PMEM) += cxl_pmem.o
-obj-$(CONFIG_CXL_PORT) += cxl_port.o
+obj-$(CONFIG_CXL_MEM) += cxl_mem.o
+obj-$(CONFIG_CXL_PCI) += cxl_pci.o
-cxl_mem-y := mem.o
-cxl_pci-y := pci.o
+cxl_port-y := port.o
cxl_acpi-y := acpi.o
cxl_pmem-y := pmem.o security.o
-cxl_port-y := port.o
+cxl_mem-y := mem.o
+cxl_pci-y := pci.o
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 4/6] cxl/port: Fix use-after-free, permit out-of-order decoder shutdown
2024-10-23 1:43 [PATCH v2 0/6] cxl: Initialization and shutdown fixes Dan Williams
2024-10-23 1:43 ` [PATCH v2 1/6] cxl/port: Fix CXL port initialization order when the subsystem is built-in Dan Williams
@ 2024-10-23 1:43 ` Dan Williams
2024-10-24 15:55 ` Ira Weiny
2024-10-23 13:12 ` [PATCH v2 0/6] cxl: Initialization and shutdown fixes Robert Richter
2 siblings, 1 reply; 17+ messages in thread
From: Dan Williams @ 2024-10-23 1:43 UTC (permalink / raw)
To: ira.weiny
Cc: stable, Jonathan Cameron, Greg Kroah-Hartman, Davidlohr Bueso,
Dave Jiang, Alison Schofield, Zijun Hu, vishal.l.verma,
dave.jiang, linux-cxl
In support of investigating an initialization failure report [1],
cxl_test was updated to register mock memory-devices after the mock
root-port/bus device had been registered. That led to cxl_test crashing
with a use-after-free bug with the following signature:
cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem0:decoder7.0 @ 0 next: cxl_switch_uport.0 nr_eps: 1 nr_targets: 1
cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem4:decoder14.0 @ 1 next: cxl_switch_uport.0 nr_eps: 2 nr_targets: 1
cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[0] = cxl_switch_dport.0 for mem0:decoder7.0 @ 0
1) cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[1] = cxl_switch_dport.4 for mem4:decoder14.0 @ 1
[..]
cxld_unregister: cxl decoder14.0:
cxl_region_decode_reset: cxl_region region3:
mock_decoder_reset: cxl_port port3: decoder3.0 reset
2) mock_decoder_reset: cxl_port port3: decoder3.0: out of order reset, expected decoder3.1
cxl_endpoint_decoder_release: cxl decoder14.0:
[..]
cxld_unregister: cxl decoder7.0:
3) cxl_region_decode_reset: cxl_region region3:
Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bc3: 0000 [#1] PREEMPT SMP PTI
[..]
RIP: 0010:to_cxl_port+0x8/0x60 [cxl_core]
[..]
Call Trace:
<TASK>
cxl_region_decode_reset+0x69/0x190 [cxl_core]
cxl_region_detach+0xe8/0x210 [cxl_core]
cxl_decoder_kill_region+0x27/0x40 [cxl_core]
cxld_unregister+0x5d/0x60 [cxl_core]
At 1) a region has been established with 2 endpoint decoders (7.0 and
14.0). Those endpoints share a common switch-decoder in the topology
(3.0). At teardown, 2), decoder14.0 is the first to be removed and hits
the "out of order reset case" in the switch decoder. The effect though
is that region3 cleanup is aborted leaving it in-tact and
referencing decoder14.0. At 3) the second attempt to teardown region3
trips over the stale decoder14.0 object which has long since been
deleted.
The fix here is to recognize that the CXL specification places no
mandate on in-order shutdown of switch-decoders, the driver enforces
in-order allocation, and hardware enforces in-order commit. So, rather
than fail and leave objects dangling, always remove them.
In support of making cxl_region_decode_reset() always succeed,
cxl_region_invalidate_memregion() failures are turned into warnings.
Crashing the kernel is ok there since system integrity is at risk if
caches cannot be managed around physical address mutation events like
CXL region destruction.
A new device_for_each_child_reverse_from() is added to cleanup
port->commit_end after all dependent decoders have been disabled. In
other words if decoders are allocated 0->1->2 and disabled 1->2->0 then
port->commit_end only decrements from 2 after 2 has been disabled, and
it decrements all the way to zero since 1 was disabled previously.
Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1]
Cc: <stable@vger.kernel.org>
Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Zijun Hu <quic_zijuhu@quicinc.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/base/core.c | 35 +++++++++++++++++++++++++++++
drivers/cxl/core/hdm.c | 50 +++++++++++++++++++++++++++++++++++-------
drivers/cxl/core/region.c | 48 +++++++++++-----------------------------
drivers/cxl/cxl.h | 3 ++-
include/linux/device.h | 3 +++
tools/testing/cxl/test/cxl.c | 14 ++++--------
6 files changed, 100 insertions(+), 53 deletions(-)
diff --git a/drivers/base/core.c b/drivers/base/core.c
index a4c853411a6b..e42f1ad73078 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -4037,6 +4037,41 @@ int device_for_each_child_reverse(struct device *parent, void *data,
}
EXPORT_SYMBOL_GPL(device_for_each_child_reverse);
+/**
+ * device_for_each_child_reverse_from - device child iterator in reversed order.
+ * @parent: parent struct device.
+ * @from: optional starting point in child list
+ * @fn: function to be called for each device.
+ * @data: data for the callback.
+ *
+ * Iterate over @parent's child devices, starting at @from, and call @fn
+ * for each, passing it @data. This helper is identical to
+ * device_for_each_child_reverse() when @from is NULL.
+ *
+ * @fn is checked each iteration. If it returns anything other than 0,
+ * iteration stop and that value is returned to the caller of
+ * device_for_each_child_reverse_from();
+ */
+int device_for_each_child_reverse_from(struct device *parent,
+ struct device *from, const void *data,
+ int (*fn)(struct device *, const void *))
+{
+ struct klist_iter i;
+ struct device *child;
+ int error = 0;
+
+ if (!parent->p)
+ return 0;
+
+ klist_iter_init_node(&parent->p->klist_children, &i,
+ (from ? &from->p->knode_parent : NULL));
+ while ((child = prev_device(&i)) && !error)
+ error = fn(child, data);
+ klist_iter_exit(&i);
+ return error;
+}
+EXPORT_SYMBOL_GPL(device_for_each_child_reverse_from);
+
/**
* device_find_child - device iterator for locating a particular device.
* @parent: parent struct device
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 3df10517a327..223c273c0cd1 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -712,7 +712,44 @@ static int cxl_decoder_commit(struct cxl_decoder *cxld)
return 0;
}
-static int cxl_decoder_reset(struct cxl_decoder *cxld)
+static int commit_reap(struct device *dev, const void *data)
+{
+ struct cxl_port *port = to_cxl_port(dev->parent);
+ struct cxl_decoder *cxld;
+
+ if (!is_switch_decoder(dev) && !is_endpoint_decoder(dev))
+ return 0;
+
+ cxld = to_cxl_decoder(dev);
+ if (port->commit_end == cxld->id &&
+ ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)) {
+ port->commit_end--;
+ dev_dbg(&port->dev, "reap: %s commit_end: %d\n",
+ dev_name(&cxld->dev), port->commit_end);
+ }
+
+ return 0;
+}
+
+void cxl_port_commit_reap(struct cxl_decoder *cxld)
+{
+ struct cxl_port *port = to_cxl_port(cxld->dev.parent);
+
+ lockdep_assert_held_write(&cxl_region_rwsem);
+
+ /*
+ * Once the highest committed decoder is disabled, free any other
+ * decoders that were pinned allocated by out-of-order release.
+ */
+ port->commit_end--;
+ dev_dbg(&port->dev, "reap: %s commit_end: %d\n", dev_name(&cxld->dev),
+ port->commit_end);
+ device_for_each_child_reverse_from(&port->dev, &cxld->dev, NULL,
+ commit_reap);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_port_commit_reap, CXL);
+
+static void cxl_decoder_reset(struct cxl_decoder *cxld)
{
struct cxl_port *port = to_cxl_port(cxld->dev.parent);
struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev);
@@ -721,14 +758,14 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld)
u32 ctrl;
if ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)
- return 0;
+ return;
- if (port->commit_end != id) {
+ if (port->commit_end == id)
+ cxl_port_commit_reap(cxld);
+ else
dev_dbg(&port->dev,
"%s: out of order reset, expected decoder%d.%d\n",
dev_name(&cxld->dev), port->id, port->commit_end);
- return -EBUSY;
- }
down_read(&cxl_dpa_rwsem);
ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(id));
@@ -741,7 +778,6 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld)
writel(0, hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(id));
up_read(&cxl_dpa_rwsem);
- port->commit_end--;
cxld->flags &= ~CXL_DECODER_F_ENABLE;
/* Userspace is now responsible for reconfiguring this decoder */
@@ -751,8 +787,6 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld)
cxled = to_cxl_endpoint_decoder(&cxld->dev);
cxled->state = CXL_DECODER_STATE_MANUAL;
}
-
- return 0;
}
static int cxl_setup_hdm_decoder_from_dvsec(
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index e701e4b04032..3478d2058303 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -232,8 +232,8 @@ static int cxl_region_invalidate_memregion(struct cxl_region *cxlr)
"Bypassing cpu_cache_invalidate_memregion() for testing!\n");
return 0;
} else {
- dev_err(&cxlr->dev,
- "Failed to synchronize CPU cache state\n");
+ dev_WARN(&cxlr->dev,
+ "Failed to synchronize CPU cache state\n");
return -ENXIO;
}
}
@@ -242,19 +242,17 @@ static int cxl_region_invalidate_memregion(struct cxl_region *cxlr)
return 0;
}
-static int cxl_region_decode_reset(struct cxl_region *cxlr, int count)
+static void cxl_region_decode_reset(struct cxl_region *cxlr, int count)
{
struct cxl_region_params *p = &cxlr->params;
- int i, rc = 0;
+ int i;
/*
- * Before region teardown attempt to flush, and if the flush
- * fails cancel the region teardown for data consistency
- * concerns
+ * Before region teardown attempt to flush, evict any data cached for
+ * this region, or scream loudly about missing arch / platform support
+ * for CXL teardown.
*/
- rc = cxl_region_invalidate_memregion(cxlr);
- if (rc)
- return rc;
+ cxl_region_invalidate_memregion(cxlr);
for (i = count - 1; i >= 0; i--) {
struct cxl_endpoint_decoder *cxled = p->targets[i];
@@ -277,23 +275,17 @@ static int cxl_region_decode_reset(struct cxl_region *cxlr, int count)
cxl_rr = cxl_rr_load(iter, cxlr);
cxld = cxl_rr->decoder;
if (cxld->reset)
- rc = cxld->reset(cxld);
- if (rc)
- return rc;
+ cxld->reset(cxld);
set_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags);
}
endpoint_reset:
- rc = cxled->cxld.reset(&cxled->cxld);
- if (rc)
- return rc;
+ cxled->cxld.reset(&cxled->cxld);
set_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags);
}
/* all decoders associated with this region have been torn down */
clear_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags);
-
- return 0;
}
static int commit_decoder(struct cxl_decoder *cxld)
@@ -409,16 +401,8 @@ static ssize_t commit_store(struct device *dev, struct device_attribute *attr,
* still pending.
*/
if (p->state == CXL_CONFIG_RESET_PENDING) {
- rc = cxl_region_decode_reset(cxlr, p->interleave_ways);
- /*
- * Revert to committed since there may still be active
- * decoders associated with this region, or move forward
- * to active to mark the reset successful
- */
- if (rc)
- p->state = CXL_CONFIG_COMMIT;
- else
- p->state = CXL_CONFIG_ACTIVE;
+ cxl_region_decode_reset(cxlr, p->interleave_ways);
+ p->state = CXL_CONFIG_ACTIVE;
}
}
@@ -2054,13 +2038,7 @@ static int cxl_region_detach(struct cxl_endpoint_decoder *cxled)
get_device(&cxlr->dev);
if (p->state > CXL_CONFIG_ACTIVE) {
- /*
- * TODO: tear down all impacted regions if a device is
- * removed out of order
- */
- rc = cxl_region_decode_reset(cxlr, p->interleave_ways);
- if (rc)
- goto out;
+ cxl_region_decode_reset(cxlr, p->interleave_ways);
p->state = CXL_CONFIG_ACTIVE;
}
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 0d8b810a51f0..5406e3ab3d4a 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -359,7 +359,7 @@ struct cxl_decoder {
struct cxl_region *region;
unsigned long flags;
int (*commit)(struct cxl_decoder *cxld);
- int (*reset)(struct cxl_decoder *cxld);
+ void (*reset)(struct cxl_decoder *cxld);
};
/*
@@ -730,6 +730,7 @@ static inline bool is_cxl_root(struct cxl_port *port)
int cxl_num_decoders_committed(struct cxl_port *port);
bool is_cxl_port(const struct device *dev);
struct cxl_port *to_cxl_port(const struct device *dev);
+void cxl_port_commit_reap(struct cxl_decoder *cxld);
struct pci_bus;
int devm_cxl_register_pci_bus(struct device *host, struct device *uport_dev,
struct pci_bus *bus);
diff --git a/include/linux/device.h b/include/linux/device.h
index b4bde8d22697..667cb6db9019 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -1078,6 +1078,9 @@ int device_for_each_child(struct device *dev, void *data,
int (*fn)(struct device *dev, void *data));
int device_for_each_child_reverse(struct device *dev, void *data,
int (*fn)(struct device *dev, void *data));
+int device_for_each_child_reverse_from(struct device *parent,
+ struct device *from, const void *data,
+ int (*fn)(struct device *, const void *));
struct device *device_find_child(struct device *dev, void *data,
int (*match)(struct device *dev, void *data));
struct device *device_find_child_by_name(struct device *parent,
diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
index 90d5afd52dd0..c5bbd89b3192 100644
--- a/tools/testing/cxl/test/cxl.c
+++ b/tools/testing/cxl/test/cxl.c
@@ -693,26 +693,22 @@ static int mock_decoder_commit(struct cxl_decoder *cxld)
return 0;
}
-static int mock_decoder_reset(struct cxl_decoder *cxld)
+static void mock_decoder_reset(struct cxl_decoder *cxld)
{
struct cxl_port *port = to_cxl_port(cxld->dev.parent);
int id = cxld->id;
if ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)
- return 0;
+ return;
dev_dbg(&port->dev, "%s reset\n", dev_name(&cxld->dev));
- if (port->commit_end != id) {
+ if (port->commit_end == id)
+ cxl_port_commit_reap(cxld);
+ else
dev_dbg(&port->dev,
"%s: out of order reset, expected decoder%d.%d\n",
dev_name(&cxld->dev), port->id, port->commit_end);
- return -EBUSY;
- }
-
- port->commit_end--;
cxld->flags &= ~CXL_DECODER_F_ENABLE;
-
- return 0;
}
static void default_mock_decoder(struct cxl_decoder *cxld)
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v2 0/6] cxl: Initialization and shutdown fixes
2024-10-23 1:43 [PATCH v2 0/6] cxl: Initialization and shutdown fixes Dan Williams
2024-10-23 1:43 ` [PATCH v2 1/6] cxl/port: Fix CXL port initialization order when the subsystem is built-in Dan Williams
2024-10-23 1:43 ` [PATCH v2 4/6] cxl/port: Fix use-after-free, permit out-of-order decoder shutdown Dan Williams
@ 2024-10-23 13:12 ` Robert Richter
2024-10-23 16:00 ` Gregory Price
2024-10-23 20:34 ` Dan Williams
2 siblings, 2 replies; 17+ messages in thread
From: Robert Richter @ 2024-10-23 13:12 UTC (permalink / raw)
To: Dan Williams
Cc: ira.weiny, Dave Jiang, Davidlohr Bueso, Jonathan Cameron, stable,
Greg Kroah-Hartman, Alison Schofield, Gregory Price, Zijun Hu,
Vishal Verma, linux-cxl
On 22.10.24 18:43:15, Dan Williams wrote:
> Changes since v1 [1]:
> - Fix some misspellings missed by checkpatch in changelogs (Jonathan)
> - Add comments explaining the order of objects in drivers/cxl/Makefile
> (Jonathan)
> - Rename attach_device => cxl_rescan_attach (Jonathan)
> - Fixup Zijun's email (Zijun)
>
> [1]: http://lore.kernel.org/172862483180.2150669.5564474284074502692.stgit@dwillia2-xfh.jf.intel.com
>
> ---
>
> Original cover:
>
> Gregory's modest proposal to fix CXL cxl_mem_probe() failures due to
> delayed arrival of the CXL "root" infrastructure [1] prompted questions
> of how the existing mechanism for retrying cxl_mem_probe() could be
> failing.
I found a similar issue with the region creation.
A region is created with the first endpoint found and immediately
added as device which triggers cxl_region_probe(). Now, in
interleaving setups the region state comes into commit state only
after the last endpoint was probed. So the probe must be repeated
until all endpoints were enumerated. I ended up with this change:
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index a07b62254596..c78704e435e5 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -3775,8 +3775,8 @@ static int cxl_region_probe(struct device *dev)
}
if (p->state < CXL_CONFIG_COMMIT) {
- dev_dbg(&cxlr->dev, "config state: %d\n", p->state);
- rc = -ENXIO;
+ rc = dev_err_probe(&cxlr->dev, -EPROBE_DEFER,
+ "region config state: %d\n", p->state);
goto out;
}
--
2.39.5
I don't see an init order issue here as the mem module is always up
before the regions are probed.
-Robert
>
> The critical missing piece in the debug was that Gregory's setup had
> almost all CXL modules built-in to the kernel.
>
> On the way to that discovery several other bugs and init-order corner
> cases were discovered.
>
> The main fix is to make sure the drivers/cxl/Makefile object order
> supports root CXL ports being fully initialized upon cxl_acpi_probe()
> exit. The modular case has some similar potential holes that are fixed
> with MODULE_SOFTDEP() and other fix ups. Finally, an attempt to update
> cxl_test to reproduce the original report resulted in the discovery of a
> separate long standing use after free bug in cxl_region_detach().
>
> [2]: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net
>
> ---
>
> Dan Williams (6):
> cxl/port: Fix CXL port initialization order when the subsystem is built-in
> cxl/port: Fix cxl_bus_rescan() vs bus_rescan_devices()
> cxl/acpi: Ensure ports ready at cxl_acpi_probe() return
> cxl/port: Fix use-after-free, permit out-of-order decoder shutdown
> cxl/port: Prevent out-of-order decoder allocation
> cxl/test: Improve init-order fidelity relative to real-world systems
>
>
> drivers/base/core.c | 35 +++++++
> drivers/cxl/Kconfig | 1
> drivers/cxl/Makefile | 20 +++-
> drivers/cxl/acpi.c | 7 +
> drivers/cxl/core/hdm.c | 50 +++++++++--
> drivers/cxl/core/port.c | 13 ++-
> drivers/cxl/core/region.c | 91 ++++++++++---------
> drivers/cxl/cxl.h | 3 -
> include/linux/device.h | 3 +
> tools/testing/cxl/test/cxl.c | 200 +++++++++++++++++++++++-------------------
> tools/testing/cxl/test/mem.c | 1
> 11 files changed, 269 insertions(+), 155 deletions(-)
>
> base-commit: 8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v2 0/6] cxl: Initialization and shutdown fixes
2024-10-23 13:12 ` [PATCH v2 0/6] cxl: Initialization and shutdown fixes Robert Richter
@ 2024-10-23 16:00 ` Gregory Price
2024-10-23 20:34 ` Dan Williams
1 sibling, 0 replies; 17+ messages in thread
From: Gregory Price @ 2024-10-23 16:00 UTC (permalink / raw)
To: Robert Richter
Cc: Dan Williams, ira.weiny, Dave Jiang, Davidlohr Bueso,
Jonathan Cameron, stable, Greg Kroah-Hartman, Alison Schofield,
Zijun Hu, Vishal Verma, linux-cxl
On Wed, Oct 23, 2024 at 03:12:07PM +0200, Robert Richter wrote:
> On 22.10.24 18:43:15, Dan Williams wrote:
> > Changes since v1 [1]:
> > - Fix some misspellings missed by checkpatch in changelogs (Jonathan)
> > - Add comments explaining the order of objects in drivers/cxl/Makefile
> > (Jonathan)
> > - Rename attach_device => cxl_rescan_attach (Jonathan)
> > - Fixup Zijun's email (Zijun)
> >
> > [1]: http://lore.kernel.org/172862483180.2150669.5564474284074502692.stgit@dwillia2-xfh.jf.intel.com
> >
> > ---
> >
> > Original cover:
> >
> > Gregory's modest proposal to fix CXL cxl_mem_probe() failures due to
> > delayed arrival of the CXL "root" infrastructure [1] prompted questions
> > of how the existing mechanism for retrying cxl_mem_probe() could be
> > failing.
>
> I found a similar issue with the region creation.
>
> A region is created with the first endpoint found and immediately
> added as device which triggers cxl_region_probe(). Now, in
> interleaving setups the region state comes into commit state only
> after the last endpoint was probed. So the probe must be repeated
> until all endpoints were enumerated. I ended up with this change:
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index a07b62254596..c78704e435e5 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -3775,8 +3775,8 @@ static int cxl_region_probe(struct device *dev)
> }
>
> if (p->state < CXL_CONFIG_COMMIT) {
> - dev_dbg(&cxlr->dev, "config state: %d\n", p->state);
> - rc = -ENXIO;
> + rc = dev_err_probe(&cxlr->dev, -EPROBE_DEFER,
> + "region config state: %d\n", p->state);
> goto out;
> }
>
I have not experienced any out of order operations since applying Dan's
v1 of this patch set, do you still see this after applying the existing
set?
Probably this is indicative of needing another SOFTDEP / ordering issue.
~Gregory
> --
> 2.39.5
>
> I don't see an init order issue here as the mem module is always up
> before the regions are probed.
>
> -Robert
>
> >
> > The critical missing piece in the debug was that Gregory's setup had
> > almost all CXL modules built-in to the kernel.
> >
> > On the way to that discovery several other bugs and init-order corner
> > cases were discovered.
> >
> > The main fix is to make sure the drivers/cxl/Makefile object order
> > supports root CXL ports being fully initialized upon cxl_acpi_probe()
> > exit. The modular case has some similar potential holes that are fixed
> > with MODULE_SOFTDEP() and other fix ups. Finally, an attempt to update
> > cxl_test to reproduce the original report resulted in the discovery of a
> > separate long standing use after free bug in cxl_region_detach().
> >
> > [2]: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net
> >
> > ---
> >
> > Dan Williams (6):
> > cxl/port: Fix CXL port initialization order when the subsystem is built-in
> > cxl/port: Fix cxl_bus_rescan() vs bus_rescan_devices()
> > cxl/acpi: Ensure ports ready at cxl_acpi_probe() return
> > cxl/port: Fix use-after-free, permit out-of-order decoder shutdown
> > cxl/port: Prevent out-of-order decoder allocation
> > cxl/test: Improve init-order fidelity relative to real-world systems
> >
> >
> > drivers/base/core.c | 35 +++++++
> > drivers/cxl/Kconfig | 1
> > drivers/cxl/Makefile | 20 +++-
> > drivers/cxl/acpi.c | 7 +
> > drivers/cxl/core/hdm.c | 50 +++++++++--
> > drivers/cxl/core/port.c | 13 ++-
> > drivers/cxl/core/region.c | 91 ++++++++++---------
> > drivers/cxl/cxl.h | 3 -
> > include/linux/device.h | 3 +
> > tools/testing/cxl/test/cxl.c | 200 +++++++++++++++++++++++-------------------
> > tools/testing/cxl/test/mem.c | 1
> > 11 files changed, 269 insertions(+), 155 deletions(-)
> >
> > base-commit: 8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 0/6] cxl: Initialization and shutdown fixes
2024-10-23 13:12 ` [PATCH v2 0/6] cxl: Initialization and shutdown fixes Robert Richter
2024-10-23 16:00 ` Gregory Price
@ 2024-10-23 20:34 ` Dan Williams
2024-10-24 11:56 ` Robert Richter
1 sibling, 1 reply; 17+ messages in thread
From: Dan Williams @ 2024-10-23 20:34 UTC (permalink / raw)
To: Robert Richter, Dan Williams
Cc: ira.weiny, Dave Jiang, Davidlohr Bueso, Jonathan Cameron, stable,
Greg Kroah-Hartman, Alison Schofield, Gregory Price, Zijun Hu,
Vishal Verma, linux-cxl
Robert Richter wrote:
> On 22.10.24 18:43:15, Dan Williams wrote:
> > Changes since v1 [1]:
> > - Fix some misspellings missed by checkpatch in changelogs (Jonathan)
> > - Add comments explaining the order of objects in drivers/cxl/Makefile
> > (Jonathan)
> > - Rename attach_device => cxl_rescan_attach (Jonathan)
> > - Fixup Zijun's email (Zijun)
> >
> > [1]: http://lore.kernel.org/172862483180.2150669.5564474284074502692.stgit@dwillia2-xfh.jf.intel.com
> >
> > ---
> >
> > Original cover:
> >
> > Gregory's modest proposal to fix CXL cxl_mem_probe() failures due to
> > delayed arrival of the CXL "root" infrastructure [1] prompted questions
> > of how the existing mechanism for retrying cxl_mem_probe() could be
> > failing.
>
> I found a similar issue with the region creation.
>
> A region is created with the first endpoint found and immediately
> added as device which triggers cxl_region_probe(). Now, in
> interleaving setups the region state comes into commit state only
> after the last endpoint was probed. So the probe must be repeated
> until all endpoints were enumerated. I ended up with this change:
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index a07b62254596..c78704e435e5 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -3775,8 +3775,8 @@ static int cxl_region_probe(struct device *dev)
> }
>
> if (p->state < CXL_CONFIG_COMMIT) {
> - dev_dbg(&cxlr->dev, "config state: %d\n", p->state);
> - rc = -ENXIO;
> + rc = dev_err_probe(&cxlr->dev, -EPROBE_DEFER,
> + "region config state: %d\n", p->state);
I would argue EPROBE_DEFER is not appropriate because there is no
guarantee that the other members of the region show up, and if they do
they will re-trigger probe. So "probe must be repeated until all
endpoints were enumerated" is the case either way. I.e. either more
endpoint arrival triggers re-probe or EPROBE_DEFER triggers extra
redundant probing *and* still results in a probe attempts as endpoints
arrive.
So a dev_dbg() plus -ENXIO return on uncommited region state is
expected.
> goto out;
> }
>
> --
> 2.39.5
>
> I don't see an init order issue here as the mem module is always up
> before the regions are probed.
Right, cxl_endpoint_port_probe() triggers region discovery and
cxl_endpoint_port_probe() currently only triggers after cxl_mem has
registered an endpoint port.
The failure this set is address is unwanted cxl_mem_probe() failures.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 1/6] cxl/port: Fix CXL port initialization order when the subsystem is built-in
2024-10-23 1:43 ` [PATCH v2 1/6] cxl/port: Fix CXL port initialization order when the subsystem is built-in Dan Williams
@ 2024-10-24 9:42 ` Jonathan Cameron
2024-10-24 16:19 ` Dan Williams
2024-10-24 10:36 ` Alejandro Lucero Palau
` (2 subsequent siblings)
3 siblings, 1 reply; 17+ messages in thread
From: Jonathan Cameron @ 2024-10-24 9:42 UTC (permalink / raw)
To: Dan Williams
Cc: ira.weiny, Gregory Price, stable, Davidlohr Bueso, Dave Jiang,
Alison Schofield, Vishal Verma, linux-cxl
On Tue, 22 Oct 2024 18:43:24 -0700
Dan Williams <dan.j.williams@intel.com> wrote:
> When the CXL subsystem is built-in the module init order is determined
> by Makefile order. That order violates expectations. The expectation is
> that cxl_acpi and cxl_mem can race to attach and that if cxl_acpi wins
> the race cxl_mem will find the enabled CXL root ports it needs and if
> cxl_acpi loses the race it will retrigger cxl_mem to attach via
> cxl_bus_rescan(). That only works if cxl_acpi can assume ports are
> enabled immediately upon cxl_acpi_probe() return. That in turn can only
> happen in the CONFIG_CXL_ACPI=y case if the cxl_port object appears
> before the cxl_acpi object in the Makefile.
>
> Fix up the order to prevent initialization failures, and make sure that
> cxl_port is built-in if cxl_acpi is also built-in.
>
> As for what contributed to this not being found earlier, the CXL
> regression environment, cxl_test, builds all CXL functionality as a
> module to allow to symbol mocking and other dynamic reload tests. As a
> result there is no regression coverage for the built-in case.
>
> Reported-by: Gregory Price <gourry@gourry.net>
> Closes: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net
> Tested-by: Gregory Price <gourry@gourry.net>
> Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver")
I don't like this due to likely long term fragility, but any other
solution is probably more painful. Long term we should really get
a regression test for these ordering issues in place in one of
the CIs.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Cc: <stable@vger.kernel.org>
> Cc: Davidlohr Bueso <dave@stgolabs.net>
> Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: Alison Schofield <alison.schofield@intel.com>
> Cc: Vishal Verma <vishal.l.verma@intel.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
> drivers/cxl/Kconfig | 1 +
> drivers/cxl/Makefile | 20 ++++++++++++++------
> 2 files changed, 15 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index 29c192f20082..876469e23f7a 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -60,6 +60,7 @@ config CXL_ACPI
> default CXL_BUS
> select ACPI_TABLE_LIB
> select ACPI_HMAT
> + select CXL_PORT
> help
> Enable support for host managed device memory (HDM) resources
> published by a platform's ACPI CXL memory layout description. See
> diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile
> index db321f48ba52..2caa90fa4bf2 100644
> --- a/drivers/cxl/Makefile
> +++ b/drivers/cxl/Makefile
> @@ -1,13 +1,21 @@
> # SPDX-License-Identifier: GPL-2.0
> +
> +# Order is important here for the built-in case:
> +# - 'core' first for fundamental init
> +# - 'port' before platform root drivers like 'acpi' so that CXL-root ports
> +# are immediately enabled
> +# - 'mem' and 'pmem' before endpoint drivers so that memdevs are
> +# immediately enabled
> +# - 'pci' last, also mirrors the hardware enumeration hierarchy
> obj-y += core/
> -obj-$(CONFIG_CXL_PCI) += cxl_pci.o
> -obj-$(CONFIG_CXL_MEM) += cxl_mem.o
> +obj-$(CONFIG_CXL_PORT) += cxl_port.o
> obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o
> obj-$(CONFIG_CXL_PMEM) += cxl_pmem.o
> -obj-$(CONFIG_CXL_PORT) += cxl_port.o
> +obj-$(CONFIG_CXL_MEM) += cxl_mem.o
> +obj-$(CONFIG_CXL_PCI) += cxl_pci.o
>
> -cxl_mem-y := mem.o
> -cxl_pci-y := pci.o
> +cxl_port-y := port.o
> cxl_acpi-y := acpi.o
> cxl_pmem-y := pmem.o security.o
> -cxl_port-y := port.o
> +cxl_mem-y := mem.o
> +cxl_pci-y := pci.o
>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 1/6] cxl/port: Fix CXL port initialization order when the subsystem is built-in
2024-10-23 1:43 ` [PATCH v2 1/6] cxl/port: Fix CXL port initialization order when the subsystem is built-in Dan Williams
2024-10-24 9:42 ` Jonathan Cameron
@ 2024-10-24 10:36 ` Alejandro Lucero Palau
2024-10-24 16:32 ` Dan Williams
2024-10-24 14:14 ` Ira Weiny
2024-10-25 19:32 ` [PATCH v3 " Dan Williams
3 siblings, 1 reply; 17+ messages in thread
From: Alejandro Lucero Palau @ 2024-10-24 10:36 UTC (permalink / raw)
To: Dan Williams, ira.weiny
Cc: Gregory Price, stable, Davidlohr Bueso, Jonathan Cameron,
Dave Jiang, Alison Schofield, Vishal Verma, linux-cxl
On 10/23/24 02:43, Dan Williams wrote:
> When the CXL subsystem is built-in the module init order is determined
> by Makefile order. That order violates expectations. The expectation is
> that cxl_acpi and cxl_mem can race to attach and that if cxl_acpi wins
> the race cxl_mem will find the enabled CXL root ports it needs and if
> cxl_acpi loses the race it will retrigger cxl_mem to attach via
> cxl_bus_rescan(). That only works if cxl_acpi can assume ports are
> enabled immediately upon cxl_acpi_probe() return. That in turn can only
> happen in the CONFIG_CXL_ACPI=y case if the cxl_port object appears
> before the cxl_acpi object in the Makefile.
I'm having problems with understanding this. The acpi module is
initialised following the initcall levels, as defined by the code with
the subsys_initcall(cxl_acpi_init), and the cxl_mem module is not, so
AFAIK, there should not be any race there with the acpi module always
being initialised first. It I'm right, the problem should be another one
we do not know yet ...
> Fix up the order to prevent initialization failures, and make sure that
> cxl_port is built-in if cxl_acpi is also built-in.
... or forcing cxl_port to be built-in is enough. I wonder how, without
it, the cxl root ports can be there for cxl_mem ...
>
> As for what contributed to this not being found earlier, the CXL
> regression environment, cxl_test, builds all CXL functionality as a
> module to allow to symbol mocking and other dynamic reload tests. As a
> result there is no regression coverage for the built-in case.
>
> Reported-by: Gregory Price <gourry@gourry.net>
> Closes: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net
> Tested-by: Gregory Price <gourry@gourry.net>
> Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver")
> Cc: <stable@vger.kernel.org>
> Cc: Davidlohr Bueso <dave@stgolabs.net>
> Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: Alison Schofield <alison.schofield@intel.com>
> Cc: Vishal Verma <vishal.l.verma@intel.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
> drivers/cxl/Kconfig | 1 +
> drivers/cxl/Makefile | 20 ++++++++++++++------
> 2 files changed, 15 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index 29c192f20082..876469e23f7a 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -60,6 +60,7 @@ config CXL_ACPI
> default CXL_BUS
> select ACPI_TABLE_LIB
> select ACPI_HMAT
> + select CXL_PORT
> help
> Enable support for host managed device memory (HDM) resources
> published by a platform's ACPI CXL memory layout description. See
> diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile
> index db321f48ba52..2caa90fa4bf2 100644
> --- a/drivers/cxl/Makefile
> +++ b/drivers/cxl/Makefile
> @@ -1,13 +1,21 @@
> # SPDX-License-Identifier: GPL-2.0
> +
> +# Order is important here for the built-in case:
> +# - 'core' first for fundamental init
> +# - 'port' before platform root drivers like 'acpi' so that CXL-root ports
> +# are immediately enabled
> +# - 'mem' and 'pmem' before endpoint drivers so that memdevs are
> +# immediately enabled
> +# - 'pci' last, also mirrors the hardware enumeration hierarchy
> obj-y += core/
> -obj-$(CONFIG_CXL_PCI) += cxl_pci.o
> -obj-$(CONFIG_CXL_MEM) += cxl_mem.o
> +obj-$(CONFIG_CXL_PORT) += cxl_port.o
> obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o
> obj-$(CONFIG_CXL_PMEM) += cxl_pmem.o
> -obj-$(CONFIG_CXL_PORT) += cxl_port.o
> +obj-$(CONFIG_CXL_MEM) += cxl_mem.o
> +obj-$(CONFIG_CXL_PCI) += cxl_pci.o
>
> -cxl_mem-y := mem.o
> -cxl_pci-y := pci.o
> +cxl_port-y := port.o
> cxl_acpi-y := acpi.o
> cxl_pmem-y := pmem.o security.o
> -cxl_port-y := port.o
> +cxl_mem-y := mem.o
> +cxl_pci-y := pci.o
>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 0/6] cxl: Initialization and shutdown fixes
2024-10-23 20:34 ` Dan Williams
@ 2024-10-24 11:56 ` Robert Richter
0 siblings, 0 replies; 17+ messages in thread
From: Robert Richter @ 2024-10-24 11:56 UTC (permalink / raw)
To: Dan Williams
Cc: ira.weiny, Dave Jiang, Davidlohr Bueso, Jonathan Cameron, stable,
Greg Kroah-Hartman, Alison Schofield, Gregory Price, Zijun Hu,
Vishal Verma, linux-cxl
On 23.10.24 13:34:36, Dan Williams wrote:
> Robert Richter wrote:
> > if (p->state < CXL_CONFIG_COMMIT) {
> > - dev_dbg(&cxlr->dev, "config state: %d\n", p->state);
> > - rc = -ENXIO;
> > + rc = dev_err_probe(&cxlr->dev, -EPROBE_DEFER,
> > + "region config state: %d\n", p->state);
>
> I would argue EPROBE_DEFER is not appropriate because there is no
> guarantee that the other members of the region show up, and if they do
> they will re-trigger probe. So "probe must be repeated until all
> endpoints were enumerated" is the case either way. I.e. either more
> endpoint arrival triggers re-probe or EPROBE_DEFER triggers extra
> redundant probing *and* still results in a probe attempts as endpoints
> arrive.
>
> So a dev_dbg() plus -ENXIO return on uncommited region state is
> expected.
So, the region device keeps failing a probe until all endpoints are
collected. This triggered by cxl_add_to_region() after the region went
into CXL_CONFIG_COMMIT state. Looks reasonable.
The setup I was using showed various probe failures so I 'fixed' this
issue without noticing the region device was reprobed later
successfully. Thanks for explaining.
-Robert
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 1/6] cxl/port: Fix CXL port initialization order when the subsystem is built-in
2024-10-23 1:43 ` [PATCH v2 1/6] cxl/port: Fix CXL port initialization order when the subsystem is built-in Dan Williams
2024-10-24 9:42 ` Jonathan Cameron
2024-10-24 10:36 ` Alejandro Lucero Palau
@ 2024-10-24 14:14 ` Ira Weiny
2024-10-25 19:32 ` [PATCH v3 " Dan Williams
3 siblings, 0 replies; 17+ messages in thread
From: Ira Weiny @ 2024-10-24 14:14 UTC (permalink / raw)
To: Dan Williams, ira.weiny
Cc: Gregory Price, stable, Davidlohr Bueso, Jonathan Cameron,
Dave Jiang, Alison Schofield, Vishal Verma, linux-cxl
Dan Williams wrote:
> When the CXL subsystem is built-in the module init order is determined
> by Makefile order. That order violates expectations. The expectation is
> that cxl_acpi and cxl_mem can race to attach and that if cxl_acpi wins
> the race cxl_mem will find the enabled CXL root ports it needs and if
> cxl_acpi loses the race it will retrigger cxl_mem to attach via
> cxl_bus_rescan(). That only works if cxl_acpi can assume ports are
> enabled immediately upon cxl_acpi_probe() return. That in turn can only
> happen in the CONFIG_CXL_ACPI=y case if the cxl_port object appears
> before the cxl_acpi object in the Makefile.
>
> Fix up the order to prevent initialization failures, and make sure that
> cxl_port is built-in if cxl_acpi is also built-in.
>
> As for what contributed to this not being found earlier, the CXL
> regression environment, cxl_test, builds all CXL functionality as a
> module to allow to symbol mocking and other dynamic reload tests. As a
> result there is no regression coverage for the built-in case.
>
> Reported-by: Gregory Price <gourry@gourry.net>
> Closes: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net
> Tested-by: Gregory Price <gourry@gourry.net>
> Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver")
> Cc: <stable@vger.kernel.org>
> Cc: Davidlohr Bueso <dave@stgolabs.net>
> Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: Alison Schofield <alison.schofield@intel.com>
> Cc: Vishal Verma <vishal.l.verma@intel.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
[snip]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 4/6] cxl/port: Fix use-after-free, permit out-of-order decoder shutdown
2024-10-23 1:43 ` [PATCH v2 4/6] cxl/port: Fix use-after-free, permit out-of-order decoder shutdown Dan Williams
@ 2024-10-24 15:55 ` Ira Weiny
0 siblings, 0 replies; 17+ messages in thread
From: Ira Weiny @ 2024-10-24 15:55 UTC (permalink / raw)
To: Dan Williams, ira.weiny
Cc: stable, Jonathan Cameron, Greg Kroah-Hartman, Davidlohr Bueso,
Dave Jiang, Alison Schofield, Zijun Hu, vishal.l.verma, linux-cxl
Dan Williams wrote:
> In support of investigating an initialization failure report [1],
> cxl_test was updated to register mock memory-devices after the mock
> root-port/bus device had been registered. That led to cxl_test crashing
> with a use-after-free bug with the following signature:
>
> cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem0:decoder7.0 @ 0 next: cxl_switch_uport.0 nr_eps: 1 nr_targets: 1
> cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem4:decoder14.0 @ 1 next: cxl_switch_uport.0 nr_eps: 2 nr_targets: 1
> cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[0] = cxl_switch_dport.0 for mem0:decoder7.0 @ 0
> 1) cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[1] = cxl_switch_dport.4 for mem4:decoder14.0 @ 1
> [..]
> cxld_unregister: cxl decoder14.0:
> cxl_region_decode_reset: cxl_region region3:
> mock_decoder_reset: cxl_port port3: decoder3.0 reset
> 2) mock_decoder_reset: cxl_port port3: decoder3.0: out of order reset, expected decoder3.1
> cxl_endpoint_decoder_release: cxl decoder14.0:
> [..]
> cxld_unregister: cxl decoder7.0:
> 3) cxl_region_decode_reset: cxl_region region3:
> Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bc3: 0000 [#1] PREEMPT SMP PTI
> [..]
> RIP: 0010:to_cxl_port+0x8/0x60 [cxl_core]
> [..]
> Call Trace:
> <TASK>
> cxl_region_decode_reset+0x69/0x190 [cxl_core]
> cxl_region_detach+0xe8/0x210 [cxl_core]
> cxl_decoder_kill_region+0x27/0x40 [cxl_core]
> cxld_unregister+0x5d/0x60 [cxl_core]
>
> At 1) a region has been established with 2 endpoint decoders (7.0 and
> 14.0). Those endpoints share a common switch-decoder in the topology
> (3.0). At teardown, 2), decoder14.0 is the first to be removed and hits
> the "out of order reset case" in the switch decoder. The effect though
> is that region3 cleanup is aborted leaving it in-tact and
> referencing decoder14.0. At 3) the second attempt to teardown region3
> trips over the stale decoder14.0 object which has long since been
> deleted.
>
> The fix here is to recognize that the CXL specification places no
> mandate on in-order shutdown of switch-decoders, the driver enforces
> in-order allocation, and hardware enforces in-order commit. So, rather
> than fail and leave objects dangling, always remove them.
>
> In support of making cxl_region_decode_reset() always succeed,
> cxl_region_invalidate_memregion() failures are turned into warnings.
> Crashing the kernel is ok there since system integrity is at risk if
> caches cannot be managed around physical address mutation events like
> CXL region destruction.
>
> A new device_for_each_child_reverse_from() is added to cleanup
> port->commit_end after all dependent decoders have been disabled. In
> other words if decoders are allocated 0->1->2 and disabled 1->2->0 then
> port->commit_end only decrements from 2 after 2 has been disabled, and
> it decrements all the way to zero since 1 was disabled previously.
>
> Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1]
> Cc: <stable@vger.kernel.org>
> Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware")
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Davidlohr Bueso <dave@stgolabs.net>
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: Alison Schofield <alison.schofield@intel.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Cc: Zijun Hu <quic_zijuhu@quicinc.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
[snip]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 1/6] cxl/port: Fix CXL port initialization order when the subsystem is built-in
2024-10-24 9:42 ` Jonathan Cameron
@ 2024-10-24 16:19 ` Dan Williams
2024-10-24 16:39 ` Jonathan Cameron
0 siblings, 1 reply; 17+ messages in thread
From: Dan Williams @ 2024-10-24 16:19 UTC (permalink / raw)
To: Jonathan Cameron, Dan Williams
Cc: ira.weiny, Gregory Price, stable, Davidlohr Bueso, Dave Jiang,
Alison Schofield, Vishal Verma, linux-cxl
Jonathan Cameron wrote:
> On Tue, 22 Oct 2024 18:43:24 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
>
> > When the CXL subsystem is built-in the module init order is determined
> > by Makefile order. That order violates expectations. The expectation is
> > that cxl_acpi and cxl_mem can race to attach and that if cxl_acpi wins
> > the race cxl_mem will find the enabled CXL root ports it needs and if
> > cxl_acpi loses the race it will retrigger cxl_mem to attach via
> > cxl_bus_rescan(). That only works if cxl_acpi can assume ports are
> > enabled immediately upon cxl_acpi_probe() return. That in turn can only
> > happen in the CONFIG_CXL_ACPI=y case if the cxl_port object appears
> > before the cxl_acpi object in the Makefile.
> >
> > Fix up the order to prevent initialization failures, and make sure that
> > cxl_port is built-in if cxl_acpi is also built-in.
> >
> > As for what contributed to this not being found earlier, the CXL
> > regression environment, cxl_test, builds all CXL functionality as a
> > module to allow to symbol mocking and other dynamic reload tests. As a
> > result there is no regression coverage for the built-in case.
> >
> > Reported-by: Gregory Price <gourry@gourry.net>
> > Closes: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net
> > Tested-by: Gregory Price <gourry@gourry.net>
> > Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver")
>
> I don't like this due to likely long term fragility, but any other
Please be specific about the fragility and how is this different than
any other Makefile order fragility, like the many cases in
drivers/Makefile/, or patches fixing up initcall order?
Now, an argument can be made that there are too many CXL sub-objects and
more can be merged into a monolithic cxl_core object. The flipside of
that is reduced testability, at least via symbol mocking techniques.
Just look at the recent case where the fact that
drivers/cxl/core/region.c is built into cxl_core.o rather than its own
cxl_region.o object results in an in-line code change to support
cxl_test [1]. There are tradeoffs.
> solution is probably more painful. Long term we should really get
> a regression test for these ordering issues in place in one of
> the CIs.
The final patch in this series does improve cxl_test's ability to catch
regressions in module init order, and that ordering change did uncover a
bug. The system works! 😅
Going further the test mode that is needed, in addition to QEMU
emulation and cxl_test interface mocking, is kunit or similar [2]
infrastructure for some function-scope unit tests.
[1]: http://lore.kernel.org/20241022030054.258942-1-lizhijian@fujitsu.com
[2]: http://lore.kernel.org/170795677066.3697776.12587812713093908173.stgit@ubuntu
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 1/6] cxl/port: Fix CXL port initialization order when the subsystem is built-in
2024-10-24 10:36 ` Alejandro Lucero Palau
@ 2024-10-24 16:32 ` Dan Williams
2024-10-25 8:43 ` Alejandro Lucero Palau
0 siblings, 1 reply; 17+ messages in thread
From: Dan Williams @ 2024-10-24 16:32 UTC (permalink / raw)
To: Alejandro Lucero Palau, Dan Williams, ira.weiny
Cc: Gregory Price, stable, Davidlohr Bueso, Jonathan Cameron,
Dave Jiang, Alison Schofield, Vishal Verma, linux-cxl
Alejandro Lucero Palau wrote:
>
> On 10/23/24 02:43, Dan Williams wrote:
> > When the CXL subsystem is built-in the module init order is determined
> > by Makefile order. That order violates expectations. The expectation is
> > that cxl_acpi and cxl_mem can race to attach and that if cxl_acpi wins
> > the race cxl_mem will find the enabled CXL root ports it needs and if
> > cxl_acpi loses the race it will retrigger cxl_mem to attach via
> > cxl_bus_rescan(). That only works if cxl_acpi can assume ports are
> > enabled immediately upon cxl_acpi_probe() return. That in turn can only
> > happen in the CONFIG_CXL_ACPI=y case if the cxl_port object appears
> > before the cxl_acpi object in the Makefile.
>
>
> I'm having problems with understanding this. The acpi module is
> initialised following the initcall levels, as defined by the code with
> the subsys_initcall(cxl_acpi_init), and the cxl_mem module is not, so
> AFAIK, there should not be any race there with the acpi module always
> being initialised first. It I'm right, the problem should be another one
> we do not know yet ...
This is a valid point, and I do think that cxl_port should also move to
subsys_initcall() for completeness.
However, the reason this Makefile change works, even though cxl_acpi
finishes init before cxl_port when both are built-in, is due to device
discovery order.
With the old Makefile order it is possible for cxl_mem to race
cxl_acpi_probe() in a way that defeats the cxl_bus_rescan() that is
there to resolve device discovery races.
> > Fix up the order to prevent initialization failures, and make sure that
> > cxl_port is built-in if cxl_acpi is also built-in.
>
> ... or forcing cxl_port to be built-in is enough. I wonder how, without
> it, the cxl root ports can be there for cxl_mem ...
It does not need to be there for cxl_mem. It is ok for cxl_mem to load
and complete enumeration well before cxl_acpi ever arrives. As long as
cxl_bus_rescan() enables those devices after the fact then everything is
ok.
The problematic case being fixed is the opposite, i.e. that
cxl_bus_rescan() completes and never triggers again after cxl_mem has
failed to find the root ports.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 1/6] cxl/port: Fix CXL port initialization order when the subsystem is built-in
2024-10-24 16:19 ` Dan Williams
@ 2024-10-24 16:39 ` Jonathan Cameron
0 siblings, 0 replies; 17+ messages in thread
From: Jonathan Cameron @ 2024-10-24 16:39 UTC (permalink / raw)
To: Dan Williams
Cc: ira.weiny, Gregory Price, stable, Davidlohr Bueso, Dave Jiang,
Alison Schofield, Vishal Verma, linux-cxl
On Thu, 24 Oct 2024 09:19:17 -0700
Dan Williams <dan.j.williams@intel.com> wrote:
> Jonathan Cameron wrote:
> > On Tue, 22 Oct 2024 18:43:24 -0700
> > Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > > When the CXL subsystem is built-in the module init order is determined
> > > by Makefile order. That order violates expectations. The expectation is
> > > that cxl_acpi and cxl_mem can race to attach and that if cxl_acpi wins
> > > the race cxl_mem will find the enabled CXL root ports it needs and if
> > > cxl_acpi loses the race it will retrigger cxl_mem to attach via
> > > cxl_bus_rescan(). That only works if cxl_acpi can assume ports are
> > > enabled immediately upon cxl_acpi_probe() return. That in turn can only
> > > happen in the CONFIG_CXL_ACPI=y case if the cxl_port object appears
> > > before the cxl_acpi object in the Makefile.
> > >
> > > Fix up the order to prevent initialization failures, and make sure that
> > > cxl_port is built-in if cxl_acpi is also built-in.
> > >
> > > As for what contributed to this not being found earlier, the CXL
> > > regression environment, cxl_test, builds all CXL functionality as a
> > > module to allow to symbol mocking and other dynamic reload tests. As a
> > > result there is no regression coverage for the built-in case.
> > >
> > > Reported-by: Gregory Price <gourry@gourry.net>
> > > Closes: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net
> > > Tested-by: Gregory Price <gourry@gourry.net>
> > > Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver")
> >
> > I don't like this due to likely long term fragility, but any other
>
> Please be specific about the fragility and how is this different than
> any other Makefile order fragility, like the many cases in
> drivers/Makefile/, or patches fixing up initcall order?
Sure, I don't like any of them ;) Mostly was having a grumpy day
rather than suggesting a change in this.
>
> Now, an argument can be made that there are too many CXL sub-objects and
> more can be merged into a monolithic cxl_core object. The flipside of
> that is reduced testability, at least via symbol mocking techniques.
> Just look at the recent case where the fact that
> drivers/cxl/core/region.c is built into cxl_core.o rather than its own
> cxl_region.o object results in an in-line code change to support
> cxl_test [1]. There are tradeoffs.
Absolutely agree.
>
> > solution is probably more painful. Long term we should really get
> > a regression test for these ordering issues in place in one of
> > the CIs.
>
> The final patch in this series does improve cxl_test's ability to catch
> regressions in module init order, and that ordering change did uncover a
> bug. The system works! 😅
That was indeed good to see!
>
> Going further the test mode that is needed, in addition to QEMU
> emulation and cxl_test interface mocking, is kunit or similar [2]
> infrastructure for some function-scope unit tests.
>
> [1]: http://lore.kernel.org/20241022030054.258942-1-lizhijian@fujitsu.com
> [2]: http://lore.kernel.org/170795677066.3697776.12587812713093908173.stgit@ubuntu
>
Yeah, we have a long way to go on testing (common problem!)
Jonathan
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 1/6] cxl/port: Fix CXL port initialization order when the subsystem is built-in
2024-10-24 16:32 ` Dan Williams
@ 2024-10-25 8:43 ` Alejandro Lucero Palau
2024-10-25 15:19 ` Dan Williams
0 siblings, 1 reply; 17+ messages in thread
From: Alejandro Lucero Palau @ 2024-10-25 8:43 UTC (permalink / raw)
To: Dan Williams, ira.weiny
Cc: Gregory Price, stable, Davidlohr Bueso, Jonathan Cameron,
Dave Jiang, Alison Schofield, Vishal Verma, linux-cxl
On 10/24/24 17:32, Dan Williams wrote:
> Alejandro Lucero Palau wrote:
>> On 10/23/24 02:43, Dan Williams wrote:
>>> When the CXL subsystem is built-in the module init order is determined
>>> by Makefile order. That order violates expectations. The expectation is
>>> that cxl_acpi and cxl_mem can race to attach and that if cxl_acpi wins
>>> the race cxl_mem will find the enabled CXL root ports it needs and if
>>> cxl_acpi loses the race it will retrigger cxl_mem to attach via
>>> cxl_bus_rescan(). That only works if cxl_acpi can assume ports are
>>> enabled immediately upon cxl_acpi_probe() return. That in turn can only
>>> happen in the CONFIG_CXL_ACPI=y case if the cxl_port object appears
>>> before the cxl_acpi object in the Makefile.
>>
>> I'm having problems with understanding this. The acpi module is
>> initialised following the initcall levels, as defined by the code with
>> the subsys_initcall(cxl_acpi_init), and the cxl_mem module is not, so
>> AFAIK, there should not be any race there with the acpi module always
>> being initialised first. It I'm right, the problem should be another one
>> we do not know yet ...
> This is a valid point, and I do think that cxl_port should also move to
> subsys_initcall() for completeness.
>
> However, the reason this Makefile change works, even though cxl_acpi
> finishes init before cxl_port when both are built-in, is due to device
> discovery order.
>
> With the old Makefile order it is possible for cxl_mem to race
> cxl_acpi_probe() in a way that defeats the cxl_bus_rescan() that is
> there to resolve device discovery races.
OK. Then rephrasing the commit would help.
Apart from that:
Tested-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Alejandro Lucero <alucerop@amd.com>
>>> Fix up the order to prevent initialization failures, and make sure that
>>> cxl_port is built-in if cxl_acpi is also built-in.
>> ... or forcing cxl_port to be built-in is enough. I wonder how, without
>> it, the cxl root ports can be there for cxl_mem ...
> It does not need to be there for cxl_mem. It is ok for cxl_mem to load
> and complete enumeration well before cxl_acpi ever arrives. As long as
> cxl_bus_rescan() enables those devices after the fact then everything is
> ok.
>
> The problematic case being fixed is the opposite, i.e. that
> cxl_bus_rescan() completes and never triggers again after cxl_mem has
> failed to find the root ports.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 1/6] cxl/port: Fix CXL port initialization order when the subsystem is built-in
2024-10-25 8:43 ` Alejandro Lucero Palau
@ 2024-10-25 15:19 ` Dan Williams
0 siblings, 0 replies; 17+ messages in thread
From: Dan Williams @ 2024-10-25 15:19 UTC (permalink / raw)
To: Alejandro Lucero Palau, Dan Williams, ira.weiny
Cc: Gregory Price, stable, Davidlohr Bueso, Jonathan Cameron,
Dave Jiang, Alison Schofield, Vishal Verma, linux-cxl
Alejandro Lucero Palau wrote:
>
> On 10/24/24 17:32, Dan Williams wrote:
> > Alejandro Lucero Palau wrote:
> >> On 10/23/24 02:43, Dan Williams wrote:
> >>> When the CXL subsystem is built-in the module init order is determined
> >>> by Makefile order. That order violates expectations. The expectation is
> >>> that cxl_acpi and cxl_mem can race to attach and that if cxl_acpi wins
> >>> the race cxl_mem will find the enabled CXL root ports it needs and if
> >>> cxl_acpi loses the race it will retrigger cxl_mem to attach via
> >>> cxl_bus_rescan(). That only works if cxl_acpi can assume ports are
> >>> enabled immediately upon cxl_acpi_probe() return. That in turn can only
> >>> happen in the CONFIG_CXL_ACPI=y case if the cxl_port object appears
> >>> before the cxl_acpi object in the Makefile.
> >>
> >> I'm having problems with understanding this. The acpi module is
> >> initialised following the initcall levels, as defined by the code with
> >> the subsys_initcall(cxl_acpi_init), and the cxl_mem module is not, so
> >> AFAIK, there should not be any race there with the acpi module always
> >> being initialised first. It I'm right, the problem should be another one
> >> we do not know yet ...
> > This is a valid point, and I do think that cxl_port should also move to
> > subsys_initcall() for completeness.
> >
> > However, the reason this Makefile change works, even though cxl_acpi
> > finishes init before cxl_port when both are built-in, is due to device
> > discovery order.
> >
> > With the old Makefile order it is possible for cxl_mem to race
> > cxl_acpi_probe() in a way that defeats the cxl_bus_rescan() that is
> > there to resolve device discovery races.
>
>
> OK. Then rephrasing the commit would help.
That and moving cxl_port to subsys_initcall(). Will respin this one.
> Apart from that:
>
> Tested-by: Alejandro Lucero <alucerop@amd.com>
>
> Reviewed-by: Alejandro Lucero <alucerop@amd.com>
Thanks!
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v3 1/6] cxl/port: Fix CXL port initialization order when the subsystem is built-in
2024-10-23 1:43 ` [PATCH v2 1/6] cxl/port: Fix CXL port initialization order when the subsystem is built-in Dan Williams
` (2 preceding siblings ...)
2024-10-24 14:14 ` Ira Weiny
@ 2024-10-25 19:32 ` Dan Williams
3 siblings, 0 replies; 17+ messages in thread
From: Dan Williams @ 2024-10-25 19:32 UTC (permalink / raw)
To: ira.weiny
Cc: Gregory Price, Gregory Price, stable, Davidlohr Bueso,
Jonathan Cameron, Dave Jiang, Alison Schofield, Vishal Verma,
Jonathan Cameron, Alejandro Lucero, Alejandro Lucero, dave.jiang,
linux-cxl
When the CXL subsystem is built-in the module init order is determined
by Makefile order. That order violates expectations. The expectation is
that cxl_acpi and cxl_mem can race to attach. If cxl_acpi wins the race,
cxl_mem will find the enabled CXL root ports it needs. If cxl_acpi loses
the race it will retrigger cxl_mem to attach via cxl_bus_rescan(). That
flow only works if cxl_acpi can assume ports are enabled immediately
upon cxl_acpi_probe() return. That in turn can only happen in the
CONFIG_CXL_ACPI=y case if the cxl_port driver is registered before
cxl_acpi_probe() runs.
Fix up the order to prevent initialization failures. Ensure that
cxl_port is built-in when cxl_acpi is also built-in, arrange for
Makefile order to resolve the subsys_initcall() order of cxl_port and
cxl_acpi, and arrange for Makefile order to resolve the
device_initcall() (module_init()) order of the remaining objects.
As for what contributed to this not being found earlier, the CXL
regression environment, cxl_test, builds all CXL functionality as a
module to allow to symbol mocking and other dynamic reload tests. As a
result there is no regression coverage for the built-in case.
Reported-by: Gregory Price <gourry@gourry.net>
Closes: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net
Tested-by: Gregory Price <gourry@gourry.net>
Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver")
Cc: stable@vger.kernel.org
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Tested-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Alejandro Lucero <alucerop@amd.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
Changes since v2
- Partial re-roll patch1 to address comments and collect tags
- Move cxl_port to subsys_initcall, clarify changelog (Alejandro)
drivers/cxl/Kconfig | 1 +
drivers/cxl/Makefile | 20 ++++++++++++++------
drivers/cxl/port.c | 17 ++++++++++++++++-
3 files changed, 31 insertions(+), 7 deletions(-)
diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 29c192f20082..876469e23f7a 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -60,6 +60,7 @@ config CXL_ACPI
default CXL_BUS
select ACPI_TABLE_LIB
select ACPI_HMAT
+ select CXL_PORT
help
Enable support for host managed device memory (HDM) resources
published by a platform's ACPI CXL memory layout description. See
diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile
index db321f48ba52..2caa90fa4bf2 100644
--- a/drivers/cxl/Makefile
+++ b/drivers/cxl/Makefile
@@ -1,13 +1,21 @@
# SPDX-License-Identifier: GPL-2.0
+
+# Order is important here for the built-in case:
+# - 'core' first for fundamental init
+# - 'port' before platform root drivers like 'acpi' so that CXL-root ports
+# are immediately enabled
+# - 'mem' and 'pmem' before endpoint drivers so that memdevs are
+# immediately enabled
+# - 'pci' last, also mirrors the hardware enumeration hierarchy
obj-y += core/
-obj-$(CONFIG_CXL_PCI) += cxl_pci.o
-obj-$(CONFIG_CXL_MEM) += cxl_mem.o
+obj-$(CONFIG_CXL_PORT) += cxl_port.o
obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o
obj-$(CONFIG_CXL_PMEM) += cxl_pmem.o
-obj-$(CONFIG_CXL_PORT) += cxl_port.o
+obj-$(CONFIG_CXL_MEM) += cxl_mem.o
+obj-$(CONFIG_CXL_PCI) += cxl_pci.o
-cxl_mem-y := mem.o
-cxl_pci-y := pci.o
+cxl_port-y := port.o
cxl_acpi-y := acpi.o
cxl_pmem-y := pmem.o security.o
-cxl_port-y := port.o
+cxl_mem-y := mem.o
+cxl_pci-y := pci.o
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
index 861dde65768f..9dc394295e1f 100644
--- a/drivers/cxl/port.c
+++ b/drivers/cxl/port.c
@@ -208,7 +208,22 @@ static struct cxl_driver cxl_port_driver = {
},
};
-module_cxl_driver(cxl_port_driver);
+static int __init cxl_port_init(void)
+{
+ return cxl_driver_register(&cxl_port_driver);
+}
+/*
+ * Be ready to immediately enable ports emitted by the platform CXL root
+ * (e.g. cxl_acpi) when CONFIG_CXL_PORT=y.
+ */
+subsys_initcall(cxl_port_init);
+
+static void __exit cxl_port_exit(void)
+{
+ cxl_driver_unregister(&cxl_port_driver);
+}
+module_exit(cxl_port_exit);
+
MODULE_DESCRIPTION("CXL: Port enumeration and services");
MODULE_LICENSE("GPL v2");
MODULE_IMPORT_NS(CXL);
^ permalink raw reply related [flat|nested] 17+ messages in thread
end of thread, other threads:[~2024-10-25 19:32 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-23 1:43 [PATCH v2 0/6] cxl: Initialization and shutdown fixes Dan Williams
2024-10-23 1:43 ` [PATCH v2 1/6] cxl/port: Fix CXL port initialization order when the subsystem is built-in Dan Williams
2024-10-24 9:42 ` Jonathan Cameron
2024-10-24 16:19 ` Dan Williams
2024-10-24 16:39 ` Jonathan Cameron
2024-10-24 10:36 ` Alejandro Lucero Palau
2024-10-24 16:32 ` Dan Williams
2024-10-25 8:43 ` Alejandro Lucero Palau
2024-10-25 15:19 ` Dan Williams
2024-10-24 14:14 ` Ira Weiny
2024-10-25 19:32 ` [PATCH v3 " Dan Williams
2024-10-23 1:43 ` [PATCH v2 4/6] cxl/port: Fix use-after-free, permit out-of-order decoder shutdown Dan Williams
2024-10-24 15:55 ` Ira Weiny
2024-10-23 13:12 ` [PATCH v2 0/6] cxl: Initialization and shutdown fixes Robert Richter
2024-10-23 16:00 ` Gregory Price
2024-10-23 20:34 ` Dan Williams
2024-10-24 11:56 ` Robert Richter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox