* [PATCH 1/2] cxl/region: Timeout auto region assembly waiting for endpoints @ 2026-01-30 4:23 Alison Schofield 2026-01-30 4:23 ` [PATCH 2/2] cxl/region: Unregister auto-created region when assembly fails Alison Schofield 2026-01-30 4:58 ` [PATCH 1/2] cxl/region: Timeout auto region assembly waiting for endpoints dan.j.williams 0 siblings, 2 replies; 15+ messages in thread From: Alison Schofield @ 2026-01-30 4:23 UTC (permalink / raw) To: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams Cc: linux-cxl Currently, if not all expected endpoints arrive for an auto-created region, the region remains registered but disabled indefinitely. The region continues to reserve its memory resource, preventing DAX from registering the memory, and provides no indication that endpoints failed to arrive (useful for non-DAX configurations). Start a 30 second timeout when the first endpoint attaches. If the remaining endpoints have not attached before the timeout expires, abort region assembly and unregister the region. Cancel the timeout when all expected endpoints attach or the region is unregistered for any reason. Signed-off-by: Alison Schofield <alison.schofield@intel.com> --- drivers/cxl/core/region.c | 51 ++++++++++++++++++++++++++++++++++++++- drivers/cxl/cxl.h | 2 ++ 2 files changed, 52 insertions(+), 1 deletion(-) diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index ae899f68551f..183cb0b49d8b 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -1968,6 +1968,35 @@ static int cxl_region_sort_targets(struct cxl_region *cxlr) return rc; } +static void unregister_region(void *dev); + +#define CXL_REGION_EP_WAIT_MS (30 * 1000) /* 30 seconds */ + +static void cxl_region_endpoint_wait_work(struct work_struct *work) +{ + struct delayed_work *dwork = to_delayed_work(work); + struct cxl_region *cxlr = + container_of(dwork, struct cxl_region, endpoint_wait_work); + struct cxl_region_params *p = &cxlr->params; + bool timeout_expired; + + scoped_guard(rwsem_read, &cxl_rwsem.region) + { + timeout_expired = (p->nr_targets < p->interleave_ways); + } + + if (timeout_expired) { + struct cxl_root_decoder *cxlrd = + to_cxl_root_decoder(cxlr->dev.parent); + struct cxl_port *port = cxlrd_to_port(cxlrd); + + dev_err(&cxlr->dev, + "timeout waiting for endpoints: %d of %d arrived, unregistering region\n", + p->nr_targets, p->interleave_ways); + devm_release_action(port->uport_dev, unregister_region, cxlr); + } +} + static int cxl_region_attach(struct cxl_region *cxlr, struct cxl_endpoint_decoder *cxled, int pos) { @@ -2059,8 +2088,23 @@ static int cxl_region_attach(struct cxl_region *cxlr, return rc; /* await more targets to arrive... */ - if (p->nr_targets < p->interleave_ways) + if (p->nr_targets < p->interleave_ways) { + if (cxlr->endpoint_wait_armed) + return 0; + + INIT_DELAYED_WORK(&cxlr->endpoint_wait_work, + cxl_region_endpoint_wait_work); + schedule_delayed_work( + &cxlr->endpoint_wait_work, + msecs_to_jiffies(CXL_REGION_EP_WAIT_MS)); + cxlr->endpoint_wait_armed = true; + dev_dbg(&cxlr->dev, + "waiting %d ms for %d more endpoints\n", + CXL_REGION_EP_WAIT_MS, + p->interleave_ways - p->nr_targets); + return 0; + } /* * All targets are here, which implies all PCI enumeration that @@ -2444,6 +2488,11 @@ static void unregister_region(void *_cxlr) struct cxl_region_params *p = &cxlr->params; int i; + if (cxlr->endpoint_wait_armed) { + cancel_delayed_work_sync(&cxlr->endpoint_wait_work); + cxlr->endpoint_wait_armed = false; + } + device_del(&cxlr->dev); /* diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index ba17fa86d249..f6441a18b3bb 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -551,6 +551,8 @@ struct cxl_region { struct access_coordinate coord[ACCESS_COORDINATE_MAX]; struct notifier_block node_notifier; struct notifier_block adist_notifier; + struct delayed_work endpoint_wait_work; + bool endpoint_wait_armed; }; struct cxl_nvdimm_bridge { base-commit: 0f61b1860cc3f52aef9036d7235ed1f017632193 -- 2.37.3 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/2] cxl/region: Unregister auto-created region when assembly fails 2026-01-30 4:23 [PATCH 1/2] cxl/region: Timeout auto region assembly waiting for endpoints Alison Schofield @ 2026-01-30 4:23 ` Alison Schofield 2026-01-30 17:45 ` dan.j.williams 2026-01-30 4:58 ` [PATCH 1/2] cxl/region: Timeout auto region assembly waiting for endpoints dan.j.williams 1 sibling, 1 reply; 15+ messages in thread From: Alison Schofield @ 2026-01-30 4:23 UTC (permalink / raw) To: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams Cc: linux-cxl When auto-created region assembly fails the region remains registered but disabled. The region continues to reserve its memory resource, preventing DAX from registering the memory. Unregister the region on assembly failure to release the resource. Signed-off-by: Alison Schofield <alison.schofield@intel.com> --- drivers/cxl/core/region.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 183cb0b49d8b..f222aa9cbda7 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -3714,6 +3714,8 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd, return cxlr; } +static void unregister_region(void *dev); + static struct cxl_region * cxl_find_region_by_range(struct cxl_root_decoder *cxlrd, struct range *hpa) { @@ -3754,7 +3756,17 @@ int cxl_add_to_region(struct cxl_endpoint_decoder *cxled) if (rc) return rc; - attach_target(cxlr, cxled, -1, TASK_UNINTERRUPTIBLE); + rc = attach_target(cxlr, cxled, -1, TASK_UNINTERRUPTIBLE); + if (rc) { + struct cxl_port *root_port = cxlrd_to_port(cxlrd); + + /* Messages at the point of failure offer more detail */ + dev_err(&cxlr->dev, + "assembly failed %d, unregistering region\n", rc); + devm_release_action(root_port->uport_dev, unregister_region, + cxlr); + return rc; + } scoped_guard(rwsem_read, &cxl_rwsem.region) { p = &cxlr->params; -- 2.37.3 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] cxl/region: Unregister auto-created region when assembly fails 2026-01-30 4:23 ` [PATCH 2/2] cxl/region: Unregister auto-created region when assembly fails Alison Schofield @ 2026-01-30 17:45 ` dan.j.williams 2026-01-31 1:04 ` Alison Schofield 0 siblings, 1 reply; 15+ messages in thread From: dan.j.williams @ 2026-01-30 17:45 UTC (permalink / raw) To: Alison Schofield, Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams Cc: linux-cxl Alison Schofield wrote: > When auto-created region assembly fails the region remains registered > but disabled. Right, that is good forensics, administrator action is needed to figure out what to do next. > The region continues to reserve its memory resource, preventing DAX > from registering the memory. I would rather have the partially assemebled region to continue to exist. It can help debug the expected catastrophic error reports from DAX enabling access to a memory range that the CXL side can see has completely failed (lost an interleave member). If the failure is more benign and DAX side access is viable, then the forensics matter and userspace can cleanup. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] cxl/region: Unregister auto-created region when assembly fails 2026-01-30 17:45 ` dan.j.williams @ 2026-01-31 1:04 ` Alison Schofield 2026-01-31 15:49 ` Gregory Price 2026-02-03 3:07 ` dan.j.williams 0 siblings, 2 replies; 15+ messages in thread From: Alison Schofield @ 2026-01-31 1:04 UTC (permalink / raw) To: dan.j.williams, gourry Cc: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Vishal Verma, Ira Weiny, linux-cxl On Fri, Jan 30, 2026 at 09:45:29AM -0800, Dan Williams wrote: > Alison Schofield wrote: > > When auto-created region assembly fails the region remains registered > > but disabled. > > Right, that is good forensics, administrator action is needed to figure > out what to do next. > > > The region continues to reserve its memory resource, preventing DAX > > from registering the memory. > > I would rather have the partially assemebled region to continue to > exist. It can help debug the expected catastrophic error reports from > DAX enabling access to a memory range that the CXL side can see has > completely failed (lost an interleave member). If the failure is more > benign and DAX side access is viable, then the forensics matter and > userspace can cleanup. Thanks for all the feedback, Dan & Greg - I'm responding here because this is the overriding topic of do we want to behave better upon region assembly failures. If there is a path here to becoming a better region driver then I'll take a look at the implementation comments, like if or how to timeout. One point I should have led with: while we are focused on failover to DAX, the issue here is more general. It is about the region driver leaving behind an unrecoverable partial configuration on assembly failure, independent of consumers. Neither of these failures are recoverable from userspace today. If they should be recoverable from userspace, prove me wrong, but I'm doubtful that we are just one smart admin or one good cxl-cli update away from handling this in userspace. That's why these patches make the region driver fail gracefully. And I do think it is the region driver’s job to fail gracefully. When auto-created region assembly fails, the region remains registered with decoders still enabled. In that state, userspace does not have a supported way to unwind the configuration. cxl destroy-region fails because the decoders are still enabled (--force fails). So while “administrator action is needed” is true in principle, the admin has no effective action available. Leaving the region behind does not provide a viable recovery path because it leaves all the things related to this region stuck. All the things being the HPA resource, the DPA resources, and the decoders. On the forensics point, the most actionable diagnostic information is not in cxl-list output. cxl-list can show the existence of a disabled region, but it does not show why assembly failed, which endpoint is missing, or what happened at the time of failure. The forensic info is in the kernel log, because that’s where the assembly failure is detected and where the relevant context exists. With the changes here, the kernel messaging is improved so that the failure is explicit rather than requiring the admin to infer the situation from a disabled region in cxl-list. Of course, the timely topic is that unregistering the failed auto-created region releases the memory resource for DAX. To be complete, I'm not suggesting that these failures should lead to the “teardown of all regions and give all the goodies to DAX” path. Also, if you all think the endpoint arrival timeout is silly, I kind of inferred that from Dan's comment about BIOS seeing it a only few seconds ago", then I can drop that. Thoughts? --Alison ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] cxl/region: Unregister auto-created region when assembly fails 2026-01-31 1:04 ` Alison Schofield @ 2026-01-31 15:49 ` Gregory Price 2026-02-05 0:32 ` Alison Schofield 2026-02-03 3:07 ` dan.j.williams 1 sibling, 1 reply; 15+ messages in thread From: Gregory Price @ 2026-01-31 15:49 UTC (permalink / raw) To: Alison Schofield Cc: dan.j.williams, Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Vishal Verma, Ira Weiny, linux-cxl On Fri, Jan 30, 2026 at 05:04:48PM -0800, Alison Schofield wrote: > ... snip ... Logging in on a saturday, will respond fully on monday just wanted to inject a question for the masses here > When auto-created region assembly fails, I wonder how many work we want to do to try to make the auto-region path more reliable for complex setups (interleaving being one example). In other paths, in particular when there is some implied use-case for the device, we already take the opinion the BIOS should do nothing. Should we draw a hard line on when that should be the official opinion? i.e. auto-decoders should be intended for trivial SysRAM regions only (especially given that a user can't even select zone-isolation except via a global kernel build option that affects all hotplug memory) ~Gregory ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] cxl/region: Unregister auto-created region when assembly fails 2026-01-31 15:49 ` Gregory Price @ 2026-02-05 0:32 ` Alison Schofield 2026-02-05 4:22 ` Gregory Price 0 siblings, 1 reply; 15+ messages in thread From: Alison Schofield @ 2026-02-05 0:32 UTC (permalink / raw) To: Gregory Price Cc: dan.j.williams, Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Vishal Verma, Ira Weiny, linux-cxl On Sat, Jan 31, 2026 at 10:49:11AM -0500, Gregory Price wrote: > On Fri, Jan 30, 2026 at 05:04:48PM -0800, Alison Schofield wrote: > > > ... snip ... > Logging in on a saturday, will respond fully on monday just wanted to > inject a question for the masses here > > > When auto-created region assembly fails, > > I wonder how many work we want to do to try to make the auto-region > path more reliable for complex setups (interleaving being one example). > > In other paths, in particular when there is some implied use-case for > the device, we already take the opinion the BIOS should do nothing. > Should we draw a hard line on when that should be the official opinion? > > i.e. auto-decoders should be intended for trivial SysRAM regions only It has not been my impression that we have that much control over what BIOS may present. ie. If it is CXL Spec legal they may build it. I recall you wrote a doc of Linux Expectations of BIOS. Did you actually try to limit what BIOS does? > > (especially given that a user can't even select zone-isolation except > via a global kernel build option that affects all hotplug memory) > > ~Gregory ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] cxl/region: Unregister auto-created region when assembly fails 2026-02-05 0:32 ` Alison Schofield @ 2026-02-05 4:22 ` Gregory Price 0 siblings, 0 replies; 15+ messages in thread From: Gregory Price @ 2026-02-05 4:22 UTC (permalink / raw) To: Alison Schofield Cc: dan.j.williams, Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Vishal Verma, Ira Weiny, linux-cxl On Wed, Feb 04, 2026 at 04:32:39PM -0800, Alison Schofield wrote: > On Sat, Jan 31, 2026 at 10:49:11AM -0500, Gregory Price wrote: > > i.e. auto-decoders should be intended for trivial SysRAM regions only > > It has not been my impression that we have that much control over what > BIOS may present. ie. If it is CXL Spec legal they may build it. I > recall you wrote a doc of Linux Expectations of BIOS. Did you actually > try to limit what BIOS does? > > I should rephrase - barring any new specification updates that actually allow the BIOS to say what a region may be used for, the only two reasonable endpoints are sysram and dax - and realistically only sysram unless you build kmem out of the dax driver (kmem is the auto destination). But for non-trivial setups, recovery might not actually get you anything. If a device fails to actually come up after having been programmed by bios, it may not even be feasible to tear-down and recreate what the BIOS tried to do because of platform specifics (Zen5). So while you can time out and clean up, I'm not sure you can actually do anything after that reliably in the general case - and i'm not sure how much effort we should put into fighting those fires. I suppose you could leave the driver in a state that allows a user to see it's broken. ~Gregory ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] cxl/region: Unregister auto-created region when assembly fails 2026-01-31 1:04 ` Alison Schofield 2026-01-31 15:49 ` Gregory Price @ 2026-02-03 3:07 ` dan.j.williams 2026-02-05 0:20 ` Alison Schofield 1 sibling, 1 reply; 15+ messages in thread From: dan.j.williams @ 2026-02-03 3:07 UTC (permalink / raw) To: Alison Schofield, dan.j.williams, gourry Cc: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Vishal Verma, Ira Weiny, linux-cxl Alison Schofield wrote: > On Fri, Jan 30, 2026 at 09:45:29AM -0800, Dan Williams wrote: > > Alison Schofield wrote: > > > When auto-created region assembly fails the region remains registered > > > but disabled. > > > > Right, that is good forensics, administrator action is needed to figure > > out what to do next. > > > > > The region continues to reserve its memory resource, preventing DAX > > > from registering the memory. > > > > I would rather have the partially assemebled region to continue to > > exist. It can help debug the expected catastrophic error reports from > > DAX enabling access to a memory range that the CXL side can see has > > completely failed (lost an interleave member). If the failure is more > > benign and DAX side access is viable, then the forensics matter and > > userspace can cleanup. > > Thanks for all the feedback, Dan & Greg - > > I'm responding here because this is the overriding topic of do we want to > behave better upon region assembly failures. If there is a path here to > becoming a better region driver then I'll take a look at the implementation > comments, like if or how to timeout. > > One point I should have led with: while we are focused on failover to DAX, > the issue here is more general. It is about the region driver leaving behind > an unrecoverable partial configuration on assembly failure, independent of > consumers. This gets to the heart of the question of what practical problem is being solved with this and is the solution suitable? Outside of the "platform is doing something strange" case like "Normalized Addressing" or "Non-CXL Interleave Target" I am struggling to imagine an end user benefiting from this automatic cleanup. A system which is so flaky that it can not arrange for BIOS configured interleave to stay alive through Linux boot. At that point I expect the end user to decommission that system, and flag it for remediation, not recover it and keep running. > Neither of these failures are recoverable from userspace today. If they > should be recoverable from userspace, prove me wrong, but I'm doubtful > that we are just one smart admin or one good cxl-cli update away from handling > this in userspace. That is my bad. I mixed some unverified wishes in with my replies, but the end goal for me remains the same. Userspace should be able to undo every step that auto-region assembly performs. > That's why these patches make the region driver fail gracefully. And I > do think it is the region driver’s job to fail gracefully. This where you lose me. It fails gracefully today. It stops in a safe configuration same as if userspace stopped short of fully configuring a region. I keep coming back to the RAID example because CXL region assembly is roughly patterned after RAID assembly. In that example a RAID0 array does not disappear after 30 seconds if auto-assembly fails, it waits for administrator action. The potential conflict with a DAX takeover is a separate problem that also might not need full teardown if we can make it work with incremental fixes. > When auto-created region assembly fails, the region remains registered > with decoders still enabled. In that state, userspace does not have a > supported way to unwind the configuration. cxl destroy-region fails because > the decoders are still enabled (--force fails). So while “administrator action > is needed” is true in principle, the admin has no effective action available. > Leaving the region behind does not provide a viable recovery path because > it leaves all the things related to this region stuck. All the things being > the HPA resource, the DPA resources, and the decoders. Right, I think that is a gap worth fixing to have all the same tools available for partial creation recovery available to partial assembly recovery. A "gap" and not a "bug" because only a unit test might care about this presently. > On the forensics point, the most actionable diagnostic information is not in > cxl-list output. cxl-list can show the existence of a disabled region, but > it does not show why assembly failed, which endpoint is missing, or what > happened at the time of failure. cxl list -RDi -r $region ...shows the region, the number of expected targets, and the ones that have arrived. > The forensic info is in the kernel log, because that’s where the > assembly failure is detected and where the relevant context exists. > With the changes here, the kernel messaging is improved so that the > failure is explicit rather than requiring the admin to infer the > situation from a disabled region in cxl-list. The kernel log does not know the device that was meant to arrive. . The kernel log likely has debug disabled by default. This situation should be debuggable without the kernel log. Likely the first notification of something wrong is operations tooling noticing that serverX came up with less memory than expected, not a kernel log message. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] cxl/region: Unregister auto-created region when assembly fails 2026-02-03 3:07 ` dan.j.williams @ 2026-02-05 0:20 ` Alison Schofield 2026-02-05 1:03 ` dan.j.williams 0 siblings, 1 reply; 15+ messages in thread From: Alison Schofield @ 2026-02-05 0:20 UTC (permalink / raw) To: dan.j.williams Cc: gourry, Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Vishal Verma, Ira Weiny, linux-cxl On Mon, Feb 02, 2026 at 07:07:57PM -0800, Dan Williams wrote: > Alison Schofield wrote: > > On Fri, Jan 30, 2026 at 09:45:29AM -0800, Dan Williams wrote: > > > Alison Schofield wrote: > > > > When auto-created region assembly fails the region remains registered > > > > but disabled. > > > > > > Right, that is good forensics, administrator action is needed to figure > > > out what to do next. > > > > > > > The region continues to reserve its memory resource, preventing DAX > > > > from registering the memory. > > > > > > I would rather have the partially assemebled region to continue to > > > exist. It can help debug the expected catastrophic error reports from > > > DAX enabling access to a memory range that the CXL side can see has > > > completely failed (lost an interleave member). If the failure is more > > > benign and DAX side access is viable, then the forensics matter and > > > userspace can cleanup. > > > > Thanks for all the feedback, Dan & Greg - > > > > I'm responding here because this is the overriding topic of do we want to > > behave better upon region assembly failures. If there is a path here to > > becoming a better region driver then I'll take a look at the implementation > > comments, like if or how to timeout. > > > > One point I should have led with: while we are focused on failover to DAX, > > the issue here is more general. It is about the region driver leaving behind > > an unrecoverable partial configuration on assembly failure, independent of > > consumers. > > This gets to the heart of the question of what practical problem is > being solved with this and is the solution suitable? Outside of the > "platform is doing something strange" case like "Normalized Addressing" > or "Non-CXL Interleave Target" I am struggling to imagine an end user > benefiting from this automatic cleanup. A system which is so flaky that > it can not arrange for BIOS configured interleave to stay alive through > Linux boot. At that point I expect the end user to decommission that > system, and flag it for remediation, not recover it and keep running. Hi Dan, Why is the response different here that with DAX failover due to wonky BIOS usage of Soft Reserved resources. When BIOS is unclear(?), we give up on the CXL regions and give all the memory directly to DAX so the system can come up w all it's expected resources, yet for these region assembly failures we are willing to strand memory, even though the option to give to DAX is so easily available. In the past, we've seen BIOS-defined regions that the kernel was unable to assemble for reasons other than outright hardware failure. While we can hope the worst of those issues are behind us, the region assembly code path has not been quiet, so assuming no more issues seems ill- fated. > > > Neither of these failures are recoverable from userspace today. If they > > should be recoverable from userspace, prove me wrong, but I'm doubtful > > that we are just one smart admin or one good cxl-cli update away from handling > > this in userspace. > > That is my bad. I mixed some unverified wishes in with my replies, but > the end goal for me remains the same. Userspace should be able to undo > every step that auto-region assembly performs. I haven't heard that before. Sounds good to do. (Repeating myself I know, but that'll take a coordinated effort, ie changes to both region driver and userspace. > > > That's why these patches make the region driver fail gracefully. And I > > do think it is the region driver’s job to fail gracefully. > > This where you lose me. It fails gracefully today. It stops in a safe > configuration same as if userspace stopped short of fully configuring a > region. I keep coming back to the RAID example because CXL region > assembly is roughly patterned after RAID assembly. In that example a > RAID0 array does not disappear after 30 seconds if auto-assembly fails, > it waits for administrator action. > This is where you lose me. An unrepairable config may be safe but it's use is limited. wrt the RAID analogy: I’m not familiar with md internals. IIUC RAID tooling provides supported admin actions to stop, tear-down, and rebuild incomplete arrays. This CXL failure mode leaves a partial configuration that userspace cannot repair. So leaving the object behind is not comparable without a supported repair path. Aspirational, but not within reach like this soln. > The potential conflict with a DAX takeover is a separate problem that > also might not need full teardown if we can make it work with > incremental fixes. This is intended to work with and is tested w the DAX takeover patches. Like said above, it seems odd to let DAX takeover if BIOS gives us wonky Soft Reserved boundaries, but we won't let DAX takeover for region assembly failures. > > > When auto-created region assembly fails, the region remains registered > > with decoders still enabled. In that state, userspace does not have a > > supported way to unwind the configuration. cxl destroy-region fails because > > the decoders are still enabled (--force fails). So while “administrator action > > is needed” is true in principle, the admin has no effective action available. > > Leaving the region behind does not provide a viable recovery path because > > it leaves all the things related to this region stuck. All the things being > > the HPA resource, the DPA resources, and the decoders. > > Right, I think that is a gap worth fixing to have all the same tools > available for partial creation recovery available to partial assembly > recovery. A "gap" and not a "bug" because only a unit test might care > about this presently. This is not a unit test driven issue. It was not found in unit testing and is not motivated by trying to make a contrived unit test pass. This was observed in real configurations where BIOS-defined regions failed to assemble and memory was not recoverable. Agree to call it a gap. Did I call it a bug? I'm not reading any intent into why the region was not unregistered upon assembly failure previously. If you tell me that it was with the intent that user space tooling would pick up the pieces, I believe you and it's worth examining: Which will work better: -- improve the existing stop on assembly failure so userspace can repair -- or unregistering completely with a fail-over to DAX. Non DAX users can recreate at cmdline. It's difficult not to be biased towards these patches, when they are simple and within reach and the other is aspirational. > > > On the forensics point, the most actionable diagnostic information is not in > > cxl-list output. cxl-list can show the existence of a disabled region, but > > it does not show why assembly failed, which endpoint is missing, or what > > happened at the time of failure. > > cxl list -RDi -r $region > > ...shows the region, the number of expected targets, and the ones that > have arrived. > > > The forensic info is in the kernel log, because that’s where the > > assembly failure is detected and where the relevant context exists. > > With the changes here, the kernel messaging is improved so that the > > failure is explicit rather than requiring the admin to infer the > > situation from a disabled region in cxl-list. > > The kernel log does not know the device that was meant to arrive. . The > kernel log likely has debug disabled by default. This situation should > be debuggable without the kernel log. > > Likely the first notification of something wrong is operations tooling > noticing that serverX came up with less memory than expected, not a > kernel log message. (feels out of order here, but to finish response on comments) I agree the cxl list output is useful, but not as useful as making the failure explicit in a non-debug kernel log message, nor as useful as giving the user their expected memory via failover to DAX, nor as useful as allowing the user to create a new region from userspace with same resources. --Alison ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] cxl/region: Unregister auto-created region when assembly fails 2026-02-05 0:20 ` Alison Schofield @ 2026-02-05 1:03 ` dan.j.williams 0 siblings, 0 replies; 15+ messages in thread From: dan.j.williams @ 2026-02-05 1:03 UTC (permalink / raw) To: Alison Schofield, dan.j.williams Cc: gourry, Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Vishal Verma, Ira Weiny, linux-cxl Alison Schofield wrote: [..] > > This gets to the heart of the question of what practical problem is > > being solved with this and is the solution suitable? Outside of the > > "platform is doing something strange" case like "Normalized Addressing" > > or "Non-CXL Interleave Target" I am struggling to imagine an end user > > benefiting from this automatic cleanup. A system which is so flaky that > > it can not arrange for BIOS configured interleave to stay alive through > > Linux boot. At that point I expect the end user to decommission that > > system, and flag it for remediation, not recover it and keep running. > > Hi Dan, > > Why is the response different here that with DAX failover due to wonky > BIOS usage of Soft Reserved resources. When BIOS is unclear(?), we give > up on the CXL regions and give all the memory directly to DAX so the > system can come up w all it's expected resources, yet for these region > assembly failures we are willing to strand memory, even though the > option to give to DAX is so easily available. Right, that is the question I had to Smita, can we solve the conflict problem with less violence because it saves her series from having to figure out the teardown race. As I mentioned, that was prompted by your review: http://lore.kernel.org/697a9d46b147e_309510027@dwillia2-mobl4.notmuch > This is where you lose me. An unrepairable config may be safe but it's use > is limited. > > wrt the RAID analogy: I’m not familiar with md internals. IIUC RAID tooling > provides supported admin actions to stop, tear-down, and rebuild incomplete > arrays. This CXL failure mode leaves a partial configuration that userspace > cannot repair. So leaving the object behind is not comparable without a > supported repair path. Aspirational, but not within reach like this > soln. I am interested in fixing those mechanisms as a first stop. [..] > This is not a unit test driven issue. It was not found in unit testing and > is not motivated by trying to make a contrived unit test pass. This was > observed in real configurations where BIOS-defined regions failed to > assemble and memory was not recoverable. That was missing from the description that a real world use case would benefit from this separate from the dax failover being discussed in Smita's patches. Are you saying the dax failover patches are insufficient for this case? Is this an alternate proposal? > Agree to call it a gap. Did I call it a bug? I'm not reading any intent into > why the region was not unregistered upon assembly failure previously. If you > tell me that it was with the intent that user space tooling would pick up the > pieces, I believe you and it's worth examining: > > Which will work better: > -- improve the existing stop on assembly failure so userspace can repair > -- or unregistering completely with a fail-over to DAX. Non DAX users can > recreate at cmdline. > > It's difficult not to be biased towards these patches, when they are simple > and within reach and the other is aspirational. I think we have time to do the improvement. Deployments that want to forfeit CXL driver operation already have the "disable cxl_acpi" workaround. That has bought us the time to do the dax failover patches which are nothing if not "try to get dax going after some timeout (wait_for_device_probe())". > (feels out of order here, but to finish response on comments) > I agree the cxl list output is useful, but not as useful as making the > failure explicit in a non-debug kernel log message, nor as useful > as giving the user their expected memory via failover to DAX, nor as > useful as allowing the user to create a new region from userspace > with same resources. If dax hmem attaches there is no opportunity to create a new region from userspace, right, that resource is burned? The summary for me is: - If the CXL assembly failure leftovers can be made to co-exist with DAX takeover, great. If not or it proves too complicated, sure unregister regions. It is simply a bug that insert_resource() is optional in construct_region() *and* blocks DAX. - CXL auto assmembly should be as recoverable from usersapce as manual assembly failures. - If auto assemble CXL regions are not ready to go by the wait_for_device_probe() timeout point that Smita's series is adding there is no point in waiting any longer. - If Dave merges this 30 second timeout as a temporary stop-gap while the "wait_for_device_probe() fine grained failover work" plays out, I will grin and bear it while grumbling something about the "disable cxl_acpi" workaround. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] cxl/region: Timeout auto region assembly waiting for endpoints 2026-01-30 4:23 [PATCH 1/2] cxl/region: Timeout auto region assembly waiting for endpoints Alison Schofield 2026-01-30 4:23 ` [PATCH 2/2] cxl/region: Unregister auto-created region when assembly fails Alison Schofield @ 2026-01-30 4:58 ` dan.j.williams 2026-01-30 17:42 ` Gregory Price 1 sibling, 1 reply; 15+ messages in thread From: dan.j.williams @ 2026-01-30 4:58 UTC (permalink / raw) To: Alison Schofield, Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams Cc: linux-cxl Alison Schofield wrote: > Currently, if not all expected endpoints arrive for an auto-created > region, the region remains registered but disabled indefinitely. The > region continues to reserve its memory resource, preventing DAX from > registering the memory, and provides no indication that endpoints > failed to arrive (useful for non-DAX configurations). Does it block DAX? The HPA reservation does not mark the resource busy, so DAX should still be able to operate. It might make a mess of resource tree, but not block unless we are in this case of the boundaries misaligning. I would rather rethink the insert_resource() in __construct_region() if it is indeed a blocking problem. It is already the case that an insert_resource() failure is not fatal to region creation. > Start a 30 second timeout when the first endpoint attaches. If the > remaining endpoints have not attached before the timeout expires, > abort region assembly and unregister the region. The time to give up on regions present at boot should be at the end of wait_for_device_probe(). There is no point waiting any longer for a device that was alive a few seconds ago according to the BIOS. > Cancel the timeout when all expected endpoints attach or the region is > unregistered for any reason. Setting aside the above, this looks like policy, and every time I see policy the first question is "can userspace do it?". It would be straightforward for userspace to kick a 30 second watchdog upon each region KOBJ_ADD event. Each time that fires go cleanup partially assembled regions. For example there is no automatic cleanup of partially assembled RAID arrays. So, precedent leans towards letting userspace decide what happens when composite devices fail assembly. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] cxl/region: Timeout auto region assembly waiting for endpoints 2026-01-30 4:58 ` [PATCH 1/2] cxl/region: Timeout auto region assembly waiting for endpoints dan.j.williams @ 2026-01-30 17:42 ` Gregory Price 2026-01-30 18:26 ` dan.j.williams 0 siblings, 1 reply; 15+ messages in thread From: Gregory Price @ 2026-01-30 17:42 UTC (permalink / raw) To: dan.j.williams Cc: Alison Schofield, Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Vishal Verma, Ira Weiny, linux-cxl On Thu, Jan 29, 2026 at 08:58:09PM -0800, dan.j.williams@intel.com wrote: > > > Cancel the timeout when all expected endpoints attach or the region is > > unregistered for any reason. > > Setting aside the above, this looks like policy, and every time I see > policy the first question is "can userspace do it?". It would be > straightforward for userspace to kick a 30 second watchdog upon each > region KOBJ_ADD event. Each time that fires go cleanup partially > assembled regions. > > For example there is no automatic cleanup of partially assembled RAID > arrays. So, precedent leans towards letting userspace decide what > happens when composite devices fail assembly. Sounds like there'll be a nasty race implied here. Lets assume a kmem region that gets auto-onlined 0) Region is waiting for a device 1) Final device arrives and starts probing, locking the region 2) Userspace timeout occurs, firing a cleanup request 2) Region finishes probing 2a) this creates the dax region 2b) this creates the dax_kmem device 2c) this may auto-hotplug into ZONE_NORMAL 2d) kernel page gets allocated on the memory region 3) Userspace cleanup arrives to unbind 3a) dax_kmem is online and can't hot-unplug 3b) dax_kmem abandons hope and leaves the memory online (see dev_dax_kmem_remove and remove_memory) 3c) region cleans up Final state: region can't be rebound because memory is left online and unassociated with any device This will be hard to get right ~Gregory ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] cxl/region: Timeout auto region assembly waiting for endpoints 2026-01-30 17:42 ` Gregory Price @ 2026-01-30 18:26 ` dan.j.williams 2026-01-30 19:03 ` Gregory Price 0 siblings, 1 reply; 15+ messages in thread From: dan.j.williams @ 2026-01-30 18:26 UTC (permalink / raw) To: Gregory Price, dan.j.williams Cc: Alison Schofield, Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Vishal Verma, Ira Weiny, linux-cxl Gregory Price wrote: > On Thu, Jan 29, 2026 at 08:58:09PM -0800, dan.j.williams@intel.com wrote: > > > > > Cancel the timeout when all expected endpoints attach or the region is > > > unregistered for any reason. > > > > Setting aside the above, this looks like policy, and every time I see > > policy the first question is "can userspace do it?". It would be > > straightforward for userspace to kick a 30 second watchdog upon each > > region KOBJ_ADD event. Each time that fires go cleanup partially > > assembled regions. > > > > For example there is no automatic cleanup of partially assembled RAID > > arrays. So, precedent leans towards letting userspace decide what > > happens when composite devices fail assembly. > > Sounds like there'll be a nasty race implied here. > > Lets assume a kmem region that gets auto-onlined > > 0) Region is waiting for a device > 1) Final device arrives and starts probing, locking the region "locking?". There is no IORESOURCE_BUSY contention until the final driver attaches. > 2) Userspace timeout occurs, firing a cleanup request > 2) Region finishes probing > 2a) this creates the dax region > 2b) this creates the dax_kmem device > 2c) this may auto-hotplug into ZONE_NORMAL > 2d) kernel page gets allocated on the memory region > 3) Userspace cleanup arrives to unbind > 3a) dax_kmem is online and can't hot-unplug Right, userspace needs to honor typical managed hotplug and not force removal. > 3b) dax_kmem abandons hope and leaves the memory online > (see dev_dax_kmem_remove and remove_memory) > 3c) region cleans up > > Final state: region can't be rebound because memory is left online > and unassociated with any device I do not see how we get into this situation. If dax_kmem comes up, then there is nothing to clean up. Yes, these can race, but typical locking should ensure full forward progress or cleanup. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] cxl/region: Timeout auto region assembly waiting for endpoints 2026-01-30 18:26 ` dan.j.williams @ 2026-01-30 19:03 ` Gregory Price 2026-01-30 22:46 ` dan.j.williams 0 siblings, 1 reply; 15+ messages in thread From: Gregory Price @ 2026-01-30 19:03 UTC (permalink / raw) To: dan.j.williams Cc: Alison Schofield, Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Vishal Verma, Ira Weiny, linux-cxl On Fri, Jan 30, 2026 at 10:26:06AM -0800, dan.j.williams@intel.com wrote: > I do not see how we get into this situation. If dax_kmem comes up, then > there is nothing to clean up. Yes, these can race, but typical locking > should ensure full forward progress or cleanup. yes as long as the watchdog checks for the region actually having come up or not before unbinding, it should be ok - but then it seems a little odd to push that to userland if the driver basically already has all that knowledge. 30 seconds is quite arbitrary though, so you have a point. ~Gregory ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] cxl/region: Timeout auto region assembly waiting for endpoints 2026-01-30 19:03 ` Gregory Price @ 2026-01-30 22:46 ` dan.j.williams 0 siblings, 0 replies; 15+ messages in thread From: dan.j.williams @ 2026-01-30 22:46 UTC (permalink / raw) To: Gregory Price, dan.j.williams Cc: Alison Schofield, Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Vishal Verma, Ira Weiny, linux-cxl Gregory Price wrote: > On Fri, Jan 30, 2026 at 10:26:06AM -0800, dan.j.williams@intel.com wrote: > > I do not see how we get into this situation. If dax_kmem comes up, then > > there is nothing to clean up. Yes, these can race, but typical locking > > should ensure full forward progress or cleanup. > > yes as long as the watchdog checks for the region actually having come > up or not before unbinding, it should be ok - but then it seems a little > odd to push that to userland if the driver basically already has all > that knowledge. A couple observations: 1/ cxl-cli already knows that raw region unbind is problematic, hence the doom and gloom documentation around the --force option. 2/ The driver does not know the error recovery policy for region assembly. The mechanism being enabled is that, in the case of CXL assembly failing due to potential platform quirks, there is a chance that the default DAX fallback could recover operation. If the fallback is always broken because of the insert_resource() in the construct_region() path, then that constrained problem needs to be fixed first. Once that mechanism is fixed the rest of this becomes a pure policy problem and I argue is suitable for userspace to handle. The incomplete region is the kernel telling the admin the unvarnished truth about what it knows to about the CXL topology. Same as an incomplete RAID array. ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2026-02-05 4:22 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-01-30 4:23 [PATCH 1/2] cxl/region: Timeout auto region assembly waiting for endpoints Alison Schofield 2026-01-30 4:23 ` [PATCH 2/2] cxl/region: Unregister auto-created region when assembly fails Alison Schofield 2026-01-30 17:45 ` dan.j.williams 2026-01-31 1:04 ` Alison Schofield 2026-01-31 15:49 ` Gregory Price 2026-02-05 0:32 ` Alison Schofield 2026-02-05 4:22 ` Gregory Price 2026-02-03 3:07 ` dan.j.williams 2026-02-05 0:20 ` Alison Schofield 2026-02-05 1:03 ` dan.j.williams 2026-01-30 4:58 ` [PATCH 1/2] cxl/region: Timeout auto region assembly waiting for endpoints dan.j.williams 2026-01-30 17:42 ` Gregory Price 2026-01-30 18:26 ` dan.j.williams 2026-01-30 19:03 ` Gregory Price 2026-01-30 22:46 ` dan.j.williams
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox