* [PATCH] cxl/region: Delay inserting iomem resource until auto region commit
@ 2026-02-12 6:22 Alison Schofield
2026-02-12 17:18 ` Gregory Price
2026-02-12 19:29 ` dan.j.williams
0 siblings, 2 replies; 6+ messages in thread
From: Alison Schofield @ 2026-02-12 6:22 UTC (permalink / raw)
To: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
Vishal Verma, Ira Weiny, Dan Williams, Smita Koralahalli
Cc: linux-cxl
During auto region assembly the region driver inserts the region
resource into the iomem tree when the first endpoint arrives and
region assembly begins. If the region later fails to assemble, the
resource can remain stranded in the iomem tree, making it appear like
a DAX region is a child of the CXL region, when that is not true.
For example:
68e80000000-8d37fffffff : CXL Window 9
68e80000000-70e7fffffff : Soft Reserved
68e80000000-70e7fffffff : region9
68e80000000-70e7fffffff : dax19.0
68e80000000-70e7fffffff : System RAM (kmem)
In the above case, region9 failed to assemble, yet proc/iomem shows
the DAX region as being parented under a CXL region. In reality, the
CXL region is in a disabled state and the DAX region is managed by the
HMEM driver.
Examining /proc/iomem is one way users inspect the memory topology,
and with this patch that view remains accurate.
Delay insertion of the iomem resource until the auto region reaches
the commit state. Introduce the res_want_insert field to track whether
the region's resource should be inserted into the iomem tree.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
Putting this out for comments and I expect to rebase on 7.0-rc1 if
this is wanted.
Today it is built upon Smita's v6 Soft Reserved set [1] because it
is with that set where the failover to DAX starts happening and the
confusing /proc/iomem can appear. Without that set, the resource of
the failed region appears in /proc/iomem, but it's less confusing
since it doesn't show any children.
There is an option for Smita's set to teardown the CXL regions when
it takes over the resource for HMEM DAX, however latest revision, v6,
has taken a gentler approach and leaves the regions intact.
[1] https://lore.kernel.org/linux-cxl/20260210064501.157591-1-Smita.KoralahalliChannabasappa@amd.com/
drivers/cxl/core/region.c | 32 ++++++++++++++++++++------------
drivers/cxl/cxl.h | 4 ++++
2 files changed, 24 insertions(+), 12 deletions(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 96ed550bfd2e..9ecc1748e9de 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -666,6 +666,7 @@ static int alloc_hpa(struct cxl_region *cxlr, resource_size_t size)
}
p->res = res;
+ p->res_want_insert = false;
p->state = CXL_CONFIG_INTERLEAVE_ACTIVE;
return 0;
@@ -2094,6 +2095,24 @@ static int cxl_region_attach(struct cxl_region *cxlr,
p->state = CXL_CONFIG_COMMIT;
cxl_region_shared_upstream_bandwidth_update(cxlr);
+ /*
+ * Insert iomem resource only once at first commit. The
+ * resource remains for the lifetime of this region, across
+ * disable/enable cycles, and is only removed at unregister.
+ *
+ * Set res_want_insert to false on the first attempt, even if
+ * it fails, to avoid retries if the platform firmware did
+ * not split resources like "System RAM" on CXL window
+ * boundaries. Resource is not required to be in iomem tree.
+ */
+ if (p->res && p->res_want_insert) {
+ rc = insert_resource(cxlrd->res, p->res);
+ if (rc)
+ dev_warn(&cxlr->dev,
+ "cannot insert iomem resource\n");
+ p->res_want_insert = false;
+ }
+
return 0;
}
@@ -3604,19 +3623,8 @@ static int __construct_region(struct cxl_region *cxlr,
if (rc)
return rc;
- rc = insert_resource(cxlrd->res, res);
- if (rc) {
- /*
- * Platform-firmware may not have split resources like "System
- * RAM" on CXL window boundaries see cxl_region_iomem_release()
- */
- dev_warn(cxlmd->dev.parent,
- "%s:%s: %s %s cannot insert resource\n",
- dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
- __func__, dev_name(&cxlr->dev));
- }
-
p->res = res;
+ p->res_want_insert = true;
p->interleave_ways = cxled->cxld.interleave_ways;
p->interleave_granularity = cxled->cxld.interleave_granularity;
p->state = CXL_CONFIG_INTERLEAVE_ACTIVE;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index c796c3db36e0..2b977ab33af6 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -480,6 +480,9 @@ enum cxl_config_state {
* @interleave_ways: number of endpoints in the region
* @interleave_granularity: capacity each endpoint contributes to a stripe
* @res: allocated iomem capacity for this region
+ * @res_want_insert: true if the resource should be inserted into the iomem
+ * tree. Set to false after the first attempt to insert or if
+ * res originates from the iomem tree via alloc_free_mem_region()
* @targets: active ordered targets in current decoder configuration
* @nr_targets: number of targets
* @cache_size: extended linear cache size if exists, otherwise zero.
@@ -492,6 +495,7 @@ struct cxl_region_params {
int interleave_ways;
int interleave_granularity;
struct resource *res;
+ bool res_want_insert;
struct cxl_endpoint_decoder *targets[CXL_DECODER_MAX_INTERLEAVE];
int nr_targets;
resource_size_t cache_size;
--
2.37.3
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH] cxl/region: Delay inserting iomem resource until auto region commit
2026-02-12 6:22 [PATCH] cxl/region: Delay inserting iomem resource until auto region commit Alison Schofield
@ 2026-02-12 17:18 ` Gregory Price
2026-02-12 18:58 ` Alison Schofield
2026-02-12 19:29 ` dan.j.williams
1 sibling, 1 reply; 6+ messages in thread
From: Gregory Price @ 2026-02-12 17:18 UTC (permalink / raw)
To: Alison Schofield
Cc: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Vishal Verma,
Ira Weiny, Dan Williams, Smita Koralahalli, linux-cxl
On Wed, Feb 11, 2026 at 10:22:46PM -0800, Alison Schofield wrote:
> During auto region assembly the region driver inserts the region
> resource into the iomem tree when the first endpoint arrives and
> region assembly begins. If the region later fails to assemble, the
> resource can remain stranded in the iomem tree, making it appear like
> a DAX region is a child of the CXL region, when that is not true.
>
> For example:
> 68e80000000-8d37fffffff : CXL Window 9
> 68e80000000-70e7fffffff : Soft Reserved
> 68e80000000-70e7fffffff : region9
> 68e80000000-70e7fffffff : dax19.0
> 68e80000000-70e7fffffff : System RAM (kmem)
>
> In the above case, region9 failed to assemble, yet proc/iomem shows
> the DAX region as being parented under a CXL region. In reality, the
> CXL region is in a disabled state and the DAX region is managed by the
> HMEM driver.
>
> Examining /proc/iomem is one way users inspect the memory topology,
> and with this patch that view remains accurate.
>
> Delay insertion of the iomem resource until the auto region reaches
> the commit state. Introduce the res_want_insert field to track whether
> the region's resource should be inserted into the iomem tree.
>
Can you show the before/after of the iomap failure case just to make it
explicit what the user sees before / after? I think I understand, but
worth just having the explicit data for the failure case in the
changelog.
Thanks!
~Gregory
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] cxl/region: Delay inserting iomem resource until auto region commit
2026-02-12 17:18 ` Gregory Price
@ 2026-02-12 18:58 ` Alison Schofield
0 siblings, 0 replies; 6+ messages in thread
From: Alison Schofield @ 2026-02-12 18:58 UTC (permalink / raw)
To: Gregory Price
Cc: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Vishal Verma,
Ira Weiny, Dan Williams, Smita Koralahalli, linux-cxl
On Thu, Feb 12, 2026 at 12:18:51PM -0500, Gregory Price wrote:
> On Wed, Feb 11, 2026 at 10:22:46PM -0800, Alison Schofield wrote:
> > During auto region assembly the region driver inserts the region
> > resource into the iomem tree when the first endpoint arrives and
> > region assembly begins. If the region later fails to assemble, the
> > resource can remain stranded in the iomem tree, making it appear like
> > a DAX region is a child of the CXL region, when that is not true.
> >
> > For example:
> > 68e80000000-8d37fffffff : CXL Window 9
> > 68e80000000-70e7fffffff : Soft Reserved
> > 68e80000000-70e7fffffff : region9
> > 68e80000000-70e7fffffff : dax19.0
> > 68e80000000-70e7fffffff : System RAM (kmem)
> >
> > In the above case, region9 failed to assemble, yet proc/iomem shows
> > the DAX region as being parented under a CXL region. In reality, the
> > CXL region is in a disabled state and the DAX region is managed by the
> > HMEM driver.
> >
> > Examining /proc/iomem is one way users inspect the memory topology,
> > and with this patch that view remains accurate.
> >
> > Delay insertion of the iomem resource until the auto region reaches
> > the commit state. Introduce the res_want_insert field to track whether
> > the region's resource should be inserted into the iomem tree.
> >
>
> Can you show the before/after of the iomap failure case just to make it
> explicit what the user sees before / after? I think I understand, but
> worth just having the explicit data for the failure case in the
> changelog.
Sure, will do in a v2.
Thanks for taking a look!
>
> Thanks!
> ~Gregory
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] cxl/region: Delay inserting iomem resource until auto region commit
2026-02-12 6:22 [PATCH] cxl/region: Delay inserting iomem resource until auto region commit Alison Schofield
2026-02-12 17:18 ` Gregory Price
@ 2026-02-12 19:29 ` dan.j.williams
2026-02-23 21:25 ` Alison Schofield
1 sibling, 1 reply; 6+ messages in thread
From: dan.j.williams @ 2026-02-12 19:29 UTC (permalink / raw)
To: Alison Schofield, Davidlohr Bueso, Jonathan Cameron, Dave Jiang,
Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Smita Koralahalli
Cc: linux-cxl
Alison Schofield wrote:
> During auto region assembly the region driver inserts the region
> resource into the iomem tree when the first endpoint arrives and
> region assembly begins. If the region later fails to assemble, the
> resource can remain stranded in the iomem tree, making it appear like
> a DAX region is a child of the CXL region, when that is not true.
>
> For example:
> 68e80000000-8d37fffffff : CXL Window 9
> 68e80000000-70e7fffffff : Soft Reserved
> 68e80000000-70e7fffffff : region9
> 68e80000000-70e7fffffff : dax19.0
> 68e80000000-70e7fffffff : System RAM (kmem)
...but it *is* telling the truth. The truth is that multiple objects
have laid a claim to that address range. The region9 object has reserved
part of the CXL window for its use. If the "Soft Reserved" and "dax19.0"
reservations are removed then the address space should still be reserved
for the region. The collision only occurs when drivers for those
competing objects try to mark the range IORESOURCE_BUSY. Competing
claims to the same address range are why IORESOURCE_BUSY exists, benign
overlaps are ok.
Note that CXL Window 9 is also reserving more address space than is
actually in use. Like region9 70e80000000-8d37fffffff is reserved
address space with no active decode.
> In the above case, region9 failed to assemble, yet proc/iomem shows
> the DAX region as being parented under a CXL region. In reality, the
> CXL region is in a disabled state and the DAX region is managed by the
> HMEM driver.
iomem resource parenting shows registration ordering, not actual
parenting. Most of the time the registration order and the parenting
lines up, but I would be surprised if any use case depends on this.
> Examining /proc/iomem is one way users inspect the memory topology,
> and with this patch that view remains accurate.
The ambiguity of iomem resource parenting can be resolved by looking the
device-path for dax19.0 ("daxctl list -RDu"). It is also resolved by
noticing that the CXL region and the DAX region are using separate
memregion_id values (9 vs 19).
The extra effort for incremental precision is not needed, the contents
of /proc/iomem are accurate.
> Delay insertion of the iomem resource until the auto region reaches
> the commit state. Introduce the res_want_insert field to track whether
> the region's resource should be inserted into the iomem tree.
>
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> ---
>
> Putting this out for comments and I expect to rebase on 7.0-rc1 if
> this is wanted.
>
> Today it is built upon Smita's v6 Soft Reserved set [1] because it
> is with that set where the failover to DAX starts happening and the
> confusing /proc/iomem can appear. Without that set, the resource of
> the failed region appears in /proc/iomem, but it's less confusing
> since it doesn't show any children.
>
> There is an option for Smita's set to teardown the CXL regions when
> it takes over the resource for HMEM DAX, however latest revision, v6,
> has taken a gentler approach and leaves the regions intact.
>
>
> [1] https://lore.kernel.org/linux-cxl/20260210064501.157591-1-Smita.KoralahalliChannabasappa@amd.com/
>
>
> drivers/cxl/core/region.c | 32 ++++++++++++++++++++------------
> drivers/cxl/cxl.h | 4 ++++
> 2 files changed, 24 insertions(+), 12 deletions(-)
>
[..]
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index c796c3db36e0..2b977ab33af6 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -480,6 +480,9 @@ enum cxl_config_state {
> * @interleave_ways: number of endpoints in the region
> * @interleave_granularity: capacity each endpoint contributes to a stripe
> * @res: allocated iomem capacity for this region
> + * @res_want_insert: true if the resource should be inserted into the iomem
> + * tree. Set to false after the first attempt to insert or if
> + * res originates from the iomem tree via alloc_free_mem_region()
This is too much "control flow in a data structure" for my taste, but is
moot given the comments above that this extra effort is not necessary.
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] cxl/region: Delay inserting iomem resource until auto region commit
2026-02-12 19:29 ` dan.j.williams
@ 2026-02-23 21:25 ` Alison Schofield
2026-03-12 23:55 ` Dan Williams
0 siblings, 1 reply; 6+ messages in thread
From: Alison Schofield @ 2026-02-23 21:25 UTC (permalink / raw)
To: dan.j.williams
Cc: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Vishal Verma,
Ira Weiny, Smita Koralahalli, linux-cxl
On Thu, Feb 12, 2026 at 11:29:04AM -0800, Dan Williams wrote:
> Alison Schofield wrote:
> > During auto region assembly the region driver inserts the region
> > resource into the iomem tree when the first endpoint arrives and
> > region assembly begins. If the region later fails to assemble, the
> > resource can remain stranded in the iomem tree, making it appear like
> > a DAX region is a child of the CXL region, when that is not true.
> >
> > For example:
> > 68e80000000-8d37fffffff : CXL Window 9
> > 68e80000000-70e7fffffff : Soft Reserved
> > 68e80000000-70e7fffffff : region9
> > 68e80000000-70e7fffffff : dax19.0
> > 68e80000000-70e7fffffff : System RAM (kmem)
>
> ...but it *is* telling the truth. The truth is that multiple objects
> have laid a claim to that address range. The region9 object has reserved
> part of the CXL window for its use. If the "Soft Reserved" and "dax19.0"
> reservations are removed then the address space should still be reserved
> for the region. The collision only occurs when drivers for those
> competing objects try to mark the range IORESOURCE_BUSY. Competing
> claims to the same address range are why IORESOURCE_BUSY exists, benign
> overlaps are ok.
>
> Note that CXL Window 9 is also reserving more address space than is
> actually in use. Like region9 70e80000000-8d37fffffff is reserved
> address space with no active decode.
>
> > In the above case, region9 failed to assemble, yet proc/iomem shows
> > the DAX region as being parented under a CXL region. In reality, the
> > CXL region is in a disabled state and the DAX region is managed by the
> > HMEM driver.
>
> iomem resource parenting shows registration ordering, not actual
> parenting. Most of the time the registration order and the parenting
> lines up, but I would be surprised if any use case depends on this.
>
> > Examining /proc/iomem is one way users inspect the memory topology,
> > and with this patch that view remains accurate.
>
> The ambiguity of iomem resource parenting can be resolved by looking the
> device-path for dax19.0 ("daxctl list -RDu"). It is also resolved by
> noticing that the CXL region and the DAX region are using separate
> memregion_id values (9 vs 19).
>
> The extra effort for incremental precision is not needed, the contents
> of /proc/iomem are accurate.
>
Incremental Precision? Isn't it just wrong? The region driver does not
have a claim on that address range. It inserted the resource before it
became a decoder owner. If it wanted to place a provisional hold and then
drop it on failure to commit, that would at least be internally
consistent.
In retrospect, inserting the resource prior to commit was a bug in the
region lifecycle. I can re-present this as that, so folks do not get
side-tracked debating how users interpret memory topology.
-- Alison
> > Delay insertion of the iomem resource until the auto region reaches
> > the commit state. Introduce the res_want_insert field to track whether
> > the region's resource should be inserted into the iomem tree.
> >
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > ---
> >
> > Putting this out for comments and I expect to rebase on 7.0-rc1 if
> > this is wanted.
> >
> > Today it is built upon Smita's v6 Soft Reserved set [1] because it
> > is with that set where the failover to DAX starts happening and the
> > confusing /proc/iomem can appear. Without that set, the resource of
> > the failed region appears in /proc/iomem, but it's less confusing
> > since it doesn't show any children.
> >
> > There is an option for Smita's set to teardown the CXL regions when
> > it takes over the resource for HMEM DAX, however latest revision, v6,
> > has taken a gentler approach and leaves the regions intact.
> >
> >
> > [1] https://lore.kernel.org/linux-cxl/20260210064501.157591-1-Smita.KoralahalliChannabasappa@amd.com/
> >
> >
> > drivers/cxl/core/region.c | 32 ++++++++++++++++++++------------
> > drivers/cxl/cxl.h | 4 ++++
> > 2 files changed, 24 insertions(+), 12 deletions(-)
> >
> [..]
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > index c796c3db36e0..2b977ab33af6 100644
> > --- a/drivers/cxl/cxl.h
> > +++ b/drivers/cxl/cxl.h
> > @@ -480,6 +480,9 @@ enum cxl_config_state {
> > * @interleave_ways: number of endpoints in the region
> > * @interleave_granularity: capacity each endpoint contributes to a stripe
> > * @res: allocated iomem capacity for this region
> > + * @res_want_insert: true if the resource should be inserted into the iomem
> > + * tree. Set to false after the first attempt to insert or if
> > + * res originates from the iomem tree via alloc_free_mem_region()
>
> This is too much "control flow in a data structure" for my taste, but is
> moot given the comments above that this extra effort is not necessary.
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] cxl/region: Delay inserting iomem resource until auto region commit
2026-02-23 21:25 ` Alison Schofield
@ 2026-03-12 23:55 ` Dan Williams
0 siblings, 0 replies; 6+ messages in thread
From: Dan Williams @ 2026-03-12 23:55 UTC (permalink / raw)
To: Alison Schofield, dan.j.williams
Cc: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Vishal Verma,
Ira Weiny, Smita Koralahalli, linux-cxl
Alison Schofield wrote:
[..]
> >
> > The ambiguity of iomem resource parenting can be resolved by looking the
> > device-path for dax19.0 ("daxctl list -RDu"). It is also resolved by
> > noticing that the CXL region and the DAX region are using separate
> > memregion_id values (9 vs 19).
> >
> > The extra effort for incremental precision is not needed, the contents
> > of /proc/iomem are accurate.
> >
>
> Incremental Precision? Isn't it just wrong?
I would say it is "acceptably" wrong. More below...
> The region driver does not have a claim on that address range. It
> inserted the resource before it became a decoder owner. If it wanted
> to place a provisional hold and then drop it on failure to commit,
> that would at least be internally consistent.
When I say it is "acceptably" wrong the user created region flow also has
an unbounded time between when size_store() inserts the resource from
alloc_free_mem_region() to whenever the region becomes committed. In the
same way that the kernel can not know that "cxl create-region" crashed
and is never going to commit that region it started, the kernel can not
know when some wayward device is going to finally arrive to finish off
the region creation process.
The determination on when to give up and when to clean up is policy.
Kernel is allergic to policy, it is a userspace responsibility.
/proc/iomem correctly shows that CXL subsystem was interested in this
address range for a region and can decide "I am done waiting for
regionX, let me go clean that up and repurpose the devices and address
space for something else."
> In retrospect, inserting the resource prior to commit was a bug in the
> region lifecycle. I can re-present this as that, so folks do not get
> side-tracked debating how users interpret memory topology.
I do not think it is a bug. It serves exactly the same purpose as
alloc_hpa() for manual region creation. It says "CXL subsystem is
interested in instantiating a region here, stay tuned...".
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-03-12 23:56 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-12 6:22 [PATCH] cxl/region: Delay inserting iomem resource until auto region commit Alison Schofield
2026-02-12 17:18 ` Gregory Price
2026-02-12 18:58 ` Alison Schofield
2026-02-12 19:29 ` dan.j.williams
2026-02-23 21:25 ` Alison Schofield
2026-03-12 23:55 ` Dan Williams
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox