public inbox for linux-cxl@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] cxl/region: fix region leak when attach_target fails in cxl_add_to_region
@ 2026-02-21  4:30 Gregory Price
  2026-02-21  4:30 ` [PATCH 2/2] cxl/region: skip default driver attach for memdev with attach callbacks Gregory Price
                   ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: Gregory Price @ 2026-02-21  4:30 UTC (permalink / raw)
  To: linux-cxl
  Cc: linux-kernel, kernel-team, dave, jonathan.cameron, dave.jiang,
	alison.schofield, vishal.l.verma, ira.weiny, dan.j.williams

cxl_add_to_region() ignores the return value of attach_target().  When
attach_target() fails (e.g. cxl_port_setup_targets() returns -ENXIO),
the auto-discovered region remains registered with its HPA resource
consumed but never reaches COMMIT state.  Subsequent region creation
attempts fail with -ENOSPC because the HPA range is already reserved.

Track whether this call to cxl_add_to_region() created the region, and
call drop_region() on attach_target() failure to unregister it and
release the HPA resource.  Pre-existing regions are left alone since
other endpoints may already be attached.

Signed-off-by: Gregory Price <gourry@gourry.net>
---
 drivers/cxl/core/region.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 385be9cb44cd..276046d49f88 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -3923,6 +3923,7 @@ int cxl_add_to_region(struct cxl_endpoint_decoder *cxled)
 	struct cxl_region_context ctx;
 	struct cxl_region_params *p;
 	bool attach = false;
+	bool newly_created = false;
 	int rc;
 
 	ctx = (struct cxl_region_context) {
@@ -3946,15 +3947,23 @@ int cxl_add_to_region(struct cxl_endpoint_decoder *cxled)
 	mutex_lock(&cxlrd->range_lock);
 	struct cxl_region *cxlr __free(put_cxl_region) =
 		cxl_find_region_by_range(cxlrd, &ctx.hpa_range);
-	if (!cxlr)
+	if (!cxlr) {
 		cxlr = construct_region(cxlrd, &ctx);
+		newly_created = !IS_ERR(cxlr);
+	}
 	mutex_unlock(&cxlrd->range_lock);
 
 	rc = PTR_ERR_OR_ZERO(cxlr);
 	if (rc)
 		return rc;
 
-	attach_target(cxlr, cxled, -1, TASK_UNINTERRUPTIBLE);
+	rc = attach_target(cxlr, cxled, -1, TASK_UNINTERRUPTIBLE);
+	if (rc) {
+		/* If endpoint was just created, tear it down to release HPA */
+		if (newly_created)
+			drop_region(cxlrd, cxlr);
+		return rc;
+	}
 
 	scoped_guard(rwsem_read, &cxl_rwsem.region) {
 		p = &cxlr->params;
@@ -3972,7 +3981,7 @@ int cxl_add_to_region(struct cxl_endpoint_decoder *cxled)
 				p->res);
 	}
 
-	return rc;
+	return 0;
 }
 EXPORT_SYMBOL_NS_GPL(cxl_add_to_region, "CXL");
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] cxl/region: skip default driver attach for memdev with attach callbacks
  2026-02-21  4:30 [PATCH 1/2] cxl/region: fix region leak when attach_target fails in cxl_add_to_region Gregory Price
@ 2026-02-21  4:30 ` Gregory Price
  2026-02-21  5:17 ` [PATCH 1/2] cxl/region: fix region leak when attach_target fails in cxl_add_to_region Gregory Price
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Gregory Price @ 2026-02-21  4:30 UTC (permalink / raw)
  To: linux-cxl
  Cc: linux-kernel, kernel-team, dave, jonathan.cameron, dave.jiang,
	alison.schofield, vishal.l.verma, ira.weiny, dan.j.williams

When a CXL memdev has a custom attach callback, cxl_add_to_region()
should not call device_attach() on the auto-discovered region.

The default device_attach() binds the dax driver, which may online
memory via dax_kmem.  The custom attach callback then has to tear down
the dax stack to convert the region to sysram, but dax_kmem refuses to
offline memory during its remove path, leaving regions stuck online.

Skip device_attach() when cxlmd->attach is set.  The attach callback
is responsible for setting up the region after auto-discovery completes
(e.g. adding it as sysram directly).

Signed-off-by: Gregory Price <gourry@gourry.net>
---
 drivers/cxl/core/region.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 276046d49f88..e5edeabd9262 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -3971,6 +3971,12 @@ int cxl_add_to_region(struct cxl_endpoint_decoder *cxled)
 	}
 
 	if (attach) {
+		struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+
+		/* Skip device_attach if memdev has is own attach callback */
+		if (cxlmd->attach)
+			return 0;
+
 		/*
 		 * If device_attach() fails the range may still be active via
 		 * the platform-firmware memory map, otherwise the driver for
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] cxl/region: fix region leak when attach_target fails in cxl_add_to_region
  2026-02-21  4:30 [PATCH 1/2] cxl/region: fix region leak when attach_target fails in cxl_add_to_region Gregory Price
  2026-02-21  4:30 ` [PATCH 2/2] cxl/region: skip default driver attach for memdev with attach callbacks Gregory Price
@ 2026-02-21  5:17 ` Gregory Price
  2026-02-24 16:15   ` Alejandro Lucero Palau
  2026-02-21 10:36 ` kernel test robot
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 11+ messages in thread
From: Gregory Price @ 2026-02-21  5:17 UTC (permalink / raw)
  To: linux-cxl
  Cc: linux-kernel, kernel-team, dave, jonathan.cameron, dave.jiang,
	alison.schofield, vishal.l.verma, ira.weiny, dan.j.williams

On Fri, Feb 20, 2026 at 11:30:12PM -0500, Gregory Price wrote:
> cxl_add_to_region() ignores the return value of attach_target().  When
> attach_target() fails (e.g. cxl_port_setup_targets() returns -ENXIO),
> the auto-discovered region remains registered with its HPA resource
> consumed but never reaches COMMIT state.  Subsequent region creation
> attempts fail with -ENOSPC because the HPA range is already reserved.
> 
> Track whether this call to cxl_add_to_region() created the region, and
> call drop_region() on attach_target() failure to unregister it and
> release the HPA resource.  Pre-existing regions are left alone since
> other endpoints may already be attached.
> 
> Signed-off-by: Gregory Price <gourry@gourry.net>

BAH - disregard this patch, it uses drop_region which is introduced by
Alejandro here:

https://lore.kernel.org/linux-cxl/20260201155438.2664640-20-alejandro.lucero-palau@amd.com/


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] cxl/region: fix region leak when attach_target fails in cxl_add_to_region
  2026-02-21  4:30 [PATCH 1/2] cxl/region: fix region leak when attach_target fails in cxl_add_to_region Gregory Price
  2026-02-21  4:30 ` [PATCH 2/2] cxl/region: skip default driver attach for memdev with attach callbacks Gregory Price
  2026-02-21  5:17 ` [PATCH 1/2] cxl/region: fix region leak when attach_target fails in cxl_add_to_region Gregory Price
@ 2026-02-21 10:36 ` kernel test robot
  2026-02-21 11:49 ` kernel test robot
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: kernel test robot @ 2026-02-21 10:36 UTC (permalink / raw)
  To: Gregory Price, linux-cxl
  Cc: llvm, oe-kbuild-all, linux-kernel, kernel-team, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, dan.j.williams

Hi Gregory,

kernel test robot noticed the following build errors:

[auto build test ERROR on cxl/next]
[also build test ERROR on linus/master next-20260220]
[cannot apply to cxl/pending v6.19]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Gregory-Price/cxl-region-skip-default-driver-attach-for-memdev-with-attach-callbacks/20260221-123300
base:   https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git next
patch link:    https://lore.kernel.org/r/20260221043013.1420169-1-gourry%40gourry.net
patch subject: [PATCH 1/2] cxl/region: fix region leak when attach_target fails in cxl_add_to_region
config: s390-allmodconfig (https://download.01.org/0day-ci/archive/20260221/202602211806.TN3ENfmn-lkp@intel.com/config)
compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260221/202602211806.TN3ENfmn-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202602211806.TN3ENfmn-lkp@intel.com/

All errors (new ones prefixed by >>):

>> drivers/cxl/core/region.c:3964:4: error: call to undeclared function 'drop_region'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
    3964 |                         drop_region(cxlrd, cxlr);
         |                         ^
   1 error generated.


vim +/drop_region +3964 drivers/cxl/core/region.c

  3920	
  3921	int cxl_add_to_region(struct cxl_endpoint_decoder *cxled)
  3922	{
  3923		struct cxl_region_context ctx;
  3924		struct cxl_region_params *p;
  3925		bool attach = false;
  3926		bool newly_created = false;
  3927		int rc;
  3928	
  3929		ctx = (struct cxl_region_context) {
  3930			.cxled = cxled,
  3931			.hpa_range = cxled->cxld.hpa_range,
  3932			.interleave_ways = cxled->cxld.interleave_ways,
  3933			.interleave_granularity = cxled->cxld.interleave_granularity,
  3934		};
  3935	
  3936		struct cxl_root_decoder *cxlrd __free(put_cxl_root_decoder) =
  3937			get_cxl_root_decoder(cxled, &ctx);
  3938	
  3939		if (IS_ERR(cxlrd))
  3940			return PTR_ERR(cxlrd);
  3941	
  3942		/*
  3943		 * Ensure that, if multiple threads race to construct_region()
  3944		 * for the HPA range, one does the construction and the others
  3945		 * add to that.
  3946		 */
  3947		mutex_lock(&cxlrd->range_lock);
  3948		struct cxl_region *cxlr __free(put_cxl_region) =
  3949			cxl_find_region_by_range(cxlrd, &ctx.hpa_range);
  3950		if (!cxlr) {
  3951			cxlr = construct_region(cxlrd, &ctx);
  3952			newly_created = !IS_ERR(cxlr);
  3953		}
  3954		mutex_unlock(&cxlrd->range_lock);
  3955	
  3956		rc = PTR_ERR_OR_ZERO(cxlr);
  3957		if (rc)
  3958			return rc;
  3959	
  3960		rc = attach_target(cxlr, cxled, -1, TASK_UNINTERRUPTIBLE);
  3961		if (rc) {
  3962			/* If endpoint was just created, tear it down to release HPA */
  3963			if (newly_created)
> 3964				drop_region(cxlrd, cxlr);
  3965			return rc;
  3966		}
  3967	
  3968		scoped_guard(rwsem_read, &cxl_rwsem.region) {
  3969			p = &cxlr->params;
  3970			attach = p->state == CXL_CONFIG_COMMIT;
  3971		}
  3972	
  3973		if (attach) {
  3974			/*
  3975			 * If device_attach() fails the range may still be active via
  3976			 * the platform-firmware memory map, otherwise the driver for
  3977			 * regions is local to this file, so driver matching can't fail.
  3978			 */
  3979			if (device_attach(&cxlr->dev) < 0)
  3980				dev_err(&cxlr->dev, "failed to enable, range: %pr\n",
  3981					p->res);
  3982		}
  3983	
  3984		return 0;
  3985	}
  3986	EXPORT_SYMBOL_NS_GPL(cxl_add_to_region, "CXL");
  3987	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] cxl/region: fix region leak when attach_target fails in cxl_add_to_region
  2026-02-21  4:30 [PATCH 1/2] cxl/region: fix region leak when attach_target fails in cxl_add_to_region Gregory Price
                   ` (2 preceding siblings ...)
  2026-02-21 10:36 ` kernel test robot
@ 2026-02-21 11:49 ` kernel test robot
  2026-02-21 11:53 ` kernel test robot
  2026-02-23 19:48 ` Alison Schofield
  5 siblings, 0 replies; 11+ messages in thread
From: kernel test robot @ 2026-02-21 11:49 UTC (permalink / raw)
  To: Gregory Price, linux-cxl
  Cc: oe-kbuild-all, linux-kernel, kernel-team, dave, jonathan.cameron,
	dave.jiang, alison.schofield, vishal.l.verma, ira.weiny,
	dan.j.williams

Hi Gregory,

kernel test robot noticed the following build errors:

[auto build test ERROR on cxl/next]
[also build test ERROR on linus/master next-20260220]
[cannot apply to cxl/pending v6.19]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Gregory-Price/cxl-region-skip-default-driver-attach-for-memdev-with-attach-callbacks/20260221-123300
base:   https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git next
patch link:    https://lore.kernel.org/r/20260221043013.1420169-1-gourry%40gourry.net
patch subject: [PATCH 1/2] cxl/region: fix region leak when attach_target fails in cxl_add_to_region
config: riscv-randconfig-001-20260221 (https://download.01.org/0day-ci/archive/20260221/202602211956.uMrPm1yM-lkp@intel.com/config)
compiler: riscv64-linux-gcc (GCC) 8.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260221/202602211956.uMrPm1yM-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202602211956.uMrPm1yM-lkp@intel.com/

All errors (new ones prefixed by >>):

   drivers/cxl/core/region.c: In function 'cxl_add_to_region':
>> drivers/cxl/core/region.c:3964:4: error: implicit declaration of function 'drop_region'; did you mean 'to_nd_region'? [-Werror=implicit-function-declaration]
       drop_region(cxlrd, cxlr);
       ^~~~~~~~~~~
       to_nd_region
   cc1: some warnings being treated as errors


vim +3964 drivers/cxl/core/region.c

  3920	
  3921	int cxl_add_to_region(struct cxl_endpoint_decoder *cxled)
  3922	{
  3923		struct cxl_region_context ctx;
  3924		struct cxl_region_params *p;
  3925		bool attach = false;
  3926		bool newly_created = false;
  3927		int rc;
  3928	
  3929		ctx = (struct cxl_region_context) {
  3930			.cxled = cxled,
  3931			.hpa_range = cxled->cxld.hpa_range,
  3932			.interleave_ways = cxled->cxld.interleave_ways,
  3933			.interleave_granularity = cxled->cxld.interleave_granularity,
  3934		};
  3935	
  3936		struct cxl_root_decoder *cxlrd __free(put_cxl_root_decoder) =
  3937			get_cxl_root_decoder(cxled, &ctx);
  3938	
  3939		if (IS_ERR(cxlrd))
  3940			return PTR_ERR(cxlrd);
  3941	
  3942		/*
  3943		 * Ensure that, if multiple threads race to construct_region()
  3944		 * for the HPA range, one does the construction and the others
  3945		 * add to that.
  3946		 */
  3947		mutex_lock(&cxlrd->range_lock);
  3948		struct cxl_region *cxlr __free(put_cxl_region) =
  3949			cxl_find_region_by_range(cxlrd, &ctx.hpa_range);
  3950		if (!cxlr) {
  3951			cxlr = construct_region(cxlrd, &ctx);
  3952			newly_created = !IS_ERR(cxlr);
  3953		}
  3954		mutex_unlock(&cxlrd->range_lock);
  3955	
  3956		rc = PTR_ERR_OR_ZERO(cxlr);
  3957		if (rc)
  3958			return rc;
  3959	
  3960		rc = attach_target(cxlr, cxled, -1, TASK_UNINTERRUPTIBLE);
  3961		if (rc) {
  3962			/* If endpoint was just created, tear it down to release HPA */
  3963			if (newly_created)
> 3964				drop_region(cxlrd, cxlr);
  3965			return rc;
  3966		}
  3967	
  3968		scoped_guard(rwsem_read, &cxl_rwsem.region) {
  3969			p = &cxlr->params;
  3970			attach = p->state == CXL_CONFIG_COMMIT;
  3971		}
  3972	
  3973		if (attach) {
  3974			/*
  3975			 * If device_attach() fails the range may still be active via
  3976			 * the platform-firmware memory map, otherwise the driver for
  3977			 * regions is local to this file, so driver matching can't fail.
  3978			 */
  3979			if (device_attach(&cxlr->dev) < 0)
  3980				dev_err(&cxlr->dev, "failed to enable, range: %pr\n",
  3981					p->res);
  3982		}
  3983	
  3984		return 0;
  3985	}
  3986	EXPORT_SYMBOL_NS_GPL(cxl_add_to_region, "CXL");
  3987	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] cxl/region: fix region leak when attach_target fails in cxl_add_to_region
  2026-02-21  4:30 [PATCH 1/2] cxl/region: fix region leak when attach_target fails in cxl_add_to_region Gregory Price
                   ` (3 preceding siblings ...)
  2026-02-21 11:49 ` kernel test robot
@ 2026-02-21 11:53 ` kernel test robot
  2026-02-23 19:48 ` Alison Schofield
  5 siblings, 0 replies; 11+ messages in thread
From: kernel test robot @ 2026-02-21 11:53 UTC (permalink / raw)
  To: Gregory Price, linux-cxl
  Cc: oe-kbuild-all, linux-kernel, kernel-team, dave, jonathan.cameron,
	dave.jiang, alison.schofield, vishal.l.verma, ira.weiny,
	dan.j.williams

Hi Gregory,

kernel test robot noticed the following build errors:

[auto build test ERROR on cxl/next]
[also build test ERROR on linus/master next-20260220]
[cannot apply to cxl/pending v6.19]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Gregory-Price/cxl-region-skip-default-driver-attach-for-memdev-with-attach-callbacks/20260221-123300
base:   https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git next
patch link:    https://lore.kernel.org/r/20260221043013.1420169-1-gourry%40gourry.net
patch subject: [PATCH 1/2] cxl/region: fix region leak when attach_target fails in cxl_add_to_region
config: x86_64-rhel-9.4 (https://download.01.org/0day-ci/archive/20260221/202602211252.7qKUA8jp-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260221/202602211252.7qKUA8jp-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202602211252.7qKUA8jp-lkp@intel.com/

All errors (new ones prefixed by >>):

   drivers/cxl/core/region.c: In function 'cxl_add_to_region':
>> drivers/cxl/core/region.c:3964:25: error: implicit declaration of function 'drop_region' [-Wimplicit-function-declaration]
    3964 |                         drop_region(cxlrd, cxlr);
         |                         ^~~~~~~~~~~


vim +/drop_region +3964 drivers/cxl/core/region.c

  3920	
  3921	int cxl_add_to_region(struct cxl_endpoint_decoder *cxled)
  3922	{
  3923		struct cxl_region_context ctx;
  3924		struct cxl_region_params *p;
  3925		bool attach = false;
  3926		bool newly_created = false;
  3927		int rc;
  3928	
  3929		ctx = (struct cxl_region_context) {
  3930			.cxled = cxled,
  3931			.hpa_range = cxled->cxld.hpa_range,
  3932			.interleave_ways = cxled->cxld.interleave_ways,
  3933			.interleave_granularity = cxled->cxld.interleave_granularity,
  3934		};
  3935	
  3936		struct cxl_root_decoder *cxlrd __free(put_cxl_root_decoder) =
  3937			get_cxl_root_decoder(cxled, &ctx);
  3938	
  3939		if (IS_ERR(cxlrd))
  3940			return PTR_ERR(cxlrd);
  3941	
  3942		/*
  3943		 * Ensure that, if multiple threads race to construct_region()
  3944		 * for the HPA range, one does the construction and the others
  3945		 * add to that.
  3946		 */
  3947		mutex_lock(&cxlrd->range_lock);
  3948		struct cxl_region *cxlr __free(put_cxl_region) =
  3949			cxl_find_region_by_range(cxlrd, &ctx.hpa_range);
  3950		if (!cxlr) {
  3951			cxlr = construct_region(cxlrd, &ctx);
  3952			newly_created = !IS_ERR(cxlr);
  3953		}
  3954		mutex_unlock(&cxlrd->range_lock);
  3955	
  3956		rc = PTR_ERR_OR_ZERO(cxlr);
  3957		if (rc)
  3958			return rc;
  3959	
  3960		rc = attach_target(cxlr, cxled, -1, TASK_UNINTERRUPTIBLE);
  3961		if (rc) {
  3962			/* If endpoint was just created, tear it down to release HPA */
  3963			if (newly_created)
> 3964				drop_region(cxlrd, cxlr);
  3965			return rc;
  3966		}
  3967	
  3968		scoped_guard(rwsem_read, &cxl_rwsem.region) {
  3969			p = &cxlr->params;
  3970			attach = p->state == CXL_CONFIG_COMMIT;
  3971		}
  3972	
  3973		if (attach) {
  3974			/*
  3975			 * If device_attach() fails the range may still be active via
  3976			 * the platform-firmware memory map, otherwise the driver for
  3977			 * regions is local to this file, so driver matching can't fail.
  3978			 */
  3979			if (device_attach(&cxlr->dev) < 0)
  3980				dev_err(&cxlr->dev, "failed to enable, range: %pr\n",
  3981					p->res);
  3982		}
  3983	
  3984		return 0;
  3985	}
  3986	EXPORT_SYMBOL_NS_GPL(cxl_add_to_region, "CXL");
  3987	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] cxl/region: fix region leak when attach_target fails in cxl_add_to_region
  2026-02-21  4:30 [PATCH 1/2] cxl/region: fix region leak when attach_target fails in cxl_add_to_region Gregory Price
                   ` (4 preceding siblings ...)
  2026-02-21 11:53 ` kernel test robot
@ 2026-02-23 19:48 ` Alison Schofield
  2026-02-23 20:15   ` Gregory Price
  5 siblings, 1 reply; 11+ messages in thread
From: Alison Schofield @ 2026-02-23 19:48 UTC (permalink / raw)
  To: Gregory Price
  Cc: linux-cxl, linux-kernel, kernel-team, dave, jonathan.cameron,
	dave.jiang, vishal.l.verma, ira.weiny, dan.j.williams

On Fri, Feb 20, 2026 at 11:30:12PM -0500, Gregory Price wrote:
> cxl_add_to_region() ignores the return value of attach_target().  When
> attach_target() fails (e.g. cxl_port_setup_targets() returns -ENXIO),
> the auto-discovered region remains registered with its HPA resource
> consumed but never reaches COMMIT state.  Subsequent region creation
> attempts fail with -ENOSPC because the HPA range is already reserved.
> 
> Track whether this call to cxl_add_to_region() created the region, and
> call drop_region() on attach_target() failure to unregister it and
> release the HPA resource.  Pre-existing regions are left alone since
> other endpoints may already be attached.

I see you dropping this, perhaps just for the moment, because
the drop_region() you wanted to use is not available yet.

This looks a lot like 
	https://lore.kernel.org/linux-cxl/2a613604c0cdda6d9f838ae9b47ea6d936c5e4ce.1769746294.git.alison.schofield@intel.com/
	cxl/region: Unregister auto-created region when assembly fails
	When auto-created region assembly fails the region remains registered
	but disabled. The region continues to reserve its memory resource,
	preventing DAX from registering the memory.
	Unregister the region on assembly failure to release the resource.

And the review comments on that one, or at least on that thread in
general, was to leave all the broken things in place.
I didn't agree with that, and hope to see this version move ahead
when you have the drop_region you need.

-- Alison






> 
> Signed-off-by: Gregory Price <gourry@gourry.net>
> ---
>  drivers/cxl/core/region.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 385be9cb44cd..276046d49f88 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -3923,6 +3923,7 @@ int cxl_add_to_region(struct cxl_endpoint_decoder *cxled)
>  	struct cxl_region_context ctx;
>  	struct cxl_region_params *p;
>  	bool attach = false;
> +	bool newly_created = false;
>  	int rc;
>  
>  	ctx = (struct cxl_region_context) {
> @@ -3946,15 +3947,23 @@ int cxl_add_to_region(struct cxl_endpoint_decoder *cxled)
>  	mutex_lock(&cxlrd->range_lock);
>  	struct cxl_region *cxlr __free(put_cxl_region) =
>  		cxl_find_region_by_range(cxlrd, &ctx.hpa_range);
> -	if (!cxlr)
> +	if (!cxlr) {
>  		cxlr = construct_region(cxlrd, &ctx);
> +		newly_created = !IS_ERR(cxlr);
> +	}
>  	mutex_unlock(&cxlrd->range_lock);
>  
>  	rc = PTR_ERR_OR_ZERO(cxlr);
>  	if (rc)
>  		return rc;
>  
> -	attach_target(cxlr, cxled, -1, TASK_UNINTERRUPTIBLE);
> +	rc = attach_target(cxlr, cxled, -1, TASK_UNINTERRUPTIBLE);
> +	if (rc) {
> +		/* If endpoint was just created, tear it down to release HPA */
> +		if (newly_created)
> +			drop_region(cxlrd, cxlr);
> +		return rc;
> +	}
>  
>  	scoped_guard(rwsem_read, &cxl_rwsem.region) {
>  		p = &cxlr->params;
> @@ -3972,7 +3981,7 @@ int cxl_add_to_region(struct cxl_endpoint_decoder *cxled)
>  				p->res);
>  	}
>  
> -	return rc;
> +	return 0;
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_add_to_region, "CXL");
>  
> -- 
> 2.47.3
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] cxl/region: fix region leak when attach_target fails in cxl_add_to_region
  2026-02-23 19:48 ` Alison Schofield
@ 2026-02-23 20:15   ` Gregory Price
  2026-02-24  0:18     ` Alison Schofield
  0 siblings, 1 reply; 11+ messages in thread
From: Gregory Price @ 2026-02-23 20:15 UTC (permalink / raw)
  To: Alison Schofield
  Cc: linux-cxl, linux-kernel, kernel-team, dave, jonathan.cameron,
	dave.jiang, vishal.l.verma, ira.weiny, dan.j.williams

On Mon, Feb 23, 2026 at 11:48:42AM -0800, Alison Schofield wrote:
> On Fri, Feb 20, 2026 at 11:30:12PM -0500, Gregory Price wrote:
> > cxl_add_to_region() ignores the return value of attach_target().  When
> > attach_target() fails (e.g. cxl_port_setup_targets() returns -ENXIO),
> > the auto-discovered region remains registered with its HPA resource
> > consumed but never reaches COMMIT state.  Subsequent region creation
> > attempts fail with -ENOSPC because the HPA range is already reserved.
> > 
> > Track whether this call to cxl_add_to_region() created the region, and
> > call drop_region() on attach_target() failure to unregister it and
> > release the HPA resource.  Pre-existing regions are left alone since
> > other endpoints may already be attached.
> 
> I see you dropping this, perhaps just for the moment, because
> the drop_region() you wanted to use is not available yet.
> 

Yeah it's not a particularly useful cleanup in the current
infrastructure because nothing actually uses this pattern (yet).

> This looks a lot like 
> 	https://lore.kernel.org/linux-cxl/2a613604c0cdda6d9f838ae9b47ea6d936c5e4ce.1769746294.git.alison.schofield@intel.com/
> 	cxl/region: Unregister auto-created region when assembly fails
> 	When auto-created region assembly fails the region remains registered
> 	but disabled. The region continues to reserve its memory resource,
> 	preventing DAX from registering the memory.
> 	Unregister the region on assembly failure to release the resource.
> 
> And the review comments on that one, or at least on that thread in
> general, was to leave all the broken things in place.
> I didn't agree with that, and hope to see this version move ahead
> when you have the drop_region you need.
> 
> 

The important note here is the difference between auto-regions and
manually created regions.  For auto-regions, you might have another
endpoint show up looking for the partially created region - and then
just go off and create it anyway because it thinks it was first.

But in my driver, i'm explicitly converting these auto-regions into
other things, and if that fails it causes *all other* region creation to
fail - even if it wasn't actually dependent on that original region.

This is only an issue if you have two devices unbind/bind cycling at
the same time - i.e.

   echo 0000:d0:00.00 > cxl_pci/unbind
   echo 0000:e0:00.00 > cxl_pci/unbind
   echo 0000:d0:00.00 > mydriver/bind
   echo 0000:e0:00.00 > mydriver/bind

If the platform has pre-programmed and locked the decoders, and one of
the two devices fails to probe and leaves a hanging partially
created region, the other device will fail too.

It's a pretty narrow failure scenario.

~Gregory

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] cxl/region: fix region leak when attach_target fails in cxl_add_to_region
  2026-02-23 20:15   ` Gregory Price
@ 2026-02-24  0:18     ` Alison Schofield
  0 siblings, 0 replies; 11+ messages in thread
From: Alison Schofield @ 2026-02-24  0:18 UTC (permalink / raw)
  To: Gregory Price
  Cc: linux-cxl, linux-kernel, kernel-team, dave, jonathan.cameron,
	dave.jiang, vishal.l.verma, ira.weiny, dan.j.williams

On Mon, Feb 23, 2026 at 03:15:16PM -0500, Gregory Price wrote:
> On Mon, Feb 23, 2026 at 11:48:42AM -0800, Alison Schofield wrote:
> > On Fri, Feb 20, 2026 at 11:30:12PM -0500, Gregory Price wrote:
> > > cxl_add_to_region() ignores the return value of attach_target().  When
> > > attach_target() fails (e.g. cxl_port_setup_targets() returns -ENXIO),
> > > the auto-discovered region remains registered with its HPA resource
> > > consumed but never reaches COMMIT state.  Subsequent region creation
> > > attempts fail with -ENOSPC because the HPA range is already reserved.
> > > 
> > > Track whether this call to cxl_add_to_region() created the region, and
> > > call drop_region() on attach_target() failure to unregister it and
> > > release the HPA resource.  Pre-existing regions are left alone since
> > > other endpoints may already be attached.
> > 
> > I see you dropping this, perhaps just for the moment, because
> > the drop_region() you wanted to use is not available yet.
> > 
> 
> Yeah it's not a particularly useful cleanup in the current
> infrastructure because nothing actually uses this pattern (yet).
> 
> > This looks a lot like 
> > 	https://lore.kernel.org/linux-cxl/2a613604c0cdda6d9f838ae9b47ea6d936c5e4ce.1769746294.git.alison.schofield@intel.com/
> > 	cxl/region: Unregister auto-created region when assembly fails
> > 	When auto-created region assembly fails the region remains registered
> > 	but disabled. The region continues to reserve its memory resource,
> > 	preventing DAX from registering the memory.
> > 	Unregister the region on assembly failure to release the resource.
> > 
> > And the review comments on that one, or at least on that thread in
> > general, was to leave all the broken things in place.
> > I didn't agree with that, and hope to see this version move ahead
> > when you have the drop_region you need.
> > 
> > 
> 
> The important note here is the difference between auto-regions and
> manually created regions.  For auto-regions, you might have another
> endpoint show up looking for the partially created region - and then
> just go off and create it anyway because it thinks it was first.

That's by design, and that'll eventually fail too.

But - is see how your case is different. Thanks for the explanation.

> 
> But in my driver, i'm explicitly converting these auto-regions into
> other things, and if that fails it causes *all other* region creation to
> fail - even if it wasn't actually dependent on that original region.
> 
> This is only an issue if you have two devices unbind/bind cycling at
> the same time - i.e.
> 
>    echo 0000:d0:00.00 > cxl_pci/unbind
>    echo 0000:e0:00.00 > cxl_pci/unbind
>    echo 0000:d0:00.00 > mydriver/bind
>    echo 0000:e0:00.00 > mydriver/bind
> 
> If the platform has pre-programmed and locked the decoders, and one of
> the two devices fails to probe and leaves a hanging partially
> created region, the other device will fail too.
> 
> It's a pretty narrow failure scenario.
> 
> ~Gregory
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] cxl/region: fix region leak when attach_target fails in cxl_add_to_region
  2026-02-21  5:17 ` [PATCH 1/2] cxl/region: fix region leak when attach_target fails in cxl_add_to_region Gregory Price
@ 2026-02-24 16:15   ` Alejandro Lucero Palau
  2026-02-25  1:42     ` Gregory Price
  0 siblings, 1 reply; 11+ messages in thread
From: Alejandro Lucero Palau @ 2026-02-24 16:15 UTC (permalink / raw)
  To: Gregory Price, linux-cxl
  Cc: linux-kernel, kernel-team, dave, jonathan.cameron, dave.jiang,
	alison.schofield, vishal.l.verma, ira.weiny, dan.j.williams


On 2/21/26 05:17, Gregory Price wrote:
> On Fri, Feb 20, 2026 at 11:30:12PM -0500, Gregory Price wrote:
>> cxl_add_to_region() ignores the return value of attach_target().  When
>> attach_target() fails (e.g. cxl_port_setup_targets() returns -ENXIO),
>> the auto-discovered region remains registered with its HPA resource
>> consumed but never reaches COMMIT state.  Subsequent region creation
>> attempts fail with -ENOSPC because the HPA range is already reserved.
>>
>> Track whether this call to cxl_add_to_region() created the region, and
>> call drop_region() on attach_target() failure to unregister it and
>> release the HPA resource.  Pre-existing regions are left alone since
>> other endpoints may already be attached.
>>
>> Signed-off-by: Gregory Price <gourry@gourry.net>
> BAH - disregard this patch, it uses drop_region which is introduced by
> Alejandro here:
>
> https://lore.kernel.org/linux-cxl/20260201155438.2664640-20-alejandro.lucero-palau@amd.com/
>
Feel free to add it to this series. I have started to send individual 
series as you know but the part changing the region creation will 
require more work than the already sent.

About this fix, it looks good to me, although I have to admit I'm a bit 
lost after following the discussion Allison points to. If we want to 
keep the state of failure for forensics, not sure if the 
debugging/tracing or default error info in this case will be enough.

In any case:

Reviewed-by: Alejandro Lucero <alucerop@amd.com>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] cxl/region: fix region leak when attach_target fails in cxl_add_to_region
  2026-02-24 16:15   ` Alejandro Lucero Palau
@ 2026-02-25  1:42     ` Gregory Price
  0 siblings, 0 replies; 11+ messages in thread
From: Gregory Price @ 2026-02-25  1:42 UTC (permalink / raw)
  To: Alejandro Lucero Palau
  Cc: linux-cxl, linux-kernel, kernel-team, dave, jonathan.cameron,
	dave.jiang, alison.schofield, vishal.l.verma, ira.weiny,
	dan.j.williams

On Tue, Feb 24, 2026 at 04:15:33PM +0000, Alejandro Lucero Palau wrote:
> 
> On 2/21/26 05:17, Gregory Price wrote:
> > On Fri, Feb 20, 2026 at 11:30:12PM -0500, Gregory Price wrote:
> > > cxl_add_to_region() ignores the return value of attach_target().  When
> > > attach_target() fails (e.g. cxl_port_setup_targets() returns -ENXIO),
> > > the auto-discovered region remains registered with its HPA resource
> > > consumed but never reaches COMMIT state.  Subsequent region creation
> > > attempts fail with -ENOSPC because the HPA range is already reserved.
> > > 
> > > Track whether this call to cxl_add_to_region() created the region, and
> > > call drop_region() on attach_target() failure to unregister it and
> > > release the HPA resource.  Pre-existing regions are left alone since
> > > other endpoints may already be attached.
> > > 
> > > Signed-off-by: Gregory Price <gourry@gourry.net>
> > BAH - disregard this patch, it uses drop_region which is introduced by
> > Alejandro here:
> > 
> > https://lore.kernel.org/linux-cxl/20260201155438.2664640-20-alejandro.lucero-palau@amd.com/
> > 
> Feel free to add it to this series. I have started to send individual series
> as you know but the part changing the region creation will require more work
> than the already sent.
> 
> About this fix, it looks good to me, although I have to admit I'm a bit lost
> after following the discussion Allison points to. If we want to keep the
> state of failure for forensics, not sure if the debugging/tracing or default
> error info in this case will be enough.
> 
> In any case:
> 
> Reviewed-by: Alejandro Lucero <alucerop@amd.com>
> 

Yeah i don't quite follow the want to keep the objects around, it seems
to cause more issues than it solves - but then i also don't think this
is going to be a particularly common problem

~Gregory

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-02-25  1:42 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-21  4:30 [PATCH 1/2] cxl/region: fix region leak when attach_target fails in cxl_add_to_region Gregory Price
2026-02-21  4:30 ` [PATCH 2/2] cxl/region: skip default driver attach for memdev with attach callbacks Gregory Price
2026-02-21  5:17 ` [PATCH 1/2] cxl/region: fix region leak when attach_target fails in cxl_add_to_region Gregory Price
2026-02-24 16:15   ` Alejandro Lucero Palau
2026-02-25  1:42     ` Gregory Price
2026-02-21 10:36 ` kernel test robot
2026-02-21 11:49 ` kernel test robot
2026-02-21 11:53 ` kernel test robot
2026-02-23 19:48 ` Alison Schofield
2026-02-23 20:15   ` Gregory Price
2026-02-24  0:18     ` Alison Schofield

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox