public inbox for linux-pm@vger.kernel.org
 help / color / mirror / Atom feed
From: Alejandro Lucero Palau <alucerop@amd.com>
To: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>,
	linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org,
	nvdimm@lists.linux.dev, linux-fsdevel@vger.kernel.org,
	linux-pm@vger.kernel.org
Cc: Ard Biesheuvel <ardb@kernel.org>,
	Alison Schofield <alison.schofield@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Ira Weiny <ira.weiny@intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Jonathan Cameron <jonathan.cameron@huawei.com>,
	Yazen Ghannam <yazen.ghannam@amd.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
	"Rafael J . Wysocki" <rafael@kernel.org>,
	Len Brown <len.brown@intel.com>, Pavel Machek <pavel@kernel.org>,
	Li Ming <ming.li@zohomail.com>,
	Jeff Johnson <jeff.johnson@oss.qualcomm.com>,
	Ying Huang <huang.ying.caritas@gmail.com>,
	Yao Xingtao <yaoxt.fnst@fujitsu.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Nathan Fontenot <nathan.fontenot@amd.com>,
	Terry Bowman <terry.bowman@amd.com>,
	Robert Richter <rrichter@amd.com>,
	Benjamin Cheatham <benjamin.cheatham@amd.com>,
	Zhijian Li <lizhijian@fujitsu.com>,
	Borislav Petkov <bp@alien8.de>,
	Tomasz Wolski <tomasz.wolski@fujitsu.com>
Subject: Re: [PATCH v5 6/7] dax/hmem, cxl: Defer and resolve ownership of Soft Reserved memory ranges
Date: Fri, 23 Jan 2026 11:59:08 +0000	[thread overview]
Message-ID: <e38625c5-16fd-4fa2-bec0-6773d91fd2b4@amd.com> (raw)
In-Reply-To: <20260122045543.218194-7-Smita.KoralahalliChannabasappa@amd.com>


On 1/22/26 04:55, Smita Koralahalli wrote:
> The current probe time ownership check for Soft Reserved memory based
> solely on CXL window intersection is insufficient. dax_hmem probing is not
> always guaranteed to run after CXL enumeration and region assembly, which
> can lead to incorrect ownership decisions before the CXL stack has
> finished publishing windows and assembling committed regions.
>
> Introduce deferred ownership handling for Soft Reserved ranges that
> intersect CXL windows at probe time by scheduling deferred work from
> dax_hmem and waiting for the CXL stack to complete enumeration and region
> assembly before deciding ownership.
>
> Evaluate ownership of Soft Reserved ranges based on CXL region
> containment.
>
>     - If all Soft Reserved ranges are fully contained within committed CXL
>       regions, DROP handling Soft Reserved ranges from dax_hmem and allow
>       dax_cxl to bind.
>
>     - If any Soft Reserved range is not fully claimed by committed CXL
>       region, tear down all CXL regions and REGISTER the Soft Reserved
>       ranges with dax_hmem instead.


I was not sure if I was understanding this properly, but after looking 
at the code I think I do ... but then I do not understand the reason 
behind. If I'm right, there could be two devices and therefore different 
soft reserved ranges, with one getting an automatic cxl region for all 
the range and the other without that, and the outcome would be the first 
one getting its region removed and added to hmem. Maybe I'm missing 
something obvious but, why? If there is a good reason, I think it should 
be documented in the commit and somewhere else.


I have also problems understanding the concurrency when handling the 
global dax_cxl_mode variable. It is modified inside process_defer_work() 
which I think can have different instances for different devices 
executed concurrently in different cores/workers (the system_wq used is 
not ordered). If I'm right race conditions are likely.


>
> While ownership resolution is pending, gate dax_cxl probing to avoid
> binding prematurely.
>
> This enforces a strict ownership. Either CXL fully claims the Soft
> Reserved ranges or it relinquishes it entirely.
>
> Co-developed-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> ---
>   drivers/cxl/core/region.c | 25 ++++++++++++
>   drivers/cxl/cxl.h         |  2 +
>   drivers/dax/cxl.c         |  9 +++++
>   drivers/dax/hmem/hmem.c   | 81 ++++++++++++++++++++++++++++++++++++++-
>   4 files changed, 115 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 9827a6dd3187..6c22a2d4abbb 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -3875,6 +3875,31 @@ static int cxl_region_debugfs_poison_clear(void *data, u64 offset)
>   DEFINE_DEBUGFS_ATTRIBUTE(cxl_poison_clear_fops, NULL,
>   			 cxl_region_debugfs_poison_clear, "%llx\n");
>   
> +static int cxl_region_teardown_cb(struct device *dev, void *data)
> +{
> +	struct cxl_root_decoder *cxlrd;
> +	struct cxl_region *cxlr;
> +	struct cxl_port *port;
> +
> +	if (!is_cxl_region(dev))
> +		return 0;
> +
> +	cxlr = to_cxl_region(dev);
> +
> +	cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
> +	port = cxlrd_to_port(cxlrd);
> +
> +	devm_release_action(port->uport_dev, unregister_region, cxlr);
> +
> +	return 0;
> +}
> +
> +void cxl_region_teardown_all(void)
> +{
> +	bus_for_each_dev(&cxl_bus_type, NULL, NULL, cxl_region_teardown_cb);
> +}
> +EXPORT_SYMBOL_GPL(cxl_region_teardown_all);
> +
>   static int cxl_region_contains_sr_cb(struct device *dev, void *data)
>   {
>   	struct resource *res = data;
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index b0ff6b65ea0b..1864d35d5f69 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -907,6 +907,7 @@ int cxl_add_to_region(struct cxl_endpoint_decoder *cxled);
>   struct cxl_dax_region *to_cxl_dax_region(struct device *dev);
>   u64 cxl_port_get_spa_cache_alias(struct cxl_port *endpoint, u64 spa);
>   bool cxl_region_contains_soft_reserve(const struct resource *res);
> +void cxl_region_teardown_all(void);
>   #else
>   static inline bool is_cxl_pmem_region(struct device *dev)
>   {
> @@ -933,6 +934,7 @@ static inline bool cxl_region_contains_soft_reserve(const struct resource *res)
>   {
>   	return false;
>   }
> +static inline void cxl_region_teardown_all(void) { }
>   #endif
>   
>   void cxl_endpoint_parse_cdat(struct cxl_port *port);
> diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
> index 13cd94d32ff7..b7e90d6dd888 100644
> --- a/drivers/dax/cxl.c
> +++ b/drivers/dax/cxl.c
> @@ -14,6 +14,15 @@ static int cxl_dax_region_probe(struct device *dev)
>   	struct dax_region *dax_region;
>   	struct dev_dax_data data;
>   
> +	switch (dax_cxl_mode) {
> +	case DAX_CXL_MODE_DEFER:
> +		return -EPROBE_DEFER;
> +	case DAX_CXL_MODE_REGISTER:
> +		return -ENODEV;
> +	case DAX_CXL_MODE_DROP:
> +		break;
> +	}
> +
>   	if (nid == NUMA_NO_NODE)
>   		nid = memory_add_physaddr_to_nid(cxlr_dax->hpa_range.start);
>   
> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
> index 1e3424358490..bcb57d8678d7 100644
> --- a/drivers/dax/hmem/hmem.c
> +++ b/drivers/dax/hmem/hmem.c
> @@ -3,6 +3,7 @@
>   #include <linux/memregion.h>
>   #include <linux/module.h>
>   #include <linux/dax.h>
> +#include "../../cxl/cxl.h"
>   #include "../bus.h"
>   
>   static bool region_idle;
> @@ -58,9 +59,15 @@ static void release_hmem(void *pdev)
>   	platform_device_unregister(pdev);
>   }
>   
> +struct dax_defer_work {
> +	struct platform_device *pdev;
> +	struct work_struct work;
> +};
> +
>   static int hmem_register_device(struct device *host, int target_nid,
>   				const struct resource *res)
>   {
> +	struct dax_defer_work *work = dev_get_drvdata(host);
>   	struct platform_device *pdev;
>   	struct memregion_info info;
>   	long id;
> @@ -69,8 +76,18 @@ static int hmem_register_device(struct device *host, int target_nid,
>   	if (IS_ENABLED(CONFIG_DEV_DAX_CXL) &&
>   	    region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
>   			      IORES_DESC_CXL) != REGION_DISJOINT) {
> -		dev_dbg(host, "deferring range to CXL: %pr\n", res);
> -		return 0;
> +		switch (dax_cxl_mode) {
> +		case DAX_CXL_MODE_DEFER:
> +			dev_dbg(host, "deferring range to CXL: %pr\n", res);
> +			schedule_work(&work->work);
> +			return 0;
> +		case DAX_CXL_MODE_REGISTER:
> +			dev_dbg(host, "registering CXL range: %pr\n", res);
> +			break;
> +		case DAX_CXL_MODE_DROP:
> +			dev_dbg(host, "dropping CXL range: %pr\n", res);
> +			return 0;
> +		}
>   	}
>   
>   	rc = region_intersects_soft_reserve(res->start, resource_size(res));
> @@ -123,8 +140,67 @@ static int hmem_register_device(struct device *host, int target_nid,
>   	return rc;
>   }
>   
> +static int cxl_contains_soft_reserve(struct device *host, int target_nid,
> +				     const struct resource *res)
> +{
> +	if (region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
> +			      IORES_DESC_CXL) != REGION_DISJOINT) {
> +		if (!cxl_region_contains_soft_reserve(res))
> +			return 1;
> +	}
> +
> +	return 0;
> +}
> +
> +static void process_defer_work(struct work_struct *_work)
> +{
> +	struct dax_defer_work *work = container_of(_work, typeof(*work), work);
> +	struct platform_device *pdev = work->pdev;
> +	int rc;
> +
> +	/* relies on cxl_acpi and cxl_pci having had a chance to load */
> +	wait_for_device_probe();
> +
> +	rc = walk_hmem_resources(&pdev->dev, cxl_contains_soft_reserve);
> +
> +	if (!rc) {
> +		dax_cxl_mode = DAX_CXL_MODE_DROP;
> +		rc = bus_rescan_devices(&cxl_bus_type);
> +		if (rc)
> +			dev_warn(&pdev->dev, "CXL bus rescan failed: %d\n", rc);
> +	} else {
> +		dax_cxl_mode = DAX_CXL_MODE_REGISTER;
> +		cxl_region_teardown_all();
> +	}
> +
> +	walk_hmem_resources(&pdev->dev, hmem_register_device);
> +}
> +
> +static void kill_defer_work(void *_work)
> +{
> +	struct dax_defer_work *work = container_of(_work, typeof(*work), work);
> +
> +	cancel_work_sync(&work->work);
> +	kfree(work);
> +}
> +
>   static int dax_hmem_platform_probe(struct platform_device *pdev)
>   {
> +	struct dax_defer_work *work = kzalloc(sizeof(*work), GFP_KERNEL);
> +	int rc;
> +
> +	if (!work)
> +		return -ENOMEM;
> +
> +	work->pdev = pdev;
> +	INIT_WORK(&work->work, process_defer_work);
> +
> +	rc = devm_add_action_or_reset(&pdev->dev, kill_defer_work, work);
> +	if (rc)
> +		return rc;
> +
> +	platform_set_drvdata(pdev, work);
> +
>   	return walk_hmem_resources(&pdev->dev, hmem_register_device);
>   }
>   
> @@ -174,3 +250,4 @@ MODULE_ALIAS("platform:hmem_platform*");
>   MODULE_DESCRIPTION("HMEM DAX: direct access to 'specific purpose' memory");
>   MODULE_LICENSE("GPL v2");
>   MODULE_AUTHOR("Intel Corporation");
> +MODULE_IMPORT_NS("CXL");

  parent reply	other threads:[~2026-01-23 11:59 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-22  4:55 [PATCH v5 0/7] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Smita Koralahalli
2026-01-22  4:55 ` [PATCH v5 1/7] dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved ranges Smita Koralahalli
2026-01-22 16:16   ` Jonathan Cameron
2026-01-22  4:55 ` [PATCH v5 2/7] dax/hmem: Gate Soft Reserved deferral on DEV_DAX_CXL Smita Koralahalli
2026-01-22  4:55 ` [PATCH v5 3/7] cxl/region: Skip decoder reset on detach for autodiscovered regions Smita Koralahalli
2026-01-22 16:18   ` Jonathan Cameron
2026-01-26 21:37     ` Koralahalli Channabasappa, Smita
2026-01-27 23:37       ` dan.j.williams
2026-01-28 15:39         ` Alejandro Lucero Palau
2026-01-28 21:24           ` dan.j.williams
2026-01-23 10:42   ` Alejandro Lucero Palau
2026-01-23 21:58   ` Dave Jiang
2026-01-22  4:55 ` [PATCH v5 4/7] cxl/region: Add helper to check Soft Reserved containment by CXL regions Smita Koralahalli
2026-01-22 16:25   ` Jonathan Cameron
2026-01-27 21:47     ` Koralahalli Channabasappa, Smita
2026-01-23 22:19   ` Dave Jiang
2026-01-25  3:30     ` Koralahalli Channabasappa, Smita
2026-01-27 21:59   ` dan.j.williams
2026-01-28 21:07     ` Koralahalli Channabasappa, Smita
2026-01-28 21:33       ` dan.j.williams
2026-01-22  4:55 ` [PATCH v5 5/7] dax: Introduce dax_cxl_mode for CXL coordination Smita Koralahalli
2026-01-22 16:33   ` Jonathan Cameron
2026-01-23 22:30   ` Dave Jiang
2026-01-27 20:03   ` Alison Schofield
2026-01-22  4:55 ` [PATCH v5 6/7] dax/hmem, cxl: Defer and resolve ownership of Soft Reserved memory ranges Smita Koralahalli
2026-01-22 13:40   ` kernel test robot
2026-01-23  5:30   ` kernel test robot
2026-01-23  6:35   ` Alison Schofield
2026-01-26 21:05     ` Koralahalli Channabasappa, Smita
2026-01-26 22:33       ` Alison Schofield
2026-01-27 21:45         ` Koralahalli Channabasappa, Smita
2026-01-29  0:45           ` dan.j.williams
2026-01-23 11:59   ` Alejandro Lucero Palau [this message]
2026-01-25  3:17     ` Koralahalli Channabasappa, Smita
2026-01-26 12:20       ` Alejandro Lucero Palau
2026-01-26 14:26         ` Alejandro Lucero Palau
2026-01-26 23:53       ` dan.j.williams
2026-01-27 12:16         ` Alejandro Lucero Palau
2025-10-01 17:15           ` Tomasz Wolski
2026-01-27 16:52             ` Alejandro Lucero Palau
2026-01-27 23:41           ` dan.j.williams
2026-01-28 16:19             ` Alejandro Lucero Palau
2026-01-27 21:29         ` Koralahalli Channabasappa, Smita
2026-01-23 22:55   ` Dave Jiang
2026-01-27  1:38   ` Alison Schofield
2026-01-28 21:14     ` Koralahalli Channabasappa, Smita
2026-01-28 21:47       ` Alison Schofield
2026-01-27 20:11   ` Alison Schofield
2026-01-28 23:35   ` dan.j.williams
2026-01-29  3:09     ` dan.j.williams
2026-01-29 21:20     ` Koralahalli Channabasappa, Smita
2026-01-29 22:01       ` dan.j.williams
2026-02-04 23:27         ` Tomasz Wolski
2026-01-22  4:55 ` [PATCH v5 7/7] dax/hmem: Reintroduce Soft Reserved ranges back into the iomem tree Smita Koralahalli
2026-01-22 16:39   ` Jonathan Cameron
2026-01-28 22:07     ` Koralahalli Channabasappa, Smita

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e38625c5-16fd-4fa2-bec0-6773d91fd2b4@amd.com \
    --to=alucerop@amd.com \
    --cc=Smita.KoralahalliChannabasappa@amd.com \
    --cc=alison.schofield@intel.com \
    --cc=ardb@kernel.org \
    --cc=benjamin.cheatham@amd.com \
    --cc=bp@alien8.de \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=huang.ying.caritas@gmail.com \
    --cc=ira.weiny@intel.com \
    --cc=jack@suse.cz \
    --cc=jeff.johnson@oss.qualcomm.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=len.brown@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=lizhijian@fujitsu.com \
    --cc=ming.li@zohomail.com \
    --cc=nathan.fontenot@amd.com \
    --cc=nvdimm@lists.linux.dev \
    --cc=pavel@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=rrichter@amd.com \
    --cc=terry.bowman@amd.com \
    --cc=tomasz.wolski@fujitsu.com \
    --cc=vishal.l.verma@intel.com \
    --cc=willy@infradead.org \
    --cc=yaoxt.fnst@fujitsu.com \
    --cc=yazen.ghannam@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox