public inbox for linux-pm@vger.kernel.org
 help / color / mirror / Atom feed
From: <dan.j.williams@intel.com>
To: "Koralahalli Channabasappa, Smita" <skoralah@amd.com>,
	<dan.j.williams@intel.com>,
	Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>,
	<linux-cxl@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<nvdimm@lists.linux.dev>, <linux-fsdevel@vger.kernel.org>,
	<linux-pm@vger.kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>,
	Alison Schofield <alison.schofield@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	"Ira Weiny" <ira.weiny@intel.com>,
	Jonathan Cameron <jonathan.cameron@huawei.com>,
	Yazen Ghannam <yazen.ghannam@amd.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
	"Rafael J . Wysocki" <rafael@kernel.org>,
	Len Brown <len.brown@intel.com>, Pavel Machek <pavel@kernel.org>,
	Li Ming <ming.li@zohomail.com>,
	Jeff Johnson <jeff.johnson@oss.qualcomm.com>,
	"Ying Huang" <huang.ying.caritas@gmail.com>,
	Yao Xingtao <yaoxt.fnst@fujitsu.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Nathan Fontenot <nathan.fontenot@amd.com>,
	Terry Bowman <terry.bowman@amd.com>,
	Robert Richter <rrichter@amd.com>,
	Benjamin Cheatham <benjamin.cheatham@amd.com>,
	Zhijian Li <lizhijian@fujitsu.com>,
	Borislav Petkov <bp@alien8.de>,
	Tomasz Wolski <tomasz.wolski@fujitsu.com>
Subject: Re: [PATCH v5 6/7] dax/hmem, cxl: Defer and resolve ownership of Soft Reserved memory ranges
Date: Thu, 29 Jan 2026 14:01:27 -0800	[thread overview]
Message-ID: <697bd8b7fb6f_1d6f100e9@dwillia2-mobl4.notmuch> (raw)
In-Reply-To: <b137dd39-dcf6-4203-adab-8c9ee2b3e6ef@amd.com>

Koralahalli Channabasappa, Smita wrote:
[..]
> > I was thinking through what Alison asked about what to do later in boot
> > when other regions are being dynamically created. It made me wonder if
> > this safety can be achieved more easily by just making sure that the
> > alloc_dax_region() call fails.
> 
> Agreed with all the points above, including making alloc_dax_region() 
> fail as the safety mechanism. This also cleanly avoids the no Soft 
> Reserved case Alison pointed out, where dax_cxl_mode can remain stuck in 
> DEFER and return -EPROBE_DEFER.
> 
> What I’m still trying to understand is the case of “other regions being 
> dynamically created.” Once HMEM has claimed the relevant HPA range, any 
> later userspace attempts to create regions (via cxl create-region) 
> should naturally fail due to the existing HPA allocation. This already 
> shows up as an HPA allocation failure currently.
> 
> #cxl create-region -d decoder0.0 -m mem2 -w 1 -g256
> cxl region: create_region: region0: set_size failed: Numerical result 
> out of range
> cxl region: cmd_create_region: created 0 regions
> 
> And in the dmesg:
> [  466.819353] alloc_hpa: cxl region0: HPA allocation error (-34) for 
> size:0x0000002000000000 in CXL Window 0 [mem 0x850000000-0x284fffffff 
> flags 0x200]
> 
> Also, at this point, with the probe-ordering fixes and the use of 
> wait_for_device_probe(), region probing should have fully completed.
> 
> Am I missing any other scenario where regions could still be created 
> dynamically beyond this?

The concern is what to do about regions and memory devices that are
completely innocent. So, for example imagine deviceA is handled by BIOS
and deviceB is ignored by BIOS. If deviceB was ignored by BIOS then it
would be rude to tear down any regions that might be established for
deviceB. So if alloc_dax_region() exclusion and HPA space reservation
prevent future collisions while not disturbing innocent devices, then I
think userspace can pick up the pieces from there.

> > Something like (untested / incomplete, needs cleanup handling!)
> > 
> > diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> > index fde29e0ad68b..fd18343e0538 100644
> > --- a/drivers/dax/bus.c
> > +++ b/drivers/dax/bus.c
> > @@ -10,6 +10,7 @@
> >   #include "dax-private.h"
> >   #include "bus.h"
> >   
> > +static struct resource dax_regions = DEFINE_RES_MEM_NAMED(0, -1, "DAX Regions");
> >   static DEFINE_MUTEX(dax_bus_lock);
> >   
> >   /*
> > @@ -661,11 +662,7 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id,
> >          dax_region->dev = parent;
> >          dax_region->target_node = target_node;
> >          ida_init(&dax_region->ida);
> > -       dax_region->res = (struct resource) {
> > -               .start = range->start,
> > -               .end = range->end,
> > -               .flags = IORESOURCE_MEM | flags,
> > -       };
> > +       dax_region->res = __request_region(&dax_regions, range->start, range->end, flags);
> >   
> >          if (sysfs_create_groups(&parent->kobj, dax_region_attribute_groups)) {
> >                  kfree(dax_region);
> > 
> > ...which will result in enforcing only one of dax_hmem or dax_cxl being
> > able to register a dax_region.
> > 
> > Yes, this would leave a mess of disabled cxl_dax_region devices lying
> > around, but it would leave more breadcrumbs for debug, and reduce the
> > number of races you need to worry about.
> > 
> > In other words, I thought total teardown would be simpler, but as the
> > feedback keeps coming in, I think that brings a different set of
> > complexity. So just inject failures for dax_cxl to trip over and then we
> > can go further later to effect total teardown if that proves to not be
> > enough.
> 
> One concern with the approach of not tearing down CXL regions is the 
> state it leaves behind in /proc/iomem. Soft Reserved ranges are 
> REGISTERed to HMEM while CXL regions remain present. The resulting 
> nesting (dax under region, region under window and window under SR) 
> visually suggests a coherent CXL hierarchy, even though ownership has 
> effectively moved to HMEM. When users, then attempt to tear regions down 
> and recreate them from userspace, they hit the same HPA allocation 
> failures described above.

So this gets back to a question of do we really need "Soft Reserved" to
show up in /proc/iomem? It is an ABI change to stop publishing it
altogether, so at a minimum we need to be prepared to keep publishing it
if it causes someone's working setup to regress.

The current state of the for-7.0/cxl-init branch drops publishing "Soft
Reserved". I am cautiously optimistic no one notices as long as DAX
devices keep appearing, but at the first sign of regression we need a
plan B.

> If we decide not to tear down regions in the REGISTER case, should we 
> gate decoder resets during user initiated region teardown? Today, 
> decoders are reset when regions are torn down dynamically, and 
> subsequent attempts to recreate regions can trigger a large amount of 
> mailbox traffic. Much of what shows up as repeated “Reading event logs/ 
> Clearing …” messages which ends up interleaved with the HPA allocation 
> failure, which can be confusing.

One of the nice side effects of installing the "Soft Reserved" entries
late, when HMEM takes over, is that they are easier to remove.

So the flow would be, if you know what you are doing, is to disable the
HMEM device which uninstalls the "Soft Reserved" entries, before trying
to decommit the region and reclaim the HPA space.

  reply	other threads:[~2026-01-29 22:01 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-22  4:55 [PATCH v5 0/7] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL and HMEM Smita Koralahalli
2026-01-22  4:55 ` [PATCH v5 1/7] dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved ranges Smita Koralahalli
2026-01-22 16:16   ` Jonathan Cameron
2026-01-22  4:55 ` [PATCH v5 2/7] dax/hmem: Gate Soft Reserved deferral on DEV_DAX_CXL Smita Koralahalli
2026-01-22  4:55 ` [PATCH v5 3/7] cxl/region: Skip decoder reset on detach for autodiscovered regions Smita Koralahalli
2026-01-22 16:18   ` Jonathan Cameron
2026-01-26 21:37     ` Koralahalli Channabasappa, Smita
2026-01-27 23:37       ` dan.j.williams
2026-01-28 15:39         ` Alejandro Lucero Palau
2026-01-28 21:24           ` dan.j.williams
2026-01-23 10:42   ` Alejandro Lucero Palau
2026-01-23 21:58   ` Dave Jiang
2026-01-22  4:55 ` [PATCH v5 4/7] cxl/region: Add helper to check Soft Reserved containment by CXL regions Smita Koralahalli
2026-01-22 16:25   ` Jonathan Cameron
2026-01-27 21:47     ` Koralahalli Channabasappa, Smita
2026-01-23 22:19   ` Dave Jiang
2026-01-25  3:30     ` Koralahalli Channabasappa, Smita
2026-01-27 21:59   ` dan.j.williams
2026-01-28 21:07     ` Koralahalli Channabasappa, Smita
2026-01-28 21:33       ` dan.j.williams
2026-01-22  4:55 ` [PATCH v5 5/7] dax: Introduce dax_cxl_mode for CXL coordination Smita Koralahalli
2026-01-22 16:33   ` Jonathan Cameron
2026-01-23 22:30   ` Dave Jiang
2026-01-27 20:03   ` Alison Schofield
2026-01-22  4:55 ` [PATCH v5 6/7] dax/hmem, cxl: Defer and resolve ownership of Soft Reserved memory ranges Smita Koralahalli
2026-01-22 13:40   ` kernel test robot
2026-01-23  5:30   ` kernel test robot
2026-01-23  6:35   ` Alison Schofield
2026-01-26 21:05     ` Koralahalli Channabasappa, Smita
2026-01-26 22:33       ` Alison Schofield
2026-01-27 21:45         ` Koralahalli Channabasappa, Smita
2026-01-29  0:45           ` dan.j.williams
2026-01-23 11:59   ` Alejandro Lucero Palau
2026-01-25  3:17     ` Koralahalli Channabasappa, Smita
2026-01-26 12:20       ` Alejandro Lucero Palau
2026-01-26 14:26         ` Alejandro Lucero Palau
2026-01-26 23:53       ` dan.j.williams
2026-01-27 12:16         ` Alejandro Lucero Palau
2025-10-01 17:15           ` Tomasz Wolski
2026-01-27 16:52             ` Alejandro Lucero Palau
2026-01-27 23:41           ` dan.j.williams
2026-01-28 16:19             ` Alejandro Lucero Palau
2026-01-27 21:29         ` Koralahalli Channabasappa, Smita
2026-01-23 22:55   ` Dave Jiang
2026-01-27  1:38   ` Alison Schofield
2026-01-28 21:14     ` Koralahalli Channabasappa, Smita
2026-01-28 21:47       ` Alison Schofield
2026-01-27 20:11   ` Alison Schofield
2026-01-28 23:35   ` dan.j.williams
2026-01-29  3:09     ` dan.j.williams
2026-01-29 21:20     ` Koralahalli Channabasappa, Smita
2026-01-29 22:01       ` dan.j.williams [this message]
2026-02-04 23:27         ` Tomasz Wolski
2026-01-22  4:55 ` [PATCH v5 7/7] dax/hmem: Reintroduce Soft Reserved ranges back into the iomem tree Smita Koralahalli
2026-01-22 16:39   ` Jonathan Cameron
2026-01-28 22:07     ` Koralahalli Channabasappa, Smita

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=697bd8b7fb6f_1d6f100e9@dwillia2-mobl4.notmuch \
    --to=dan.j.williams@intel.com \
    --cc=Smita.KoralahalliChannabasappa@amd.com \
    --cc=alison.schofield@intel.com \
    --cc=ardb@kernel.org \
    --cc=benjamin.cheatham@amd.com \
    --cc=bp@alien8.de \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=huang.ying.caritas@gmail.com \
    --cc=ira.weiny@intel.com \
    --cc=jack@suse.cz \
    --cc=jeff.johnson@oss.qualcomm.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=len.brown@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=lizhijian@fujitsu.com \
    --cc=ming.li@zohomail.com \
    --cc=nathan.fontenot@amd.com \
    --cc=nvdimm@lists.linux.dev \
    --cc=pavel@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=rrichter@amd.com \
    --cc=skoralah@amd.com \
    --cc=terry.bowman@amd.com \
    --cc=tomasz.wolski@fujitsu.com \
    --cc=vishal.l.verma@intel.com \
    --cc=willy@infradead.org \
    --cc=yaoxt.fnst@fujitsu.com \
    --cc=yazen.ghannam@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox