From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Gregory Price <gourry@gourry.net>
Cc: linux-cxl@vger.kernel.org, dan.j.williams@intel.com,
dave.jiang@intel.com, jonathan.cameron@huawei.com,
alison.schofield@intel.com, ira.weiny@intel.com,
dave@stgolabs.net, linux-kernel@vger.kernel.org,
kernel-team@meta.com, vishal.l.verma@intel.com,
benjamin.cheatham@amd.com, David Rientjes <rientjes@google.com>
Subject: Re: cxl/region.c improvements and DAX/Hotplug plumbing
Date: Wed, 18 Mar 2026 09:53:05 +0100 [thread overview]
Message-ID: <217b154d-e4d1-4824-b2e3-fea82bc44402@kernel.org> (raw)
In-Reply-To: <aXLAtVZ_bVwF9nBG@gourry-fedora-PF4VCD3F>
On 1/23/26 01:28, Gregory Price wrote:
> On Thu, Jan 22, 2026 at 11:14:15PM +0100, David Hildenbrand (Red Hat) wrote:
>> Some of that (especially the interaction with core-mm) feels like it would
>> be a good fit to discuss with he wider MM community in one of the bi-weekly
>> mm meeting. (CCing David R.)
>>
>
> There is a Monthly Linux-DAX meeting, and a Monthly Linux-CXL meeting,
> obviously this is a lot of cross-attendance.
>
> Happy to attend additional discussion. I was trying to shore up some of
> the cxl-region plumbing aspects before going wider.
Oh hey, I found an unanswered mail in my inbox :)
Sorry for stumbling over this that late.
>
>>> - hiding memory blocks? (discussed in last meeting)
>>
>> What is that about and what was the result of that discussion? :)
>>
>
> It was just a question as to whether memory blocks are still useful
> if the intent is to provide a collective hotplug interface. I don't
> think there are any real proposals for this, just making note of it.
Okay, thanks.
>
>>> Solution 2: Make a dedicated sysram_region with policy
>>
>> What kind of region would that be?
>
> plumbing between regionN and dax_region kobjects
>
> right now the kobject relationship is:
>
> region0 <- cxl driver created kobject
> └dax_region0 <- default selects IORESOURCE_DAX_KMEM
> └dax0.0 <- auto-probes on discovery
>
> But there is baggage in the existing plumbing:
>
> 1) dax/cxl.c => hard-coded IORESOURCE_DAX_KMEM for dax_region
> 2) dax/bus.c => devdax is probed on discovery w/o manual bind step
> 3) cxl/core/region.c => BIOS-configured CXL regions automatically
> generate a dax_region, and this auto-creates a dax_kmem device
> which is subject to system-wide MHP policy.
>
> This creates a backwards compatibility headache.
Agreed.
>
> The same auto-plumbing is used in the manual creation path, so:
>
> echo regionN > cxl/decoder0.0/create_ram_region
> /* program decoders */
> echo regionN > cxl/drivers/region/bind
>
> will pump the whole thing directly into dax_kmem and auto-online
> according to system default MHP policy. There's no intermediate
> step in which the user can define preferences (unless you add
> them as attributes to regionN - which is another option).
>
> Adding the intermediate object:
>
> regionN
> └sysram_region <- encodes policy like hotplug and dax drv
> └dax_regionN <- which would be passed here on creation
> └dax0.0
>
> lets the cxl-cli command to be more expressive:
> `cxl-cli create-region -t ram --driver=sysram` => kmem
> `cxl-cli create-region -t ram --driver=dax` => device_dax
>
> and would change the sysfs pattern to
> echo regionN > cxl/decoder0.0/create_ram_region
> echo regionN > cxl/drivers/sysram_region/bind
> echo online_movable > cxl/devices/dax_regionN/hotplug
> echo dax_regionN > cxl/drivers/dax_region/bind
>
> and gives the user a chance to configure a policy before the region
> is pumped all the way through to the endpoint dax driver.
Would that still be backwards-compatible?
>>> Solution 2: dedicated sysram_region driver w/ or w/o DAX.
>>> Can support sparseness w/o DAX (see DCD problem)
>>> Could use DAX for tagged DCD regions.
>>> Tradeoff: May duplicate some DAX logic.
>>
>> How would that look like?
>
> For untagged extents w/o dax:
>
> sysram_region->nr_range
> sysram_region->ranges[0 : nr_range-1]
>
> Extents in this list would be hotpluggable individually and
> could be returned to the DCD device individually
>
> sysram_region.c code would call hotplug directly, not via dax.
> - hence, this duplicates some DAX logic
>
> The above just prevents needlessly creating dax-indirection for sysram
> extents with only one destination: add_memory_driver_managed()
>
>
> For tagged extents:
> sysram_region->nr_regions
> sysram_region->dax_regions[0 : nr_regions]
>
> A set of tagged extents would only be hotpluggable as a group
> and could only be returned to the DCD as a group.
>
> it would also expose: dax0.0/uuid <- contains the tag
Interesting.
>
>
> from this you get a cli command like
>
> cxl release-extents regionN [--id=X] [--tag=Y]
>
> translates to something like
>
> echo "release" > regionN/sysram_region/extents/[X,Y]
>
> Something like this.
>
>>>
>>> Solution 4: Prevent non-driver actions from changing state.
>>> Also solves hotplug protection problem (see next)
>>
>> The crucial part is solving what you spelled out in the description: "race
>> conditions". Forbidding someone to re-configure system RAM sounds
>> unnecessary.
>>
>> For example, I use it a lot for testing issues with page migration while
>> offlining memory from ZONE_MOVABLE.
>>
>
> For most use-cases yes. For something like FAMFS (distributed shared
> memory), one system onlining a block as kmem could be potentially
> destructive to an entirely separate physical server.
Right. But shouldn't we fail this already at the add_memory() stage?
Sounds like during onlining is a bit too late. Conceptually, the hotplug
as sysram was already wrong for famfs, or am I wrong?
>
>>> Example: Slow(er) memory
>>> Some memory is "just memory", but might be particularly slow and
>>> intended for use as a filesystem backend or as only a demotion
>>> target. Otherwise its allocated / mapped like any other memory,
>>> but it still required isolation so isolated to the demotion path
>>> and not a fallback allocation target
>>
>> That doesn't quite fit the description of N_PRIVATE_MEMORY, though. Or what
>> am I missing?
>
> I suppose we could also explore a per-node fallback policy to accomplish
> this - but there was also the LPC talk about trying to deprecate that
> entirely.
I'm looking forward to that LPC talk!
--
Cheers,
David
next prev parent reply other threads:[~2026-03-18 8:53 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-21 19:38 cxl/region.c improvements and DAX/Hotplug plumbing Gregory Price
2026-01-22 16:28 ` Gregory Price
2026-01-22 22:14 ` David Hildenbrand (Red Hat)
2026-01-23 0:28 ` Gregory Price
2026-03-18 8:53 ` David Hildenbrand (Arm) [this message]
2026-03-19 15:14 ` Gregory Price
2026-03-19 19:35 ` David Hildenbrand (Arm)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=217b154d-e4d1-4824-b2e3-fea82bc44402@kernel.org \
--to=david@kernel.org \
--cc=alison.schofield@intel.com \
--cc=benjamin.cheatham@amd.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=gourry@gourry.net \
--cc=ira.weiny@intel.com \
--cc=jonathan.cameron@huawei.com \
--cc=kernel-team@meta.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=rientjes@google.com \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox