public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Gregory Price <gourry@gourry.net>
Cc: linux-cxl@vger.kernel.org, dan.j.williams@intel.com,
	dave.jiang@intel.com, jonathan.cameron@huawei.com,
	alison.schofield@intel.com, ira.weiny@intel.com,
	dave@stgolabs.net, linux-kernel@vger.kernel.org,
	kernel-team@meta.com, vishal.l.verma@intel.com,
	benjamin.cheatham@amd.com, David Rientjes <rientjes@google.com>
Subject: Re: cxl/region.c improvements and DAX/Hotplug plumbing
Date: Wed, 18 Mar 2026 09:53:05 +0100	[thread overview]
Message-ID: <217b154d-e4d1-4824-b2e3-fea82bc44402@kernel.org> (raw)
In-Reply-To: <aXLAtVZ_bVwF9nBG@gourry-fedora-PF4VCD3F>

On 1/23/26 01:28, Gregory Price wrote:
> On Thu, Jan 22, 2026 at 11:14:15PM +0100, David Hildenbrand (Red Hat) wrote:
>> Some of that (especially the interaction with core-mm) feels like it would
>> be a good fit to discuss with he wider MM community in one of the bi-weekly
>> mm meeting. (CCing David R.)
>>
> 
> There is a Monthly Linux-DAX meeting, and a Monthly Linux-CXL meeting,
> obviously this is a lot of cross-attendance.
> 
> Happy to attend additional discussion.  I was trying to shore up some of
> the cxl-region plumbing aspects before going wider.

Oh hey, I found an unanswered mail in my inbox :)

Sorry for stumbling over this that late.

> 
>>>        - hiding memory blocks? (discussed in last meeting)
>>
>> What is that about and what was the result of that discussion? :)
>>
> 
> It was just a question as to whether memory blocks are still useful
> if the intent is to provide a collective hotplug interface. I don't
> think there are any real proposals for this, just making note of it.

Okay, thanks.

> 
>>>    Solution 2:  Make a dedicated sysram_region with policy
>>
>> What kind of region would that be?
> 
> plumbing between regionN and dax_region kobjects
> 
> right now the kobject relationship is:
> 
> region0           <- cxl driver created kobject
>   └dax_region0    <- default selects IORESOURCE_DAX_KMEM
>   	└dax0.0   <- auto-probes on discovery
> 
> But there is baggage in the existing plumbing:
> 
> 1) dax/cxl.c =>  hard-coded IORESOURCE_DAX_KMEM for dax_region
> 2) dax/bus.c =>  devdax is probed on discovery w/o manual bind step
> 3) cxl/core/region.c => BIOS-configured CXL regions automatically
>    generate a dax_region, and this auto-creates a dax_kmem device
>    which is subject to system-wide MHP policy.
> 
> This creates a backwards compatibility headache.

Agreed.

> 
> The same auto-plumbing is used in the manual creation path, so:
> 
>    echo regionN > cxl/decoder0.0/create_ram_region
>    /* program decoders */
>    echo regionN > cxl/drivers/region/bind
> 
> will pump the whole thing directly into dax_kmem and auto-online
> according to system default MHP policy.  There's no intermediate
> step in which the user can define preferences (unless you add
> them as attributes to regionN - which is another option).
> 
> Adding the intermediate object:
> 
> regionN
>   └sysram_region      <- encodes policy like hotplug and dax drv
>   	└dax_regionN  <- which would be passed here on creation
> 		└dax0.0
> 
> lets the cxl-cli command to be more expressive:
>    `cxl-cli create-region -t ram --driver=sysram` => kmem
>    `cxl-cli create-region -t ram --driver=dax`    => device_dax
> 
> and would change the sysfs pattern to
> 	echo regionN > cxl/decoder0.0/create_ram_region
> 	echo regionN > cxl/drivers/sysram_region/bind
> 	echo online_movable > cxl/devices/dax_regionN/hotplug
>         echo dax_regionN > cxl/drivers/dax_region/bind
> 
> and gives the user a chance to configure a policy before the region
> is pumped all the way through to the endpoint dax driver.

Would that still be backwards-compatible?


>>>    Solution 2: dedicated sysram_region driver w/ or w/o DAX.
>>>                Can support sparseness w/o DAX (see DCD problem)
>>> 	      Could use DAX for tagged DCD regions.
>>>                Tradeoff: May duplicate some DAX logic.
>>
>> How would that look like?
> 
> For untagged extents w/o dax:
> 
>     sysram_region->nr_range
>     sysram_region->ranges[0 : nr_range-1]
> 
>     Extents in this list would be hotpluggable individually and
>     could be returned to the DCD device individually
> 
>     sysram_region.c code would call hotplug directly, not via dax.
>        - hence, this duplicates some DAX logic
> 
> The above just prevents needlessly creating dax-indirection for sysram
> extents with only one destination:  add_memory_driver_managed()
> 
> 
> For tagged extents:
>     sysram_region->nr_regions
>     sysram_region->dax_regions[0 : nr_regions]
> 
>     A set of tagged extents would only be hotpluggable as a group
>     and could only be returned to the DCD as a group.
> 
>     it would also expose:  dax0.0/uuid  <- contains the tag


Interesting.

> 
> 
> from this you get a cli command like
> 
>     cxl release-extents regionN [--id=X] [--tag=Y]
> 
>          translates to something like
> 
>     echo "release" > regionN/sysram_region/extents/[X,Y]
> 
> Something like this.
> 
>>>
>>>    Solution 4: Prevent non-driver actions from changing state.
>>>                Also solves hotplug protection problem (see next)
>>
>> The crucial part is solving what you spelled out in the description: "race
>> conditions". Forbidding someone to re-configure system RAM sounds
>> unnecessary.
>>
>> For example, I use it a lot for testing issues with page migration while
>> offlining memory from ZONE_MOVABLE.
>>
> 
> For most use-cases yes.  For something like FAMFS (distributed shared
> memory), one system onlining a block as kmem could be potentially
> destructive to an entirely separate physical server.

Right. But shouldn't we fail this already at the add_memory() stage?
Sounds like during onlining is a bit too late. Conceptually, the hotplug
as sysram was already wrong for famfs, or am I wrong?


> 
>>>     Example:  Slow(er) memory
>>>        Some memory is "just memory", but might be particularly slow and
>>>        intended for use as a filesystem backend or as only a demotion
>>>        target.  Otherwise its allocated / mapped like any other memory,
>>>        but it still required isolation so isolated to the demotion path
>>>        and not a fallback allocation target
>>
>> That doesn't quite fit the description of N_PRIVATE_MEMORY, though. Or what
>> am I missing?
> 
> I suppose we could also explore a per-node fallback policy to accomplish
> this - but there was also the LPC talk about trying to deprecate that
> entirely.

I'm looking forward to that LPC talk!

-- 
Cheers,

David

  reply	other threads:[~2026-03-18  8:53 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-21 19:38 cxl/region.c improvements and DAX/Hotplug plumbing Gregory Price
2026-01-22 16:28 ` Gregory Price
2026-01-22 22:14 ` David Hildenbrand (Red Hat)
2026-01-23  0:28   ` Gregory Price
2026-03-18  8:53     ` David Hildenbrand (Arm) [this message]
2026-03-19 15:14       ` Gregory Price
2026-03-19 19:35         ` David Hildenbrand (Arm)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=217b154d-e4d1-4824-b2e3-fea82bc44402@kernel.org \
    --to=david@kernel.org \
    --cc=alison.schofield@intel.com \
    --cc=benjamin.cheatham@amd.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=gourry@gourry.net \
    --cc=ira.weiny@intel.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=kernel-team@meta.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rientjes@google.com \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox