Re: [LSF/MM/BPF TOPIC] Strategies for memory deallocation/movement for Dynamic Capacity Pooling

public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed

From: Gregory Price <gourry@gourry.net>
To: Hannes Reinecke <hare@suse.de>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>,
	lsf-pc <lsf-pc@lists.linux-foundation.org>,
	linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [LSF/MM/BPF TOPIC] Strategies for memory deallocation/movement for Dynamic Capacity Pooling
Date: Tue, 14 Apr 2026 20:26:59 -0400	[thread overview]
Message-ID: <ad7bU9Guk_csFsEG@gourry-fedora-PF4VCD3F> (raw)
In-Reply-To: <38952332-dad4-4d17-a9c1-5c25d79f67b4@suse.de>

On Tue, Apr 14, 2026 at 09:08:22AM +0200, Hannes Reinecke wrote:
> On 4/13/26 23:10, Gregory Price wrote:
> > On Mon, Apr 13, 2026 at 04:43:59PM +0100, Jonathan Cameron wrote:
> > > > 
> > > > So quite some things to discuss; however, not sure if this isn't too
> > > > much of an arcane topic which should rather be directed at places like
> > > > LPC. But I'll let the PC decide.
> > > 
> > > Superficially feels a bit arcane, particularly as we are currently
> > > kicking untagged memory into the long grass as there are too many
> > > open questions on how to present it at all (e.g. related to Gregory's
> > > recent work on private nodes).  On recent CXL sync calls the proposal
> > > has been to do tagged memory first and only support allocation of
> > > all memory with a given tag in one go and full release.
> > > 
> > 
> > General consensus after last few months seems to be:
> > 
> > "While technically possible, untagged memory a bad idea for $REASONS"
> > 
> > I do not thing the private node case changes this, if anything it only
> > changes where the capacity ends up.
> > 
> Thing is, there will be things like CXL switches. And with that we'll get
> CXL memory behind the switch, making it possible to reshuffle memory
> 'behind the back' of the application.
> While the situation is similar to the current memory hotplug case
> (and, in fact, the mechanism on the host side will be the same I guess),
> the problem is now that we have a bit more flexibility.
> 
> The reason why one would want to reshuffle memory behind a CXL switch
> is to deallocate memory from one machine to reassign it to another
> machine. But as the request is just for 'memory' (not 'this particular
> CXL card holding _that_ memory'), the admin gets to decide _which_
> of the memory areas assigned to machine A should be moved to machine B.
> But how?
> 
> And that basically is the question: Can we get the admin / orchestration
> a better idea which of the memory blocks should be preferred for
> reassignment?
> I'm sure there are applications which have a pretty flexible memory
> allocation strategy which, with some prodding, they would be happy to
> relinquish. But I'm equally sure there are applications which react
> extremely allergic to memory being pulled of underneath them.
> And then there are 'modern' applications, which also don't like that
> but for them it really doesn't matter as one can simply restart them.
> 
> So it would be cool if we could address this, as then the admin
> /orchastration can make a far better choice which memory area to
> reassign.
> And it might even help in other scenarios (VM ballooning?), too.
> 

I'm a little confused by how you imagine this memory actually gets used.

  1) Are you hotplugging directly into the buddy as a normal NUMA node
     and letting the kernel dole out allocations to anything?
     - i.e.: existing add_memory_driver_managed() interface

  2) Are you trying to plop the entire dynamically added extent into a
     a specific workload?  Something like ioremap/mremap or ZONE_DEVICE
     exposed by a driver's /dev/fd ?

  3) Are you reserving this region specifically for in-kernel/driver
     use but not doled out to random users?

  4) Are you trying to just plop an entire extent into a VM (in which
     case you technically shouldn't even need to hotplug, in theory)?

  5) Are you trying to just decide which memory to release based on how
     much of it is used / hot / cold / etc?

I see lot of "Wondering if..." here based on what a switch COULD do, but
divorced from real use cases, 99.999% of what COULD be done is useless.

There are basically an infinite number of ways we should shuffle this
memory around - the actual question is what's useful?

Some use-case clarity here would be helpful.

~Gregory

     prev parent reply	other threads:[~2026-04-15  0:27 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-30  7:59 [LSF/MM/BPF TOPIC] Strategies for memory deallocation/movement for Dynamic Capacity Pooling Hannes Reinecke
2026-04-13 15:43 ` Jonathan Cameron
2026-04-13 21:10   ` Gregory Price
2026-04-14  7:08     ` Hannes Reinecke
2026-04-15  0:26       ` Gregory Price [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ad7bU9Guk_csFsEG@gourry-fedora-PF4VCD3F \
    --to=gourry@gourry.net \
    --cc=hare@suse.de \
    --cc=jonathan.cameron@huawei.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox