All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yury Norov <ynorov@nvidia.com>
To: Gregory Price <gourry@gourry.net>
Cc: Balbir Singh <balbirs@nvidia.com>,
	linux-mm@kvack.org, cgroups@vger.kernel.org,
	linux-cxl@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	kernel-team@meta.com, longman@redhat.com, tj@kernel.org,
	hannes@cmpxchg.org, mkoutny@suse.com, corbet@lwn.net,
	gregkh@linuxfoundation.org, rafael@kernel.org, dakr@kernel.org,
	dave@stgolabs.net, jonathan.cameron@huawei.com,
	dave.jiang@intel.com, alison.schofield@intel.com,
	vishal.l.verma@intel.com, ira.weiny@intel.com,
	dan.j.williams@intel.com, akpm@linux-foundation.org,
	vbabka@suse.cz, surenb@google.com, mhocko@suse.com,
	jackmanb@google.com, ziy@nvidia.com, david@kernel.org,
	lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com,
	rppt@kernel.org, axelrasmussen@google.com, yuanchu@google.com,
	weixugc@google.com, yury.norov@gmail.com,
	linux@rasmusvillemoes.dk, rientjes@google.com,
	shakeel.butt@linux.dev, chrisl@kernel.org, kasong@tencent.com,
	shikemeng@huaweicloud.com, nphamcs@gmail.com, bhe@redhat.com,
	baohua@kernel.org, yosry.ahmed@linux.dev,
	chengming.zhou@linux.dev, roman.gushchin@linux.dev,
	muchun.song@linux.dev, osalvador@suse.de,
	matthew.brost@intel.com, joshua.hahnjy@gmail.com,
	rakie.kim@sk.com, byungchul@sk.com, ying.huang@linux.alibaba.com,
	apopple@nvidia.com, cl@gentwo.org, harry.yoo@oracle.com,
	zhengqi.arch@bytedance.com
Subject: Re: [RFC PATCH v3 0/8] mm,numa: N_PRIVATE node isolation for device-managed memory
Date: Mon, 12 Jan 2026 12:18:40 -0500	[thread overview]
Message-ID: <aWUs8Fx2CG07F81e@yury> (raw)
In-Reply-To: <aWUHAboKw28XepWr@gourry-fedora-PF4VCD3F>

On Mon, Jan 12, 2026 at 09:36:49AM -0500, Gregory Price wrote:
> On Mon, Jan 12, 2026 at 10:12:23PM +1100, Balbir Singh wrote:
> > On 1/9/26 06:37, Gregory Price wrote:
> > > This series introduces N_PRIVATE, a new node state for memory nodes 
> > > whose memory is not intended for general system consumption.  Today,
> > > device drivers (CXL, accelerators, etc.) hotplug their memory to access
> > > mm/ services like page allocation and reclaim, but this exposes general
> > > workloads to memory with different characteristics and reliability
> > > guarantees than system RAM.
> > > 
> > > N_PRIVATE provides isolation by default while enabling explicit access
> > > via __GFP_THISNODE for subsystems that understand how to manage these
> > > specialized memory regions.
> > > 
> > 
> > I assume each class of N_PRIVATE is a separate set of NUMA nodes, these
> > could be real or virtual memory nodes?
> >
> 
> This has the the topic of a long, long discussion on the CXL discord -
> how do we get extra nodes if we intend to make HPA space flexibly
> configurable by "intended use".
> 
> tl;dr:  open to discussion.  As of right now, there's no way (that I
> know of) to allocate additional NUMA nodes at boot without having some
> indication that one is needed in the ACPI table (srat touches a PXM, or
> CEDT defines a region not present in SRAT).
> 
> Best idea we have right now is to have a build config that reserves some
> extra nodes which can be used later (they're in N_POSSIBLE but otherwise
> not used by anything).
> 
> > > Design
> > > ======
> > > 
> > > The series introduces:
> > > 
> > >   1. N_PRIVATE node state (mutually exclusive with N_MEMORY)
> > 
> > We should call it N_PRIVATE_MEMORY
> >
> 
> Dan Williams convinced me to go with N_PRIVATE, but this is really a
> bikeshed topic

No it's not. To me (OK, an almost random reader in this discussion),
N_PRIVATE is a pretty confusing name. It doesn't answer the question:
private what? N_PRIVATE_MEMORY is better in that department, isn't?

But taking into account isolcpus, maybe N_ISOLMEM?

> - we could call it N_BOBERT until we find consensus.

Please give it the right name well describing the scope and purpose of
the new restriction policy before moving forward.
 
> > >   enum private_memtype {
> > >       NODE_MEM_NOTYPE,      /* No type assigned (invalid state) */
> > >       NODE_MEM_ZSWAP,       /* Swap compression target */
> > >       NODE_MEM_COMPRESSED,  /* General compressed RAM */
> > >       NODE_MEM_ACCELERATOR, /* Accelerator-attached memory */
> > >       NODE_MEM_DEMOTE_ONLY, /* Memory-tier demotion target only */
> > >       NODE_MAX_MEMTYPE,
> > >   };
> > > 
> > > These types serve as policy hints for subsystems:
> > > 
> > 
> > Do these nodes have fallback(s)? Are these nodes prone to OOM when memory is exhausted
> > in one class of N_PRIVATE node(s)?
> > 
> 
> Right now, these nodes do not have fallbacks, and even if they did the
> use of __GFP_THISNODE would prevent this.  That's intended.
> 
> In theory you could have nodes of similar types fall back to each other,
> but that feels like increased complexity for questionable value.  The
> service requested __GFP_THISNODE should be aware that it needs to manage
> fallback.

Yeah, and most GFP_THISNODE users also pass GFP_NOWARN, which makes it
looking more like an emergency feature. Maybe add a symmetric GFP_PRIVATE
flag that would allow for more flexibility, and highlight the intention
better?

> > What about page cache allocation form these nodes? Since default allocations
> > never use them, a file system would need to do additional work to allocate
> > on them, if there was ever a desire to use them. 
> 
> Yes, in-fact that is the intent.  Anything requesting memory from these
> nodes would need to be aware of how to manage them.
> 
> Similar to ZONE_DEVICE memory - which is wholly unmanaged by the page

This is quite opposite to what you are saying in the motivation
section:

  Several emerging memory technologies require kernel memory management
  services but should not be used for general allocations

So, is it completely unmanaged node, or only general allocation isolated?

Thanks,
Yury

> allocator.  There's potential for re-using some of the ZONE_DEVICE or
> HMM callback infrastructure to implement the callbacks for N_PRIVATE
> instead of re-inventing it.
> 
> > Would memory
> > migration would work between N_PRIVATE and N_MEMORY using move_pages()?
> > 
> 
> N_PRIVATE -> N_MEMORY would probably be easy and trivial, but could also
> be a controllable bit.
> 
> A side-discussion not present in these notes has been whether memtype
> should be an enum or a bitfield.
> 
> N_MEMORY -> N_PRIVATE via migrate.c would probably require some changes
> to migration_target_control and the alloc callback (in vmscan.c, see
> alloc_migrate_folio) would need to be N_PRIVATE aware.
> 
> 
> Thanks for taking a look,
> ~Gregory

  reply	other threads:[~2026-01-12 17:18 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-08 20:37 [RFC PATCH v3 0/8] mm,numa: N_PRIVATE node isolation for device-managed memory Gregory Price
2026-01-08 20:37 ` [RFC PATCH v3 1/8] numa,memory_hotplug: create N_PRIVATE (Private Nodes) Gregory Price
2026-01-08 20:37 ` [RFC PATCH v3 2/8] mm: constify oom_control, scan_control, and alloc_context nodemask Gregory Price
2026-01-08 20:37 ` [RFC PATCH v3 3/8] mm: restrict slub, compaction, and page_alloc to sysram Gregory Price
2026-01-08 20:37 ` [RFC PATCH v3 4/8] cpuset: introduce cpuset.mems.sysram Gregory Price
2026-01-12 17:56   ` Yury Norov
2026-01-08 20:37 ` [RFC PATCH v3 5/8] Documentation/admin-guide/cgroups: update docs for mems_allowed Gregory Price
2026-01-12 14:30   ` Michal Koutný
2026-01-12 15:25     ` Gregory Price
2026-01-08 20:37 ` [RFC PATCH v3 6/8] drivers/cxl/core/region: add private_region Gregory Price
2026-01-08 20:37 ` [RFC PATCH v3 7/8] mm/zswap: compressed ram direct integration Gregory Price
2026-01-09 16:00   ` Yosry Ahmed
2026-01-09 17:03     ` Gregory Price
2026-01-09 21:40     ` Gregory Price
2026-01-12 21:13       ` Yosry Ahmed
2026-01-12 23:33         ` Gregory Price
2026-01-12 23:46           ` Gregory Price
2026-01-13 16:24           ` Jonathan Cameron
2026-01-15 16:55           ` Yosry Ahmed
2026-01-15 17:26             ` Gregory Price
2026-01-15 22:09               ` Yosry Ahmed
2026-01-13  7:35         ` Nhat Pham
2026-01-13  7:49           ` Nhat Pham
2026-01-15 17:00             ` Yosry Ahmed
2026-01-15 17:32               ` Gregory Price
2026-01-08 20:37 ` [RFC PATCH v3 8/8] drivers/cxl: add zswap private_region type Gregory Price
2026-01-12 11:12 ` [RFC PATCH v3 0/8] mm,numa: N_PRIVATE node isolation for device-managed memory Balbir Singh
2026-01-12 14:36   ` Gregory Price
2026-01-12 17:18     ` Yury Norov [this message]
2026-01-12 17:36       ` Gregory Price
2026-01-12 21:24       ` dan.j.williams
2026-01-12 21:57         ` Balbir Singh
2026-01-12 22:10           ` dan.j.williams
2026-01-12 22:54             ` Balbir Singh
2026-01-12 23:40               ` Gregory Price
2026-01-13  1:12                 ` Balbir Singh
2026-01-13  1:17                 ` dan.j.williams
2026-01-13  2:30                   ` Gregory Price
2026-01-13  3:12                     ` dan.j.williams
2026-01-13 14:15                       ` Gregory Price
2026-01-13  3:24                     ` Balbir Singh
2026-01-13 14:21                       ` Gregory Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aWUs8Fx2CG07F81e@yury \
    --to=ynorov@nvidia.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=alison.schofield@intel.com \
    --cc=apopple@nvidia.com \
    --cc=axelrasmussen@google.com \
    --cc=balbirs@nvidia.com \
    --cc=baohua@kernel.org \
    --cc=bhe@redhat.com \
    --cc=byungchul@sk.com \
    --cc=cgroups@vger.kernel.org \
    --cc=chengming.zhou@linux.dev \
    --cc=chrisl@kernel.org \
    --cc=cl@gentwo.org \
    --cc=corbet@lwn.net \
    --cc=dakr@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=david@kernel.org \
    --cc=gourry@gourry.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=harry.yoo@oracle.com \
    --cc=ira.weiny@intel.com \
    --cc=jackmanb@google.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kasong@tencent.com \
    --cc=kernel-team@meta.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=longman@redhat.com \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=matthew.brost@intel.com \
    --cc=mhocko@suse.com \
    --cc=mkoutny@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=nphamcs@gmail.com \
    --cc=osalvador@suse.de \
    --cc=rafael@kernel.org \
    --cc=rakie.kim@sk.com \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rppt@kernel.org \
    --cc=shakeel.butt@linux.dev \
    --cc=shikemeng@huaweicloud.com \
    --cc=surenb@google.com \
    --cc=tj@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=vishal.l.verma@intel.com \
    --cc=weixugc@google.com \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yosry.ahmed@linux.dev \
    --cc=yuanchu@google.com \
    --cc=yury.norov@gmail.com \
    --cc=zhengqi.arch@bytedance.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.