From: Gregory Price <gourry@gourry.net>
To: "David Hildenbrand (Arm)" <david@kernel.org>
Cc: Balbir Singh <balbirs@nvidia.com>,
lsf-pc@lists.linux-foundation.org, linux-kernel@vger.kernel.org,
linux-cxl@vger.kernel.org, cgroups@vger.kernel.org,
linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org,
damon@lists.linux.dev, kernel-team@meta.com,
gregkh@linuxfoundation.org, rafael@kernel.org, dakr@kernel.org,
dave@stgolabs.net, jonathan.cameron@huawei.com,
dave.jiang@intel.com, alison.schofield@intel.com,
vishal.l.verma@intel.com, ira.weiny@intel.com,
dan.j.williams@intel.com, longman@redhat.com,
akpm@linux-foundation.org, lorenzo.stoakes@oracle.com,
Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, osalvador@suse.de,
ziy@nvidia.com, matthew.brost@intel.com, joshua.hahnjy@gmail.com,
rakie.kim@sk.com, byungchul@sk.com, ying.huang@linux.alibaba.com,
apopple@nvidia.com, axelrasmussen@google.com, yuanchu@google.com,
weixugc@google.com, yury.norov@gmail.com,
linux@rasmusvillemoes.dk, mhiramat@kernel.org,
mathieu.desnoyers@efficios.com, tj@kernel.org,
hannes@cmpxchg.org, mkoutny@suse.com, jackmanb@google.com,
sj@kernel.org, baolin.wang@linux.alibaba.com, npache@redhat.com,
ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org,
lance.yang@linux.dev, muchun.song@linux.dev, xu.xin16@zte.com.cn,
chengming.zhou@linux.dev, jannh@google.com, linmiaohe@huawei.com,
nao.horiguchi@gmail.com, pfalcato@suse.de, rientjes@google.com,
shakeel.butt@linux.dev, riel@surriel.com, harry.yoo@oracle.com,
cl@gentwo.org, roman.gushchin@linux.dev, chrisl@kernel.org,
kasong@tencent.com, shikemeng@huaweicloud.com, nphamcs@gmail.com,
bhe@redhat.com, zhengqi.arch@bytedance.com, terry.bowman@amd.com
Subject: Re: [LSF/MM/BPF TOPIC][RFC PATCH v4 00/27] Private Memory Nodes (w/ Compressed RAM)
Date: Wed, 10 Jun 2026 16:12:52 -0400 [thread overview]
Message-ID: <ainFROZ3WrGioyuY@gourry-fedora-PF4VCD3F> (raw)
In-Reply-To: <d01fb1ed-2418-42ee-aea2-37f9a5c5729c@kernel.org>
On Wed, Jun 10, 2026 at 08:59:59PM +0200, David Hildenbrand (Arm) wrote:
> On 6/10/26 18:37, Gregory Price wrote:
> > On Wed, Jun 10, 2026 at 05:00:33PM +0200, David Hildenbrand (Arm) wrote:
> >> On 6/10/26 12:41, Gregory Price wrote:
> >
> > So, I remember this being asked, and I didn't fully grok the request.
> >
> > I'm still not sure I fully understand the question, so apologies if I'm
> > answer the wrong things here.
> >
> > I understand this question in two ways:
> >
> > 1) Can we disallow PAGE allocation and limit this to FOLIO allocation
>
> Yes. Can we only allow folios to be allocated from private memory nodes. So let
> me reply to that one below.
>
... snip ...
>
> At LSF/MM we talked about how GFP flags are bad and how deriving stuff from the
> context might be better. I think there was also talk about how the memalloc_*
> interface might be a better way forward. Maybe we would start giving the
> allocator more context ("we are allocating a folio").
>
> The following is incomplete (esp. hugetlb stuff I assume), just as some idea:
>
Ok, the mental gap I have is not knowing the full context behind
memalloc. I'll take this and do some reading / prototyping, but
this looks entirely reasonable.
I will still probably send the next RFC version tomorrow or friday,
as I want to get some eyes on the __GFP_PRIVATE-less pattern.
Also, I made a new `anondax` driver which enables userland testing
of this functionality without any specialty hardware.
tl;dr:
fd = open("/dev/anondax0.0", ....);
buf = mmap(fd, ...);
buf[0] = 0xDEADBEEF; /* fault to anondax driver */
static vm_fault_t anon_dax_fault(struct vm_fault *vmf)
{
struct dev_dax *dev_dax = vmf->vma->vm_file->private_data;
vm_fault_t ret;
int id;
id = dax_read_lock();
if (!dax_alive(dev_dax->dax_dev))
ret = VM_FAULT_SIGBUS;
else
ret = do_anonymous_page_node(vmf, dev_dax->target_node);
dax_read_unlock(id);
if (ret & VM_FAULT_OOM)
return VM_FAULT_SIGBUS;
return ret ? ret : VM_FAULT_NOPAGE;
}
With:
qemu-system-x86_64 -m 5G \
-object memory-backend-ram,id=m0,size=4G -numa node,nodeid=0,memdev=m0 \
-object memory-backend-ram,id=m1,size=1G -numa node,nodeid=1,memdev=m1 \
-append "... memmap=0x40000000!0x140000000"
Voila - buddy-managed private anonymous memory (1G region)
No need to reinvent page_alloc.c or fault handling :]
This can be used to hammer on reclaim/compaction/whatever support
without needing any particular hardware setup, and in fact it gives
some memory devices a path to support in userland while standards
get worked out.
do_anonymous_page_node is a bit of a bodge right now but I just haven't
fleshed it out yet. The idea is - don't reinvent the fault path, just
provide the appropriate context to memory.c to do the right thing.
If this is acceptable, I imagine whatever interface gets implemented
will carry an in-tree driver export only, similar to hotplug/kmem.
> From 64aaff5f40497201ecc089c3339df6576184c433 Mon Sep 17 00:00:00 2001
> From: "David Hildenbrand (Arm)" <david@kernel.org>
> Date: Wed, 10 Jun 2026 20:55:49 +0200
> Subject: [PATCH] tmp
>
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
> ---
> include/linux/sched.h | 2 +-
> include/linux/sched/mm.h | 11 +++++++++++
> mm/mempolicy.c | 14 ++++++++++++--
> mm/page_alloc.c | 7 ++++++-
> 4 files changed, 30 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index ee06cba5c6f5..9c850b7be6bf 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1778,7 +1778,7 @@ extern struct pid *cad_pid;
> * I am cleaning dirty pages from some other bdi. */
> #define PF_KTHREAD 0x00200000 /* I am a kernel thread */
> #define PF_RANDOMIZE 0x00400000 /* Randomize virtual address space */
> -#define PF__HOLE__00800000 0x00800000
> +#define PF__MEMALLOC_FOLIO 0x00800000 /* Allocating a folio that can end up on
> private memory nodes */
> #define PF__HOLE__01000000 0x01000000
> #define PF__HOLE__02000000 0x02000000
> #define PF_NO_SETAFFINITY 0x04000000 /* Userland is not allowed to meddle with
> cpus_mask */
> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index 95d0040df584..2101a447c084 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -471,6 +471,17 @@ static inline void memalloc_pin_restore(unsigned int flags)
> memalloc_flags_restore(flags);
> }
>
> +static inline unsigned int memalloc_folio_save(void)
> +{
> + return memalloc_flags_save(PF_MEMALLOC_FOLIO);
> +}
> +
> +static inline void memalloc_folio_restore(unsigned int flags)
> +{
> + memalloc_flags_restore(flags);
> +}
> +
> +
> #ifdef CONFIG_MEMCG
> DECLARE_PER_CPU(struct mem_cgroup *, int_active_memcg);
> /**
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 36699fabd3c2..a78b0e5a1fce 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -2506,8 +2506,13 @@ static struct page *alloc_pages_mpol(gfp_t gfp, unsigned
> int order,
> struct folio *folio_alloc_mpol_noprof(gfp_t gfp, unsigned int order,
> struct mempolicy *pol, pgoff_t ilx, int nid)
> {
> - struct page *page = alloc_pages_mpol(gfp | __GFP_COMP, order, pol,
> + struct page *page;
> + int flags;
> +
> + flags = memalloc_folio_save();
> + page = alloc_pages_mpol(gfp | __GFP_COMP, order, pol,
> ilx, nid);
> + memalloc_folio_restore(flags);
> if (!page)
> return NULL;
>
> @@ -2588,7 +2593,12 @@ EXPORT_SYMBOL(alloc_pages_noprof);
>
> struct folio *folio_alloc_noprof(gfp_t gfp, unsigned int order)
> {
> - return page_rmappable_folio(alloc_pages_noprof(gfp | __GFP_COMP, order));
> + struct folio *folio;
> + int flags;
> +
> + flags = memalloc_folio_save();
> + folio = page_rmappable_folio(alloc_pages_noprof(gfp | __GFP_COMP, order));
> + memalloc_folio_restore(flags);
> + return folio;
> }
> EXPORT_SYMBOL(folio_alloc_noprof);
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ee902a468c2f..37434b37f7af 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5345,8 +5345,13 @@ EXPORT_SYMBOL(__alloc_pages_noprof);
> struct folio *__folio_alloc_noprof(gfp_t gfp, unsigned int order, int
> preferred_nid,
> nodemask_t *nodemask)
> {
> - struct page *page = __alloc_pages_noprof(gfp | __GFP_COMP, order,
> + struct page *page;
> + int flags;
> +
> + flags = memalloc_folio_save();
> + page = __alloc_pages_noprof(gfp | __GFP_COMP, order,
> preferred_nid, nodemask);
> + memalloc_folio_restore(flags);
> return page_rmappable_folio(page);
> }
> EXPORT_SYMBOL(__folio_alloc_noprof);
> --
> 2.43.0
>
>
> --
> Cheers,
>
> David
next prev parent reply other threads:[~2026-06-10 20:12 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20260427123800epcas5p1e1a2fed257091b31e2e6c3a7d1b0c2b0@epcas5p1.samsung.com>
2026-02-22 8:48 ` [LSF/MM/BPF TOPIC][RFC PATCH v4 00/27] Private Memory Nodes (w/ Compressed RAM) Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 01/27] numa: introduce N_MEMORY_PRIVATE node state Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 02/27] mm,cpuset: gate allocations from N_MEMORY_PRIVATE behind __GFP_PRIVATE Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 03/27] mm/page_alloc: add numa_zone_allowed() and wire it up Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 04/27] mm/page_alloc: Add private node handling to build_zonelists Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 05/27] mm: introduce folio_is_private_managed() unified predicate Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 06/27] mm/mlock: skip mlock for managed-memory folios Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 07/27] mm/madvise: skip madvise " Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 08/27] mm/ksm: skip KSM " Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 09/27] mm/khugepaged: skip private node folios when trying to collapse Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 10/27] mm/swap: add free_folio callback for folio release cleanup Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 11/27] mm/huge_memory.c: add private node folio split notification callback Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 12/27] mm/migrate: NP_OPS_MIGRATION - support private node user migration Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 13/27] mm/mempolicy: NP_OPS_MEMPOLICY - support private node mempolicy Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 14/27] mm/memory-tiers: NP_OPS_DEMOTION - support private node demotion Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 15/27] mm/mprotect: NP_OPS_PROTECT_WRITE - gate PTE/PMD write-upgrades Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 16/27] mm: NP_OPS_RECLAIM - private node reclaim participation Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 17/27] mm/oom: NP_OPS_OOM_ELIGIBLE - private node OOM participation Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 18/27] mm/memory: NP_OPS_NUMA_BALANCING - private node NUMA balancing Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 19/27] mm/compaction: NP_OPS_COMPACTION - private node compaction support Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 20/27] mm/gup: NP_OPS_LONGTERM_PIN - private node longterm pin support Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 21/27] mm/memory-failure: add memory_failure callback to node_private_ops Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 22/27] mm/memory_hotplug: add add_private_memory_driver_managed() Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 23/27] mm/cram: add compressed ram memory management subsystem Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 24/27] cxl/core: Add cxl_sysram region type Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 25/27] cxl/core: Add private node support to cxl_sysram Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 26/27] cxl: add cxl_mempolicy sample PCI driver Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 27/27] cxl: add cxl_compression " Gregory Price
2026-02-23 13:07 ` [LSF/MM/BPF TOPIC][RFC PATCH v4 00/27] Private Memory Nodes (w/ Compressed RAM) David Hildenbrand (Arm)
2026-02-23 14:54 ` Gregory Price
2026-02-23 16:08 ` Gregory Price
2026-03-17 13:05 ` David Hildenbrand (Arm)
2026-03-19 14:29 ` Gregory Price
2026-02-24 6:19 ` Alistair Popple
2026-02-24 15:17 ` Gregory Price
2026-02-24 16:54 ` Gregory Price
2026-02-25 22:21 ` Matthew Brost
2026-02-25 23:58 ` Gregory Price
2026-02-26 3:27 ` Alistair Popple
2026-02-26 5:54 ` Gregory Price
2026-02-26 22:49 ` Gregory Price
2026-03-03 20:36 ` Gregory Price
2026-02-25 12:40 ` Alejandro Lucero Palau
2026-02-25 14:43 ` Gregory Price
2026-05-06 14:43 ` Gregory Price
2026-03-17 13:25 ` David Hildenbrand (Arm)
2026-03-19 15:09 ` Gregory Price
2026-04-13 13:11 ` David Hildenbrand (Arm)
2026-04-13 17:05 ` Gregory Price
2026-04-15 9:49 ` David Hildenbrand (Arm)
2026-04-15 15:17 ` Gregory Price
2026-04-15 19:47 ` Frank van der Linden
2026-04-16 1:24 ` Gregory Price
2026-04-17 9:50 ` David Hildenbrand (Arm)
2026-04-17 15:07 ` Gregory Price
2026-04-16 20:23 ` Gregory Price
2026-04-17 9:39 ` David Hildenbrand (Arm)
2026-04-17 9:37 ` David Hildenbrand (Arm)
2026-04-17 14:45 ` Gregory Price
2026-04-20 2:56 ` Gregory Price
2026-04-27 12:32 ` Arun George
2026-04-27 22:28 ` Gregory Price
2026-04-29 6:15 ` Arun George/Arun George
2026-04-29 13:42 ` Gregory Price
2026-05-04 13:08 ` Arun George/Arun George
2026-05-05 7:45 ` Gregory Price
2026-05-22 8:40 ` Arun George/Arun George
2026-05-25 2:03 ` Gregory Price
2026-05-05 22:21 ` Yiannis Nikolakopoulos
2026-05-09 16:38 ` [LSF/MM/BPF TOPIC] Private Memory Nodes - follow up Gregory Price
2026-05-21 6:23 ` [LSF/MM/BPF TOPIC][RFC PATCH v4 00/27] Private Memory Nodes (w/ Compressed RAM) Balbir Singh
2026-05-25 1:50 ` Gregory Price
2026-06-02 2:16 ` Balbir Singh
2026-06-02 8:57 ` Gregory Price
2026-06-03 5:00 ` Balbir Singh
2026-06-03 7:02 ` Gregory Price
2026-06-04 1:43 ` Balbir Singh
2026-06-04 8:36 ` Gregory Price
2026-06-04 10:35 ` Balbir Singh
2026-06-04 12:18 ` Gregory Price
2026-06-10 23:09 ` Balbir Singh
2026-06-10 10:41 ` Gregory Price
2026-06-10 15:00 ` David Hildenbrand (Arm)
2026-06-10 16:37 ` Gregory Price
2026-06-10 18:59 ` David Hildenbrand (Arm)
2026-06-10 20:12 ` Gregory Price [this message]
2026-06-10 22:18 ` Gregory Price
2026-06-10 23:53 ` Balbir Singh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ainFROZ3WrGioyuY@gourry-fedora-PF4VCD3F \
--to=gourry@gourry.net \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=alison.schofield@intel.com \
--cc=apopple@nvidia.com \
--cc=axelrasmussen@google.com \
--cc=balbirs@nvidia.com \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=bhe@redhat.com \
--cc=byungchul@sk.com \
--cc=cgroups@vger.kernel.org \
--cc=chengming.zhou@linux.dev \
--cc=chrisl@kernel.org \
--cc=cl@gentwo.org \
--cc=dakr@kernel.org \
--cc=damon@lists.linux.dev \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=gregkh@linuxfoundation.org \
--cc=hannes@cmpxchg.org \
--cc=harry.yoo@oracle.com \
--cc=ira.weiny@intel.com \
--cc=jackmanb@google.com \
--cc=jannh@google.com \
--cc=jonathan.cameron@huawei.com \
--cc=joshua.hahnjy@gmail.com \
--cc=kasong@tencent.com \
--cc=kernel-team@meta.com \
--cc=lance.yang@linux.dev \
--cc=linmiaohe@huawei.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=linux@rasmusvillemoes.dk \
--cc=longman@redhat.com \
--cc=lorenzo.stoakes@oracle.com \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=matthew.brost@intel.com \
--cc=mhiramat@kernel.org \
--cc=mhocko@suse.com \
--cc=mkoutny@suse.com \
--cc=muchun.song@linux.dev \
--cc=nao.horiguchi@gmail.com \
--cc=npache@redhat.com \
--cc=nphamcs@gmail.com \
--cc=osalvador@suse.de \
--cc=pfalcato@suse.de \
--cc=rafael@kernel.org \
--cc=rakie.kim@sk.com \
--cc=riel@surriel.com \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=shakeel.butt@linux.dev \
--cc=shikemeng@huaweicloud.com \
--cc=sj@kernel.org \
--cc=surenb@google.com \
--cc=terry.bowman@amd.com \
--cc=tj@kernel.org \
--cc=vbabka@suse.cz \
--cc=vishal.l.verma@intel.com \
--cc=weixugc@google.com \
--cc=xu.xin16@zte.com.cn \
--cc=ying.huang@linux.alibaba.com \
--cc=yuanchu@google.com \
--cc=yury.norov@gmail.com \
--cc=zhengqi.arch@bytedance.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox