From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Gregory Price <gourry@gourry.net>
Cc: Balbir Singh <balbirs@nvidia.com>,
lsf-pc@lists.linux-foundation.org, linux-kernel@vger.kernel.org,
linux-cxl@vger.kernel.org, cgroups@vger.kernel.org,
linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org,
damon@lists.linux.dev, kernel-team@meta.com,
gregkh@linuxfoundation.org, rafael@kernel.org, dakr@kernel.org,
dave@stgolabs.net, jonathan.cameron@huawei.com,
dave.jiang@intel.com, alison.schofield@intel.com,
vishal.l.verma@intel.com, ira.weiny@intel.com,
dan.j.williams@intel.com, longman@redhat.com,
akpm@linux-foundation.org, lorenzo.stoakes@oracle.com,
Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, osalvador@suse.de,
ziy@nvidia.com, matthew.brost@intel.com, joshua.hahnjy@gmail.com,
rakie.kim@sk.com, byungchul@sk.com, ying.huang@linux.alibaba.com,
apopple@nvidia.com, axelrasmussen@google.com, yuanchu@google.com,
weixugc@google.com, yury.norov@gmail.com,
linux@rasmusvillemoes.dk, mhiramat@kernel.org,
mathieu.desnoyers@efficios.com, tj@kernel.org,
hannes@cmpxchg.org, mkoutny@suse.com, jackmanb@google.com,
sj@kernel.org, baolin.wang@linux.alibaba.com, npache@redhat.com,
ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org,
lance.yang@linux.dev, muchun.song@linux.dev, xu.xin16@zte.com.cn,
chengming.zhou@linux.dev, jannh@google.com, linmiaohe@huawei.com,
nao.horiguchi@gmail.com, pfalcato@suse.de, rientjes@google.com,
shakeel.butt@linux.dev, riel@surriel.com, harry.yoo@oracle.com,
cl@gentwo.org, roman.gushchin@linux.dev, chrisl@kernel.org,
kasong@tencent.com, shikemeng@huaweicloud.com, nphamcs@gmail.com,
bhe@redhat.com, zhengqi.arch@bytedance.com, terry.bowman@amd.com
Subject: Re: [LSF/MM/BPF TOPIC][RFC PATCH v4 00/27] Private Memory Nodes (w/ Compressed RAM)
Date: Wed, 10 Jun 2026 20:59:59 +0200 [thread overview]
Message-ID: <d01fb1ed-2418-42ee-aea2-37f9a5c5729c@kernel.org> (raw)
In-Reply-To: <aimSzvoJDrpeQsmM@gourry-fedora-PF4VCD3F>
On 6/10/26 18:37, Gregory Price wrote:
> On Wed, Jun 10, 2026 at 05:00:33PM +0200, David Hildenbrand (Arm) wrote:
>> On 6/10/26 12:41, Gregory Price wrote:
>>>
>>> Notably: slub.c injects __GFP_THISNODE internally on behalf of kmalloc,
>>> which causes spillage into private nodes because slub allows private
>>> nodes in its mask. I think this is fixable.
>>>
>>> I have to inspect some other __GFP_THISNODE users (hugetlb, some arch
>>> code, etc), but it seems like fully dropping the FALLBACK entries and
>>> requiring __GFP_THISNODE might be sufficient.
>>
>> Sorry, I haven't been able to follow up so far, and not sure if that's what you
>> are discussing here ...
>>
>> After the LSF/MM session, I was wondering, whether if we focus on allowing only
>> folios allocations to end up on private memory nodes for now: could the
>> __GFP_THISNODE approach work there?
>>
>> Essentially, disallow any allocations on non-folio paths, and allow folio
>> allocation only with __GFP_THISNODE set.
>>
>> I have to find time to read the other mails in this thread, on my todo list.
>>
>> So sorry if that is precisely what is being discussed here.
>>
>
> So, I remember this being asked, and I didn't fully grok the request.
>
> I'm still not sure I fully understand the question, so apologies if I'm
> answer the wrong things here.
>
> I understand this question in two ways:
>
> 1) Can we disallow PAGE allocation and limit this to FOLIO allocation
Yes. Can we only allow folios to be allocated from private memory nodes. So let
me reply to that one below.
> 2) Can we disallow [Feature] (i.e. slab) allocation targeting the node.
>
>
> 1) Can we disallow page allocation and limit this to folios?
>
> No, I don't think so.
>
> Folio allocations are written in terms of page allocations, we would
> have to rewrite folio allocation interfaces and introduce a bunch of
> boilerplate for the sake of this.
>
> struct page *__alloc_pages_noprof(gfp_t gfp, unsigned int order,
> int preferred_nid, nodemask_t *nodemask)
> {
> struct page *page;
>
> page = __alloc_frozen_pages_noprof(gfp, order, preferred_nid, nodemask);
> if (page)
> set_page_refcounted(page);
> return page;
> }
>
> struct folio *__folio_alloc_noprof(gfp_t gfp, unsigned int order, int preferred_nid,
> nodemask_t *nodemask)
> {
> struct page *page = __alloc_pages_noprof(gfp | __GFP_COMP, order,
> preferred_nid, nodemask);
> return page_rmappable_folio(page);
> }
At LSF/MM we talked about how GFP flags are bad and how deriving stuff from the
context might be better. I think there was also talk about how the memalloc_*
interface might be a better way forward. Maybe we would start giving the
allocator more context ("we are allocating a folio").
The following is incomplete (esp. hugetlb stuff I assume), just as some idea:
From 64aaff5f40497201ecc089c3339df6576184c433 Mon Sep 17 00:00:00 2001
From: "David Hildenbrand (Arm)" <david@kernel.org>
Date: Wed, 10 Jun 2026 20:55:49 +0200
Subject: [PATCH] tmp
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
include/linux/sched.h | 2 +-
include/linux/sched/mm.h | 11 +++++++++++
mm/mempolicy.c | 14 ++++++++++++--
mm/page_alloc.c | 7 ++++++-
4 files changed, 30 insertions(+), 4 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index ee06cba5c6f5..9c850b7be6bf 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1778,7 +1778,7 @@ extern struct pid *cad_pid;
* I am cleaning dirty pages from some other bdi. */
#define PF_KTHREAD 0x00200000 /* I am a kernel thread */
#define PF_RANDOMIZE 0x00400000 /* Randomize virtual address space */
-#define PF__HOLE__00800000 0x00800000
+#define PF__MEMALLOC_FOLIO 0x00800000 /* Allocating a folio that can end up on
private memory nodes */
#define PF__HOLE__01000000 0x01000000
#define PF__HOLE__02000000 0x02000000
#define PF_NO_SETAFFINITY 0x04000000 /* Userland is not allowed to meddle with
cpus_mask */
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 95d0040df584..2101a447c084 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -471,6 +471,17 @@ static inline void memalloc_pin_restore(unsigned int flags)
memalloc_flags_restore(flags);
}
+static inline unsigned int memalloc_folio_save(void)
+{
+ return memalloc_flags_save(PF_MEMALLOC_FOLIO);
+}
+
+static inline void memalloc_folio_restore(unsigned int flags)
+{
+ memalloc_flags_restore(flags);
+}
+
+
#ifdef CONFIG_MEMCG
DECLARE_PER_CPU(struct mem_cgroup *, int_active_memcg);
/**
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 36699fabd3c2..a78b0e5a1fce 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2506,8 +2506,13 @@ static struct page *alloc_pages_mpol(gfp_t gfp, unsigned
int order,
struct folio *folio_alloc_mpol_noprof(gfp_t gfp, unsigned int order,
struct mempolicy *pol, pgoff_t ilx, int nid)
{
- struct page *page = alloc_pages_mpol(gfp | __GFP_COMP, order, pol,
+ struct page *page;
+ int flags;
+
+ flags = memalloc_folio_save();
+ page = alloc_pages_mpol(gfp | __GFP_COMP, order, pol,
ilx, nid);
+ memalloc_folio_restore(flags);
if (!page)
return NULL;
@@ -2588,7 +2593,12 @@ EXPORT_SYMBOL(alloc_pages_noprof);
struct folio *folio_alloc_noprof(gfp_t gfp, unsigned int order)
{
- return page_rmappable_folio(alloc_pages_noprof(gfp | __GFP_COMP, order));
+ struct folio *folio;
+ int flags;
+
+ flags = memalloc_folio_save();
+ folio = page_rmappable_folio(alloc_pages_noprof(gfp | __GFP_COMP, order));
+ memalloc_folio_restore(flags);
+ return folio;
}
EXPORT_SYMBOL(folio_alloc_noprof);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ee902a468c2f..37434b37f7af 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5345,8 +5345,13 @@ EXPORT_SYMBOL(__alloc_pages_noprof);
struct folio *__folio_alloc_noprof(gfp_t gfp, unsigned int order, int
preferred_nid,
nodemask_t *nodemask)
{
- struct page *page = __alloc_pages_noprof(gfp | __GFP_COMP, order,
+ struct page *page;
+ int flags;
+
+ flags = memalloc_folio_save();
+ page = __alloc_pages_noprof(gfp | __GFP_COMP, order,
preferred_nid, nodemask);
+ memalloc_folio_restore(flags);
return page_rmappable_folio(page);
}
EXPORT_SYMBOL(__folio_alloc_noprof);
--
2.43.0
--
Cheers,
David
next prev parent reply other threads:[~2026-06-10 19:00 UTC|newest]
Thread overview: 85+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20260427123800epcas5p1e1a2fed257091b31e2e6c3a7d1b0c2b0@epcas5p1.samsung.com>
2026-02-22 8:48 ` [LSF/MM/BPF TOPIC][RFC PATCH v4 00/27] Private Memory Nodes (w/ Compressed RAM) Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 01/27] numa: introduce N_MEMORY_PRIVATE node state Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 02/27] mm,cpuset: gate allocations from N_MEMORY_PRIVATE behind __GFP_PRIVATE Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 03/27] mm/page_alloc: add numa_zone_allowed() and wire it up Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 04/27] mm/page_alloc: Add private node handling to build_zonelists Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 05/27] mm: introduce folio_is_private_managed() unified predicate Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 06/27] mm/mlock: skip mlock for managed-memory folios Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 07/27] mm/madvise: skip madvise " Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 08/27] mm/ksm: skip KSM " Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 09/27] mm/khugepaged: skip private node folios when trying to collapse Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 10/27] mm/swap: add free_folio callback for folio release cleanup Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 11/27] mm/huge_memory.c: add private node folio split notification callback Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 12/27] mm/migrate: NP_OPS_MIGRATION - support private node user migration Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 13/27] mm/mempolicy: NP_OPS_MEMPOLICY - support private node mempolicy Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 14/27] mm/memory-tiers: NP_OPS_DEMOTION - support private node demotion Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 15/27] mm/mprotect: NP_OPS_PROTECT_WRITE - gate PTE/PMD write-upgrades Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 16/27] mm: NP_OPS_RECLAIM - private node reclaim participation Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 17/27] mm/oom: NP_OPS_OOM_ELIGIBLE - private node OOM participation Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 18/27] mm/memory: NP_OPS_NUMA_BALANCING - private node NUMA balancing Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 19/27] mm/compaction: NP_OPS_COMPACTION - private node compaction support Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 20/27] mm/gup: NP_OPS_LONGTERM_PIN - private node longterm pin support Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 21/27] mm/memory-failure: add memory_failure callback to node_private_ops Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 22/27] mm/memory_hotplug: add add_private_memory_driver_managed() Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 23/27] mm/cram: add compressed ram memory management subsystem Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 24/27] cxl/core: Add cxl_sysram region type Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 25/27] cxl/core: Add private node support to cxl_sysram Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 26/27] cxl: add cxl_mempolicy sample PCI driver Gregory Price
2026-02-22 8:48 ` [RFC PATCH v4 27/27] cxl: add cxl_compression " Gregory Price
2026-02-23 13:07 ` [LSF/MM/BPF TOPIC][RFC PATCH v4 00/27] Private Memory Nodes (w/ Compressed RAM) David Hildenbrand (Arm)
2026-02-23 14:54 ` Gregory Price
2026-02-23 16:08 ` Gregory Price
2026-03-17 13:05 ` David Hildenbrand (Arm)
2026-03-19 14:29 ` Gregory Price
2026-02-24 6:19 ` Alistair Popple
2026-02-24 15:17 ` Gregory Price
2026-02-24 16:54 ` Gregory Price
2026-02-25 22:21 ` Matthew Brost
2026-02-25 23:58 ` Gregory Price
2026-02-26 3:27 ` Alistair Popple
2026-02-26 5:54 ` Gregory Price
2026-02-26 22:49 ` Gregory Price
2026-03-03 20:36 ` Gregory Price
2026-02-25 12:40 ` Alejandro Lucero Palau
2026-02-25 14:43 ` Gregory Price
2026-05-06 14:43 ` Gregory Price
2026-03-17 13:25 ` David Hildenbrand (Arm)
2026-03-19 15:09 ` Gregory Price
2026-04-13 13:11 ` David Hildenbrand (Arm)
2026-04-13 17:05 ` Gregory Price
2026-04-15 9:49 ` David Hildenbrand (Arm)
2026-04-15 15:17 ` Gregory Price
2026-04-15 19:47 ` Frank van der Linden
2026-04-16 1:24 ` Gregory Price
2026-04-17 9:50 ` David Hildenbrand (Arm)
2026-04-17 15:07 ` Gregory Price
2026-04-16 20:23 ` Gregory Price
2026-04-17 9:39 ` David Hildenbrand (Arm)
2026-04-17 9:37 ` David Hildenbrand (Arm)
2026-04-17 14:45 ` Gregory Price
2026-04-20 2:56 ` Gregory Price
2026-04-27 12:32 ` Arun George
2026-04-27 22:28 ` Gregory Price
2026-04-29 6:15 ` Arun George/Arun George
2026-04-29 13:42 ` Gregory Price
2026-05-04 13:08 ` Arun George/Arun George
2026-05-05 7:45 ` Gregory Price
2026-05-22 8:40 ` Arun George/Arun George
2026-05-25 2:03 ` Gregory Price
2026-05-05 22:21 ` Yiannis Nikolakopoulos
2026-05-09 16:38 ` [LSF/MM/BPF TOPIC] Private Memory Nodes - follow up Gregory Price
2026-05-21 6:23 ` [LSF/MM/BPF TOPIC][RFC PATCH v4 00/27] Private Memory Nodes (w/ Compressed RAM) Balbir Singh
2026-05-25 1:50 ` Gregory Price
2026-06-02 2:16 ` Balbir Singh
2026-06-02 8:57 ` Gregory Price
2026-06-03 5:00 ` Balbir Singh
2026-06-03 7:02 ` Gregory Price
2026-06-04 1:43 ` Balbir Singh
2026-06-04 8:36 ` Gregory Price
2026-06-04 10:35 ` Balbir Singh
2026-06-04 12:18 ` Gregory Price
2026-06-10 10:41 ` Gregory Price
2026-06-10 15:00 ` David Hildenbrand (Arm)
2026-06-10 16:37 ` Gregory Price
2026-06-10 18:59 ` David Hildenbrand (Arm) [this message]
2026-06-10 20:12 ` Gregory Price
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d01fb1ed-2418-42ee-aea2-37f9a5c5729c@kernel.org \
--to=david@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=alison.schofield@intel.com \
--cc=apopple@nvidia.com \
--cc=axelrasmussen@google.com \
--cc=balbirs@nvidia.com \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=bhe@redhat.com \
--cc=byungchul@sk.com \
--cc=cgroups@vger.kernel.org \
--cc=chengming.zhou@linux.dev \
--cc=chrisl@kernel.org \
--cc=cl@gentwo.org \
--cc=dakr@kernel.org \
--cc=damon@lists.linux.dev \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=dev.jain@arm.com \
--cc=gourry@gourry.net \
--cc=gregkh@linuxfoundation.org \
--cc=hannes@cmpxchg.org \
--cc=harry.yoo@oracle.com \
--cc=ira.weiny@intel.com \
--cc=jackmanb@google.com \
--cc=jannh@google.com \
--cc=jonathan.cameron@huawei.com \
--cc=joshua.hahnjy@gmail.com \
--cc=kasong@tencent.com \
--cc=kernel-team@meta.com \
--cc=lance.yang@linux.dev \
--cc=linmiaohe@huawei.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=linux@rasmusvillemoes.dk \
--cc=longman@redhat.com \
--cc=lorenzo.stoakes@oracle.com \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=matthew.brost@intel.com \
--cc=mhiramat@kernel.org \
--cc=mhocko@suse.com \
--cc=mkoutny@suse.com \
--cc=muchun.song@linux.dev \
--cc=nao.horiguchi@gmail.com \
--cc=npache@redhat.com \
--cc=nphamcs@gmail.com \
--cc=osalvador@suse.de \
--cc=pfalcato@suse.de \
--cc=rafael@kernel.org \
--cc=rakie.kim@sk.com \
--cc=riel@surriel.com \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=shakeel.butt@linux.dev \
--cc=shikemeng@huaweicloud.com \
--cc=sj@kernel.org \
--cc=surenb@google.com \
--cc=terry.bowman@amd.com \
--cc=tj@kernel.org \
--cc=vbabka@suse.cz \
--cc=vishal.l.verma@intel.com \
--cc=weixugc@google.com \
--cc=xu.xin16@zte.com.cn \
--cc=ying.huang@linux.alibaba.com \
--cc=yuanchu@google.com \
--cc=yury.norov@gmail.com \
--cc=zhengqi.arch@bytedance.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox