Re: [PATCH RFC 3/9] mm: add __GFP_PREZEROED flag and folio_test_clear_prezeroed()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "David Hildenbrand (Arm)" <david@kernel.org>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@kernel.org>,
	Brendan Jackman <jackmanb@google.com>,
	Michal Hocko <mhocko@suse.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Jason Wang <jasowang@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	linux-mm@kvack.org, virtualization@lists.linux.dev,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Mike Rapoport <rppt@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>, Zi Yan <ziy@nvidia.com>
Subject: Re: [PATCH RFC 3/9] mm: add __GFP_PREZEROED flag and folio_test_clear_prezeroed()
Date: Tue, 14 Apr 2026 11:04:04 +0200	[thread overview]
Message-ID: <da935245-9c75-4cf4-8a42-2c6328bd93f9@kernel.org> (raw)
In-Reply-To: <20260413163644-mutt-send-email-mst@kernel.org>

On 4/13/26 22:37, Michael S. Tsirkin wrote:
> On Mon, Apr 13, 2026 at 11:05:40AM +0200, David Hildenbrand (Arm) wrote:
>> On 4/13/26 00:50, Michael S. Tsirkin wrote:
>>> The previous patch skips zeroing in post_alloc_hook() when
>>> __GFP_ZERO is used.  However, several page allocation paths
>>> zero pages via folio_zero_user() or clear_user_highpage() after
>>> allocation, not via __GFP_ZERO.
>>>
>>> Add __GFP_PREZEROED gfp flag that tells post_alloc_hook() to
>>> preserve the MAGIC_PAGE_ZEROED sentinel in page->private so the
>>> caller can detect pre-zeroed pages and skip its own zeroing.
>>> Add folio_test_clear_prezeroed() helper to check and clear
>>> the sentinel.
>>
>> I really don't like __GFP_PREZEROED, and wonder how we can avoid it.
>>
>>
>> What you want is, allocate a folio (well, actually a page that becomes
>> a folio) and know whether zeroing for that folio (once we establish it
>> from a page) is still required.
>>
>> Or you just allocate a folio, specify GFP_ZERO, and let the folio
>> allocation code deal with that.
>>
>>
>> I think we have two options:
>>
>> (1) Use an indication that can be sticky for callers that do not care.
>>
>> Assuming we would use a page flag that is only ever used on folios, all
>> we'd have to do is make sure that we clear the flag once we convert
>> the to a folio.
>>
>> For example, PG_dropbehind is only ever set on folios in the pagecache. 
>>
>> Paths that allocate folios would have to clear the flag. For non-hugetlb
>> folios that happens through page_rmappable_folio().
>>
>> I'm not super-happy about that, but it would be doable.
>>
>>
>> (2) Use a dedicated allocation interface for user pages in the buddy.
>>
>> I hate the whole user_alloc_needs_zeroing()+folio_zero_user() handling.
>>
>> It shouldn't exist. We should just be passing GFP_ZERO and let the buddy handle
>> all that.
>>
>>
>> For example, vma_alloc_folio() already gets passed the address in.
>>
>> Pass the address from vma_alloc_folio_noprof()->folio_alloc_noprof(), and let
>> folio_alloc_noprof() use a buddy interface that can handle it.
>>
>> Imagine if we had a alloc_user_pages_noprof() that consumes an address. It could just
>> do what folio_zero_user() does, and only if really required.
>>
>> The whole user_alloc_needs_zeroing() could go away and you could just handle the
>> pre-zeroed optimization internally.
>>
>> -- 
>> Cheers,
>>
>> David
> 
> I admit I only vaguely understand the core mm refactoring you are suggesting.
> 

Oh, I was hoping claude would figure that out for you.


Essentially, we move the zeroing of folios back into the buddy, by using
GFP_ZERO.

user_alloc_needs_zeroing() logic would reside in the buddy and is no longer
required in callers.

E.g.,

diff --git a/mm/memory.c b/mm/memory.c
index 631205a384e1..44576ba3def5 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5259,7 +5259,7 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
        gfp = vma_thp_gfp_mask(vma);
        while (orders) {
                addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order);
-               folio = vma_alloc_folio(gfp, order, vma, addr);
+               folio = vma_alloc_folio(gfp | GFP_ZERO, order, vma, addr);
                if (!folio)
                        goto next;
                if (mem_cgroup_charge(folio, vma->vm_mm, gfp)) {
@@ -5272,15 +5272,6 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
                        goto fallback;
                }
                folio_throttle_swaprate(folio, gfp);
-               /*
-                * When a folio is not zeroed during allocation
-                * (__GFP_ZERO not used) or user folios require special
-                * handling, folio_zero_user() is used to make sure
-                * that the page corresponding to the faulting address
-                * will be hot in the cache after zeroing.
-                */
-               if (user_alloc_needs_zeroing())
-                       folio_zero_user(folio, vmf->address);
                return folio;
 next:
                count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK);


folio_zero_user(), from where we would extract a function that operates on a
page+order chunk, requires the address hint.

So we would have to pass that address. For example for the !CONFIG_NUMA case,
something like the following could be done.

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 51ef13ed756e..29771c3240be 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -234,6 +234,10 @@ struct folio *__folio_alloc_noprof(gfp_t gfp, unsigned int order, int preferred_
                nodemask_t *nodemask);
 #define __folio_alloc(...)                     alloc_hooks(__folio_alloc_noprof(__VA_ARGS__))
 
+struct folio *__folio_alloc_user_noprof(gfp_t gfp, unsigned int order, int preferred_nid,
+               nodemask_t *nodemask, unsigned long addr);
+#define __folio_alloc_user(...)                        alloc_hooks(__folio_alloc_user_noprof(__VA_ARGS__))
+
 unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid,
                                nodemask_t *nodemask, int nr_pages,
                                struct page **page_array);
@@ -291,6 +295,18 @@ __alloc_pages_node_noprof(int nid, gfp_t gfp_mask, unsigned int order)
 
 #define  __alloc_pages_node(...)               alloc_hooks(__alloc_pages_node_noprof(__VA_ARGS__))
 
+static inline
+struct folio *__folio_alloc_user_node_noprof(gfp_t gfp, unsigned int order,
+               int nid, unsigned long addr)
+{
+       VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
+       warn_if_node_offline(nid, gfp);
+
+       return __folio_alloc_user_noprof(gfp, order, nid, NULL, addr);
+}
+
+#define  __folio_alloc_user_node(...)          alloc_hooks(__folio_alloc_user_node_noprof(__VA_ARGS__))
+
 static inline
 struct folio *__folio_alloc_node_noprof(gfp_t gfp, unsigned int order, int nid)
 {
@@ -342,7 +358,7 @@ static inline struct folio *folio_alloc_mpol_noprof(gfp_t gfp, unsigned int orde
 static inline struct folio *vma_alloc_folio_noprof(gfp_t gfp, int order,
                struct vm_area_struct *vma, unsigned long addr)
 {
-       return folio_alloc_noprof(gfp, order);
+       return __folio_alloc_user_node_noprof(gfp, order, numa_node_id(), addr);
 }
 #endif
 
index ee81f5c67c18..28f448f40b75 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5260,6 +5260,13 @@ struct folio *__folio_alloc_noprof(gfp_t gfp, unsigned int order, int preferred_
 }
 EXPORT_SYMBOL(__folio_alloc_noprof);
 
+struct folio *__folio_alloc_user_noprof(gfp_t gfp, unsigned int order, int preferred_nid,
+               nodemask_t *nodemask, unsigned long addr)
+{
+       /* TODO */
+}
+EXPORT_SYMBOL(__folio_alloc_noprof);
+
 /*
  * Common helper functions. Never use with __GFP_HIGHMEM because the returned
  * address cannot represent highmem pages. Use alloc_pages and then kmap if


As alloc_user_pages() resides in the buddy, it can just honor any
buddy-internal "pre-zeroed" flag.


Once you are in page_alloc.c you can access internal allocation functions and
take care of that without GFP flags.

-- 
Cheers,

David

next prev parent reply	other threads:[~2026-04-14  9:04 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-12 22:50 [PATCH RFC 0/9] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
2026-04-12 22:50 ` [PATCH RFC 1/9] mm: page_alloc: propagate PageReported flag across buddy splits Michael S. Tsirkin
2026-04-13 19:11   ` David Hildenbrand (Arm)
2026-04-13 20:32     ` Michael S. Tsirkin
2026-04-14  9:08       ` David Hildenbrand (Arm)
2026-04-12 22:50 ` [PATCH RFC 2/9] mm: page_reporting: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
2026-04-13  8:00   ` David Hildenbrand (Arm)
2026-04-13  8:10     ` Michael S. Tsirkin
2026-04-13  8:15       ` David Hildenbrand (Arm)
2026-04-13  8:29         ` Michael S. Tsirkin
2026-04-13 20:35     ` Michael S. Tsirkin
2026-04-14  9:18       ` David Hildenbrand (Arm)
2026-04-14 10:22         ` Michael S. Tsirkin
2026-04-14 15:32           ` David Hildenbrand (Arm)
2026-04-14 15:36             ` Michael S. Tsirkin
2026-04-12 22:50 ` [PATCH RFC 3/9] mm: add __GFP_PREZEROED flag and folio_test_clear_prezeroed() Michael S. Tsirkin
2026-04-13  9:05   ` David Hildenbrand (Arm)
2026-04-13 20:37     ` Michael S. Tsirkin
2026-04-14  9:04       ` David Hildenbrand (Arm) [this message]
2026-04-14 10:29         ` Michael S. Tsirkin
2026-04-14 15:46           ` David Hildenbrand (Arm)
2026-04-16  8:45             ` Michael S. Tsirkin
2026-04-13 21:37     ` Michael S. Tsirkin
2026-04-13 22:06     ` Michael S. Tsirkin
2026-04-13 23:43       ` Michael S. Tsirkin
2026-04-14  9:06         ` David Hildenbrand (Arm)
2026-04-12 22:50 ` [PATCH RFC 4/9] mm: skip zeroing in vma_alloc_zeroed_movable_folio for pre-zeroed pages Michael S. Tsirkin
2026-04-12 22:50 ` [PATCH RFC 5/9] mm: skip zeroing in alloc_anon_folio " Michael S. Tsirkin
2026-04-12 22:50 ` [PATCH RFC 6/9] mm: skip zeroing in vma_alloc_anon_folio_pmd " Michael S. Tsirkin
2026-04-12 22:51 ` [PATCH RFC 7/9] mm: hugetlb: skip zeroing of pre-zeroed hugetlb pages Michael S. Tsirkin
2026-04-12 22:51 ` [PATCH RFC 8/9] mm: page_reporting: add flush parameter to trigger immediate reporting Michael S. Tsirkin
2026-04-12 22:51 ` [PATCH RFC 9/9] virtio_balloon: a hack to enable host-zeroed page optimization Michael S. Tsirkin

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:631205a384e dfblob:44576ba3def dfblob:51ef13ed756
dfblob:29771c3240b dfblob:ee81f5c67c1 dfblob:28f448f40b7 )
 OR (
bs:"Re: [PATCH RFC 3/9] mm: add __GFP_PREZEROED flag and folio_test_clear_prezeroed()" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=da935245-9c75-4cf4-8a42-2c6328bd93f9@kernel.org \
    --to=david@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=jackmanb@google.com \
    --cc=jasowang@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=mst@redhat.com \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=virtualization@lists.linux.dev \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.