From: "Michael S. Tsirkin" <mst@redhat.com>
To: linux-kernel@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@kernel.org>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>,
Brendan Jackman <jackmanb@google.com>,
Johannes Weiner <hannes@cmpxchg.org>, Zi Yan <ziy@nvidia.com>,
linux-mm@kvack.org
Subject: [PATCH v5 16/28] mm: page_reporting: skip redundant zeroing of host-zeroed reported pages
Date: Thu, 7 May 2026 18:23:12 -0400 [thread overview]
Message-ID: <da0482ba6f868a12417e46a8eb86594f5eb58d83.1778192416.git.mst@redhat.com> (raw)
In-Reply-To: <cover.1778192416.git.mst@redhat.com>
When a guest reports free pages to the hypervisor via the page reporting
framework (used by virtio-balloon and hv_balloon), the host typically
zeros those pages when reclaiming their backing memory. However, when
those pages are later allocated in the guest, post_alloc_hook()
unconditionally zeros them again if __GFP_ZERO is set. This
double-zeroing is wasteful, especially for large pages.
Avoid redundant zeroing:
- Add a host_zeroes_pages flag to page_reporting_dev_info, allowing
drivers to declare that their host zeros reported pages on reclaim.
A static key (page_reporting_host_zeroes) gates the fast path.
- Add PG_zeroed page flag (sharing PG_private bit) to mark pages
that have been zeroed by the host. Set it in
page_reporting_drain() after the host reports them.
- Thread the zeroed bool through rmqueue -> prep_new_page ->
post_alloc_hook, where it skips redundant zeroing for __GFP_ZERO
allocations.
No driver sets host_zeroes_pages yet; a follow-up patch to
virtio_balloon is needed to opt in.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
Assisted-by: cursor-agent:GPT-5.4-xhigh
---
include/linux/page-flags.h | 9 +++++
include/linux/page_reporting.h | 3 ++
mm/compaction.c | 6 ++--
mm/internal.h | 2 +-
mm/page_alloc.c | 66 +++++++++++++++++++++++-----------
mm/page_reporting.c | 14 +++++++-
mm/page_reporting.h | 12 +++++++
7 files changed, 87 insertions(+), 25 deletions(-)
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index f7a0e4af0c73..eef2499cba8b 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -135,6 +135,8 @@ enum pageflags {
PG_swapcache = PG_owner_priv_1, /* Swap page: swp_entry_t in private */
/* Some filesystems */
PG_checked = PG_owner_priv_1,
+ /* Page contents are known to be zero */
+ PG_zeroed = PG_private,
/*
* Depending on the way an anonymous folio can be mapped into a page
@@ -679,6 +681,13 @@ FOLIO_TEST_CLEAR_FLAG_FALSE(young)
FOLIO_FLAG_FALSE(idle)
#endif
+/*
+ * PageZeroed() tracks pages known to be zero. The allocator
+ * uses this to skip redundant zeroing in post_alloc_hook().
+ */
+__PAGEFLAG(Zeroed, zeroed, PF_NO_COMPOUND)
+#define __PG_ZEROED (1UL << PG_zeroed)
+
/*
* PageReported() is used to track reported free pages within the Buddy
* allocator. We can use the non-atomic version of the test and set
diff --git a/include/linux/page_reporting.h b/include/linux/page_reporting.h
index 306468b6c7d8..81e5a2819b3c 100644
--- a/include/linux/page_reporting.h
+++ b/include/linux/page_reporting.h
@@ -13,6 +13,9 @@ struct page_reporting_dev_info {
int (*report)(struct page_reporting_dev_info *prdev,
struct scatterlist *sg, unsigned int nents);
+ /* If true, host zeros reported pages on reclaim */
+ bool host_zeroes_pages;
+
/* work struct for processing reports */
struct delayed_work work;
diff --git a/mm/compaction.c b/mm/compaction.c
index c1039a9373e5..61209cd408ea 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -82,7 +82,8 @@ static inline bool is_via_compact_memory(int order) { return false; }
static struct page *mark_allocated_noprof(struct page *page, unsigned int order, gfp_t gfp_flags)
{
- post_alloc_hook(page, order, __GFP_MOVABLE, USER_ADDR_NONE);
+ __ClearPageZeroed(page);
+ post_alloc_hook(page, order, __GFP_MOVABLE, false, USER_ADDR_NONE);
set_page_refcounted(page);
return page;
}
@@ -1832,7 +1833,8 @@ static struct folio *compaction_alloc_noprof(struct folio *src, unsigned long da
set_page_private(&freepage[size], start_order);
}
dst = (struct folio *)freepage;
- post_alloc_hook(&dst->page, order, __GFP_MOVABLE, USER_ADDR_NONE);
+ __ClearPageZeroed(&dst->page);
+ post_alloc_hook(&dst->page, order, __GFP_MOVABLE, false, USER_ADDR_NONE);
set_page_refcounted(&dst->page);
if (order)
prep_compound_page(&dst->page, order);
diff --git a/mm/internal.h b/mm/internal.h
index e39abab956e7..a01bc2c85cf2 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -895,7 +895,7 @@ static inline void prep_compound_tail(struct page *head, int tail_idx)
}
void post_alloc_hook(struct page *page, unsigned int order, gfp_t gfp_flags,
- unsigned long user_addr);
+ bool zeroed, unsigned long user_addr);
extern bool free_pages_prepare(struct page *page, unsigned int order);
extern int user_min_free_kbytes;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 127b343d3783..e5db2601d673 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1774,6 +1774,7 @@ static __always_inline void page_del_and_expand(struct zone *zone,
bool was_reported = page_reported(page);
__del_page_from_free_list(page, zone, high, migratetype);
+
nr_pages -= expand(zone, page, low, high, migratetype, was_reported);
account_freepages(zone, -nr_pages, migratetype);
}
@@ -1846,8 +1847,10 @@ static inline bool should_skip_init(gfp_t flags)
return (flags & __GFP_SKIP_ZERO);
}
+
inline void post_alloc_hook(struct page *page, unsigned int order,
- gfp_t gfp_flags, unsigned long user_addr)
+ gfp_t gfp_flags, bool zeroed,
+ unsigned long user_addr)
{
bool init = !want_init_on_free() && want_init_on_alloc(gfp_flags) &&
!should_skip_init(gfp_flags);
@@ -1856,6 +1859,14 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
set_page_private(page, 0);
+ /*
+ * If the page is zeroed, skip memory initialization.
+ * We still need to handle tag zeroing separately since the host
+ * does not know about memory tags.
+ */
+ if (zeroed && init && !zero_tags)
+ init = false;
+
arch_alloc_page(page, order);
debug_pagealloc_map_pages(page, 1 << order);
@@ -1913,13 +1924,13 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
}
static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags,
- unsigned int alloc_flags,
- unsigned long user_addr)
+ unsigned int alloc_flags, bool zeroed,
+ unsigned long user_addr)
{
if (order && (gfp_flags & __GFP_COMP))
prep_compound_page(page, order);
- post_alloc_hook(page, order, gfp_flags, user_addr);
+ post_alloc_hook(page, order, gfp_flags, zeroed, user_addr);
/*
* page is set pfmemalloc when ALLOC_NO_WATERMARKS was necessary to
@@ -3190,6 +3201,7 @@ int __isolate_free_page(struct page *page, unsigned int order)
}
del_page_from_free_list(page, zone, order, mt);
+ __ClearPageZeroed(page);
/*
* Set the pageblock if the isolated page is at least half of a
@@ -3262,7 +3274,7 @@ static inline void zone_statistics(struct zone *preferred_zone, struct zone *z,
static __always_inline
struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
unsigned int order, unsigned int alloc_flags,
- int migratetype)
+ int migratetype, bool *zeroed)
{
struct page *page;
unsigned long flags;
@@ -3297,6 +3309,8 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
}
}
spin_unlock_irqrestore(&zone->lock, flags);
+ *zeroed = PageZeroed(page);
+ __ClearPageZeroed(page);
} while (check_new_pages(page, order));
__count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order);
@@ -3358,10 +3372,9 @@ static int nr_pcp_alloc(struct per_cpu_pages *pcp, struct zone *zone, int order)
/* Remove page from the per-cpu list, caller must protect the list */
static inline
struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order,
- int migratetype,
- unsigned int alloc_flags,
+ int migratetype, unsigned int alloc_flags,
struct per_cpu_pages *pcp,
- struct list_head *list)
+ struct list_head *list, bool *zeroed)
{
struct page *page;
@@ -3382,6 +3395,8 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order,
page = list_first_entry(list, struct page, pcp_list);
list_del(&page->pcp_list);
pcp->count -= 1 << order;
+ *zeroed = PageZeroed(page);
+ __ClearPageZeroed(page);
} while (check_new_pages(page, order));
return page;
@@ -3390,7 +3405,8 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order,
/* Lock and remove page from the per-cpu list */
static struct page *rmqueue_pcplist(struct zone *preferred_zone,
struct zone *zone, unsigned int order,
- int migratetype, unsigned int alloc_flags)
+ int migratetype, unsigned int alloc_flags,
+ bool *zeroed)
{
struct per_cpu_pages *pcp;
struct list_head *list;
@@ -3409,7 +3425,8 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone,
*/
pcp->free_count >>= 1;
list = &pcp->lists[order_to_pindex(migratetype, order)];
- page = __rmqueue_pcplist(zone, order, migratetype, alloc_flags, pcp, list);
+ page = __rmqueue_pcplist(zone, order, migratetype, alloc_flags,
+ pcp, list, zeroed);
pcp_spin_unlock(pcp, UP_flags);
if (page) {
__count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order);
@@ -3434,19 +3451,19 @@ static inline
struct page *rmqueue(struct zone *preferred_zone,
struct zone *zone, unsigned int order,
gfp_t gfp_flags, unsigned int alloc_flags,
- int migratetype)
+ int migratetype, bool *zeroed)
{
struct page *page;
if (likely(pcp_allowed_order(order))) {
page = rmqueue_pcplist(preferred_zone, zone, order,
- migratetype, alloc_flags);
+ migratetype, alloc_flags, zeroed);
if (likely(page))
goto out;
}
page = rmqueue_buddy(preferred_zone, zone, order, alloc_flags,
- migratetype);
+ migratetype, zeroed);
out:
/* Separate test+clear to avoid unnecessary atomics */
@@ -3837,6 +3854,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
struct pglist_data *last_pgdat = NULL;
bool last_pgdat_dirty_ok = false;
bool no_fallback;
+ bool zeroed;
bool skip_kswapd_nodes = nr_online_nodes > 1;
bool skipped_kswapd_nodes = false;
@@ -3981,10 +3999,11 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
try_this_zone:
page = rmqueue(zonelist_zone(ac->preferred_zoneref), zone, order,
- gfp_mask, alloc_flags, ac->migratetype);
+ gfp_mask, alloc_flags, ac->migratetype,
+ &zeroed);
if (page) {
prep_new_page(page, order, gfp_mask, alloc_flags,
- ac->user_addr);
+ zeroed, ac->user_addr);
/*
* If this is a high-order atomic allocation then check
@@ -4218,9 +4237,11 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
count_vm_event(COMPACTSTALL);
/* Prep a captured page if available */
- if (page)
- prep_new_page(page, order, gfp_mask, alloc_flags,
+ if (page) {
+ __ClearPageZeroed(page);
+ prep_new_page(page, order, gfp_mask, alloc_flags, false,
ac->user_addr);
+ }
/* Try get a page from the freelist if available */
if (!page)
@@ -5194,6 +5215,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid,
/* Attempt the batch allocation */
pcp_list = &pcp->lists[order_to_pindex(ac.migratetype, 0)];
while (nr_populated < nr_pages) {
+ bool zeroed = false;
/* Skip existing pages */
if (page_array[nr_populated]) {
@@ -5202,7 +5224,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid,
}
page = __rmqueue_pcplist(zone, 0, ac.migratetype, alloc_flags,
- pcp, pcp_list);
+ pcp, pcp_list, &zeroed);
if (unlikely(!page)) {
/* Try and allocate at least one page */
if (!nr_account) {
@@ -5213,7 +5235,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid,
}
nr_account++;
- prep_new_page(page, 0, gfp, 0, USER_ADDR_NONE);
+ prep_new_page(page, 0, gfp, 0, zeroed, USER_ADDR_NONE);
set_page_refcounted(page);
page_array[nr_populated++] = page;
}
@@ -6975,7 +6997,8 @@ static void split_free_frozen_pages(struct list_head *list, gfp_t gfp_mask)
list_for_each_entry_safe(page, next, &list[order], lru) {
int i;
- post_alloc_hook(page, order, gfp_mask, USER_ADDR_NONE);
+ __ClearPageZeroed(page);
+ post_alloc_hook(page, order, gfp_mask, false, USER_ADDR_NONE);
if (!order)
continue;
@@ -7180,8 +7203,9 @@ int alloc_contig_frozen_range_noprof(unsigned long start, unsigned long end,
} else if (start == outer_start && end == outer_end && is_power_of_2(end - start)) {
struct page *head = pfn_to_page(start);
+ __ClearPageZeroed(head);
check_new_pages(head, order);
- prep_new_page(head, order, gfp_mask, 0, USER_ADDR_NONE);
+ prep_new_page(head, order, gfp_mask, 0, false, USER_ADDR_NONE);
} else {
ret = -EINVAL;
WARN(true, "PFN range: requested [%lu, %lu), allocated [%lu, %lu)\n",
diff --git a/mm/page_reporting.c b/mm/page_reporting.c
index 247cda44e9de..1f48fcd7c042 100644
--- a/mm/page_reporting.c
+++ b/mm/page_reporting.c
@@ -50,6 +50,8 @@ EXPORT_SYMBOL_GPL(page_reporting_order);
#define PAGE_REPORTING_DELAY (2 * HZ)
static struct page_reporting_dev_info __rcu *pr_dev_info __read_mostly;
+DEFINE_STATIC_KEY_FALSE(page_reporting_host_zeroes);
+
enum {
PAGE_REPORTING_IDLE = 0,
PAGE_REPORTING_REQUESTED,
@@ -129,8 +131,11 @@ page_reporting_drain(struct page_reporting_dev_info *prdev,
* report on the new larger page when we make our way
* up to that higher order.
*/
- if (PageBuddy(page) && buddy_order(page) == order)
+ if (PageBuddy(page) && buddy_order(page) == order) {
__SetPageReported(page);
+ if (page_reporting_host_zeroes_pages())
+ __SetPageZeroed(page);
+ }
} while ((sg = sg_next(sg)));
/* reinitialize scatterlist now that it is empty */
@@ -390,6 +395,10 @@ int page_reporting_register(struct page_reporting_dev_info *prdev)
/* Assign device to allow notifications */
rcu_assign_pointer(pr_dev_info, prdev);
+ /* enable zeroed page optimization if host zeroes reported pages */
+ if (prdev->host_zeroes_pages)
+ static_branch_enable(&page_reporting_host_zeroes);
+
/* enable page reporting notification */
if (!static_key_enabled(&page_reporting_enabled)) {
static_branch_enable(&page_reporting_enabled);
@@ -414,6 +423,9 @@ void page_reporting_unregister(struct page_reporting_dev_info *prdev)
/* Flush any existing work, and lock it out */
cancel_delayed_work_sync(&prdev->work);
+
+ if (prdev->host_zeroes_pages)
+ static_branch_disable(&page_reporting_host_zeroes);
}
mutex_unlock(&page_reporting_mutex);
diff --git a/mm/page_reporting.h b/mm/page_reporting.h
index c51dbc228b94..736ea7b37e9e 100644
--- a/mm/page_reporting.h
+++ b/mm/page_reporting.h
@@ -15,6 +15,13 @@ DECLARE_STATIC_KEY_FALSE(page_reporting_enabled);
extern unsigned int page_reporting_order;
void __page_reporting_notify(void);
+DECLARE_STATIC_KEY_FALSE(page_reporting_host_zeroes);
+
+static inline bool page_reporting_host_zeroes_pages(void)
+{
+ return static_branch_unlikely(&page_reporting_host_zeroes);
+}
+
static inline bool page_reported(struct page *page)
{
return static_branch_unlikely(&page_reporting_enabled) &&
@@ -46,6 +53,11 @@ static inline void page_reporting_notify_free(unsigned int order)
#else /* CONFIG_PAGE_REPORTING */
#define page_reported(_page) false
+static inline bool page_reporting_host_zeroes_pages(void)
+{
+ return false;
+}
+
static inline void page_reporting_notify_free(unsigned int order)
{
}
--
MST
next prev parent reply other threads:[~2026-05-07 22:23 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <cover.1778192416.git.mst@redhat.com>
2026-05-07 22:22 ` [PATCH v5 01/28] mm: mempolicy: fix interleave index for unaligned VMA start Michael S. Tsirkin
2026-05-07 22:22 ` [PATCH v5 02/28] mm: thread user_addr through page allocator for cache-friendly zeroing Michael S. Tsirkin
2026-05-07 22:22 ` [PATCH v5 03/28] mm: add folio_zero_user stub for configs without THP/HUGETLBFS Michael S. Tsirkin
2026-05-07 22:22 ` [PATCH v5 04/28] mm: page_alloc: move prep_compound_page before post_alloc_hook Michael S. Tsirkin
2026-05-07 22:22 ` [PATCH v5 05/28] mm: use folio_zero_user for user pages in post_alloc_hook Michael S. Tsirkin
2026-05-07 22:22 ` [PATCH v5 06/28] mm: use __GFP_ZERO in vma_alloc_zeroed_movable_folio Michael S. Tsirkin
2026-05-07 22:22 ` [PATCH v5 07/28] mm: alloc_anon_folio: pass raw fault address to vma_alloc_folio Michael S. Tsirkin
2026-05-07 22:22 ` [PATCH v5 08/28] mm: use __GFP_ZERO in alloc_anon_folio Michael S. Tsirkin
2026-05-07 22:22 ` [PATCH v5 09/28] mm: vma_alloc_anon_folio_pmd: pass raw fault address to vma_alloc_folio Michael S. Tsirkin
2026-05-08 3:36 ` Dev Jain
2026-05-08 5:01 ` Lance Yang
2026-05-08 6:11 ` Michael S. Tsirkin
2026-05-08 6:10 ` Michael S. Tsirkin
2026-05-08 12:10 ` David Hildenbrand (Arm)
2026-05-09 19:32 ` Michael S. Tsirkin
2026-05-08 13:12 ` Lorenzo Stoakes
2026-05-09 19:35 ` Michael S. Tsirkin
2026-05-07 22:22 ` [PATCH v5 10/28] mm: use __GFP_ZERO in vma_alloc_anon_folio_pmd Michael S. Tsirkin
2026-05-07 22:22 ` [PATCH v5 11/28] mm: hugetlb: use __GFP_ZERO and skip zeroing for zeroed pages Michael S. Tsirkin
2026-05-07 22:23 ` [PATCH v5 12/28] mm: memfd: skip zeroing for zeroed hugetlb pool pages Michael S. Tsirkin
2026-05-07 22:23 ` [PATCH v5 14/28] mm: page_reporting: allow driver to set batch capacity Michael S. Tsirkin
2026-05-07 22:23 ` [PATCH v5 15/28] mm: page_alloc: propagate PageReported flag across buddy splits Michael S. Tsirkin
2026-05-07 22:23 ` Michael S. Tsirkin [this message]
2026-05-07 22:23 ` [PATCH v5 17/28] mm: page_reporting: add per-page zeroed bitmap for host feedback Michael S. Tsirkin
2026-05-07 22:23 ` [PATCH v5 18/28] mm: page_alloc: clear PG_zeroed on buddy merge if not both zero Michael S. Tsirkin
2026-05-07 22:23 ` [PATCH v5 19/28] mm: page_alloc: preserve PG_zeroed in page_del_and_expand Michael S. Tsirkin
2026-05-07 22:23 ` [PATCH v5 21/28] mm: page_reporting: add flush parameter with page budget Michael S. Tsirkin
2026-05-07 22:23 ` [PATCH v5 22/28] mm: page_alloc: propagate PG_zeroed in split_large_buddy Michael S. Tsirkin
2026-05-07 22:23 ` [PATCH v5 23/28] mm: add free_frozen_pages_zeroed Michael S. Tsirkin
2026-05-07 22:23 ` [PATCH v5 24/28] mm: add put_page_zeroed and folio_put_zeroed Michael S. Tsirkin
2026-05-07 22:23 ` [PATCH v5 26/28] mm: balloon: use put_page_zeroed for zeroed balloon pages Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=da0482ba6f868a12417e46a8eb86594f5eb58d83.1778192416.git.mst@redhat.com \
--to=mst@redhat.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=david@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=jackmanb@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox