public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC v2 01/18] mm: page_alloc: propagate PageReported flag across buddy splits
  2026-04-20 12:51 [PATCH RFC v2 00/18] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
@ 2026-04-20 12:50 ` Michael S. Tsirkin
  2026-04-20 12:50 ` [PATCH RFC v2 02/18] mm: add pghint_t type and vma_alloc_folio_hints API Michael S. Tsirkin
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Michael S. Tsirkin @ 2026-04-20 12:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
	Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
	Andrea Arcangeli, linux-mm, virtualization, Johannes Weiner,
	Zi Yan

When a reported free page is split via expand() to satisfy a
smaller allocation, the sub-pages placed back on the free lists
lose the PageReported flag.  This means they will be unnecessarily
re-reported to the hypervisor in the next reporting cycle, wasting
work.

Propagate the PageReported flag to sub-pages during expand() so
that they are recognized as already-reported.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
Assisted-by: cursor-agent:GPT-5.4-xhigh
---
 mm/page_alloc.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2d4b6f1a554e..edbb1edf463d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1730,7 +1730,7 @@ struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
  * -- nyc
  */
 static inline unsigned int expand(struct zone *zone, struct page *page, int low,
-				  int high, int migratetype)
+				  int high, int migratetype, bool reported)
 {
 	unsigned int size = 1 << high;
 	unsigned int nr_added = 0;
@@ -1752,6 +1752,15 @@ static inline unsigned int expand(struct zone *zone, struct page *page, int low,
 		__add_to_free_list(&page[size], zone, high, migratetype, false);
 		set_buddy_order(&page[size], high);
 		nr_added += size;
+
+		/*
+		 * The parent page has been reported to the host.  The
+		 * sub-pages are part of the same reported block, so mark
+		 * them reported too.  This avoids re-reporting pages that
+		 * the host already knows about.
+		 */
+		if (reported)
+			__SetPageReported(&page[size]);
 	}
 
 	return nr_added;
@@ -1762,9 +1771,10 @@ static __always_inline void page_del_and_expand(struct zone *zone,
 						int high, int migratetype)
 {
 	int nr_pages = 1 << high;
+	bool was_reported = page_reported(page);
 
 	__del_page_from_free_list(page, zone, high, migratetype);
-	nr_pages -= expand(zone, page, low, high, migratetype);
+	nr_pages -= expand(zone, page, low, high, migratetype, was_reported);
 	account_freepages(zone, -nr_pages, migratetype);
 }
 
@@ -2322,7 +2332,8 @@ try_to_claim_block(struct zone *zone, struct page *page,
 
 		del_page_from_free_list(page, zone, current_order, block_type);
 		change_pageblock_range(page, current_order, start_type);
-		nr_added = expand(zone, page, order, current_order, start_type);
+		nr_added = expand(zone, page, order, current_order, start_type,
+				  false);
 		account_freepages(zone, nr_added, start_type);
 		return page;
 	}
-- 
MST


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC v2 02/18] mm: add pghint_t type and vma_alloc_folio_hints API
  2026-04-20 12:51 [PATCH RFC v2 00/18] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
  2026-04-20 12:50 ` [PATCH RFC v2 01/18] mm: page_alloc: propagate PageReported flag across buddy splits Michael S. Tsirkin
@ 2026-04-20 12:50 ` Michael S. Tsirkin
  2026-04-21  0:58   ` Huang, Ying
  2026-04-20 12:50 ` [PATCH RFC v2 03/18] mm: add PG_zeroed page flag for known-zero pages Michael S. Tsirkin
                   ` (18 subsequent siblings)
  20 siblings, 1 reply; 25+ messages in thread
From: Michael S. Tsirkin @ 2026-04-20 12:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
	Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
	Andrea Arcangeli, linux-mm, virtualization, Johannes Weiner,
	Zi Yan, Lorenzo Stoakes, Liam R. Howlett, Mike Rapoport,
	Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Gregory Price, Ying Huang, Alistair Popple

Add pghint_t, a bitwise type for communicating page allocation hints
between the allocator and callers.  Define PGHINT_ZEROED to indicate
that the allocated page contents are known to be zero.

Add _hints variants of the allocation functions that accept a
pghint_t *hints output parameter:

  vma_alloc_folio_hints()  -> folio_alloc_mpol_hints (internal)
                           -> __alloc_frozen_pages_hints()

The existing APIs are unchanged and continue to work without hints.
For now, hints is always initialized to 0.  A subsequent patch will
set PGHINT_ZEROED when the page was pre-zeroed by the host.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
Assisted-by: cursor-agent:GPT-5.4-xhigh
---
 include/linux/gfp.h | 15 ++++++++
 mm/internal.h       |  4 +++
 mm/mempolicy.c      | 85 +++++++++++++++++++++++++++++++++++++++++++++
 mm/page_alloc.c     | 15 ++++++--
 4 files changed, 117 insertions(+), 2 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 51ef13ed756e..14433a20e60c 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -226,6 +226,9 @@ static inline void arch_free_page(struct page *page, int order) { }
 static inline void arch_alloc_page(struct page *page, int order) { }
 #endif
 
+typedef unsigned int __bitwise pghint_t;
+#define PGHINT_ZEROED	((__force pghint_t)BIT(0))
+
 struct page *__alloc_pages_noprof(gfp_t gfp, unsigned int order, int preferred_nid,
 		nodemask_t *nodemask);
 #define __alloc_pages(...)			alloc_hooks(__alloc_pages_noprof(__VA_ARGS__))
@@ -325,6 +328,9 @@ struct folio *folio_alloc_mpol_noprof(gfp_t gfp, unsigned int order,
 		struct mempolicy *mpol, pgoff_t ilx, int nid);
 struct folio *vma_alloc_folio_noprof(gfp_t gfp, int order, struct vm_area_struct *vma,
 		unsigned long addr);
+struct folio *vma_alloc_folio_hints_noprof(gfp_t gfp, int order,
+		struct vm_area_struct *vma, unsigned long addr,
+		pghint_t *hints);
 #else
 static inline struct page *alloc_pages_noprof(gfp_t gfp_mask, unsigned int order)
 {
@@ -344,12 +350,21 @@ static inline struct folio *vma_alloc_folio_noprof(gfp_t gfp, int order,
 {
 	return folio_alloc_noprof(gfp, order);
 }
+static inline struct folio *vma_alloc_folio_hints_noprof(gfp_t gfp, int order,
+		struct vm_area_struct *vma, unsigned long addr,
+		pghint_t *hints)
+{
+	if (hints)
+		*hints = 0;
+	return folio_alloc_noprof(gfp, order);
+}
 #endif
 
 #define alloc_pages(...)			alloc_hooks(alloc_pages_noprof(__VA_ARGS__))
 #define folio_alloc(...)			alloc_hooks(folio_alloc_noprof(__VA_ARGS__))
 #define folio_alloc_mpol(...)			alloc_hooks(folio_alloc_mpol_noprof(__VA_ARGS__))
 #define vma_alloc_folio(...)			alloc_hooks(vma_alloc_folio_noprof(__VA_ARGS__))
+#define vma_alloc_folio_hints(...)		alloc_hooks(vma_alloc_folio_hints_noprof(__VA_ARGS__))
 
 #define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0)
 
diff --git a/mm/internal.h b/mm/internal.h
index cb0af847d7d9..686667b956c0 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -894,8 +894,12 @@ extern int user_min_free_kbytes;
 
 struct page *__alloc_frozen_pages_noprof(gfp_t, unsigned int order, int nid,
 		nodemask_t *);
+struct page *__alloc_frozen_pages_hints_noprof(gfp_t, unsigned int order,
+		int nid, nodemask_t *, pghint_t *hints);
 #define __alloc_frozen_pages(...) \
 	alloc_hooks(__alloc_frozen_pages_noprof(__VA_ARGS__))
+#define __alloc_frozen_pages_hints(...) \
+	alloc_hooks(__alloc_frozen_pages_hints_noprof(__VA_ARGS__))
 void free_frozen_pages(struct page *page, unsigned int order);
 void free_unref_folios(struct folio_batch *fbatch);
 
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index cf92bd6a8226..b918639eef71 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2547,6 +2547,91 @@ struct folio *vma_alloc_folio_noprof(gfp_t gfp, int order, struct vm_area_struct
 }
 EXPORT_SYMBOL(vma_alloc_folio_noprof);
 
+static struct page *alloc_pages_preferred_many_hints(gfp_t gfp,
+		unsigned int order, int nid, nodemask_t *nodemask,
+		pghint_t *hints)
+{
+	struct page *page;
+	gfp_t preferred_gfp;
+
+	preferred_gfp = gfp | __GFP_NOWARN;
+	preferred_gfp &= ~(__GFP_DIRECT_RECLAIM | __GFP_NOFAIL);
+	page = __alloc_frozen_pages_hints_noprof(preferred_gfp, order, nid,
+						 nodemask, hints);
+	if (!page)
+		page = __alloc_frozen_pages_hints_noprof(gfp, order, nid, NULL,
+							 hints);
+
+	return page;
+}
+
+static struct page *alloc_pages_mpol_hints(gfp_t gfp, unsigned int order,
+		struct mempolicy *pol, pgoff_t ilx, int nid,
+		pghint_t *hints)
+{
+	nodemask_t *nodemask;
+	struct page *page;
+
+	nodemask = policy_nodemask(gfp, pol, ilx, &nid);
+
+	if (pol->mode == MPOL_PREFERRED_MANY)
+		return alloc_pages_preferred_many_hints(gfp, order, nid,
+						       nodemask, hints);
+
+	if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
+	    order == HPAGE_PMD_ORDER && ilx != NO_INTERLEAVE_INDEX) {
+		if (pol->mode != MPOL_INTERLEAVE &&
+		    pol->mode != MPOL_WEIGHTED_INTERLEAVE &&
+		    (!nodemask || node_isset(nid, *nodemask))) {
+			page = __alloc_frozen_pages_hints_noprof(
+				gfp | __GFP_THISNODE | __GFP_NORETRY, order,
+				nid, NULL, hints);
+			if (page || !(gfp & __GFP_DIRECT_RECLAIM))
+				return page;
+		}
+	}
+
+	page = __alloc_frozen_pages_hints_noprof(gfp, order, nid, nodemask,
+						 hints);
+
+	if (unlikely(pol->mode == MPOL_INTERLEAVE ||
+		     pol->mode == MPOL_WEIGHTED_INTERLEAVE) && page) {
+		if (static_branch_likely(&vm_numa_stat_key) &&
+		    page_to_nid(page) == nid) {
+			preempt_disable();
+			__count_numa_event(page_zone(page), NUMA_INTERLEAVE_HIT);
+			preempt_enable();
+		}
+	}
+
+	return page;
+}
+
+struct folio *vma_alloc_folio_hints_noprof(gfp_t gfp, int order,
+		struct vm_area_struct *vma, unsigned long addr,
+		pghint_t *hints)
+{
+	struct mempolicy *pol;
+	pgoff_t ilx;
+	struct folio *folio;
+	struct page *page;
+
+	if (vma->vm_flags & VM_DROPPABLE)
+		gfp |= __GFP_NOWARN;
+
+	pol = get_vma_policy(vma, addr, order, &ilx);
+	page = alloc_pages_mpol_hints(gfp | __GFP_COMP, order, pol, ilx,
+				      numa_node_id(), hints);
+	mpol_cond_put(pol);
+	if (!page)
+		return NULL;
+
+	set_page_refcounted(page);
+	folio = page_rmappable_folio(page);
+	return folio;
+}
+EXPORT_SYMBOL(vma_alloc_folio_hints_noprof);
+
 struct page *alloc_frozen_pages_noprof(gfp_t gfp, unsigned order)
 {
 	struct mempolicy *pol = &default_policy;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index edbb1edf463d..f7abbc46e725 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5222,14 +5222,17 @@ EXPORT_SYMBOL_GPL(alloc_pages_bulk_noprof);
 /*
  * This is the 'heart' of the zoned buddy allocator.
  */
-struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
-		int preferred_nid, nodemask_t *nodemask)
+struct page *__alloc_frozen_pages_hints_noprof(gfp_t gfp, unsigned int order,
+		int preferred_nid, nodemask_t *nodemask, pghint_t *hints)
 {
 	struct page *page;
 	unsigned int alloc_flags = ALLOC_WMARK_LOW;
 	gfp_t alloc_gfp; /* The gfp_t that was actually used for allocation */
 	struct alloc_context ac = { };
 
+	if (hints)
+		*hints = (pghint_t)0;
+
 	/*
 	 * There are several places where we assume that the order value is sane
 	 * so bail out early if the request is out of bound.
@@ -5285,6 +5288,14 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
 
 	return page;
 }
+EXPORT_SYMBOL(__alloc_frozen_pages_hints_noprof);
+
+struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
+		int preferred_nid, nodemask_t *nodemask)
+{
+	return __alloc_frozen_pages_hints_noprof(gfp, order, preferred_nid,
+						nodemask, NULL);
+}
 EXPORT_SYMBOL(__alloc_frozen_pages_noprof);
 
 struct page *__alloc_pages_noprof(gfp_t gfp, unsigned int order,
-- 
MST


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC v2 03/18] mm: add PG_zeroed page flag for known-zero pages
  2026-04-20 12:51 [PATCH RFC v2 00/18] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
  2026-04-20 12:50 ` [PATCH RFC v2 01/18] mm: page_alloc: propagate PageReported flag across buddy splits Michael S. Tsirkin
  2026-04-20 12:50 ` [PATCH RFC v2 02/18] mm: add pghint_t type and vma_alloc_folio_hints API Michael S. Tsirkin
@ 2026-04-20 12:50 ` Michael S. Tsirkin
  2026-04-20 12:50 ` [PATCH RFC v2 04/18] mm: page_alloc: track PG_zeroed across buddy merges Michael S. Tsirkin
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Michael S. Tsirkin @ 2026-04-20 12:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
	Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
	Andrea Arcangeli, linux-mm, virtualization, Lorenzo Stoakes,
	Liam R. Howlett, Mike Rapoport

Add PG_zeroed (aliased to PG_private) to track pages whose contents
are known to be zero.  Exclude __PG_ZEROED from PAGE_FLAGS_CHECK_AT_PREP
so the allocator does not BUG when encountering zeroed pages on the
free list.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
Assisted-by: cursor-agent:GPT-5.4-xhigh
---
 include/linux/page-flags.h | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index f7a0e4af0c73..f87ecb740e7f 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -157,6 +157,9 @@ enum pageflags {
 	 */
 	PG_fscache = PG_private_2,	/* page backed by cache */
 
+	/* Page contents are known to be zero (host-zeroed or balloon) */
+	PG_zeroed = PG_private,
+
 	/* XEN */
 	/* Pinned in Xen as a read-only pagetable page. */
 	PG_pinned = PG_owner_priv_1,
@@ -687,6 +690,14 @@ FOLIO_FLAG_FALSE(idle)
  */
 __PAGEFLAG(Reported, reported, PF_NO_COMPOUND)
 
+/*
+ * PageZeroed() tracks pages whose contents are known to be zero.
+ * Set on free-list pages by the balloon driver or page reporting.
+ * The allocator uses this to skip redundant zeroing.
+ */
+__PAGEFLAG(Zeroed, zeroed, PF_NO_COMPOUND)
+#define __PG_ZEROED (1UL << PG_zeroed)
+
 #ifdef CONFIG_MEMORY_HOTPLUG
 PAGEFLAG(VmemmapSelfHosted, vmemmap_self_hosted, PF_ANY)
 #else
@@ -1209,7 +1220,7 @@ static __always_inline void __ClearPageAnonExclusive(struct page *page)
  * alloc-free cycle to prevent from reusing the page.
  */
 #define PAGE_FLAGS_CHECK_AT_PREP	\
-	((PAGEFLAGS_MASK & ~__PG_HWPOISON) | LRU_GEN_MASK | LRU_REFS_MASK)
+	((PAGEFLAGS_MASK & ~(__PG_HWPOISON | __PG_ZEROED)) | LRU_GEN_MASK | LRU_REFS_MASK)
 
 /*
  * Flags stored in the second page of a compound page.  They may overlap
-- 
MST


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC v2 04/18] mm: page_alloc: track PG_zeroed across buddy merges
  2026-04-20 12:51 [PATCH RFC v2 00/18] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
                   ` (2 preceding siblings ...)
  2026-04-20 12:50 ` [PATCH RFC v2 03/18] mm: add PG_zeroed page flag for known-zero pages Michael S. Tsirkin
@ 2026-04-20 12:50 ` Michael S. Tsirkin
  2026-04-20 12:50 ` [PATCH RFC v2 05/18] mm: page_alloc: preserve PG_zeroed in try_to_claim_block Michael S. Tsirkin
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Michael S. Tsirkin @ 2026-04-20 12:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
	Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
	Andrea Arcangeli, linux-mm, virtualization, Johannes Weiner,
	Zi Yan

Preserve PG_zeroed when two buddy pages merge in __free_one_page().
Set it on the merged page only if both buddies are known-zero.  A buddy is
known-zero if it has PG_zeroed set, or if it is reported and the
host zeroes reported pages.

Without this, a zeroed page freed via free_frozen_pages_hint could
merge with a non-zero buddy, and the merged page would falsely
appear zeroed.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
Assisted-by: cursor-agent:GPT-5.4-xhigh
---
 mm/page_alloc.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f7abbc46e725..6adc894748c8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -984,6 +984,8 @@ static inline void __free_one_page(struct page *page,
 	unsigned long buddy_pfn = 0;
 	unsigned long combined_pfn;
 	struct page *buddy;
+	bool buddy_zeroed;
+	bool page_zeroed;
 	bool to_tail;
 
 	VM_BUG_ON(!zone_is_initialized(zone));
@@ -1022,6 +1024,8 @@ static inline void __free_one_page(struct page *page,
 				goto done_merging;
 		}
 
+		buddy_zeroed = PageZeroed(buddy);
+
 		/*
 		 * Our buddy is free or it is CONFIG_DEBUG_PAGEALLOC guard page,
 		 * merge with it and move up one order.
@@ -1040,10 +1044,17 @@ static inline void __free_one_page(struct page *page,
 			change_pageblock_range(buddy, order, migratetype);
 		}
 
+		page_zeroed = PageZeroed(page);
+		__ClearPageZeroed(page);
+		__ClearPageZeroed(buddy);
+
 		combined_pfn = buddy_pfn & pfn;
 		page = page + (combined_pfn - pfn);
 		pfn = combined_pfn;
 		order++;
+
+		if (page_zeroed && buddy_zeroed)
+			__SetPageZeroed(page);
 	}
 
 done_merging:
@@ -1730,7 +1741,8 @@ struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
  * -- nyc
  */
 static inline unsigned int expand(struct zone *zone, struct page *page, int low,
-				  int high, int migratetype, bool reported)
+				  int high, int migratetype, bool reported,
+				  bool zeroed)
 {
 	unsigned int size = 1 << high;
 	unsigned int nr_added = 0;
@@ -1761,6 +1773,8 @@ static inline unsigned int expand(struct zone *zone, struct page *page, int low,
 		 */
 		if (reported)
 			__SetPageReported(&page[size]);
+		if (zeroed)
+			__SetPageZeroed(&page[size]);
 	}
 
 	return nr_added;
@@ -1772,9 +1786,11 @@ static __always_inline void page_del_and_expand(struct zone *zone,
 {
 	int nr_pages = 1 << high;
 	bool was_reported = page_reported(page);
+	bool was_zeroed = PageZeroed(page);
 
 	__del_page_from_free_list(page, zone, high, migratetype);
-	nr_pages -= expand(zone, page, low, high, migratetype, was_reported);
+	nr_pages -= expand(zone, page, low, high, migratetype, was_reported,
+			   was_zeroed);
 	account_freepages(zone, -nr_pages, migratetype);
 }
 
@@ -2333,7 +2349,7 @@ try_to_claim_block(struct zone *zone, struct page *page,
 		del_page_from_free_list(page, zone, current_order, block_type);
 		change_pageblock_range(page, current_order, start_type);
 		nr_added = expand(zone, page, order, current_order, start_type,
-				  false);
+				  false, false);
 		account_freepages(zone, nr_added, start_type);
 		return page;
 	}
-- 
MST


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC v2 05/18] mm: page_alloc: preserve PG_zeroed in try_to_claim_block
  2026-04-20 12:51 [PATCH RFC v2 00/18] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
                   ` (3 preceding siblings ...)
  2026-04-20 12:50 ` [PATCH RFC v2 04/18] mm: page_alloc: track PG_zeroed across buddy merges Michael S. Tsirkin
@ 2026-04-20 12:50 ` Michael S. Tsirkin
  2026-04-20 12:50 ` [PATCH RFC v2 06/18] mm: page_alloc: thread pghint_t through get_page_from_freelist Michael S. Tsirkin
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Michael S. Tsirkin @ 2026-04-20 12:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
	Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
	Andrea Arcangeli, linux-mm, virtualization, Johannes Weiner,
	Zi Yan

try_to_claim_block() calls expand() with false for both reported
and zeroed, losing the zeroed state for claimed pageblocks.

Capture reported and zeroed state before del_page_from_free_list()
clears PageReported, and pass them to expand() for sub-page
propagation.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
Assisted-by: cursor-agent:GPT-5.4-xhigh
---
 mm/page_alloc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6adc894748c8..b0971a1eaa73 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2345,11 +2345,13 @@ try_to_claim_block(struct zone *zone, struct page *page,
 	/* Take ownership for orders >= pageblock_order */
 	if (current_order >= pageblock_order) {
 		unsigned int nr_added;
+		bool was_reported = page_reported(page);
+		bool was_zeroed = PageZeroed(page);
 
 		del_page_from_free_list(page, zone, current_order, block_type);
 		change_pageblock_range(page, current_order, start_type);
 		nr_added = expand(zone, page, order, current_order, start_type,
-				  false, false);
+				  was_reported, was_zeroed);
 		account_freepages(zone, nr_added, start_type);
 		return page;
 	}
-- 
MST


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC v2 06/18] mm: page_alloc: thread pghint_t through get_page_from_freelist
  2026-04-20 12:51 [PATCH RFC v2 00/18] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
                   ` (4 preceding siblings ...)
  2026-04-20 12:50 ` [PATCH RFC v2 05/18] mm: page_alloc: preserve PG_zeroed in try_to_claim_block Michael S. Tsirkin
@ 2026-04-20 12:50 ` Michael S. Tsirkin
  2026-04-20 12:50 ` [PATCH RFC v2 07/18] mm: post_alloc_hook: use PG_zeroed to skip zeroing, return pghint_t Michael S. Tsirkin
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Michael S. Tsirkin @ 2026-04-20 12:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
	Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
	Andrea Arcangeli, linux-mm, virtualization, Johannes Weiner,
	Zi Yan

Add pghint_t *hints to get_page_from_freelist() and pass it to
prep_new_page().  All internal callers except the main fast path
in __alloc_frozen_pages_hints_noprof() pass NULL.

The next patch uses this to return hints from
post_alloc_hook() to callers via __alloc_frozen_pages_hints().

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
Assisted-by: cursor-agent:GPT-5.4-xhigh
---
 mm/page_alloc.c | 37 +++++++++++++++++++++++--------------
 1 file changed, 23 insertions(+), 14 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b0971a1eaa73..ece61d02ea96 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2483,7 +2483,8 @@ __rmqueue_steal(struct zone *zone, int order, int start_migratetype)
 			continue;
 
 		page = get_page_from_free_area(area, fallback_mt);
-		page_del_and_expand(zone, page, order, current_order, fallback_mt);
+		page_del_and_expand(zone, page, order, current_order,
+				    fallback_mt);
 		trace_mm_page_alloc_extfrag(page, order, current_order,
 					    start_migratetype, fallback_mt);
 		return page;
@@ -3294,7 +3295,8 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
 			 * high-order atomic allocation in the future.
 			 */
 			if (!page && (alloc_flags & (ALLOC_OOM|ALLOC_NON_BLOCK)))
-				page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
+				page = __rmqueue_smallest(zone, order,
+							  MIGRATE_HIGHATOMIC);
 
 			if (!page) {
 				spin_unlock_irqrestore(&zone->lock, flags);
@@ -3414,7 +3416,8 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone,
 	 */
 	pcp->free_count >>= 1;
 	list = &pcp->lists[order_to_pindex(migratetype, order)];
-	page = __rmqueue_pcplist(zone, order, migratetype, alloc_flags, pcp, list);
+	page = __rmqueue_pcplist(zone, order, migratetype, alloc_flags, pcp,
+				list);
 	pcp_spin_unlock(pcp, UP_flags);
 	if (page) {
 		__count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order);
@@ -3451,7 +3454,7 @@ struct page *rmqueue(struct zone *preferred_zone,
 	}
 
 	page = rmqueue_buddy(preferred_zone, zone, order, alloc_flags,
-							migratetype);
+			     migratetype);
 
 out:
 	/* Separate test+clear to avoid unnecessary atomics */
@@ -3835,7 +3838,7 @@ static inline unsigned int gfp_to_alloc_flags_cma(gfp_t gfp_mask,
  */
 static struct page *
 get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
-						const struct alloc_context *ac)
+		       const struct alloc_context *ac, pghint_t *hints)
 {
 	struct zoneref *z;
 	struct zone *zone;
@@ -4084,14 +4087,14 @@ __alloc_pages_cpuset_fallback(gfp_t gfp_mask, unsigned int order,
 	struct page *page;
 
 	page = get_page_from_freelist(gfp_mask, order,
-			alloc_flags|ALLOC_CPUSET, ac);
+			alloc_flags|ALLOC_CPUSET, ac, NULL);
 	/*
 	 * fallback to ignore cpuset restriction if our nodes
 	 * are depleted
 	 */
 	if (!page)
 		page = get_page_from_freelist(gfp_mask, order,
-				alloc_flags, ac);
+				alloc_flags, ac, NULL);
 	return page;
 }
 
@@ -4129,7 +4132,8 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 	 */
 	page = get_page_from_freelist((gfp_mask | __GFP_HARDWALL) &
 				      ~__GFP_DIRECT_RECLAIM, order,
-				      ALLOC_WMARK_HIGH|ALLOC_CPUSET, ac);
+				      ALLOC_WMARK_HIGH|ALLOC_CPUSET, ac,
+				      NULL);
 	if (page)
 		goto out;
 
@@ -4227,7 +4231,8 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 
 	/* Try get a page from the freelist if available */
 	if (!page)
-		page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
+		page = get_page_from_freelist(gfp_mask, order, alloc_flags,
+					     ac, NULL);
 
 	if (page) {
 		struct zone *zone = page_zone(page);
@@ -4477,7 +4482,8 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
 		goto out;
 
 retry:
-	page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
+	page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac,
+				     NULL);
 
 	/*
 	 * If an allocation failed after direct reclaim, it could be because
@@ -4831,7 +4837,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	 * The adjusted alloc_flags might result in immediate success, so try
 	 * that first
 	 */
-	page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
+	page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac,
+				     NULL);
 	if (page)
 		goto got_pg;
 
@@ -5249,7 +5256,7 @@ struct page *__alloc_frozen_pages_hints_noprof(gfp_t gfp, unsigned int order,
 	struct alloc_context ac = { };
 
 	if (hints)
-		*hints = (pghint_t)0;
+		*hints = 0;
 
 	/*
 	 * There are several places where we assume that the order value is sane
@@ -5279,7 +5286,8 @@ struct page *__alloc_frozen_pages_hints_noprof(gfp_t gfp, unsigned int order,
 	alloc_flags |= alloc_flags_nofragment(zonelist_zone(ac.preferred_zoneref), gfp);
 
 	/* First allocation attempt */
-	page = get_page_from_freelist(alloc_gfp, order, alloc_flags, &ac);
+	page = get_page_from_freelist(alloc_gfp, order, alloc_flags, &ac,
+				     hints);
 	if (likely(page))
 		goto out;
 
@@ -7855,7 +7863,8 @@ struct page *alloc_frozen_pages_nolock_noprof(gfp_t gfp_flags, int nid, unsigned
 	 * Best effort allocation from percpu free list.
 	 * If it's empty attempt to spin_trylock zone->lock.
 	 */
-	page = get_page_from_freelist(alloc_gfp, order, alloc_flags, &ac);
+	page = get_page_from_freelist(alloc_gfp, order, alloc_flags, &ac,
+				     NULL);
 
 	/* Unlike regular alloc_pages() there is no __alloc_pages_slowpath(). */
 
-- 
MST


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC v2 07/18] mm: post_alloc_hook: use PG_zeroed to skip zeroing, return pghint_t
  2026-04-20 12:51 [PATCH RFC v2 00/18] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
                   ` (5 preceding siblings ...)
  2026-04-20 12:50 ` [PATCH RFC v2 06/18] mm: page_alloc: thread pghint_t through get_page_from_freelist Michael S. Tsirkin
@ 2026-04-20 12:50 ` Michael S. Tsirkin
  2026-04-20 12:50 ` [PATCH RFC v2 08/18] mm: hugetlb: thread pghint_t through buddy allocation chain Michael S. Tsirkin
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Michael S. Tsirkin @ 2026-04-20 12:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
	Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
	Andrea Arcangeli, linux-mm, virtualization, Johannes Weiner,
	Zi Yan, Lorenzo Stoakes, Liam R. Howlett, Mike Rapoport

Add pghint_t *hints parameter to post_alloc_hook() and
prep_new_page(). post_alloc_hook() reads PageZeroed, clears
it, and returns PGHINT_ZEROED via hints.

This provides a single point where PG_zeroed is consumed and
cleared, regardless of whether the page came through PCP or
buddy. The flag is set in page_del_and_expand() and survives
both paths until post_alloc_hook() consumes it.

Only get_page_from_freelist() passes hints through
prep_new_page(); all other callers (compaction, bulk alloc,
split, contig) pass NULL.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
Assisted-by: cursor-agent:GPT-5.4-xhigh
---
 mm/compaction.c |  4 ++--
 mm/internal.h   |  3 ++-
 mm/page_alloc.c | 25 +++++++++++++++++--------
 3 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 1e8f8eca318c..6fcce7756613 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -82,7 +82,7 @@ static inline bool is_via_compact_memory(int order) { return false; }
 
 static struct page *mark_allocated_noprof(struct page *page, unsigned int order, gfp_t gfp_flags)
 {
-	post_alloc_hook(page, order, __GFP_MOVABLE);
+	post_alloc_hook(page, order, __GFP_MOVABLE, NULL);
 	set_page_refcounted(page);
 	return page;
 }
@@ -1833,7 +1833,7 @@ static struct folio *compaction_alloc_noprof(struct folio *src, unsigned long da
 	}
 	dst = (struct folio *)freepage;
 
-	post_alloc_hook(&dst->page, order, __GFP_MOVABLE);
+	post_alloc_hook(&dst->page, order, __GFP_MOVABLE, NULL);
 	set_page_refcounted(&dst->page);
 	if (order)
 		prep_compound_page(&dst->page, order);
diff --git a/mm/internal.h b/mm/internal.h
index 686667b956c0..2964cdfcd31f 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -887,7 +887,8 @@ static inline void prep_compound_tail(struct page *head, int tail_idx)
 	set_page_private(p, 0);
 }
 
-void post_alloc_hook(struct page *page, unsigned int order, gfp_t gfp_flags);
+void post_alloc_hook(struct page *page, unsigned int order, gfp_t gfp_flags,
+		     pghint_t *hints);
 extern bool free_pages_prepare(struct page *page, unsigned int order);
 
 extern int user_min_free_kbytes;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ece61d02ea96..a4cfd645599a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1863,13 +1863,21 @@ static inline bool should_skip_init(gfp_t flags)
 }
 
 inline void post_alloc_hook(struct page *page, unsigned int order,
-				gfp_t gfp_flags)
+				gfp_t gfp_flags, pghint_t *hints)
 {
 	bool init = !want_init_on_free() && want_init_on_alloc(gfp_flags) &&
 			!should_skip_init(gfp_flags);
 	bool zero_tags = init && (gfp_flags & __GFP_ZEROTAGS);
+	bool zeroed = PageZeroed(page);
 	int i;
 
+	__ClearPageZeroed(page);
+	if (hints)
+		*hints = zeroed ? PGHINT_ZEROED : 0;
+
+	if (zeroed && !zero_tags)
+		init = false;
+
 	set_page_private(page, 0);
 
 	arch_alloc_page(page, order);
@@ -1918,9 +1926,9 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
 }
 
 static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags,
-							unsigned int alloc_flags)
+					unsigned int alloc_flags, pghint_t *hints)
 {
-	post_alloc_hook(page, order, gfp_flags);
+	post_alloc_hook(page, order, gfp_flags, hints);
 
 	if (order && (gfp_flags & __GFP_COMP))
 		prep_compound_page(page, order);
@@ -3991,7 +3999,8 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
 		page = rmqueue(zonelist_zone(ac->preferred_zoneref), zone, order,
 				gfp_mask, alloc_flags, ac->migratetype);
 		if (page) {
-			prep_new_page(page, order, gfp_mask, alloc_flags);
+			prep_new_page(page, order, gfp_mask, alloc_flags,
+				      hints);
 
 			/*
 			 * If this is a high-order atomic allocation then check
@@ -4227,7 +4236,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 
 	/* Prep a captured page if available */
 	if (page)
-		prep_new_page(page, order, gfp_mask, alloc_flags);
+		prep_new_page(page, order, gfp_mask, alloc_flags, NULL);
 
 	/* Try get a page from the freelist if available */
 	if (!page)
@@ -5223,7 +5232,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid,
 		}
 		nr_account++;
 
-		prep_new_page(page, 0, gfp, 0);
+		prep_new_page(page, 0, gfp, 0, NULL);
 		set_page_refcounted(page);
 		page_array[nr_populated++] = page;
 	}
@@ -6958,7 +6967,7 @@ static void split_free_frozen_pages(struct list_head *list, gfp_t gfp_mask)
 		list_for_each_entry_safe(page, next, &list[order], lru) {
 			int i;
 
-			post_alloc_hook(page, order, gfp_mask);
+			post_alloc_hook(page, order, gfp_mask, NULL);
 			if (!order)
 				continue;
 
@@ -7164,7 +7173,7 @@ int alloc_contig_frozen_range_noprof(unsigned long start, unsigned long end,
 		struct page *head = pfn_to_page(start);
 
 		check_new_pages(head, order);
-		prep_new_page(head, order, gfp_mask, 0);
+		prep_new_page(head, order, gfp_mask, 0, NULL);
 	} else {
 		ret = -EINVAL;
 		WARN(true, "PFN range: requested [%lu, %lu), allocated [%lu, %lu)\n",
-- 
MST


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC v2 08/18] mm: hugetlb: thread pghint_t through buddy allocation chain
  2026-04-20 12:51 [PATCH RFC v2 00/18] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
                   ` (6 preceding siblings ...)
  2026-04-20 12:50 ` [PATCH RFC v2 07/18] mm: post_alloc_hook: use PG_zeroed to skip zeroing, return pghint_t Michael S. Tsirkin
@ 2026-04-20 12:50 ` Michael S. Tsirkin
  2026-04-20 12:50 ` [PATCH RFC v2 09/18] mm: hugetlb: use PG_zeroed for pool pages, skip redundant zeroing Michael S. Tsirkin
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Michael S. Tsirkin @ 2026-04-20 12:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
	Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
	Andrea Arcangeli, linux-mm, virtualization, Muchun Song,
	Oscar Salvador, Hugh Dickins, Baolin Wang

Thread pghint_t *hints through the hugetlb buddy allocation path:
  alloc_buddy_frozen_folio -> only_alloc_fresh_hugetlb_folio

alloc_buddy_frozen_folio now calls __alloc_frozen_pages_hints()
so the reported-page zeroed hint can propagate up.

Add pghint_t *hints to alloc_hugetlb_folio_reserve() for the
memfd path.  Callers that do not need hints pass NULL.

No functional change yet: hints are threaded but not acted upon.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
Assisted-by: cursor-agent:GPT-5.4-xhigh
---
 include/linux/hugetlb.h |  6 ++++--
 mm/hugetlb.c            | 29 ++++++++++++++++++++---------
 mm/memfd.c              |  2 +-
 3 files changed, 25 insertions(+), 12 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 65910437be1c..7311ad87add4 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -710,7 +710,8 @@ struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
 				nodemask_t *nmask, gfp_t gfp_mask,
 				bool allow_alloc_fallback);
 struct folio *alloc_hugetlb_folio_reserve(struct hstate *h, int preferred_nid,
-					  nodemask_t *nmask, gfp_t gfp_mask);
+				nodemask_t *nmask, gfp_t gfp_mask,
+				pghint_t *hints);
 
 int hugetlb_add_to_page_cache(struct folio *folio, struct address_space *mapping,
 			pgoff_t idx);
@@ -1124,7 +1125,8 @@ static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
 
 static inline struct folio *
 alloc_hugetlb_folio_reserve(struct hstate *h, int preferred_nid,
-			    nodemask_t *nmask, gfp_t gfp_mask)
+			    nodemask_t *nmask, gfp_t gfp_mask,
+			    pghint_t *hints)
 {
 	return NULL;
 }
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 327eaa4074d3..faa94a114fd4 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1842,7 +1842,8 @@ struct address_space *hugetlb_folio_mapping_lock_write(struct folio *folio)
 }
 
 static struct folio *alloc_buddy_frozen_folio(int order, gfp_t gfp_mask,
-		int nid, nodemask_t *nmask, nodemask_t *node_alloc_noretry)
+		int nid, nodemask_t *nmask, nodemask_t *node_alloc_noretry,
+		pghint_t *hints)
 {
 	struct folio *folio;
 	bool alloc_try_hard = true;
@@ -1859,7 +1860,8 @@ static struct folio *alloc_buddy_frozen_folio(int order, gfp_t gfp_mask,
 	if (alloc_try_hard)
 		gfp_mask |= __GFP_RETRY_MAYFAIL;
 
-	folio = (struct folio *)__alloc_frozen_pages(gfp_mask, order, nid, nmask);
+	folio = (struct folio *)__alloc_frozen_pages_hints(gfp_mask, order,
+							  nid, nmask, hints);
 
 	/*
 	 * If we did not specify __GFP_RETRY_MAYFAIL, but still got a
@@ -1888,11 +1890,14 @@ static struct folio *alloc_buddy_frozen_folio(int order, gfp_t gfp_mask,
 
 static struct folio *only_alloc_fresh_hugetlb_folio(struct hstate *h,
 		gfp_t gfp_mask, int nid, nodemask_t *nmask,
-		nodemask_t *node_alloc_noretry)
+		nodemask_t *node_alloc_noretry, pghint_t *hints)
 {
 	struct folio *folio;
 	int order = huge_page_order(h);
 
+	if (hints)
+		*hints = 0;
+
 	if (nid == NUMA_NO_NODE)
 		nid = numa_mem_id();
 
@@ -1900,7 +1905,7 @@ static struct folio *only_alloc_fresh_hugetlb_folio(struct hstate *h,
 		folio = alloc_gigantic_frozen_folio(order, gfp_mask, nid, nmask);
 	else
 		folio = alloc_buddy_frozen_folio(order, gfp_mask, nid, nmask,
-						 node_alloc_noretry);
+						 node_alloc_noretry, hints);
 	if (folio)
 		init_new_hugetlb_folio(folio);
 	return folio;
@@ -1918,7 +1923,8 @@ static struct folio *alloc_fresh_hugetlb_folio(struct hstate *h,
 {
 	struct folio *folio;
 
-	folio = only_alloc_fresh_hugetlb_folio(h, gfp_mask, nid, nmask, NULL);
+	folio = only_alloc_fresh_hugetlb_folio(h, gfp_mask, nid, nmask,
+					      NULL, NULL);
 	if (folio)
 		hugetlb_vmemmap_optimize_folio(h, folio);
 	return folio;
@@ -1958,7 +1964,8 @@ static struct folio *alloc_pool_huge_folio(struct hstate *h,
 		struct folio *folio;
 
 		folio = only_alloc_fresh_hugetlb_folio(h, gfp_mask, node,
-					nodes_allowed, node_alloc_noretry);
+					nodes_allowed, node_alloc_noretry,
+					NULL);
 		if (folio)
 			return folio;
 	}
@@ -2231,10 +2238,13 @@ struct folio *alloc_buddy_hugetlb_folio_with_mpol(struct hstate *h,
 }
 
 struct folio *alloc_hugetlb_folio_reserve(struct hstate *h, int preferred_nid,
-		nodemask_t *nmask, gfp_t gfp_mask)
+		nodemask_t *nmask, gfp_t gfp_mask, pghint_t *hints)
 {
 	struct folio *folio;
 
+	if (hints)
+		*hints = (pghint_t)0;
+
 	spin_lock_irq(&hugetlb_lock);
 	if (!h->resv_huge_pages) {
 		spin_unlock_irq(&hugetlb_lock);
@@ -3434,13 +3444,14 @@ static void __init hugetlb_hstate_alloc_pages_onenode(struct hstate *h, int nid)
 			gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE;
 
 			folio = only_alloc_fresh_hugetlb_folio(h, gfp_mask, nid,
-					&node_states[N_MEMORY], NULL);
+					&node_states[N_MEMORY], NULL, NULL);
 			if (!folio && !list_empty(&folio_list) &&
 			    hugetlb_vmemmap_optimizable_size(h)) {
 				prep_and_add_allocated_folios(h, &folio_list);
 				INIT_LIST_HEAD(&folio_list);
 				folio = only_alloc_fresh_hugetlb_folio(h, gfp_mask, nid,
-						&node_states[N_MEMORY], NULL);
+						&node_states[N_MEMORY], NULL,
+						NULL);
 			}
 			if (!folio)
 				break;
diff --git a/mm/memfd.c b/mm/memfd.c
index 919c2a53eb96..f1c00600e19a 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -93,7 +93,7 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx)
 		folio = alloc_hugetlb_folio_reserve(h,
 						    numa_node_id(),
 						    NULL,
-						    gfp_mask);
+						    gfp_mask, NULL);
 		if (folio) {
 			u32 hash;
 
-- 
MST


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC v2 09/18] mm: hugetlb: use PG_zeroed for pool pages, skip redundant zeroing
  2026-04-20 12:51 [PATCH RFC v2 00/18] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
                   ` (7 preceding siblings ...)
  2026-04-20 12:50 ` [PATCH RFC v2 08/18] mm: hugetlb: thread pghint_t through buddy allocation chain Michael S. Tsirkin
@ 2026-04-20 12:50 ` Michael S. Tsirkin
  2026-04-20 12:50 ` [PATCH RFC v2 10/18] mm: page_reporting: support host-zeroed reported pages Michael S. Tsirkin
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Michael S. Tsirkin @ 2026-04-20 12:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
	Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
	Andrea Arcangeli, linux-mm, virtualization, Muchun Song,
	Oscar Salvador

Set PG_zeroed on surplus hugetlb pages when buddy-allocated with
PGHINT_ZEROED (indicating the page was pre-zeroed by the host).
Clear PG_zeroed in free_huge_folio() when a user-mapped page
returns to the pool.

Check PG_zeroed at fault and fallocate callers to skip redundant
folio_zero_user().

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
Assisted-by: cursor-agent:GPT-5.4-xhigh
---
 fs/hugetlbfs/inode.c |  5 ++++-
 mm/hugetlb.c         | 31 +++++++++++++++++++++----------
 2 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 3f70c47981de..3f9bdb5a7c85 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -828,7 +828,10 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
 			error = PTR_ERR(folio);
 			goto out;
 		}
-		folio_zero_user(folio, addr);
+		if (PageZeroed(&folio->page))
+			__ClearPageZeroed(&folio->page);
+		else
+			folio_zero_user(folio, addr);
 		__folio_mark_uptodate(folio);
 		error = hugetlb_add_to_page_cache(folio, mapping, index);
 		if (unlikely(error)) {
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index faa94a114fd4..704ec0817c5e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1746,6 +1746,8 @@ void free_huge_folio(struct folio *folio)
 	bool restore_reserve;
 	unsigned long flags;
 
+	__ClearPageZeroed(&folio->page);
+
 	VM_BUG_ON_FOLIO(folio_ref_count(folio), folio);
 	VM_BUG_ON_FOLIO(folio_mapcount(folio), folio);
 
@@ -1919,12 +1921,13 @@ static struct folio *only_alloc_fresh_hugetlb_folio(struct hstate *h,
  * pages is zero, and the accounting must be done in the caller.
  */
 static struct folio *alloc_fresh_hugetlb_folio(struct hstate *h,
-		gfp_t gfp_mask, int nid, nodemask_t *nmask)
+		gfp_t gfp_mask, int nid, nodemask_t *nmask,
+		pghint_t *hints)
 {
 	struct folio *folio;
 
 	folio = only_alloc_fresh_hugetlb_folio(h, gfp_mask, nid, nmask,
-					      NULL, NULL);
+					      NULL, hints);
 	if (folio)
 		hugetlb_vmemmap_optimize_folio(h, folio);
 	return folio;
@@ -2137,6 +2140,7 @@ static struct folio *alloc_surplus_hugetlb_folio(struct hstate *h,
 				gfp_t gfp_mask,	int nid, nodemask_t *nmask)
 {
 	struct folio *folio = NULL;
+	pghint_t hints;
 
 	if (hstate_is_gigantic_no_runtime(h))
 		return NULL;
@@ -2146,10 +2150,13 @@ static struct folio *alloc_surplus_hugetlb_folio(struct hstate *h,
 		goto out_unlock;
 	spin_unlock_irq(&hugetlb_lock);
 
-	folio = alloc_fresh_hugetlb_folio(h, gfp_mask, nid, nmask);
+	folio = alloc_fresh_hugetlb_folio(h, gfp_mask, nid, nmask, &hints);
 	if (!folio)
 		return NULL;
 
+	if (hints & PGHINT_ZEROED)
+		__SetPageZeroed(&folio->page);
+
 	spin_lock_irq(&hugetlb_lock);
 	/*
 	 * nr_huge_pages needs to be adjusted within the same lock cycle
@@ -2189,7 +2196,7 @@ static struct folio *alloc_migrate_hugetlb_folio(struct hstate *h, gfp_t gfp_mas
 	if (hstate_is_gigantic(h))
 		return NULL;
 
-	folio = alloc_fresh_hugetlb_folio(h, gfp_mask, nid, nmask);
+	folio = alloc_fresh_hugetlb_folio(h, gfp_mask, nid, nmask, NULL);
 	if (!folio)
 		return NULL;
 
@@ -2242,9 +2249,6 @@ struct folio *alloc_hugetlb_folio_reserve(struct hstate *h, int preferred_nid,
 {
 	struct folio *folio;
 
-	if (hints)
-		*hints = (pghint_t)0;
-
 	spin_lock_irq(&hugetlb_lock);
 	if (!h->resv_huge_pages) {
 		spin_unlock_irq(&hugetlb_lock);
@@ -2253,8 +2257,12 @@ struct folio *alloc_hugetlb_folio_reserve(struct hstate *h, int preferred_nid,
 
 	folio = dequeue_hugetlb_folio_nodemask(h, gfp_mask, preferred_nid,
 					       nmask);
-	if (folio)
+	if (folio) {
 		h->resv_huge_pages--;
+		if (hints)
+			*hints = PageZeroed(&folio->page) ? PGHINT_ZEROED : 0;
+		__ClearPageZeroed(&folio->page);
+	}
 
 	spin_unlock_irq(&hugetlb_lock);
 	return folio;
@@ -2748,7 +2756,7 @@ static int alloc_and_dissolve_hugetlb_folio(struct folio *old_folio,
 			spin_unlock_irq(&hugetlb_lock);
 			gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE;
 			new_folio = alloc_fresh_hugetlb_folio(h, gfp_mask,
-							      nid, NULL);
+							      nid, NULL, NULL);
 			if (!new_folio)
 				return -ENOMEM;
 			goto retry;
@@ -5820,7 +5828,10 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
 				ret = 0;
 			goto out;
 		}
-		folio_zero_user(folio, vmf->real_address);
+		if (PageZeroed(&folio->page))
+			__ClearPageZeroed(&folio->page);
+		else
+			folio_zero_user(folio, vmf->real_address);
 		__folio_mark_uptodate(folio);
 		new_folio = true;
 
-- 
MST


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC v2 10/18] mm: page_reporting: support host-zeroed reported pages
  2026-04-20 12:51 [PATCH RFC v2 00/18] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
                   ` (8 preceding siblings ...)
  2026-04-20 12:50 ` [PATCH RFC v2 09/18] mm: hugetlb: use PG_zeroed for pool pages, skip redundant zeroing Michael S. Tsirkin
@ 2026-04-20 12:50 ` Michael S. Tsirkin
  2026-04-20 12:50 ` [PATCH RFC v2 11/18] mm: skip zeroing in vma_alloc_zeroed_movable_folio for pre-zeroed pages Michael S. Tsirkin
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Michael S. Tsirkin @ 2026-04-20 12:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
	Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
	Andrea Arcangeli, linux-mm, virtualization, Lorenzo Stoakes,
	Liam R. Howlett, Mike Rapoport, Johannes Weiner, Zi Yan

Add a host_zeroes_pages flag to page_reporting_dev_info, allowing
drivers to declare that their host zeros reported pages on reclaim.
A static key gates the fast path.

Set PG_zeroed alongside PageReported in page_reporting_drain() when
host_zeroes_pages is enabled, so reported pages are marked as
known-zero at reporting time.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
Assisted-by: cursor-agent:GPT-5.4-xhigh
---
 include/linux/page_reporting.h |  3 +++
 mm/page_reporting.c            | 13 ++++++++++++-
 mm/page_reporting.h            | 11 +++++++++++
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/include/linux/page_reporting.h b/include/linux/page_reporting.h
index fe648dfa3a7c..10faadfeb4fb 100644
--- a/include/linux/page_reporting.h
+++ b/include/linux/page_reporting.h
@@ -13,6 +13,9 @@ struct page_reporting_dev_info {
 	int (*report)(struct page_reporting_dev_info *prdev,
 		      struct scatterlist *sg, unsigned int nents);
 
+	/* If true, host zeros reported pages on reclaim */
+	bool host_zeroes_pages;
+
 	/* work struct for processing reports */
 	struct delayed_work work;
 
diff --git a/mm/page_reporting.c b/mm/page_reporting.c
index f0042d5743af..5e1e1a924b0c 100644
--- a/mm/page_reporting.c
+++ b/mm/page_reporting.c
@@ -50,6 +50,8 @@ EXPORT_SYMBOL_GPL(page_reporting_order);
 #define PAGE_REPORTING_DELAY	(2 * HZ)
 static struct page_reporting_dev_info __rcu *pr_dev_info __read_mostly;
 
+DEFINE_STATIC_KEY_FALSE(page_reporting_host_zeroes);
+
 enum {
 	PAGE_REPORTING_IDLE = 0,
 	PAGE_REPORTING_REQUESTED,
@@ -129,8 +131,11 @@ page_reporting_drain(struct page_reporting_dev_info *prdev,
 		 * report on the new larger page when we make our way
 		 * up to that higher order.
 		 */
-		if (PageBuddy(page) && buddy_order(page) == order)
+		if (PageBuddy(page) && buddy_order(page) == order) {
 			__SetPageReported(page);
+			if (page_reporting_host_zeroes_pages())
+				__SetPageZeroed(page);
+		}
 	} while ((sg = sg_next(sg)));
 
 	/* reinitialize scatterlist now that it is empty */
@@ -386,6 +391,9 @@ int page_reporting_register(struct page_reporting_dev_info *prdev)
 	/* Assign device to allow notifications */
 	rcu_assign_pointer(pr_dev_info, prdev);
 
+	if (prdev->host_zeroes_pages)
+		static_branch_enable(&page_reporting_host_zeroes);
+
 	/* enable page reporting notification */
 	if (!static_key_enabled(&page_reporting_enabled)) {
 		static_branch_enable(&page_reporting_enabled);
@@ -410,6 +418,9 @@ void page_reporting_unregister(struct page_reporting_dev_info *prdev)
 
 		/* Flush any existing work, and lock it out */
 		cancel_delayed_work_sync(&prdev->work);
+
+		if (prdev->host_zeroes_pages)
+			static_branch_disable(&page_reporting_host_zeroes);
 	}
 
 	mutex_unlock(&page_reporting_mutex);
diff --git a/mm/page_reporting.h b/mm/page_reporting.h
index c51dbc228b94..a53ab22f4c49 100644
--- a/mm/page_reporting.h
+++ b/mm/page_reporting.h
@@ -12,9 +12,15 @@
 
 #ifdef CONFIG_PAGE_REPORTING
 DECLARE_STATIC_KEY_FALSE(page_reporting_enabled);
+DECLARE_STATIC_KEY_FALSE(page_reporting_host_zeroes);
 extern unsigned int page_reporting_order;
 void __page_reporting_notify(void);
 
+static inline bool page_reporting_host_zeroes_pages(void)
+{
+	return static_branch_unlikely(&page_reporting_host_zeroes);
+}
+
 static inline bool page_reported(struct page *page)
 {
 	return static_branch_unlikely(&page_reporting_enabled) &&
@@ -46,6 +52,11 @@ static inline void page_reporting_notify_free(unsigned int order)
 #else /* CONFIG_PAGE_REPORTING */
 #define page_reported(_page)	false
 
+static inline bool page_reporting_host_zeroes_pages(void)
+{
+	return false;
+}
+
 static inline void page_reporting_notify_free(unsigned int order)
 {
 }
-- 
MST


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC v2 11/18] mm: skip zeroing in vma_alloc_zeroed_movable_folio for pre-zeroed pages
  2026-04-20 12:51 [PATCH RFC v2 00/18] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
                   ` (9 preceding siblings ...)
  2026-04-20 12:50 ` [PATCH RFC v2 10/18] mm: page_reporting: support host-zeroed reported pages Michael S. Tsirkin
@ 2026-04-20 12:50 ` Michael S. Tsirkin
  2026-04-20 12:50 ` [PATCH RFC v2 12/18] mm: skip zeroing in alloc_anon_folio " Michael S. Tsirkin
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Michael S. Tsirkin @ 2026-04-20 12:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
	Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
	Andrea Arcangeli, linux-mm, virtualization, Lorenzo Stoakes,
	Liam R. Howlett, Mike Rapoport

Use vma_alloc_folio_hints() and check PGHINT_ZEROED to skip
clear_user_highpage() when the page is already zeroed.

On x86, vma_alloc_zeroed_movable_folio is overridden by a macro
that uses __GFP_ZERO directly, so this change has no effect there.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
Assisted-by: cursor-agent:GPT-5.4-xhigh
---
 include/linux/highmem.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index af03db851a1d..8bb67772c1cb 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -321,9 +321,11 @@ struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
 				   unsigned long vaddr)
 {
 	struct folio *folio;
+	pghint_t hints;
 
-	folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vaddr);
-	if (folio && user_alloc_needs_zeroing())
+	folio = vma_alloc_folio_hints(GFP_HIGHUSER_MOVABLE, 0, vma, vaddr,
+				      &hints);
+	if (folio && user_alloc_needs_zeroing() && !(hints & PGHINT_ZEROED))
 		clear_user_highpage(&folio->page, vaddr);
 
 	return folio;
-- 
MST


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC v2 12/18] mm: skip zeroing in alloc_anon_folio for pre-zeroed pages
  2026-04-20 12:51 [PATCH RFC v2 00/18] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
                   ` (10 preceding siblings ...)
  2026-04-20 12:50 ` [PATCH RFC v2 11/18] mm: skip zeroing in vma_alloc_zeroed_movable_folio for pre-zeroed pages Michael S. Tsirkin
@ 2026-04-20 12:50 ` Michael S. Tsirkin
  2026-04-20 12:50 ` [PATCH RFC v2 13/18] mm: skip zeroing in vma_alloc_anon_folio_pmd " Michael S. Tsirkin
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Michael S. Tsirkin @ 2026-04-20 12:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
	Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
	Andrea Arcangeli, linux-mm, virtualization, Lorenzo Stoakes,
	Liam R. Howlett, Mike Rapoport

Use vma_alloc_folio_hints() and check PGHINT_ZEROED to skip
folio_zero_user() in the mTHP anonymous page allocation path
when the page is already zeroed.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
Assisted-by: cursor-agent:GPT-5.4-xhigh
---
 mm/memory.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index c65e82c86fed..066e2c9781dc 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5179,8 +5179,10 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
 	/* Try allocating the highest of the remaining orders. */
 	gfp = vma_thp_gfp_mask(vma);
 	while (orders) {
+		pghint_t hints;
+
 		addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order);
-		folio = vma_alloc_folio(gfp, order, vma, addr);
+		folio = vma_alloc_folio_hints(gfp, order, vma, addr, &hints);
 		if (folio) {
 			if (mem_cgroup_charge(folio, vma->vm_mm, gfp)) {
 				count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
@@ -5188,14 +5190,8 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
 				goto next;
 			}
 			folio_throttle_swaprate(folio, gfp);
-			/*
-			 * When a folio is not zeroed during allocation
-			 * (__GFP_ZERO not used) or user folios require special
-			 * handling, folio_zero_user() is used to make sure
-			 * that the page corresponding to the faulting address
-			 * will be hot in the cache after zeroing.
-			 */
-			if (user_alloc_needs_zeroing())
+			if (user_alloc_needs_zeroing() &&
+			    !(hints & PGHINT_ZEROED))
 				folio_zero_user(folio, vmf->address);
 			return folio;
 		}
-- 
MST


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC v2 13/18] mm: skip zeroing in vma_alloc_anon_folio_pmd for pre-zeroed pages
  2026-04-20 12:51 [PATCH RFC v2 00/18] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
                   ` (11 preceding siblings ...)
  2026-04-20 12:50 ` [PATCH RFC v2 12/18] mm: skip zeroing in alloc_anon_folio " Michael S. Tsirkin
@ 2026-04-20 12:50 ` Michael S. Tsirkin
  2026-04-20 12:50 ` [PATCH RFC v2 14/18] mm: memfd: skip zeroing for pre-zeroed hugetlb pages Michael S. Tsirkin
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Michael S. Tsirkin @ 2026-04-20 12:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
	Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
	Andrea Arcangeli, linux-mm, virtualization, Lorenzo Stoakes,
	Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang

Use vma_alloc_folio_hints() and check PGHINT_ZEROED to skip
folio_zero_user() in the PMD THP anonymous page allocation path
when the page is already zeroed.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
Assisted-by: cursor-agent:GPT-5.4-xhigh
---
 mm/huge_memory.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index b298cba853ab..243592452ead 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1259,8 +1259,10 @@ static struct folio *vma_alloc_anon_folio_pmd(struct vm_area_struct *vma,
 	gfp_t gfp = vma_thp_gfp_mask(vma);
 	const int order = HPAGE_PMD_ORDER;
 	struct folio *folio;
+	pghint_t hints;
 
-	folio = vma_alloc_folio(gfp, order, vma, addr & HPAGE_PMD_MASK);
+	folio = vma_alloc_folio_hints(gfp, order, vma, addr & HPAGE_PMD_MASK,
+				      &hints);
 
 	if (unlikely(!folio)) {
 		count_vm_event(THP_FAULT_FALLBACK);
@@ -1279,13 +1281,7 @@ static struct folio *vma_alloc_anon_folio_pmd(struct vm_area_struct *vma,
 	}
 	folio_throttle_swaprate(folio, gfp);
 
-       /*
-	* When a folio is not zeroed during allocation (__GFP_ZERO not used)
-	* or user folios require special handling, folio_zero_user() is used to
-	* make sure that the page corresponding to the faulting address will be
-	* hot in the cache after zeroing.
-	*/
-	if (user_alloc_needs_zeroing())
+	if (user_alloc_needs_zeroing() && !(hints & PGHINT_ZEROED))
 		folio_zero_user(folio, addr);
 	/*
 	 * The memory barrier inside __folio_mark_uptodate makes sure that
-- 
MST


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC v2 14/18] mm: memfd: skip zeroing for pre-zeroed hugetlb pages
  2026-04-20 12:51 [PATCH RFC v2 00/18] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
                   ` (12 preceding siblings ...)
  2026-04-20 12:50 ` [PATCH RFC v2 13/18] mm: skip zeroing in vma_alloc_anon_folio_pmd " Michael S. Tsirkin
@ 2026-04-20 12:50 ` Michael S. Tsirkin
  2026-04-20 12:51 ` [PATCH RFC v2 15/18] virtio_balloon: add host_zeroes_pages module parameter Michael S. Tsirkin
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Michael S. Tsirkin @ 2026-04-20 12:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
	Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
	Andrea Arcangeli, linux-mm, virtualization, Hugh Dickins,
	Baolin Wang

Use the pghint_t output from alloc_hugetlb_folio_reserve() to
skip folio_zero_user() when the page is already zeroed.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
Assisted-by: cursor-agent:GPT-5.4-xhigh
---
 mm/memfd.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/mm/memfd.c b/mm/memfd.c
index f1c00600e19a..546149897369 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -81,6 +81,7 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx)
 		struct hstate *h = hstate_file(memfd);
 		int err = -ENOMEM;
 		long nr_resv;
+		pghint_t hints;
 
 		gfp_mask = htlb_alloc_mask(h);
 		gfp_mask &= ~(__GFP_HIGHMEM | __GFP_MOVABLE);
@@ -93,17 +94,12 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx)
 		folio = alloc_hugetlb_folio_reserve(h,
 						    numa_node_id(),
 						    NULL,
-						    gfp_mask, NULL);
+						    gfp_mask, &hints);
 		if (folio) {
 			u32 hash;
 
-			/*
-			 * Zero the folio to prevent information leaks to userspace.
-			 * Use folio_zero_user() which is optimized for huge/gigantic
-			 * pages. Pass 0 as addr_hint since this is not a faulting path
-			 *  and we don't have a user virtual address yet.
-			 */
-			folio_zero_user(folio, 0);
+			if (!(hints & PGHINT_ZEROED))
+				folio_zero_user(folio, 0);
 
 			/*
 			 * Mark the folio uptodate before adding to page cache,
-- 
MST


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC v2 15/18] virtio_balloon: add host_zeroes_pages module parameter
  2026-04-20 12:51 [PATCH RFC v2 00/18] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
                   ` (13 preceding siblings ...)
  2026-04-20 12:50 ` [PATCH RFC v2 14/18] mm: memfd: skip zeroing for pre-zeroed hugetlb pages Michael S. Tsirkin
@ 2026-04-20 12:51 ` Michael S. Tsirkin
  2026-04-20 12:51 ` [PATCH RFC v2 16/18] mm: page_reporting: add flush parameter with page budget Michael S. Tsirkin
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Michael S. Tsirkin @ 2026-04-20 12:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
	Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
	Andrea Arcangeli, linux-mm, virtualization, Xuan Zhuo,
	Eugenio Pérez

Add a module parameter to opt in to the pre-zeroed page
optimization.  A proper virtio feature flag is needed before
this can be merged.

  insmod virtio_balloon.ko host_zeroes_pages=1

When host_zeroes_pages is set, callers skip folio_zero_user() for
pages that are known to have been zeroed by the host.  This is safe
on cache-aliasing architectures because the hypervisor invalidates
guest cache lines when reclaiming page backing (e.g. MADV_DONTNEED),
so no stale cache state exists when the guest re-maps the page.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
Assisted-by: cursor-agent:GPT-5.4-xhigh
---
 drivers/virtio/virtio_balloon.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index d1fbc8fe8470..2e524bf6f934 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -19,6 +19,11 @@
 #include <linux/mm.h>
 #include <linux/page_reporting.h>
 
+static bool host_zeroes_pages;
+module_param(host_zeroes_pages, bool, 0444);
+MODULE_PARM_DESC(host_zeroes_pages,
+		 "Host zeroes reported pages, skip guest re-zeroing");
+
 /*
  * Balloon device works in 4K page units.  So each page is pointed to by
  * multiple balloon pages.  All memory counters in this driver are in balloon
@@ -1039,6 +1044,7 @@ static int virtballoon_probe(struct virtio_device *vdev)
 		vb->pr_dev_info.order = 5;
 #endif
 
+		vb->pr_dev_info.host_zeroes_pages = host_zeroes_pages;
 		err = page_reporting_register(&vb->pr_dev_info);
 		if (err)
 			goto out_unregister_oom;
-- 
MST


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC v2 16/18] mm: page_reporting: add flush parameter with page budget
  2026-04-20 12:51 [PATCH RFC v2 00/18] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
                   ` (14 preceding siblings ...)
  2026-04-20 12:51 ` [PATCH RFC v2 15/18] virtio_balloon: add host_zeroes_pages module parameter Michael S. Tsirkin
@ 2026-04-20 12:51 ` Michael S. Tsirkin
  2026-04-20 12:51 ` [PATCH RFC v2 17/18] mm: add free_frozen_pages_hint and put_page_hint APIs Michael S. Tsirkin
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Michael S. Tsirkin @ 2026-04-20 12:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
	Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
	Andrea Arcangeli, linux-mm, virtualization, Johannes Weiner,
	Zi Yan

Add a write-only module parameter "flush" that triggers immediate
page reporting.  The value specifies approximately how many pages
(at page_reporting_order) to report.  The flush loops through
reporting cycles until the budget is exhausted, all pages are
reported, or a signal is pending.

  echo 512 > /sys/module/page_reporting/parameters/flush

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
Assisted-by: cursor-agent:GPT-5.4-xhigh
---
 mm/page_reporting.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/mm/page_reporting.c b/mm/page_reporting.c
index 5e1e1a924b0c..3560f272ab70 100644
--- a/mm/page_reporting.c
+++ b/mm/page_reporting.c
@@ -354,6 +354,48 @@ static void page_reporting_process(struct work_struct *work)
 static DEFINE_MUTEX(page_reporting_mutex);
 DEFINE_STATIC_KEY_FALSE(page_reporting_enabled);
 
+static int page_reporting_flush_set(const char *val,
+				    const struct kernel_param *kp)
+{
+	struct page_reporting_dev_info *prdev;
+	unsigned int budget;
+	int err;
+
+	err = kstrtouint(val, 0, &budget);
+	if (err)
+		return err;
+	if (!budget)
+		return 0;
+
+	mutex_lock(&page_reporting_mutex);
+	prdev = rcu_dereference_protected(pr_dev_info,
+				lockdep_is_held(&page_reporting_mutex));
+	if (prdev) {
+		unsigned int reported;
+
+		for (reported = 0; reported < budget;
+		     reported += PAGE_REPORTING_CAPACITY) {
+			flush_delayed_work(&prdev->work);
+			__page_reporting_request(prdev);
+			flush_delayed_work(&prdev->work);
+			if (atomic_read(&prdev->state) == PAGE_REPORTING_IDLE)
+				break;
+			if (signal_pending(current))
+				break;
+		}
+	}
+	mutex_unlock(&page_reporting_mutex);
+	return 0;
+}
+
+static const struct kernel_param_ops flush_ops = {
+	.set = page_reporting_flush_set,
+	.get = param_get_uint,
+};
+static unsigned int page_reporting_flush;
+module_param_cb(flush, &flush_ops, &page_reporting_flush, 0200);
+MODULE_PARM_DESC(flush, "Report up to N pages at page_reporting_order");
+
 int page_reporting_register(struct page_reporting_dev_info *prdev)
 {
 	int err = 0;
-- 
MST


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC v2 17/18] mm: add free_frozen_pages_hint and put_page_hint APIs
  2026-04-20 12:51 [PATCH RFC v2 00/18] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
                   ` (15 preceding siblings ...)
  2026-04-20 12:51 ` [PATCH RFC v2 16/18] mm: page_reporting: add flush parameter with page budget Michael S. Tsirkin
@ 2026-04-20 12:51 ` Michael S. Tsirkin
  2026-04-20 12:51 ` [PATCH RFC v2 18/18] virtio_balloon: mark deflated pages as pre-zeroed Michael S. Tsirkin
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Michael S. Tsirkin @ 2026-04-20 12:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
	Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
	Andrea Arcangeli, linux-mm, virtualization, Lorenzo Stoakes,
	Liam R. Howlett, Mike Rapoport, Johannes Weiner, Zi Yan, Chris Li,
	Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song,
	Axel Rasmussen, Yuanchu Xie, Wei Xu

Add free_frozen_pages_hint(page, order, hints) to free a page
while marking it as pre-zeroed when PGHINT_ZEROED is set.
The PG_zeroed flag is set after __free_pages_prepare so it
survives on the free list.

Add __folio_put_hint(), folio_put_hint(), and put_page_hint()
wrappers for the put_page path.

These APIs are intended for balloon drivers during deflation
when the host has zeroed the pages.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
Assisted-by: cursor-agent:GPT-5.4-xhigh
---
 include/linux/gfp.h |  2 ++
 include/linux/mm.h  | 12 ++++++++++++
 mm/page_alloc.c     | 21 +++++++++++++++------
 mm/swap.c           | 19 +++++++++++++++++++
 4 files changed, 48 insertions(+), 6 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 14433a20e60c..b226d5e1930e 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -404,6 +404,8 @@ __meminit void *alloc_pages_exact_nid_noprof(int nid, size_t size, gfp_t gfp_mas
 extern void __free_pages(struct page *page, unsigned int order);
 extern void free_pages_nolock(struct page *page, unsigned int order);
 extern void free_pages(unsigned long addr, unsigned int order);
+void free_frozen_pages_hint(struct page *page, unsigned int order,
+			    pghint_t hints);
 
 #define __free_page(page) __free_pages((page), 0)
 #define free_page(addr) free_pages((addr), 0)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index abb4963c1f06..f4e28c55e2c9 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1640,6 +1640,7 @@ static inline struct folio *virt_to_folio(const void *x)
 }
 
 void __folio_put(struct folio *folio);
+void __folio_put_hint(struct folio *folio, pghint_t hints);
 
 void split_page(struct page *page, unsigned int order);
 void folio_copy(struct folio *dst, struct folio *src);
@@ -1817,6 +1818,17 @@ static inline void folio_put(struct folio *folio)
 		__folio_put(folio);
 }
 
+static inline void folio_put_hint(struct folio *folio, pghint_t hints)
+{
+	if (folio_put_testzero(folio))
+		__folio_put_hint(folio, hints);
+}
+
+static inline void put_page_hint(struct page *page, pghint_t hints)
+{
+	folio_put_hint(page_folio(page), hints);
+}
+
 /**
  * folio_put_refs - Reduce the reference count on a folio.
  * @folio: The folio.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a4cfd645599a..f04813db3015 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3000,7 +3000,7 @@ static bool free_frozen_page_commit(struct zone *zone,
  * Free a pcp page
  */
 static void __free_frozen_pages(struct page *page, unsigned int order,
-				fpi_t fpi_flags)
+				fpi_t fpi_flags, pghint_t hints)
 {
 	unsigned long UP_flags;
 	struct per_cpu_pages *pcp;
@@ -3016,6 +3016,9 @@ static void __free_frozen_pages(struct page *page, unsigned int order,
 	if (!__free_pages_prepare(page, order, fpi_flags))
 		return;
 
+	if (hints & PGHINT_ZEROED)
+		__SetPageZeroed(page);
+
 	/*
 	 * We only track unmovable, reclaimable and movable on pcp lists.
 	 * Place ISOLATE pages on the isolated list because they are being
@@ -3051,12 +3054,18 @@ static void __free_frozen_pages(struct page *page, unsigned int order,
 
 void free_frozen_pages(struct page *page, unsigned int order)
 {
-	__free_frozen_pages(page, order, FPI_NONE);
+	__free_frozen_pages(page, order, FPI_NONE, 0);
+}
+
+void free_frozen_pages_hint(struct page *page, unsigned int order,
+			    pghint_t hints)
+{
+	__free_frozen_pages(page, order, FPI_NONE, hints);
 }
 
 void free_frozen_pages_nolock(struct page *page, unsigned int order)
 {
-	__free_frozen_pages(page, order, FPI_TRYLOCK);
+	__free_frozen_pages(page, order, FPI_TRYLOCK, 0);
 }
 
 /*
@@ -5385,7 +5394,7 @@ static void ___free_pages(struct page *page, unsigned int order,
 	struct alloc_tag *tag = pgalloc_tag_get(page);
 
 	if (put_page_testzero(page))
-		__free_frozen_pages(page, order, fpi_flags);
+		__free_frozen_pages(page, order, fpi_flags, 0);
 	else if (!head) {
 		pgalloc_tag_sub_pages(tag, (1 << order) - 1);
 		while (order-- > 0) {
@@ -5396,7 +5405,7 @@ static void ___free_pages(struct page *page, unsigned int order,
 			 */
 			clear_page_tag_ref(page + (1 << order));
 			__free_frozen_pages(page + (1 << order), order,
-					    fpi_flags);
+					    fpi_flags, 0);
 		}
 	}
 }
@@ -7879,7 +7888,7 @@ struct page *alloc_frozen_pages_nolock_noprof(gfp_t gfp_flags, int nid, unsigned
 
 	if (memcg_kmem_online() && page && (gfp_flags & __GFP_ACCOUNT) &&
 	    unlikely(__memcg_kmem_charge_page(page, alloc_gfp, order) != 0)) {
-		__free_frozen_pages(page, order, FPI_TRYLOCK);
+		__free_frozen_pages(page, order, FPI_TRYLOCK, 0);
 		page = NULL;
 	}
 	trace_mm_page_alloc(page, order, alloc_gfp, ac.migratetype);
diff --git a/mm/swap.c b/mm/swap.c
index bb19ccbece46..1dfd232d3944 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -113,6 +113,25 @@ void __folio_put(struct folio *folio)
 }
 EXPORT_SYMBOL(__folio_put);
 
+void __folio_put_hint(struct folio *folio, pghint_t hints)
+{
+	if (unlikely(folio_is_zone_device(folio))) {
+		free_zone_device_folio(folio);
+		return;
+	}
+
+	if (folio_test_hugetlb(folio)) {
+		free_huge_folio(folio);
+		return;
+	}
+
+	page_cache_release(folio);
+	folio_unqueue_deferred_split(folio);
+	mem_cgroup_uncharge(folio);
+	free_frozen_pages_hint(&folio->page, folio_order(folio), hints);
+}
+EXPORT_SYMBOL(__folio_put_hint);
+
 typedef void (*move_fn_t)(struct lruvec *lruvec, struct folio *folio);
 
 static void lru_add(struct lruvec *lruvec, struct folio *folio)
-- 
MST


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC v2 18/18] virtio_balloon: mark deflated pages as pre-zeroed
  2026-04-20 12:51 [PATCH RFC v2 00/18] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
                   ` (16 preceding siblings ...)
  2026-04-20 12:51 ` [PATCH RFC v2 17/18] mm: add free_frozen_pages_hint and put_page_hint APIs Michael S. Tsirkin
@ 2026-04-20 12:51 ` Michael S. Tsirkin
  2026-04-20 18:09 ` [syzbot ci] Re: mm/virtio: skip redundant zeroing of host-zeroed reported pages syzbot ci
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Michael S. Tsirkin @ 2026-04-20 12:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
	Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
	Andrea Arcangeli, linux-mm, virtualization, Xuan Zhuo,
	Eugenio Pérez

Mark deflated pages as pre-zeroed when host_zeroes_pages is set.
Use put_page_hint() with PGHINT_ZEROED during deflation so the
freed pages carry PG_zeroed in the buddy allocator, allowing the
next allocation to skip redundant zeroing.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
Assisted-by: cursor-agent:GPT-5.4-xhigh
---
 drivers/virtio/virtio_balloon.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 2e524bf6f934..9b35203f579d 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -299,7 +299,10 @@ static void release_pages_balloon(struct virtio_balloon *vb,
 
 	list_for_each_entry_safe(page, next, pages, lru) {
 		list_del(&page->lru);
-		put_page(page); /* balloon reference */
+		if (host_zeroes_pages)
+			put_page_hint(page, PGHINT_ZEROED);
+		else
+			put_page(page);
 	}
 }
 
-- 
MST


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC v2 00/18] mm/virtio: skip redundant zeroing of host-zeroed reported pages
@ 2026-04-20 12:51 Michael S. Tsirkin
  2026-04-20 12:50 ` [PATCH RFC v2 01/18] mm: page_alloc: propagate PageReported flag across buddy splits Michael S. Tsirkin
                   ` (20 more replies)
  0 siblings, 21 replies; 25+ messages in thread
From: Michael S. Tsirkin @ 2026-04-20 12:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
	Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
	Andrea Arcangeli, linux-mm, virtualization



v2 - this is an attempt to address David Hildenbrand's comments:
overloading GFP and using page->private, support for
balloon deflate.

I hope this one is acceptable, API wise.

I also went ahead and implemented an alternative approach
that David suggested:
using GFP_ZERO to zero userspace pages.
The issue is simple: on some architectures, one has to know the
userspace fault address in order to flush the cache.

So, I had to propagate the fault address everywhere.
A lot of churn, and my concern is, if we miss even one
place, silent, subtle data corruption will result and only
on some arches (x86 will be fine).

Still, you can view that approach here:
https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git gfp_zero

David, if you still feel I should switch to that approach,
let me know. Personally, I'd rather keep that as a separate
project from this optimization.


Still an RFC as virtio bits need work, but I would very much like
to get a general agreement on mm bits first. Thanks!

Patch 1 is a minor
optimization that I am carrying here to avoid conflicts. It
might make sense to merge it straight away.

-------



When a guest reports free pages to the hypervisor via virtio-balloon's
free page reporting, the host typically zeros those pages when reclaiming
their backing memory (e.g., via MADV_DONTNEED on anonymous mappings).
When the guest later reallocates those pages, the kernel zeros them
again -- redundantly.

This series eliminates that double-zeroing by propagating the "host
already zeroed this page" information through the buddy allocator and
into the page fault path.

Performance with THP enabled on a 2GB VM, 1 vCPU, allocating
256MB of anonymous pages:

  metric         baseline        optimized       delta
  task-clock     191 +- 31 ms    60 +- 35 ms     -68%
  cache-misses   1.10M +- 460K   269K +- 31K     -76%
  instructions   4.54M +- 275K   4.10M +- 130K   -10%

With hugetlb surplus pages:

  metric         baseline        optimized       delta
  task-clock     183 +- 24 ms    45 +- 23 ms     -76%
  cache-misses   1.27M +- 544K   270K +- 16K     -79%
  instructions   5.37M +- 254K   4.94M +- 155K   -8%

Notes:
- The virtio_balloon module parameter (15/18) is a testing hack.
  A proper virtio feature flag is needed before merging.
- Patch 16/18 adds a sysfs flush trigger for deterministic testing
  (avoids waiting for the 2-second reporting delay).
- When host_zeroes_pages is set, callers skip folio_zero_user() for
  pages known to be zeroed by the host. This is safe on all
  architectures because the hypervisor invalidates guest cache lines
  when reclaiming page backing (MADV_DONTNEED).
- PG_zeroed is aliased to PG_private. It is excluded from
  PAGE_FLAGS_CHECK_AT_PREP because it must survive on free-list pages
  until post_alloc_hook() consumes and clears it. Is this acceptable,
  or should a different bit be used?
- The optimization is most effective with THP, where entire 2MB
  pages are allocated directly from reported order-9+ buddy pages.
  Without THP, only ~21% of order-0 allocations come from reported
  pages due to low-order fragmentation.
- Persistent hugetlb pool pages are not covered: when freed by
  userspace they return to the hugetlb free pool, not the buddy
  allocator, so they are never reported to the host.  Surplus
  hugetlb pages are allocated from buddy and do benefit.

Test program:

  #include <stdio.h>
  #include <stdlib.h>
  #include <string.h>
  #include <sys/mman.h>

  #ifndef MADV_POPULATE_WRITE
  #define MADV_POPULATE_WRITE 23
  #endif
  #ifndef MAP_HUGETLB
  #define MAP_HUGETLB 0x40000
  #endif

  int main(int argc, char **argv)
  {
      unsigned long size;
      int flags = MAP_PRIVATE | MAP_ANONYMOUS;
      void *p;
      int r;

      if (argc < 2) {
          fprintf(stderr, "usage: %s <size_mb> [huge]\n", argv[0]);
          return 1;
      }
      size = atol(argv[1]) * 1024UL * 1024;
      if (argc >= 3 && strcmp(argv[2], "huge") == 0)
          flags |= MAP_HUGETLB;
      p = mmap(NULL, size, PROT_READ | PROT_WRITE, flags, -1, 0);
      if (p == MAP_FAILED) {
          perror("mmap");
          return 1;
      }
      r = madvise(p, size, MADV_POPULATE_WRITE);
      if (r) {
          perror("madvise");
          return 1;
      }
      munmap(p, size);
      return 0;
  }

Test script (bench.sh):

  #!/bin/bash
  # Usage: bench.sh <size_mb> <mode> <iterations> [huge]
  # mode 0 = baseline, mode 1 = skip zeroing
  SZ=${1:-256}; MODE=${2:-0}; ITER=${3:-10}; HUGE=${4:-}
  FLUSH=/sys/module/page_reporting/parameters/flush
  PERF_DATA=/tmp/perf-$MODE.csv
  rmmod virtio_balloon 2>/dev/null
  insmod virtio_balloon.ko host_zeroes_pages=$MODE
  echo 512 > $FLUSH
  [ "$HUGE" = "huge" ] && echo $((SZ/2)) > /proc/sys/vm/nr_overcommit_hugepages
  rm -f $PERF_DATA
  echo "=== sz=${SZ}MB mode=$MODE iter=$ITER $HUGE ==="
  for i in $(seq 1 $ITER); do
      echo 3 > /proc/sys/vm/drop_caches
      echo 512 > $FLUSH
      perf stat -e task-clock,instructions,cache-misses \
          -x, -o $PERF_DATA --append -- ./alloc_once $SZ $HUGE
  done
  [ "$HUGE" = "huge" ] && echo 0 > /proc/sys/vm/nr_overcommit_hugepages
  rmmod virtio_balloon
  awk -F, '/^#/||/^$/{next}{v=$1+0;e=$3;gsub(/ /,"",e);s[e]+=v;n[e]++}
  END{for(e in s)printf "  %-16s %10.2f (n=%d)\n",e,s[e]/n[e],n[e]}' $PERF_DATA

Compile and run:
  gcc -static -O2 -o alloc_once alloc_once.c
  bash bench.sh 256 0 10          # baseline (regular pages)
  bash bench.sh 256 1 10          # optimized (regular pages)
  bash bench.sh 256 0 10 huge     # baseline (hugetlb surplus)
  bash bench.sh 256 1 10 huge     # optimized (hugetlb surplus)

Changes since v1:
- Replaced __GFP_PREZEROED with PG_zeroed page flag (aliased PG_private)
- Added pghint_t type and vma_alloc_folio_hints() API
- Track PG_zeroed across buddy merges and splits
- Added post_alloc_hook integration (single consume/clear point)
- Added hugetlb support (pool pages + memfd)
- Added page_reporting flush parameter for deterministic testing
- Added free_frozen_pages_hint/put_page_hint for balloon deflate path
- Added try_to_claim_block PG_zeroed preservation
- Updated perf numbers with per-iteration flush methodology

Michael S. Tsirkin (18):
  mm: page_alloc: propagate PageReported flag across buddy splits
  mm: add pghint_t type and vma_alloc_folio_hints API
  mm: add PG_zeroed page flag for known-zero pages
  mm: page_alloc: track PG_zeroed across buddy merges
  mm: page_alloc: preserve PG_zeroed in try_to_claim_block
  mm: page_alloc: thread pghint_t through get_page_from_freelist
  mm: post_alloc_hook: use PG_zeroed to skip zeroing, return pghint_t
  mm: hugetlb: thread pghint_t through buddy allocation chain
  mm: hugetlb: use PG_zeroed for pool pages, skip redundant zeroing
  mm: page_reporting: support host-zeroed reported pages
  mm: skip zeroing in vma_alloc_zeroed_movable_folio for pre-zeroed
    pages


Michael S. Tsirkin (18):
  mm: page_alloc: propagate PageReported flag across buddy splits
  mm: add pghint_t type and vma_alloc_folio_hints API
  mm: add PG_zeroed page flag for known-zero pages
  mm: page_alloc: track PG_zeroed across buddy merges
  mm: page_alloc: preserve PG_zeroed in try_to_claim_block
  mm: page_alloc: thread pghint_t through get_page_from_freelist
  mm: post_alloc_hook: use PG_zeroed to skip zeroing, return pghint_t
  mm: hugetlb: thread pghint_t through buddy allocation chain
  mm: hugetlb: use PG_zeroed for pool pages, skip redundant zeroing
  mm: page_reporting: support host-zeroed reported pages
  mm: skip zeroing in vma_alloc_zeroed_movable_folio for pre-zeroed
    pages
  mm: skip zeroing in alloc_anon_folio for pre-zeroed pages
  mm: skip zeroing in vma_alloc_anon_folio_pmd for pre-zeroed pages
  mm: memfd: skip zeroing for pre-zeroed hugetlb pages
  virtio_balloon: add host_zeroes_pages module parameter
  mm: page_reporting: add flush parameter with page budget
  mm: add free_frozen_pages_hint and put_page_hint APIs
  virtio_balloon: mark deflated pages as pre-zeroed

 drivers/virtio/virtio_balloon.c |  11 ++-
 fs/hugetlbfs/inode.c            |   5 +-
 include/linux/gfp.h             |  17 +++++
 include/linux/highmem.h         |   6 +-
 include/linux/hugetlb.h         |   6 +-
 include/linux/mm.h              |  12 +++
 include/linux/page-flags.h      |  13 +++-
 include/linux/page_reporting.h  |   3 +
 mm/compaction.c                 |   4 +-
 mm/huge_memory.c                |  12 +--
 mm/hugetlb.c                    |  52 +++++++++----
 mm/internal.h                   |   7 +-
 mm/memfd.c                      |  12 +--
 mm/memory.c                     |  14 ++--
 mm/mempolicy.c                  |  85 +++++++++++++++++++++
 mm/page_alloc.c                 | 131 ++++++++++++++++++++++++--------
 mm/page_reporting.c             |  55 +++++++++++++-
 mm/page_reporting.h             |  11 +++
 mm/swap.c                       |  19 +++++
 19 files changed, 392 insertions(+), 83 deletions(-)

-- 
MST


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [syzbot ci] Re: mm/virtio: skip redundant zeroing of host-zeroed reported pages
  2026-04-20 12:51 [PATCH RFC v2 00/18] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
                   ` (17 preceding siblings ...)
  2026-04-20 12:51 ` [PATCH RFC v2 18/18] virtio_balloon: mark deflated pages as pre-zeroed Michael S. Tsirkin
@ 2026-04-20 18:09 ` syzbot ci
  2026-04-20 18:20 ` [PATCH RFC v2 00/18] " David Hildenbrand (Arm)
  2026-04-21  2:21 ` Gregory Price
  20 siblings, 0 replies; 25+ messages in thread
From: syzbot ci @ 2026-04-20 18:09 UTC (permalink / raw)
  To: aarcange, akpm, apopple, axelrasmussen, baohua, baolin.wang, bhe,
	byungchul, chrisl, david, dev.jain, eperezma, gourry, hannes,
	hughd, jackmanb, jasowang, joshua.hahnjy, kasong, lance.yang,
	liam.howlett, linux-kernel, linux-mm, ljs, matthew.brost, mhocko,
	mst, muchun.song, npache, nphamcs, osalvador, rakie.kim, rppt,
	ryan.roberts, shikemeng, surenb, vbabka, virtualization, weixugc,
	xuanzhuo, ying.huang, yuanchu, ziy
  Cc: syzbot, syzkaller-bugs

syzbot ci has tested the following series

[v2] mm/virtio: skip redundant zeroing of host-zeroed reported pages
https://lore.kernel.org/all/cover.1776689093.git.mst@redhat.com
* [PATCH RFC v2 01/18] mm: page_alloc: propagate PageReported flag across buddy splits
* [PATCH RFC v2 02/18] mm: add pghint_t type and vma_alloc_folio_hints API
* [PATCH RFC v2 03/18] mm: add PG_zeroed page flag for known-zero pages
* [PATCH RFC v2 04/18] mm: page_alloc: track PG_zeroed across buddy merges
* [PATCH RFC v2 05/18] mm: page_alloc: preserve PG_zeroed in try_to_claim_block
* [PATCH RFC v2 06/18] mm: page_alloc: thread pghint_t through get_page_from_freelist
* [PATCH RFC v2 07/18] mm: post_alloc_hook: use PG_zeroed to skip zeroing, return pghint_t
* [PATCH RFC v2 08/18] mm: hugetlb: thread pghint_t through buddy allocation chain
* [PATCH RFC v2 09/18] mm: hugetlb: use PG_zeroed for pool pages, skip redundant zeroing
* [PATCH RFC v2 10/18] mm: page_reporting: support host-zeroed reported pages
* [PATCH RFC v2 11/18] mm: skip zeroing in vma_alloc_zeroed_movable_folio for pre-zeroed pages
* [PATCH RFC v2 12/18] mm: skip zeroing in alloc_anon_folio for pre-zeroed pages
* [PATCH RFC v2 13/18] mm: skip zeroing in vma_alloc_anon_folio_pmd for pre-zeroed pages
* [PATCH RFC v2 14/18] mm: memfd: skip zeroing for pre-zeroed hugetlb pages
* [PATCH RFC v2 15/18] virtio_balloon: add host_zeroes_pages module parameter
* [PATCH RFC v2 16/18] mm: page_reporting: add flush parameter with page budget
* [PATCH RFC v2 17/18] mm: add free_frozen_pages_hint and put_page_hint APIs
* [PATCH RFC v2 18/18] virtio_balloon: mark deflated pages as pre-zeroed

and found the following issue:
kernel BUG in free_huge_folio

Full report is available here:
https://ci.syzbot.org/series/329d9cff-a0ad-46d2-8ff4-d9f4341a611f

***

kernel BUG in free_huge_folio

tree:      mm-new
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/akpm/mm.git
base:      b8a5774cd49996e8ef83b1637a9b547158f18de9
arch:      amd64
compiler:  Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config:    https://ci.syzbot.org/builds/60e99a0e-08bf-474f-b034-a8bfd2eb90b0/config
syz repro: https://ci.syzbot.org/findings/2868ce13-1752-4f9f-9aa9-c5ce89f01fc7/syz_repro

 ret_from_fork+0x51e/0xb90 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
page_owner free stack trace missing
------------[ cut here ]------------
kernel BUG at ./include/linux/page-flags.h:698!
Oops: invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 1 UID: 0 PID: 6015 Comm: syz.2.19 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:__ClearPageZeroed include/linux/page-flags.h:698 [inline]
RIP: 0010:free_huge_folio+0xf93/0x12e0 mm/hugetlb.c:1749
Code: c7 c6 a0 64 db 8b e8 5c 9b fe fe 90 0f 0b e8 74 40 9c ff eb 05 e8 6d 40 9c ff 48 89 df 48 c7 c6 e0 63 db 8b e8 3e 9b fe fe 90 <0f> 0b e8 56 40 9c ff 48 89 df 48 c7 c6 40 64 db 8b e8 27 9b fe fe
RSP: 0018:ffffc90003a675b8 EFLAGS: 00010246
RAX: c71fb9abd148e700 RBX: ffffea0005808000 RCX: 0000000000000000
RDX: 0000000000000007 RSI: ffffffff8defcd3f RDI: 00000000ffffffff
RBP: 1ffffd4000b0101a R08: ffffffff9011ddb7 R09: 1ffffffff2023bb6
R10: dffffc0000000000 R11: fffffbfff2023bb7 R12: ffffea0005808008
R13: ffffea00058080d0 R14: ffffffff9a2e27c0 R15: 0000000000000040
FS:  00007fe11abed6c0(0000) GS:ffff8882a9453000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe119de9f00 CR3: 00000001ba914000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 __folio_put+0xfc/0x4f0 mm/swap.c:105
 hugetlb_mfill_atomic_pte+0x130a/0x1730 mm/hugetlb.c:6294
 mfill_atomic_hugetlb mm/userfaultfd.c:601 [inline]
 mfill_atomic mm/userfaultfd.c:773 [inline]
 mfill_atomic_copy+0xe28/0x1420 mm/userfaultfd.c:872
 userfaultfd_copy fs/userfaultfd.c:1642 [inline]
 userfaultfd_ioctl+0x2c17/0x5130 fs/userfaultfd.c:2059
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:597 [inline]
 __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:583
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fe119d9c819
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fe11abed028 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007fe11a015fa0 RCX: 00007fe119d9c819
RDX: 00002000000000c0 RSI: 00000000c028aa03 RDI: 0000000000000003
RBP: 00007fe119e32c91 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fe11a016038 R14: 00007fe11a015fa0 R15: 00007ffe7cd6ce28
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:__ClearPageZeroed include/linux/page-flags.h:698 [inline]
RIP: 0010:free_huge_folio+0xf93/0x12e0 mm/hugetlb.c:1749
Code: c7 c6 a0 64 db 8b e8 5c 9b fe fe 90 0f 0b e8 74 40 9c ff eb 05 e8 6d 40 9c ff 48 89 df 48 c7 c6 e0 63 db 8b e8 3e 9b fe fe 90 <0f> 0b e8 56 40 9c ff 48 89 df 48 c7 c6 40 64 db 8b e8 27 9b fe fe
RSP: 0018:ffffc90003a675b8 EFLAGS: 00010246
RAX: c71fb9abd148e700 RBX: ffffea0005808000 RCX: 0000000000000000
RDX: 0000000000000007 RSI: ffffffff8defcd3f RDI: 00000000ffffffff
RBP: 1ffffd4000b0101a R08: ffffffff9011ddb7 R09: 1ffffffff2023bb6
R10: dffffc0000000000 R11: fffffbfff2023bb7 R12: ffffea0005808008
R13: ffffea00058080d0 R14: ffffffff9a2e27c0 R15: 0000000000000040
FS:  00007fe11abed6c0(0000) GS:ffff8882a9453000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe119de9f00 CR3: 00000001ba914000 CR4: 00000000000006f0


***

If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
  Tested-by: syzbot@syzkaller.appspotmail.com

---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.

To test a patch for this bug, please reply with `#syz test`
(should be on a separate line).

The patch should be attached to the email.
Note: arguments like custom git repos and branches are not supported.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC v2 00/18] mm/virtio: skip redundant zeroing of host-zeroed reported pages
  2026-04-20 12:51 [PATCH RFC v2 00/18] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
                   ` (18 preceding siblings ...)
  2026-04-20 18:09 ` [syzbot ci] Re: mm/virtio: skip redundant zeroing of host-zeroed reported pages syzbot ci
@ 2026-04-20 18:20 ` David Hildenbrand (Arm)
  2026-04-20 23:33   ` Michael S. Tsirkin
  2026-04-21  2:21 ` Gregory Price
  20 siblings, 1 reply; 25+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-20 18:20 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel
  Cc: Andrew Morton, Vlastimil Babka, Brendan Jackman, Michal Hocko,
	Suren Baghdasaryan, Jason Wang, Andrea Arcangeli, linux-mm,
	virtualization

On 4/20/26 14:51, Michael S. Tsirkin wrote:
> 

Hi!

> 
> v2 - this is an attempt to address David Hildenbrand's comments:
> overloading GFP and using page->private, support for
> balloon deflate.
> 
> I hope this one is acceptable, API wise.
> 
> I also went ahead and implemented an alternative approach
> that David suggested:
> using GFP_ZERO to zero userspace pages.
> The issue is simple: on some architectures, one has to know the
> userspace fault address in order to flush the cache.
> 
> So, I had to propagate the fault address everywhere.

As I said, that might not be necessary. vma_alloc_folio() is the
interface we mostly care about in that regard.

> A lot of churn, and my concern is, if we miss even one
> place, silent, subtle data corruption will result and only
> on some arches (x86 will be fine).

Which would *already* be the case of you use folio_alloc(GFP_ZERO)
instead of magical vma_alloc_folio() + folio_zero_user().

I don't really see how vma_alloc_folio_hints() -- that also consumes the
address -- is any better in that regard?

When we just do the right thing with vma_alloc_folio(GFP_ZERO), at least
vma_alloc_folio() users will not accidentally do the wrong thing by
forgetting to use folio_zero_user().

> 
> Still, you can view that approach here:
> https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git gfp_zero
> 
> David, if you still feel I should switch to that approach,
> let me know. Personally, I'd rather keep that as a separate
> project from this optimization.
I'd prefer if we extend vma_alloc_folio() to just handle GFP_ZERO for us.

But let's hear other opinions first.

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC v2 00/18] mm/virtio: skip redundant zeroing of host-zeroed reported pages
  2026-04-20 18:20 ` [PATCH RFC v2 00/18] " David Hildenbrand (Arm)
@ 2026-04-20 23:33   ` Michael S. Tsirkin
  2026-04-21  2:38     ` Gregory Price
  0 siblings, 1 reply; 25+ messages in thread
From: Michael S. Tsirkin @ 2026-04-20 23:33 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-kernel, Andrew Morton, Vlastimil Babka, Brendan Jackman,
	Michal Hocko, Suren Baghdasaryan, Jason Wang, Andrea Arcangeli,
	linux-mm, virtualization

On Mon, Apr 20, 2026 at 08:20:57PM +0200, David Hildenbrand (Arm) wrote:
> On 4/20/26 14:51, Michael S. Tsirkin wrote:
> > 
> 
> Hi!
> 
> > 
> > v2 - this is an attempt to address David Hildenbrand's comments:
> > overloading GFP and using page->private, support for
> > balloon deflate.
> > 
> > I hope this one is acceptable, API wise.
> > 
> > I also went ahead and implemented an alternative approach
> > that David suggested:
> > using GFP_ZERO to zero userspace pages.
> > The issue is simple: on some architectures, one has to know the
> > userspace fault address in order to flush the cache.
> > 
> > So, I had to propagate the fault address everywhere.
> 
> As I said, that might not be necessary. vma_alloc_folio() is the
> interface we mostly care about in that regard.
>

I'm not sure I follow what "might not be necessary". We need a fault
address so zeroing can be effective wrt cache. Since you asked that it's
done deep in post alloc hook, the address has to propagate all over mm.


> > A lot of churn, and my concern is, if we miss even one
> > place, silent, subtle data corruption will result and only
> > on some arches (x86 will be fine).
> 
> Which would *already* be the case of you use folio_alloc(GFP_ZERO)
> instead of magical vma_alloc_folio() + folio_zero_user().
> 
> I don't really see how vma_alloc_folio_hints() -- that also consumes the
> address -- is any better in that regard?

By itself, it is not. But the issue is propagating the address from
there all over mm. If we miss even one place - we get a subtle cache
corruption on non x86.



hints are exactly that - if we forget to set them, all that happens
is that we do an extra zeroing. That is all.

> When we just do the right thing with vma_alloc_folio(GFP_ZERO), at least
> vma_alloc_folio() users will not accidentally do the wrong thing by
> forgetting to use folio_zero_user().


Well, it's simply that
1. if you plain forget folio_zero_user you get non zero on all arches
2. we *already* have folio_zero_user in place




> > 
> > Still, you can view that approach here:
> > https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git gfp_zero
> > 
> > David, if you still feel I should switch to that approach,
> > let me know. Personally, I'd rather keep that as a separate
> > project from this optimization.
> I'd prefer if we extend vma_alloc_folio() to just handle GFP_ZERO for us.

Pls take a look at that tree then. What do you think of that approach?
Better? If you want it in form of patches, I can post them
in private or on list.

Let me know, I don't have a problem with that approach - I tested
it and the performance is the same.  But the issue is that there's lot
of paths that have to propagate the fault address. It took me a while to
even find them all (assuming I found them all).


I also note that we need a flag for free in order to implement
balloon deflate as you asked. Here, I reused the hints.



> But let's hear other opinions first.
> 
> -- 
> Cheers,
> 
> David


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC v2 02/18] mm: add pghint_t type and vma_alloc_folio_hints API
  2026-04-20 12:50 ` [PATCH RFC v2 02/18] mm: add pghint_t type and vma_alloc_folio_hints API Michael S. Tsirkin
@ 2026-04-21  0:58   ` Huang, Ying
  0 siblings, 0 replies; 25+ messages in thread
From: Huang, Ying @ 2026-04-21  0:58 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, Andrew Morton, David Hildenbrand, Vlastimil Babka,
	Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
	Andrea Arcangeli, linux-mm, virtualization, Johannes Weiner,
	Zi Yan, Lorenzo Stoakes, Liam R. Howlett, Mike Rapoport,
	Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Gregory Price, Alistair Popple

"Michael S. Tsirkin" <mst@redhat.com> writes:

> Add pghint_t, a bitwise type for communicating page allocation hints
> between the allocator and callers.  Define PGHINT_ZEROED to indicate
> that the allocated page contents are known to be zero.
>
> Add _hints variants of the allocation functions that accept a
> pghint_t *hints output parameter:
>
>   vma_alloc_folio_hints()  -> folio_alloc_mpol_hints (internal)
>                            -> __alloc_frozen_pages_hints()
>
> The existing APIs are unchanged and continue to work without hints.
> For now, hints is always initialized to 0.  A subsequent patch will
> set PGHINT_ZEROED when the page was pre-zeroed by the host.

Why do we need this feature?  Is there any performance impact?  If so,
please provide some performance data.

[snip]

---
Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC v2 00/18] mm/virtio: skip redundant zeroing of host-zeroed reported pages
  2026-04-20 12:51 [PATCH RFC v2 00/18] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
                   ` (19 preceding siblings ...)
  2026-04-20 18:20 ` [PATCH RFC v2 00/18] " David Hildenbrand (Arm)
@ 2026-04-21  2:21 ` Gregory Price
  20 siblings, 0 replies; 25+ messages in thread
From: Gregory Price @ 2026-04-21  2:21 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, Andrew Morton, David Hildenbrand, Vlastimil Babka,
	Brendan Jackman, Michal Hocko, Suren Baghdasaryan, Jason Wang,
	Andrea Arcangeli, linux-mm, virtualization

On Mon, Apr 20, 2026 at 08:51:13AM -0400, Michael S. Tsirkin wrote:
> 
> When a guest reports free pages to the hypervisor via virtio-balloon's
> free page reporting, the host typically zeros those pages when reclaiming
> their backing memory (e.g., via MADV_DONTNEED on anonymous mappings).
> When the guest later reallocates those pages, the kernel zeros them
> again -- redundantly.
>

It took me a second to really wrap my head around what you were saying
here, but if i'm following correctly:

  1) Guest steals a page, reports the free page to the host
  2) Host returns that page to the buddy
  3) Guest wants the page back -> vmexit, alloc()
      a) host gets a page from the buddy via fault path
      b) this memory is "user memory" so host zeroes the page
  4) Guest repeats step 3, re-zeoring the page

So you're adding a step that does:

  1) page_reporting_drain() in guest sets PG_zeroed if host_zeroes_pages=true
  2) on allocation, if PG_zeroed is set, don't zero

In theory this seems ok.  PG_zeroed being a buddy-only flag is nice.

In practice there are obvious concerns about an explicit flag that would
allow a kernel (in this case the guest) to skip zeroing a page destined
for userland mappings - but i'm also paranoid.

In concept this seems reasonable, in implementation I have concerns
about the pghint_t type being added. Will respond inline in David's
reply thread on that though where you already have notes.

~Gregory

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC v2 00/18] mm/virtio: skip redundant zeroing of host-zeroed reported pages
  2026-04-20 23:33   ` Michael S. Tsirkin
@ 2026-04-21  2:38     ` Gregory Price
  0 siblings, 0 replies; 25+ messages in thread
From: Gregory Price @ 2026-04-21  2:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: David Hildenbrand (Arm), linux-kernel, Andrew Morton,
	Vlastimil Babka, Brendan Jackman, Michal Hocko,
	Suren Baghdasaryan, Jason Wang, Andrea Arcangeli, linux-mm,
	virtualization

On Mon, Apr 20, 2026 at 07:33:38PM -0400, Michael S. Tsirkin wrote:
> On Mon, Apr 20, 2026 at 08:20:57PM +0200, David Hildenbrand (Arm) wrote:
> > On 4/20/26 14:51, Michael S. Tsirkin wrote:
> 
> > > A lot of churn, and my concern is, if we miss even one
> > > place, silent, subtle data corruption will result and only
> > > on some arches (x86 will be fine).
> > 
> > Which would *already* be the case of you use folio_alloc(GFP_ZERO)
> > instead of magical vma_alloc_folio() + folio_zero_user().
> > 
> > I don't really see how vma_alloc_folio_hints() -- that also consumes the
> > address -- is any better in that regard?
> 
> By itself, it is not. But the issue is propagating the address from
> there all over mm. If we miss even one place - we get a subtle cache
> corruption on non x86.
> 

Why does it need to propogate?

Can we leave folio_zero_user() callers the same, but add a PG_zeroed
check in folio_zero_user() that skips the zeroing (but not the cache
flush) and clear the PG_zeroed bit?

Is this feasible?

You don't eliminate the folio_zero_user(), but maybe we shouldn't?

(a bit naive here - i haven't checked the PG_zeroed lifetime, i did
 see it overloads PG_private - so this might not be feasible)

> 
> I also note that we need a flag for free in order to implement
> balloon deflate as you asked. Here, I reused the hints.
> 

I'd sooner just implement this as

   ___put_folio(folio, gfp_t)

   __put_folio(folio) { ___put_folio(folio, NULL); }

And change the free path to take overloaded gfp flags.

Some of the existing ones might even be useful as-is.

It's essentially the same thing, but prevents a bunch of churn and
saves us a new concept.

optional gfp flags on free seem like genuinely useful interface for
certain callers (definitely not all).

~Gregory

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2026-04-21  2:38 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-20 12:51 [PATCH RFC v2 00/18] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 01/18] mm: page_alloc: propagate PageReported flag across buddy splits Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 02/18] mm: add pghint_t type and vma_alloc_folio_hints API Michael S. Tsirkin
2026-04-21  0:58   ` Huang, Ying
2026-04-20 12:50 ` [PATCH RFC v2 03/18] mm: add PG_zeroed page flag for known-zero pages Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 04/18] mm: page_alloc: track PG_zeroed across buddy merges Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 05/18] mm: page_alloc: preserve PG_zeroed in try_to_claim_block Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 06/18] mm: page_alloc: thread pghint_t through get_page_from_freelist Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 07/18] mm: post_alloc_hook: use PG_zeroed to skip zeroing, return pghint_t Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 08/18] mm: hugetlb: thread pghint_t through buddy allocation chain Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 09/18] mm: hugetlb: use PG_zeroed for pool pages, skip redundant zeroing Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 10/18] mm: page_reporting: support host-zeroed reported pages Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 11/18] mm: skip zeroing in vma_alloc_zeroed_movable_folio for pre-zeroed pages Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 12/18] mm: skip zeroing in alloc_anon_folio " Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 13/18] mm: skip zeroing in vma_alloc_anon_folio_pmd " Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 14/18] mm: memfd: skip zeroing for pre-zeroed hugetlb pages Michael S. Tsirkin
2026-04-20 12:51 ` [PATCH RFC v2 15/18] virtio_balloon: add host_zeroes_pages module parameter Michael S. Tsirkin
2026-04-20 12:51 ` [PATCH RFC v2 16/18] mm: page_reporting: add flush parameter with page budget Michael S. Tsirkin
2026-04-20 12:51 ` [PATCH RFC v2 17/18] mm: add free_frozen_pages_hint and put_page_hint APIs Michael S. Tsirkin
2026-04-20 12:51 ` [PATCH RFC v2 18/18] virtio_balloon: mark deflated pages as pre-zeroed Michael S. Tsirkin
2026-04-20 18:09 ` [syzbot ci] Re: mm/virtio: skip redundant zeroing of host-zeroed reported pages syzbot ci
2026-04-20 18:20 ` [PATCH RFC v2 00/18] " David Hildenbrand (Arm)
2026-04-20 23:33   ` Michael S. Tsirkin
2026-04-21  2:38     ` Gregory Price
2026-04-21  2:21 ` Gregory Price

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox