From: Lorenzo Stoakes <ljs@kernel.org>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: linux-kernel@vger.kernel.org,
"David Hildenbrand (Arm)" <david@kernel.org>,
"Jason Wang" <jasowang@redhat.com>,
"Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
"Eugenio Pérez" <eperezma@redhat.com>,
"Muchun Song" <muchun.song@linux.dev>,
"Oscar Salvador" <osalvador@suse.de>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Liam R. Howlett" <liam@infradead.org>,
"Vlastimil Babka" <vbabka@kernel.org>,
"Mike Rapoport" <rppt@kernel.org>,
"Suren Baghdasaryan" <surenb@google.com>,
"Michal Hocko" <mhocko@suse.com>,
"Brendan Jackman" <jackmanb@google.com>,
"Johannes Weiner" <hannes@cmpxchg.org>, "Zi Yan" <ziy@nvidia.com>,
"Baolin Wang" <baolin.wang@linux.alibaba.com>,
"Nico Pache" <npache@redhat.com>,
"Ryan Roberts" <ryan.roberts@arm.com>,
"Dev Jain" <dev.jain@arm.com>, "Barry Song" <baohua@kernel.org>,
"Lance Yang" <lance.yang@linux.dev>,
"Hugh Dickins" <hughd@google.com>,
"Matthew Brost" <matthew.brost@intel.com>,
"Joshua Hahn" <joshua.hahnjy@gmail.com>,
"Rakie Kim" <rakie.kim@sk.com>,
"Byungchul Park" <byungchul@sk.com>,
"Gregory Price" <gourry@gourry.net>,
"Ying Huang" <ying.huang@linux.alibaba.com>,
"Alistair Popple" <apopple@nvidia.com>,
"Christoph Lameter" <cl@gentwo.org>,
"David Rientjes" <rientjes@google.com>,
"Roman Gushchin" <roman.gushchin@linux.dev>,
"Harry Yoo" <harry.yoo@oracle.com>,
"Axel Rasmussen" <axelrasmussen@google.com>,
"Yuanchu Xie" <yuanchu@google.com>, "Wei Xu" <weixugc@google.com>,
"Chris Li" <chrisl@kernel.org>,
"Kairui Song" <kasong@tencent.com>,
"Kemeng Shi" <shikemeng@huaweicloud.com>,
"Nhat Pham" <nphamcs@gmail.com>, "Baoquan He" <bhe@redhat.com>,
virtualization@lists.linux.dev, linux-mm@kvack.org,
"Andrea Arcangeli" <aarcange@redhat.com>
Subject: Re: [PATCH v10 00/37] mm/virtio: skip redundant zeroing of host-zeroed pages
Date: Mon, 8 Jun 2026 10:17:34 +0100 [thread overview]
Message-ID: <aiaHd3T42XyB3UBn@lucifer> (raw)
In-Reply-To: <cover.1780906288.git.mst@redhat.com>
On Mon, Jun 08, 2026 at 04:33:46AM -0400, Michael S. Tsirkin wrote:
> Further, on architectures with aliasing caches, upstream with init_on_alloc
> double-zeros user pages: once via kernel_init_pages() in
> post_alloc_hook, and again via clear_user_highpage() at the
> callsite (because user_alloc_needs_zeroing() returns true).
> This series eliminates that double-zeroing by moving the zeroing
> into the post_alloc_hook + propagating the "host
> already zeroed this page" information through the buddy allocator.
>
> For page reporting, VIRTIO_BALLOON_F_DEVICE_INIT_REPORTED (bit 6)
> is used. For the inflate/deflate path,
> VIRTIO_BALLOON_F_DEVICE_INIT_ON_INFLATE (bit 7) is used.
>
> Virtio spec: https://lore.kernel.org/all/cover.1778140241.git.mst@redhat.com
>
> Based on v7.1-rc6. When applying on mm-unstable, two conflicts
> are expected:
> - kernel_init_pages() was renamed to clear_highpages_kasan_tagged()
> in mm-unstable. Use clear_highpages_kasan_tagged() in the
> post_alloc_hook else branch.
> - FPI_PREPARED uses BIT(3) in mm-unstable. Bump FPI_ZEROED to
> BIT(4).
> Build-tested on mm-unstable at e9dd96806dbc:
> https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git zero-mm-unstable
>
> Patches 1-5: fixes/cleanups, dependencies of the zeroing patches.
> Patches 6-9: thread user_addr through page allocator, contig API,
> and gigantic hugetlb allocation.
> Patches 10-16: folio_zero_user in post_alloc_hook, vma_alloc_zeroed
> conversion, raw fault address threading.
> Patches 17-24: PG_zeroed flag, aliasing guard, buddy merge/split
> tracking, FPI_ZEROED optimization, folio_put_zeroed.
> Patches 25-27: __GFP_ZERO callsite conversions (alloc_anon_folio,
> vma_alloc_anon_folio_pmd) with memcg charge failure mitigation.
> Patches 28-29: hugetlb __GFP_ZERO + HPG_zeroed.
> Patches 30-35: page reporting zeroing (DEVICE_INIT_REPORTED),
> disable indirect descriptors.
> Patches 36-37: inflate/deflate zeroing (DEVICE_INIT_ON_INFLATE).
This seems far too much for one series.
YOu're doing a bunch of mm stuff that seems relatively independent, then
putting the virtio stuff on top.
I think this should be broken out into separate series laying foundations
rather than doing it all in one go, which is also difficult for review
purposes.
Adding a new folio flag is contentious also for instance, we maybe want to
go bit-by-bit and ensure that each foundational element is acceptable
before doing the next bit rather than having it as part of a big series.
Looking through the changelog only adds to this feeling! Huge numbers of
changes, even relatively recently and I'm not sure all relevant maintainers
in mm have had a look through either.
Thanks, Lorenzo
>
> -------
>
> Performance with THP enabled on a 2GB VM, 1 vCPU, allocating
> 256MB of anonymous pages:
>
> metric baseline optimized delta
> task-clock 232 +- 20 ms 51 +- 26 ms -78%
> cache-misses 1.20M +- 248K 288K +- 102K -76%
> instructions 16.3M +- 1.2M 13.8M +- 1.0M -15%
>
> With hugetlb surplus pages:
>
> metric baseline optimized delta
> task-clock 219 +- 23 ms 65 +- 34 ms -70%
> cache-misses 1.17M +- 391K 263K +- 36K -78%
> instructions 17.9M +- 1.2M 15.1M +- 724K -16%
>
> Two flags track known-zero pages:
> PG_zeroed (aliased to PG_private) marks buddy allocator pages that
> are known to contain all zeros, either because the host zeroed
> them during page reporting, or because they were freed via the
> balloon deflate path. It lives on free-list pages and is consumed
> by post_alloc_hook() on allocation.
> HPG_zeroed (stored in hugetlb folio->private bits) serves the same
> purpose for hugetlb pool pages, which are kept in a pool and may
> be zeroed long after buddy allocation, so PG_zeroed (consumed at
> allocation time) cannot track their state.
>
> PG_zeroed lifecycle:
>
> Sets PG_zeroed:
> - page_reporting_drain: on reported pages when host zeroes them
> - __free_pages_ok / __free_frozen_pages: when FPI_ZEROED is set
> (balloon deflate path)
> - buddy merge: on merged page if both buddies were zeroed
> - expand(): propagate to split-off buddy sub-pages
>
> Clears PG_zeroed:
> - __free_pages_prepare: clears all PAGE_FLAGS_CHECK_AT_PREP flags
> (PG_zeroed included), preventing PG_private aliasing leaks
> - rmqueue_buddy / __rmqueue_pcplist: read-then-clear, passes
> zeroed hint to prep_new_page -> post_alloc_hook
> - __isolate_free_page: clear (compaction/page_reporting isolation)
> - compaction, alloc_contig, split_free_frozen: clear before use
> - buddy merge: clear both pages before merge, then conditionally
> re-set on merged head if both were zeroed
>
> HPG_zeroed lifecycle (hugetlb pool pages, stored in folio->private):
>
> Sets HPG_zeroed:
> - alloc_surplus_hugetlb_folio: after buddy allocation with
> __GFP_ZERO, mark pool page as known-zero
>
> Clears HPG_zeroed:
> - free_huge_folio: page was mapped to userspace, no longer
> known-zero when it returns to the pool
> - alloc_hugetlb_folio: cleared unconditionally on output
> - alloc_hugetlb_folio_reserve: cleared after checking
>
> - The optimization is most effective with THP, where entire 2MB
> pages are allocated directly from reported order-9+ buddy pages.
> Without THP, only ~21% of order-0 allocations come from reported
> pages due to low-order fragmentation.
> - Persistent hugetlb pool pages are not covered: when freed by
> userspace they return to the hugetlb free pool, not the buddy
> allocator, so they are never reported to the host. Surplus
> hugetlb pages are allocated from buddy and do benefit.
>
> - PG_zeroed is aliased to PG_private. __free_pages_prepare() clears it
> (preventing filesystem PG_private from leaking as false PG_zeroed).
> FPI_ZEROED re-sets it after prepare for balloon deflate pages.
> Is aliasing PG_private acceptable, or should a different bit be used?
>
> - With __GFP_ZERO, the folio is zeroed before mem_cgroup_charge().
> If the charge fails (cgroup at limit), the zeroing work is wasted
> and the folio is freed and retried at a smaller order. Previously,
> zeroing was done after a successful charge. This is inherent to
> the __GFP_ZERO approach. Is this acceptable?
>
> - On architectures with aliasing caches, upstream with init_on_alloc
> double-zeros user pages: once via kernel_init_pages() in
> post_alloc_hook, and again via clear_user_highpage() at the
> callsite (because user_alloc_needs_zeroing() returns true).
> Our patches eliminate this by zeroing once via folio_zero_user()
> in post_alloc_hook. Not a critical fix (people who set init_on_alloc
> know they are paying performance) but a nice cleanup anyway.
>
> Test program:
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <sys/mman.h>
>
> #ifndef MADV_POPULATE_WRITE
> #define MADV_POPULATE_WRITE 23
> #endif
> #ifndef MAP_HUGETLB
> #define MAP_HUGETLB 0x40000
> #endif
>
> int main(int argc, char **argv)
> {
> unsigned long size;
> int flags = MAP_PRIVATE | MAP_ANONYMOUS;
> void *p;
> int r;
>
> if (argc < 2) {
> fprintf(stderr, "usage: %s <size_mb> [huge]\n", argv[0]);
> return 1;
> }
> size = atol(argv[1]) * 1024UL * 1024;
> if (argc >= 3 && strcmp(argv[2], "huge") == 0)
> flags |= MAP_HUGETLB;
> p = mmap(NULL, size, PROT_READ | PROT_WRITE, flags, -1, 0);
> if (p == MAP_FAILED) {
> perror("mmap");
> return 1;
> }
> r = madvise(p, size, MADV_POPULATE_WRITE);
> if (r) {
> perror("madvise");
> return 1;
> }
> munmap(p, size);
> return 0;
> }
>
> Test script (bench.sh):
>
> #!/bin/bash
> # Usage: bench.sh <size_mb> <iterations> [huge]
> # Feature negotiation (DEVICE_INIT_REPORTED/ON_INFLATE) is
> # handled by QEMU command line flags,
> SZ=${1:-256}; ITER=${2:-10}; HUGE=${3:-}
> FLUSH=/sys/module/page_reporting/parameters/flush
> CSV=/tmp/perf.csv
> rmmod virtio_balloon 2>/dev/null
> insmod /mnt/share/virtio_balloon.ko
> echo 512 > $FLUSH
> [ "$HUGE" = "huge" ] && echo $((SZ/2)) > /proc/sys/vm/nr_overcommit_hugepages
> rm -f $CSV
> echo "=== sz=${SZ}MB iter=$ITER $HUGE ==="
> for i in $(seq 1 $ITER); do
> echo 3 > /proc/sys/vm/drop_caches
> echo 512 > $FLUSH
> perf stat -e task-clock,instructions,cache-misses \
> -x, -o $CSV --append -- /mnt/share/alloc_once $SZ $HUGE
> done
> [ "$HUGE" = "huge" ] && echo 0 > /proc/sys/vm/nr_overcommit_hugepages
> rmmod virtio_balloon
> awk -F, '/^#/||/^$/{next}{v=$1+0;e=$3;gsub(/ /,"",e);s[e]+=v;ss[e]+=v*v;n[e]++}
> END{for(e in s){a=s[e]/n[e];d=sqrt(ss[e]/n[e]-a*a);printf " %-16s %10.0f +- %8.0f (n=%d)\n",e,a,d,n[e]}}' $CSV
>
> Compile and run:
> gcc -static -O2 -o alloc_once alloc_once.c
> bash bench.sh 256 10 # regular pages
> bash bench.sh 256 10 huge # hugetlb surplus
>
> Note about Sashiko (sashiko.dev) false positives:
> Sashiko's mm-alloc guideline says "Any optimization replacing
> clear_user_highpage() with __GFP_ZERO is wrong on [cache-aliasing]
> architectures". This is correct for mainline but not for this
> series, which threads user_addr through the allocator so that
> post_alloc_hook() calls folio_zero_user() with the dcache flush.
> Suggested guideline update: add "unless the caller passes a
> valid user address (i.e. not USER_ADDR_NONE) to vma_alloc_folio(),
> alloc_contig_frozen_pages_user() etc., which reaches
> post_alloc_hook() for the dcache flush".
>
> Pre-existing bugs found during review (not fixed, not made worse):
> - do_swap_page() returns VM_FAULT_OOM on large-folio swapin race
> instead of retrying.
> - free_huge_folio() called with refcount==1 on
> mem_cgroup_charge_hugetlb failure.
> - memfd_alloc_folio() double-decrements resv_huge_pages on error.
> - wait_event in virtballoon_free_page_report hangs on broken
> virtqueue (pre-existing, same as old single-buffer code).
> - tell_host() GFP_KERNEL under balloon_lock risks OOM deadlock.
>
> Changes since v9:
> - Fix W=1 kerneldoc warning on alloc_contig_frozen_pages_user_noprof.
> - Fix link error on !MMU configs (m68k, arm allnoconfig): move
> folio_zero_user stub to new mm/folio_zero.h header.
> - Reorder patches: move PG_zeroed tracking and folio_put_zeroed
> before __GFP_ZERO conversions, allowing folio_put_zeroed to
> handle memcg charge failures.
> - Better handle memcg charge failures.
>
> Changes since v8 (address Sashiko v8 review findings):
> - Fix mempolicy interleave: combine vm_pgoff and VMA offset into
> a single expression before shifting, fixing carry loss for
> file-backed VMAs with unaligned vm_pgoff.
> - Fix memory-failure: wrap ClearPageHWPoison in retry path with
> zone->lock (same race as TestSetPageHWPoison).
> - Fix stale comment: "folio_zero_user writes" -> "page zeroing"
> in huge_memory.c __folio_mark_uptodate comment.
> - Drop rounddown_pow_of_two for page reporting capacity (no-op
> for compiler optimization, halves batch size for non-power-of-2).
> - Reorder: move "mm: balloon: use put_page_zeroed" before
> "virtio_balloon: implement DEVICE_INIT_ON_INFLATE" so the
> ClearPageZeroed handling is in place before any page gets
> the flag set.
> - Various commit log improvements (PowerPC note in aliasing
> patch, memory-failure note about other HWPoison calls,
> wording fixes).
>
> Changes since v7 (address Sashiko AI review findings):
> - Fix dcache flush on VIPT aliasing architectures: add
> user_alloc_needs_zeroing() guard in post_alloc_hook to force
> folio_zero_user for user pages when cache aliasing requires it.
> Host-zeroed pages excluded (!zeroed). Optimization preserved.
> - Fix folio_zero_user stub: replace macro with non-inline function
> in mm/memory.c to avoid double-evaluation and missing include.
> - Fix C89 declaration-after-statement in free_huge_folio.
> - Fix CMA __GFP_ZERO: pass through to cma_alloc_frozen_compound
> so HPG_zeroed accurately reflects whether page was zeroed.
> - Fix big-endian bitmap: use test_bit_le() for inflate_bitmap.
> - Fix migratepage: clear PageZeroed on old page before deflation.
> - Fix page_reporting flush: overflow-safe loop, add -EINTR on
> signal, add code comment explaining double flush_delayed_work.
> - Add atomic ClearPageZeroed (CLEARPAGEFLAG) for balloon migration
> path where zone->lock is not held.
> - Add VM_WARN_ON_ONCE for order>0 without __GFP_COMP in
> post_alloc_hook (folio_zero_user requires compound metadata).
> - Add _noprof pattern for vma_alloc_zeroed_movable_folio to
> preserve memory allocation profiling attribution.
> - Add PageReported propagation in split_large_buddy (was missing
> from patch 2).
> - Add FPI_ZEROED guard: skip PageZeroed when page_poisoning
> enabled and init_on_free disabled (poison overwrites zeroes).
> - Add DMA alignment comment for inflate_bitmap (ACCESS_PLATFORM
> cleared, so not needed now).
> - Restore tell_host comment explaining vq buffer assumption.
> - Various code comments documenting design decisions.
> - Drop __GFP_ZERO from gather_surplus_pages: avoid shifting
> zeroing from fault time to reservation time (mmap/fallocate).
> Pool pages are zeroed at fault time via alloc_hugetlb_folio.
> Fresh surplus allocations at fault time still benefit from
> __GFP_ZERO + HPG_zeroed.
> - New patch: add alloc_contig_frozen_pages_user API with user_addr
> for cache-friendly zeroing in the contiguous allocation path.
> - New patch: thread user_addr through gigantic hugetlb allocation
> via alloc_contig_frozen_pages_user.
> - New patch: replace user_alloc_needs_zeroing() with aliasing-only
> checks (cpu_dcache_is_aliasing || cpu_icache_is_aliasing) in the
> post_alloc_hook guard. Avoids redundant re-zero on non-aliasing.
> - New patch: serialize TestSetPageHWPoison with zone->lock in
> memory_failure to fix pre-existing race with non-atomic buddy
> flag operations (e.g. page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP).
> - New patch: disable VIRTIO_RING_F_INDIRECT_DESC in balloon to
> prevent GFP_KERNEL allocation under balloon_lock (OOM deadlock).
> - New patch: skip kernel_init_pages for FPI_ZEROED when page
> poisoning is not enabled (page already zero, skip redundant work).
>
> Also since v7 (address review by Gregory Price):
> - Drop from_pool bool in alloc_hugetlb_folio: use
> folio_test_hugetlb_zeroed directly. HPG_zeroed is set by
> alloc_surplus_hugetlb_folio for fresh allocations, so the
> check handles both pool and fresh pages.
> - Drop bool *zeroed output parameter from alloc_hugetlb_folio:
> sink zeroing inside the function. When __GFP_ZERO is set and
> !folio_test_hugetlb_zeroed, call folio_zero_user internally.
> - Rename addr to user_addr in alloc_hugetlb_folio, align
> internally with huge_page_mask.
> - Add Reviewed-by: Gregory Price tags on reviewed patches.
>
> New patches since v7:
> - mm: memory-failure: serialize TestSetPageHWPoison with zone->lock
> - mm: add alloc_contig_frozen_pages_user for cache-friendly zeroing
> - mm: hugetlb: thread user_addr through gigantic page allocation
> - mm: page_alloc: use aliasing checks instead of
> user_alloc_needs_zeroing
> - virtio_balloon: disable indirect descriptors
> - mm: page_alloc: skip kernel_init_pages for FPI_ZEROED when safe
>
> Changes since v6 (address review by Gregory Price):
> - Rework hugetlb: use gfp_t parameter instead of bool zero /
> bool *zeroed. Sink zeroing inside alloc_hugetlb_folio().
> Pass raw fault address (user_addr) for cache-friendly zeroing
> on both pool-page and fresh allocation
> paths. (Suggested by Gregory Price)
> - Reorder compaction_alloc_noprof() to call prep_compound_page
> before post_alloc_hook for consistency.
> (Suggested by Gregory Price)
> - Reorder: interleave fix first, PageReported propagation and
> capacity fix moved to front as dependencies.
> - Add USER_ADDR_NONE comments in mmap.c and internal.h explaining why -1 is
> never a valid userspace address.
> - Fix err uninitialized warning in virtballoon_free_page_report().
> - Lots of commit log tweaks.
>
> Also in v7:
> - Fix hugetlb pool page zeroing to use vmf->real_address
> (the actual faulting subpage) instead of vmf->address
> (hugepage-aligned), preserving cache-friendly zeroing
> locality that upstream had at the callsite.
> - Remove dead/broken alloc_hugetlb_folio !CONFIG_HUGETLB_PAGE
> stub (returned NULL but callers check IS_ERR).
>
> Changes since v5:
> - Rebased onto v7.1-rc2.
> - Split alloc_anon_folio and alloc_swap_folio raw fault address
> changes into separate patches.
> - In virtio, move PAGE_POISON check for DEVICE_INIT_REPORTED
> from probe() to validate(), clearing the feature instead of
> just gating host_zeroes_pages. Same for confidential
> computing check.
> - Fix bisectability: FPI_ZEROED definition and usage now in
> the same patch.
> - Lots of commit log tweaks.
> - Reorder: REPORTED before ON_INFLATE.
> - Kerneldoc fixes.
>
> Changes since v4:
> With virtio spec posted, update to latest spec:
> - Add VIRTIO_BALLOON_F_DEVICE_INIT_REPORTED (bit 6) for reporting.
> - Add VIRTIO_BALLOON_F_DEVICE_INIT_ON_INFLATE (bit 7) for inflate.
> - Per-page virtqueue submission, per-page used_len feedback.
> - Balloon migration preserves PageZeroed hint.
> - Page_reporting capacity bugfix for small virtqueues.
> - PG_zeroed propagation in split_large_buddy.
> - Disable both features for confidential computing guests.
> - Gate host_zeroes_pages on PAGE_POISON/poison_val: when PAGE_POISON
> is negotiated with non-zero poison_val, device fills with poison
> not zeros, so host_zeroes_pages must be false.
> - Disable ON_INFLATE when PAGE_POISON with non-zero poison_val.
> - Bound inflate bitmap reads by used_len from device.
> - Move ON_INFLATE poison_val check to validate() for proper
> feature negotiation.
> - Fix NUMA interleave index for unaligned VMA start (new patch 1).
> - Drop vma_alloc_folio_user_addr: with the ilx fix, callers can
> pass raw fault address to vma_alloc_folio directly.
> - Tested with DEBUG_VM, INIT_ON_ALLOC/FREE enabled.
>
> Changes since v3 (address review by Gregory Price and David Hildenbrand):
> - Keep user_addr threading internal: public APIs (__alloc_pages,
> __folio_alloc, folio_alloc_mpol) are unchanged. Only internal
> functions (__alloc_frozen_pages_noprof, __alloc_pages_mpol) carry
> user_addr. This eliminates all API churn for external callers.
> - Add vma_alloc_folio_user_addr() (2/22) to separate NUMA policy
> address from the zeroing hint address. Fixes NUMA interleave
> index corruption when passing unaligned fault address for
> higher-order allocations.
> - Add per-page zeroed_bitmap to page_reporting_dev_info (17/22).
> The driver's report() callback manages the bitmap. Drain
> checks it gated by the host_zeroes_pages static key. This
> matches the proposed virtio balloon extension at
> https://lore.kernel.org/all/cover.1776874126.git.mst@redhat.com/
> - Clear PG_zeroed in __isolate_free_page() to prevent the aliased
> PG_private flag from leaking to compaction/alloc_contig paths.
> - Do not exclude PG_zeroed from PAGE_FLAGS_CHECK_AT_PREP macro.
> Instead, __free_pages_prepare() clears it (preventing filesystem
> PG_private leaking as false PG_zeroed), and FPI_ZEROED sets it
> after prepare. Only buddy merge assertion is relaxed.
> - Initialize alloc_context.user_addr in alloc_pages_bulk_noprof.
> - Deflate and hugetlb changes are much smaller now. Still, the
> patchset can be merged gradually, if desired.
>
> Changes since v2 (address review by Gregory Price and David Hildenbrand):
> - v2 used pghint_t / vma_alloc_folio_hints API. v3 switches to
> threading user_addr through the page allocator and using __GFP_ZERO,
> so post_alloc_hook() can use folio_zero_user() for cache-friendly
> zeroing when the user fault address is known.
> - Use FPI_ZEROED to set PG_zeroed after __free_pages_prepare() instead
> of runtime masking in __free_one_page (further refined in v4).
> - Drop redundant page_poisoning_enabled() check from mm core free
> path, already guarded at feature negotiation time in
> virtio_balloon_validate. The balloon driver keeps its own
> page_poisoning_enabled_static() check as defense in depth.
> - Split free_frozen_pages_zeroed and put_page_zeroed into separate
> patches. David Hildenbrand indicated he intends to rework balloon
> pages to be frozen (no refcount), at which point put_page_zeroed
> (21/22) can be dropped and the balloon can call
> free_frozen_pages_zeroed directly.
> - Use HPG_zeroed flag (in hugetlb folio->private) for hugetlb pool
> pages instead of PG_zeroed, since pool pages are zeroed long after
> buddy allocation and PG_zeroed is consumed at allocation time.
> - syzbot CI found a PF_NO_COMPOUND BUG in the v2 pghint_t approach
> where __ClearPageZeroed was called on compound hugetlb pages in
> free_huge_folio. The v3 HPG_zeroed approach avoids this.
> - Remove redundant arch vma_alloc_zeroed_movable_folio overrides
> on x86, s390, m68k, and alpha (12/22). Suggested by David
> Hildenbrand.
> - Updated benchmarking script to compute per-run avg +- stddev
> via awk on CSV output.
>
> Changes v1->v2:
> - Replaced __GFP_PREZEROED with PG_zeroed page flag (aliased PG_private)
> - Added pghint_t type and vma_alloc_folio_hints() API
> - Track PG_zeroed across buddy merges and splits
> - Added post_alloc_hook integration (single consume/clear point)
> - Added hugetlb support (pool pages + memfd)
> - Added page_reporting flush parameter for deterministic testing
> - Added free_frozen_pages_hint/put_page_hint for balloon deflate path
> - Added try_to_claim_block PG_zeroed preservation
> - Updated perf numbers with per-iteration flush methodology
>
> Written with assistance from Claude (claude-opus-4-6).
> Reviewed by cursor-agent (GPT-5.4-xhigh).
> Everything manually read, patchset split and commit logs edited manually.
>
>
> Michael S. Tsirkin (37):
> mm: mempolicy: fix interleave index calculation
> mm: memory-failure: serialize TestSetPageHWPoison with zone->lock
> mm: page_alloc: propagate PageReported flag across buddy splits
> mm: page_reporting: allow driver to set batch capacity
> mm: hugetlb: remove dead alloc_hugetlb_folio stub
> mm: move vma_alloc_folio_noprof to page_alloc.c
> mm: thread user_addr through page allocator for cache-friendly zeroing
> mm: add alloc_contig_frozen_pages_user for cache-friendly zeroing
> mm: hugetlb: thread user_addr through gigantic page allocation
> mm: add folio_zero_user stub for configs without THP/HUGETLBFS
> mm: page_alloc: move prep_compound_page before post_alloc_hook
> mm: use folio_zero_user for user pages in post_alloc_hook
> mm: use __GFP_ZERO in vma_alloc_zeroed_movable_folio
> mm: remove arch vma_alloc_zeroed_movable_folio overrides
> mm: alloc_anon_folio: pass raw fault address to vma_alloc_folio
> mm: alloc_swap_folio: pass raw fault address to vma_alloc_folio
> mm: page_reporting: skip redundant zeroing of host-zeroed reported
> pages
> mm: page_alloc: use aliasing checks instead of
> user_alloc_needs_zeroing
> mm: page_alloc: clear PG_zeroed on buddy merge if not both zero
> mm: page_alloc: preserve PG_zeroed in page_del_and_expand
> mm: page_alloc: propagate PG_zeroed in split_large_buddy
> mm: add free_frozen_pages_zeroed
> mm: page_alloc: skip kernel_init_pages for FPI_ZEROED when safe
> mm: add put_page_zeroed and folio_put_zeroed
> mm: use __GFP_ZERO in alloc_anon_folio
> mm: vma_alloc_anon_folio_pmd: pass raw fault address to
> vma_alloc_folio
> mm: use __GFP_ZERO in vma_alloc_anon_folio_pmd
> mm: hugetlb: add gfp parameter and skip zeroing for zeroed pages
> mm: memfd: skip zeroing for zeroed hugetlb pool pages
> mm: page_reporting: add per-page zeroed bitmap for host feedback
> virtio_balloon: submit reported pages as individual buffers
> virtio_balloon: disable indirect descriptors
> mm: page_reporting: add flush parameter with page budget
> virtio_balloon: skip zeroing for host-zeroed reported pages
> virtio_balloon: disable reporting zeroed optimization for confidential
> guests
> mm: balloon: use put_page_zeroed for zeroed balloon pages
> virtio_balloon: implement VIRTIO_BALLOON_F_DEVICE_INIT_ON_INFLATE
>
> arch/alpha/include/asm/page.h | 3 -
> arch/m68k/include/asm/page_no.h | 3 -
> arch/s390/include/asm/page.h | 3 -
> arch/x86/include/asm/page.h | 3 -
> drivers/virtio/virtio_balloon.c | 177 ++++++++++++++---
> fs/hugetlbfs/inode.c | 3 +-
> include/linux/cma.h | 3 +-
> include/linux/gfp.h | 18 +-
> include/linux/highmem.h | 15 +-
> include/linux/hugetlb.h | 18 +-
> include/linux/mm.h | 13 ++
> include/linux/page-flags.h | 11 ++
> include/linux/page_reporting.h | 13 ++
> include/uapi/linux/virtio_balloon.h | 2 +
> mm/balloon.c | 10 +-
> mm/cma.c | 6 +-
> mm/compaction.c | 9 +-
> mm/folio_zero.h | 18 ++
> mm/huge_memory.c | 16 +-
> mm/hugetlb.c | 138 ++++++++-----
> mm/hugetlb_cma.c | 4 +-
> mm/internal.h | 22 ++-
> mm/memfd.c | 14 +-
> mm/memory-failure.c | 10 +
> mm/memory.c | 19 +-
> mm/mempolicy.c | 75 +++----
> mm/mmap.c | 6 +
> mm/page_alloc.c | 297 +++++++++++++++++++++++-----
> mm/page_reporting.c | 99 ++++++++--
> mm/page_reporting.h | 12 ++
> mm/slub.c | 4 +-
> mm/swap.c | 20 +-
> 32 files changed, 792 insertions(+), 272 deletions(-)
> create mode 100644 mm/folio_zero.h
>
> --
> MST
>
next prev parent reply other threads:[~2026-06-08 9:17 UTC|newest]
Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-08 8:33 [PATCH v10 00/37] mm/virtio: skip redundant zeroing of host-zeroed pages Michael S. Tsirkin
2026-06-08 8:34 ` [PATCH v10 01/37] mm: mempolicy: fix interleave index calculation Michael S. Tsirkin
2026-06-08 9:43 ` Lorenzo Stoakes
2026-06-08 8:34 ` [PATCH v10 02/37] mm: memory-failure: serialize TestSetPageHWPoison with zone->lock Michael S. Tsirkin
2026-06-08 9:43 ` Lorenzo Stoakes
2026-06-08 13:48 ` Michael S. Tsirkin
2026-06-08 14:14 ` Lorenzo Stoakes
2026-06-08 8:34 ` [PATCH v10 03/37] mm: page_alloc: propagate PageReported flag across buddy splits Michael S. Tsirkin
2026-06-08 9:52 ` Lorenzo Stoakes
2026-06-08 12:50 ` Matthew Wilcox
2026-06-08 8:34 ` [PATCH v10 04/37] mm: page_reporting: allow driver to set batch capacity Michael S. Tsirkin
2026-06-08 8:34 ` [PATCH v10 05/37] mm: hugetlb: remove dead alloc_hugetlb_folio stub Michael S. Tsirkin
2026-06-08 9:56 ` Lorenzo Stoakes
2026-06-08 8:35 ` [PATCH v10 06/37] mm: move vma_alloc_folio_noprof to page_alloc.c Michael S. Tsirkin
2026-06-08 10:05 ` Lorenzo Stoakes
2026-06-08 8:35 ` [PATCH v10 07/37] mm: thread user_addr through page allocator for cache-friendly zeroing Michael S. Tsirkin
2026-06-08 10:23 ` Lorenzo Stoakes
2026-06-08 11:06 ` Lorenzo Stoakes
2026-06-08 13:04 ` Matthew Wilcox
2026-06-08 13:09 ` Lorenzo Stoakes
2026-06-08 14:26 ` David Hildenbrand (Arm)
2026-06-08 14:31 ` Matthew Wilcox
2026-06-08 14:37 ` David Hildenbrand (Arm)
2026-06-08 14:44 ` Matthew Wilcox
2026-06-08 14:55 ` David Hildenbrand (Arm)
2026-06-08 11:08 ` David Hildenbrand (Arm)
2026-06-08 15:27 ` Zi Yan
2026-06-08 8:35 ` [PATCH v10 08/37] mm: add alloc_contig_frozen_pages_user " Michael S. Tsirkin
2026-06-08 10:29 ` Lorenzo Stoakes
2026-06-08 8:35 ` [PATCH v10 09/37] mm: hugetlb: thread user_addr through gigantic page allocation Michael S. Tsirkin
2026-06-08 8:36 ` [PATCH v10 10/37] mm: add folio_zero_user stub for configs without THP/HUGETLBFS Michael S. Tsirkin
2026-06-08 9:12 ` Lorenzo Stoakes
2026-06-08 8:36 ` [PATCH v10 11/37] mm: page_alloc: move prep_compound_page before post_alloc_hook Michael S. Tsirkin
2026-06-08 10:33 ` Lorenzo Stoakes
2026-06-08 8:36 ` [PATCH v10 12/37] mm: use folio_zero_user for user pages in post_alloc_hook Michael S. Tsirkin
2026-06-08 11:23 ` Lorenzo Stoakes
2026-06-08 8:36 ` [PATCH v10 13/37] mm: use __GFP_ZERO in vma_alloc_zeroed_movable_folio Michael S. Tsirkin
2026-06-08 10:39 ` Lorenzo Stoakes
2026-06-08 10:55 ` Lorenzo Stoakes
2026-06-08 8:37 ` [PATCH v10 14/37] mm: remove arch vma_alloc_zeroed_movable_folio overrides Michael S. Tsirkin
2026-06-08 11:29 ` Lorenzo Stoakes
2026-06-08 8:37 ` [PATCH v10 15/37] mm: alloc_anon_folio: pass raw fault address to vma_alloc_folio Michael S. Tsirkin
2026-06-08 11:35 ` Lorenzo Stoakes
2026-06-08 8:37 ` [PATCH v10 16/37] mm: alloc_swap_folio: " Michael S. Tsirkin
2026-06-08 11:37 ` Lorenzo Stoakes
2026-06-08 8:37 ` [PATCH v10 17/37] mm: page_reporting: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
2026-06-08 12:00 ` Lorenzo Stoakes
2026-06-08 8:38 ` [PATCH v10 18/37] mm: page_alloc: use aliasing checks instead of user_alloc_needs_zeroing Michael S. Tsirkin
2026-06-08 11:39 ` Lorenzo Stoakes
2026-06-08 8:38 ` [PATCH v10 19/37] mm: page_alloc: clear PG_zeroed on buddy merge if not both zero Michael S. Tsirkin
2026-06-08 11:47 ` Lorenzo Stoakes
2026-06-08 8:38 ` [PATCH v10 20/37] mm: page_alloc: preserve PG_zeroed in page_del_and_expand Michael S. Tsirkin
2026-06-08 8:38 ` [PATCH v10 21/37] mm: page_alloc: propagate PG_zeroed in split_large_buddy Michael S. Tsirkin
2026-06-08 8:38 ` [PATCH v10 22/37] mm: add free_frozen_pages_zeroed Michael S. Tsirkin
2026-06-08 12:06 ` Lorenzo Stoakes
2026-06-08 8:38 ` [PATCH v10 23/37] mm: page_alloc: skip kernel_init_pages for FPI_ZEROED when safe Michael S. Tsirkin
2026-06-08 12:18 ` Lorenzo Stoakes
2026-06-08 8:38 ` [PATCH v10 24/37] mm: add put_page_zeroed and folio_put_zeroed Michael S. Tsirkin
2026-06-08 12:25 ` Lorenzo Stoakes
2026-06-08 12:46 ` David Hildenbrand (Arm)
2026-06-08 14:08 ` Michael S. Tsirkin
2026-06-08 14:28 ` David Hildenbrand (Arm)
2026-06-08 8:39 ` [PATCH v10 25/37] mm: use __GFP_ZERO in alloc_anon_folio Michael S. Tsirkin
2026-06-08 12:29 ` Lorenzo Stoakes
2026-06-08 8:39 ` [PATCH v10 26/37] mm: vma_alloc_anon_folio_pmd: pass raw fault address to vma_alloc_folio Michael S. Tsirkin
2026-06-08 12:30 ` Lorenzo Stoakes
2026-06-08 8:39 ` [PATCH v10 27/37] mm: use __GFP_ZERO in vma_alloc_anon_folio_pmd Michael S. Tsirkin
2026-06-08 12:32 ` Lorenzo Stoakes
2026-06-08 8:39 ` [PATCH v10 28/37] mm: hugetlb: add gfp parameter and skip zeroing for zeroed pages Michael S. Tsirkin
2026-06-08 12:44 ` Lorenzo Stoakes
2026-06-08 8:39 ` [PATCH v10 29/37] mm: memfd: skip zeroing for zeroed hugetlb pool pages Michael S. Tsirkin
2026-06-08 12:47 ` Lorenzo Stoakes
2026-06-08 8:39 ` [PATCH v10 30/37] mm: page_reporting: add per-page zeroed bitmap for host feedback Michael S. Tsirkin
2026-06-08 8:39 ` [PATCH v10 31/37] virtio_balloon: submit reported pages as individual buffers Michael S. Tsirkin
2026-06-08 8:40 ` [PATCH v10 32/37] virtio_balloon: disable indirect descriptors Michael S. Tsirkin
2026-06-08 8:40 ` [PATCH v10 33/37] mm: page_reporting: add flush parameter with page budget Michael S. Tsirkin
2026-06-08 8:40 ` [PATCH v10 34/37] virtio_balloon: skip zeroing for host-zeroed reported pages Michael S. Tsirkin
2026-06-08 8:40 ` [PATCH v10 35/37] virtio_balloon: disable reporting zeroed optimization for confidential guests Michael S. Tsirkin
2026-06-08 8:40 ` [PATCH v10 36/37] mm: balloon: use put_page_zeroed for zeroed balloon pages Michael S. Tsirkin
2026-06-08 11:10 ` David Hildenbrand (Arm)
2026-06-08 8:40 ` [PATCH v10 37/37] virtio_balloon: implement VIRTIO_BALLOON_F_DEVICE_INIT_ON_INFLATE Michael S. Tsirkin
2026-06-08 9:17 ` Lorenzo Stoakes [this message]
2026-06-08 12:52 ` [PATCH v10 00/37] mm/virtio: skip redundant zeroing of host-zeroed pages Lorenzo Stoakes
2026-06-08 11:02 ` Vlastimil Babka (SUSE)
2026-06-08 11:13 ` Vlastimil Babka (SUSE)
2026-06-08 15:45 ` Gregory Price
2026-06-08 14:21 ` Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aiaHd3T42XyB3UBn@lucifer \
--to=ljs@kernel.org \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=axelrasmussen@google.com \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=bhe@redhat.com \
--cc=byungchul@sk.com \
--cc=chrisl@kernel.org \
--cc=cl@gentwo.org \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=eperezma@redhat.com \
--cc=gourry@gourry.net \
--cc=hannes@cmpxchg.org \
--cc=harry.yoo@oracle.com \
--cc=hughd@google.com \
--cc=jackmanb@google.com \
--cc=jasowang@redhat.com \
--cc=joshua.hahnjy@gmail.com \
--cc=kasong@tencent.com \
--cc=lance.yang@linux.dev \
--cc=liam@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=matthew.brost@intel.com \
--cc=mhocko@suse.com \
--cc=mst@redhat.com \
--cc=muchun.song@linux.dev \
--cc=npache@redhat.com \
--cc=nphamcs@gmail.com \
--cc=osalvador@suse.de \
--cc=rakie.kim@sk.com \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=shikemeng@huaweicloud.com \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=virtualization@lists.linux.dev \
--cc=weixugc@google.com \
--cc=xuanzhuo@linux.alibaba.com \
--cc=ying.huang@linux.alibaba.com \
--cc=yuanchu@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox