Re: [PATCH v10 00/37] mm/virtio: skip redundant zeroing of host-zeroed pages

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Lorenzo Stoakes <ljs@kernel.org>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: linux-kernel@vger.kernel.org,
	"David Hildenbrand (Arm)" <david@kernel.org>,
	"Jason Wang" <jasowang@redhat.com>,
	"Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
	"Eugenio Pérez" <eperezma@redhat.com>,
	"Muchun Song" <muchun.song@linux.dev>,
	"Oscar Salvador" <osalvador@suse.de>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Liam R. Howlett" <liam@infradead.org>,
	"Vlastimil Babka" <vbabka@kernel.org>,
	"Mike Rapoport" <rppt@kernel.org>,
	"Suren Baghdasaryan" <surenb@google.com>,
	"Michal Hocko" <mhocko@suse.com>,
	"Brendan Jackman" <jackmanb@google.com>,
	"Johannes Weiner" <hannes@cmpxchg.org>, "Zi Yan" <ziy@nvidia.com>,
	"Baolin Wang" <baolin.wang@linux.alibaba.com>,
	"Nico Pache" <npache@redhat.com>,
	"Ryan Roberts" <ryan.roberts@arm.com>,
	"Dev Jain" <dev.jain@arm.com>, "Barry Song" <baohua@kernel.org>,
	"Lance Yang" <lance.yang@linux.dev>,
	"Hugh Dickins" <hughd@google.com>,
	"Matthew Brost" <matthew.brost@intel.com>,
	"Joshua Hahn" <joshua.hahnjy@gmail.com>,
	"Rakie Kim" <rakie.kim@sk.com>,
	"Byungchul Park" <byungchul@sk.com>,
	"Gregory Price" <gourry@gourry.net>,
	"Ying Huang" <ying.huang@linux.alibaba.com>,
	"Alistair Popple" <apopple@nvidia.com>,
	"Christoph Lameter" <cl@gentwo.org>,
	"David Rientjes" <rientjes@google.com>,
	"Roman Gushchin" <roman.gushchin@linux.dev>,
	"Harry Yoo" <harry.yoo@oracle.com>,
	"Axel Rasmussen" <axelrasmussen@google.com>,
	"Yuanchu Xie" <yuanchu@google.com>, "Wei Xu" <weixugc@google.com>,
	"Chris Li" <chrisl@kernel.org>,
	"Kairui Song" <kasong@tencent.com>,
	"Kemeng Shi" <shikemeng@huaweicloud.com>,
	"Nhat Pham" <nphamcs@gmail.com>, "Baoquan He" <bhe@redhat.com>,
	virtualization@lists.linux.dev, linux-mm@kvack.org,
	"Andrea Arcangeli" <aarcange@redhat.com>
Subject: Re: [PATCH v10 00/37] mm/virtio: skip redundant zeroing of host-zeroed pages
Date: Mon, 8 Jun 2026 10:17:34 +0100	[thread overview]
Message-ID: <aiaHd3T42XyB3UBn@lucifer> (raw)
In-Reply-To: <cover.1780906288.git.mst@redhat.com>

On Mon, Jun 08, 2026 at 04:33:46AM -0400, Michael S. Tsirkin wrote:
> Further, on architectures with aliasing caches, upstream with init_on_alloc
> double-zeros user pages: once via kernel_init_pages() in
> post_alloc_hook, and again via clear_user_highpage() at the
> callsite (because user_alloc_needs_zeroing() returns true).
> This series eliminates that double-zeroing by moving the zeroing
> into the post_alloc_hook + propagating the "host
> already zeroed this page" information through the buddy allocator.
>
> For page reporting, VIRTIO_BALLOON_F_DEVICE_INIT_REPORTED (bit 6)
> is used. For the inflate/deflate path,
> VIRTIO_BALLOON_F_DEVICE_INIT_ON_INFLATE (bit 7) is used.
>
> Virtio spec: https://lore.kernel.org/all/cover.1778140241.git.mst@redhat.com
>
> Based on v7.1-rc6.  When applying on mm-unstable, two conflicts
> are expected:
> - kernel_init_pages() was renamed to clear_highpages_kasan_tagged()
>   in mm-unstable.  Use clear_highpages_kasan_tagged() in the
>   post_alloc_hook else branch.
> - FPI_PREPARED uses BIT(3) in mm-unstable.  Bump FPI_ZEROED to
>   BIT(4).
> Build-tested on mm-unstable at e9dd96806dbc:
> https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git zero-mm-unstable
>
> Patches 1-5: fixes/cleanups, dependencies of the zeroing patches.
> Patches 6-9: thread user_addr through page allocator, contig API,
>   and gigantic hugetlb allocation.
> Patches 10-16: folio_zero_user in post_alloc_hook, vma_alloc_zeroed
>   conversion, raw fault address threading.
> Patches 17-24: PG_zeroed flag, aliasing guard, buddy merge/split
>   tracking, FPI_ZEROED optimization, folio_put_zeroed.
> Patches 25-27: __GFP_ZERO callsite conversions (alloc_anon_folio,
>   vma_alloc_anon_folio_pmd) with memcg charge failure mitigation.
> Patches 28-29: hugetlb __GFP_ZERO + HPG_zeroed.
> Patches 30-35: page reporting zeroing (DEVICE_INIT_REPORTED),
>   disable indirect descriptors.
> Patches 36-37: inflate/deflate zeroing (DEVICE_INIT_ON_INFLATE).

This seems far too much for one series.

YOu're doing a bunch of mm stuff that seems relatively independent, then
putting the virtio stuff on top.

I think this should be broken out into separate series laying foundations
rather than doing it all in one go, which is also difficult for review
purposes.

Adding a new folio flag is contentious also for instance, we maybe want to
go bit-by-bit and ensure that each foundational element is acceptable
before doing the next bit rather than having it as part of a big series.

Looking through the changelog only adds to this feeling! Huge numbers of
changes, even relatively recently and I'm not sure all relevant maintainers
in mm have had a look through either.

Thanks, Lorenzo

>
> -------
>
> Performance with THP enabled on a 2GB VM, 1 vCPU, allocating
> 256MB of anonymous pages:
>
>   metric         baseline            optimized           delta
>   task-clock     232 +- 20 ms        51 +- 26 ms         -78%
>   cache-misses   1.20M +- 248K       288K +- 102K        -76%
>   instructions   16.3M +- 1.2M       13.8M +- 1.0M       -15%
>
> With hugetlb surplus pages:
>
>   metric         baseline            optimized           delta
>   task-clock     219 +- 23 ms        65 +- 34 ms         -70%
>   cache-misses   1.17M +- 391K       263K +- 36K         -78%
>   instructions   17.9M +- 1.2M       15.1M +- 724K       -16%
>
> Two flags track known-zero pages:
>   PG_zeroed (aliased to PG_private) marks buddy allocator pages that
>   are known to contain all zeros, either because the host zeroed
>   them during page reporting, or because they were freed via the
>   balloon deflate path.  It lives on free-list pages and is consumed
>   by post_alloc_hook() on allocation.
>   HPG_zeroed (stored in hugetlb folio->private bits) serves the same
>   purpose for hugetlb pool pages, which are kept in a pool and may
>   be zeroed long after buddy allocation, so PG_zeroed (consumed at
>   allocation time) cannot track their state.
>
> PG_zeroed lifecycle:
>
>   Sets PG_zeroed:
>   - page_reporting_drain: on reported pages when host zeroes them
>   - __free_pages_ok / __free_frozen_pages: when FPI_ZEROED is set
>     (balloon deflate path)
>   - buddy merge: on merged page if both buddies were zeroed
>   - expand(): propagate to split-off buddy sub-pages
>
>   Clears PG_zeroed:
>   - __free_pages_prepare: clears all PAGE_FLAGS_CHECK_AT_PREP flags
>     (PG_zeroed included), preventing PG_private aliasing leaks
>   - rmqueue_buddy / __rmqueue_pcplist: read-then-clear, passes
>     zeroed hint to prep_new_page -> post_alloc_hook
>   - __isolate_free_page: clear (compaction/page_reporting isolation)
>   - compaction, alloc_contig, split_free_frozen: clear before use
>   - buddy merge: clear both pages before merge, then conditionally
>     re-set on merged head if both were zeroed
>
> HPG_zeroed lifecycle (hugetlb pool pages, stored in folio->private):
>
>   Sets HPG_zeroed:
>   - alloc_surplus_hugetlb_folio: after buddy allocation with
>     __GFP_ZERO, mark pool page as known-zero
>
>   Clears HPG_zeroed:
>   - free_huge_folio: page was mapped to userspace, no longer
>     known-zero when it returns to the pool
>   - alloc_hugetlb_folio: cleared unconditionally on output
>   - alloc_hugetlb_folio_reserve: cleared after checking
>
> - The optimization is most effective with THP, where entire 2MB
>   pages are allocated directly from reported order-9+ buddy pages.
>   Without THP, only ~21% of order-0 allocations come from reported
>   pages due to low-order fragmentation.
> - Persistent hugetlb pool pages are not covered: when freed by
>   userspace they return to the hugetlb free pool, not the buddy
>   allocator, so they are never reported to the host.  Surplus
>   hugetlb pages are allocated from buddy and do benefit.
>
> - PG_zeroed is aliased to PG_private.  __free_pages_prepare() clears it
>   (preventing filesystem PG_private from leaking as false PG_zeroed).
>   FPI_ZEROED re-sets it after prepare for balloon deflate pages.
>   Is aliasing PG_private acceptable, or should a different bit be used?
>
> - With __GFP_ZERO, the folio is zeroed before mem_cgroup_charge().
>   If the charge fails (cgroup at limit), the zeroing work is wasted
>   and the folio is freed and retried at a smaller order.  Previously,
>   zeroing was done after a successful charge.  This is inherent to
>   the __GFP_ZERO approach.  Is this acceptable?
>
> - On architectures with aliasing caches, upstream with init_on_alloc
>   double-zeros user pages: once via kernel_init_pages() in
>   post_alloc_hook, and again via clear_user_highpage() at the
>   callsite (because user_alloc_needs_zeroing() returns true).
>   Our patches eliminate this by zeroing once via folio_zero_user()
>   in post_alloc_hook.  Not a critical fix (people who set init_on_alloc
>   know they are paying performance) but a nice cleanup anyway.
>
> Test program:
>
>   #include <stdio.h>
>   #include <stdlib.h>
>   #include <string.h>
>   #include <sys/mman.h>
>
>   #ifndef MADV_POPULATE_WRITE
>   #define MADV_POPULATE_WRITE 23
>   #endif
>   #ifndef MAP_HUGETLB
>   #define MAP_HUGETLB 0x40000
>   #endif
>
>   int main(int argc, char **argv)
>   {
>       unsigned long size;
>       int flags = MAP_PRIVATE | MAP_ANONYMOUS;
>       void *p;
>       int r;
>
>       if (argc < 2) {
>           fprintf(stderr, "usage: %s <size_mb> [huge]\n", argv[0]);
>           return 1;
>       }
>       size = atol(argv[1]) * 1024UL * 1024;
>       if (argc >= 3 && strcmp(argv[2], "huge") == 0)
>           flags |= MAP_HUGETLB;
>       p = mmap(NULL, size, PROT_READ | PROT_WRITE, flags, -1, 0);
>       if (p == MAP_FAILED) {
>           perror("mmap");
>           return 1;
>       }
>       r = madvise(p, size, MADV_POPULATE_WRITE);
>       if (r) {
>           perror("madvise");
>           return 1;
>       }
>       munmap(p, size);
>       return 0;
>   }
>
> Test script (bench.sh):
>
>   #!/bin/bash
>   # Usage: bench.sh <size_mb> <iterations> [huge]
>   # Feature negotiation (DEVICE_INIT_REPORTED/ON_INFLATE) is
>   # handled by QEMU command line flags,
>   SZ=${1:-256}; ITER=${2:-10}; HUGE=${3:-}
>   FLUSH=/sys/module/page_reporting/parameters/flush
>   CSV=/tmp/perf.csv
>   rmmod virtio_balloon 2>/dev/null
>   insmod /mnt/share/virtio_balloon.ko
>   echo 512 > $FLUSH
>   [ "$HUGE" = "huge" ] && echo $((SZ/2)) > /proc/sys/vm/nr_overcommit_hugepages
>   rm -f $CSV
>   echo "=== sz=${SZ}MB iter=$ITER $HUGE ==="
>   for i in $(seq 1 $ITER); do
>       echo 3 > /proc/sys/vm/drop_caches
>       echo 512 > $FLUSH
>       perf stat -e task-clock,instructions,cache-misses \
>           -x, -o $CSV --append -- /mnt/share/alloc_once $SZ $HUGE
>   done
>   [ "$HUGE" = "huge" ] && echo 0 > /proc/sys/vm/nr_overcommit_hugepages
>   rmmod virtio_balloon
>   awk -F, '/^#/||/^$/{next}{v=$1+0;e=$3;gsub(/ /,"",e);s[e]+=v;ss[e]+=v*v;n[e]++}
>   END{for(e in s){a=s[e]/n[e];d=sqrt(ss[e]/n[e]-a*a);printf "  %-16s %10.0f +- %8.0f (n=%d)\n",e,a,d,n[e]}}' $CSV
>
> Compile and run:
>   gcc -static -O2 -o alloc_once alloc_once.c
>   bash bench.sh 256 10            # regular pages
>   bash bench.sh 256 10 huge       # hugetlb surplus
>
> Note about Sashiko (sashiko.dev) false positives:
>   Sashiko's mm-alloc guideline says "Any optimization replacing
>   clear_user_highpage() with __GFP_ZERO is wrong on [cache-aliasing]
>   architectures". This is correct for mainline but not for this
>   series, which threads user_addr through the allocator so that
>   post_alloc_hook() calls folio_zero_user() with the dcache flush.
>   Suggested guideline update: add "unless the caller passes a
>   valid user address (i.e. not USER_ADDR_NONE) to vma_alloc_folio(),
>   alloc_contig_frozen_pages_user() etc., which reaches
>   post_alloc_hook() for the dcache flush".
>
> Pre-existing bugs found during review (not fixed, not made worse):
>   - do_swap_page() returns VM_FAULT_OOM on large-folio swapin race
>     instead of retrying.
>   - free_huge_folio() called with refcount==1 on
>     mem_cgroup_charge_hugetlb failure.
>   - memfd_alloc_folio() double-decrements resv_huge_pages on error.
>   - wait_event in virtballoon_free_page_report hangs on broken
>     virtqueue (pre-existing, same as old single-buffer code).
>   - tell_host() GFP_KERNEL under balloon_lock risks OOM deadlock.
>
> Changes since v9:
> - Fix W=1 kerneldoc warning on alloc_contig_frozen_pages_user_noprof.
> - Fix link error on !MMU configs (m68k, arm allnoconfig): move
>   folio_zero_user stub to new mm/folio_zero.h header.
> - Reorder patches: move PG_zeroed tracking and folio_put_zeroed
>   before __GFP_ZERO conversions, allowing folio_put_zeroed to
>   handle memcg charge failures.
> - Better handle memcg charge failures.
>
> Changes since v8 (address Sashiko v8 review findings):
> - Fix mempolicy interleave: combine vm_pgoff and VMA offset into
>   a single expression before shifting, fixing carry loss for
>   file-backed VMAs with unaligned vm_pgoff.
> - Fix memory-failure: wrap ClearPageHWPoison in retry path with
>   zone->lock (same race as TestSetPageHWPoison).
> - Fix stale comment: "folio_zero_user writes" -> "page zeroing"
>   in huge_memory.c __folio_mark_uptodate comment.
> - Drop rounddown_pow_of_two for page reporting capacity (no-op
>   for compiler optimization, halves batch size for non-power-of-2).
> - Reorder: move "mm: balloon: use put_page_zeroed" before
>   "virtio_balloon: implement DEVICE_INIT_ON_INFLATE" so the
>   ClearPageZeroed handling is in place before any page gets
>   the flag set.
> - Various commit log improvements (PowerPC note in aliasing
>   patch, memory-failure note about other HWPoison calls,
>   wording fixes).
>
> Changes since v7 (address Sashiko AI review findings):
> - Fix dcache flush on VIPT aliasing architectures: add
>   user_alloc_needs_zeroing() guard in post_alloc_hook to force
>   folio_zero_user for user pages when cache aliasing requires it.
>   Host-zeroed pages excluded (!zeroed).  Optimization preserved.
> - Fix folio_zero_user stub: replace macro with non-inline function
>   in mm/memory.c to avoid double-evaluation and missing include.
> - Fix C89 declaration-after-statement in free_huge_folio.
> - Fix CMA __GFP_ZERO: pass through to cma_alloc_frozen_compound
>   so HPG_zeroed accurately reflects whether page was zeroed.
> - Fix big-endian bitmap: use test_bit_le() for inflate_bitmap.
> - Fix migratepage: clear PageZeroed on old page before deflation.
> - Fix page_reporting flush: overflow-safe loop, add -EINTR on
>   signal, add code comment explaining double flush_delayed_work.
> - Add atomic ClearPageZeroed (CLEARPAGEFLAG) for balloon migration
>   path where zone->lock is not held.
> - Add VM_WARN_ON_ONCE for order>0 without __GFP_COMP in
>   post_alloc_hook (folio_zero_user requires compound metadata).
> - Add _noprof pattern for vma_alloc_zeroed_movable_folio to
>   preserve memory allocation profiling attribution.
> - Add PageReported propagation in split_large_buddy (was missing
>   from patch 2).
> - Add FPI_ZEROED guard: skip PageZeroed when page_poisoning
>   enabled and init_on_free disabled (poison overwrites zeroes).
> - Add DMA alignment comment for inflate_bitmap (ACCESS_PLATFORM
>   cleared, so not needed now).
> - Restore tell_host comment explaining vq buffer assumption.
> - Various code comments documenting design decisions.
> - Drop __GFP_ZERO from gather_surplus_pages: avoid shifting
>   zeroing from fault time to reservation time (mmap/fallocate).
>   Pool pages are zeroed at fault time via alloc_hugetlb_folio.
>   Fresh surplus allocations at fault time still benefit from
>   __GFP_ZERO + HPG_zeroed.
> - New patch: add alloc_contig_frozen_pages_user API with user_addr
>   for cache-friendly zeroing in the contiguous allocation path.
> - New patch: thread user_addr through gigantic hugetlb allocation
>   via alloc_contig_frozen_pages_user.
> - New patch: replace user_alloc_needs_zeroing() with aliasing-only
>   checks (cpu_dcache_is_aliasing || cpu_icache_is_aliasing) in the
>   post_alloc_hook guard.  Avoids redundant re-zero on non-aliasing.
> - New patch: serialize TestSetPageHWPoison with zone->lock in
>   memory_failure to fix pre-existing race with non-atomic buddy
>   flag operations (e.g. page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP).
> - New patch: disable VIRTIO_RING_F_INDIRECT_DESC in balloon to
>   prevent GFP_KERNEL allocation under balloon_lock (OOM deadlock).
> - New patch: skip kernel_init_pages for FPI_ZEROED when page
>   poisoning is not enabled (page already zero, skip redundant work).
>
> Also since v7 (address review by Gregory Price):
> - Drop from_pool bool in alloc_hugetlb_folio: use
>   folio_test_hugetlb_zeroed directly.  HPG_zeroed is set by
>   alloc_surplus_hugetlb_folio for fresh allocations, so the
>   check handles both pool and fresh pages.
> - Drop bool *zeroed output parameter from alloc_hugetlb_folio:
>   sink zeroing inside the function.  When __GFP_ZERO is set and
>   !folio_test_hugetlb_zeroed, call folio_zero_user internally.
> - Rename addr to user_addr in alloc_hugetlb_folio, align
>   internally with huge_page_mask.
> - Add Reviewed-by: Gregory Price tags on reviewed patches.
>
> New patches since v7:
> - mm: memory-failure: serialize TestSetPageHWPoison with zone->lock
> - mm: add alloc_contig_frozen_pages_user for cache-friendly zeroing
> - mm: hugetlb: thread user_addr through gigantic page allocation
> - mm: page_alloc: use aliasing checks instead of
>   user_alloc_needs_zeroing
> - virtio_balloon: disable indirect descriptors
> - mm: page_alloc: skip kernel_init_pages for FPI_ZEROED when safe
>
> Changes since v6 (address review by Gregory Price):
> - Rework hugetlb: use gfp_t parameter instead of bool zero /
>   bool *zeroed.  Sink zeroing inside alloc_hugetlb_folio().
>   Pass raw fault address (user_addr) for cache-friendly zeroing
>   on both pool-page and fresh allocation
>   paths.  (Suggested by Gregory Price)
> - Reorder compaction_alloc_noprof() to call prep_compound_page
>   before post_alloc_hook for consistency.
>   (Suggested by Gregory Price)
> - Reorder: interleave fix first, PageReported propagation and
>   capacity fix moved to front as dependencies.
> - Add USER_ADDR_NONE comments in mmap.c and internal.h explaining why -1 is
>   never a valid userspace address.
> - Fix err uninitialized warning in virtballoon_free_page_report().
> - Lots of commit log tweaks.
>
> Also in v7:
> - Fix hugetlb pool page zeroing to use vmf->real_address
>   (the actual faulting subpage) instead of vmf->address
>   (hugepage-aligned), preserving cache-friendly zeroing
>   locality that upstream had at the callsite.
> - Remove dead/broken alloc_hugetlb_folio !CONFIG_HUGETLB_PAGE
>   stub (returned NULL but callers check IS_ERR).
>
> Changes since v5:
> - Rebased onto v7.1-rc2.
> - Split alloc_anon_folio and alloc_swap_folio raw fault address
>   changes into separate patches.
> - In virtio, move PAGE_POISON check for DEVICE_INIT_REPORTED
>   from probe() to validate(), clearing the feature instead of
>   just gating host_zeroes_pages.  Same for confidential
>   computing check.
> - Fix bisectability: FPI_ZEROED definition and usage now in
>   the same patch.
> - Lots of commit log tweaks.
> - Reorder: REPORTED before ON_INFLATE.
> - Kerneldoc fixes.
>
> Changes since v4:
> With virtio spec posted, update to latest spec:
> - Add VIRTIO_BALLOON_F_DEVICE_INIT_REPORTED (bit 6) for reporting.
> - Add VIRTIO_BALLOON_F_DEVICE_INIT_ON_INFLATE (bit 7) for inflate.
> - Per-page virtqueue submission, per-page used_len feedback.
> - Balloon migration preserves PageZeroed hint.
> - Page_reporting capacity bugfix for small virtqueues.
> - PG_zeroed propagation in split_large_buddy.
> - Disable both features for confidential computing guests.
> - Gate host_zeroes_pages on PAGE_POISON/poison_val: when PAGE_POISON
>   is negotiated with non-zero poison_val, device fills with poison
>   not zeros, so host_zeroes_pages must be false.
> - Disable ON_INFLATE when PAGE_POISON with non-zero poison_val.
> - Bound inflate bitmap reads by used_len from device.
> - Move ON_INFLATE poison_val check to validate() for proper
>   feature negotiation.
> - Fix NUMA interleave index for unaligned VMA start (new patch 1).
> - Drop vma_alloc_folio_user_addr: with the ilx fix, callers can
>   pass raw fault address to vma_alloc_folio directly.
> - Tested with DEBUG_VM, INIT_ON_ALLOC/FREE enabled.
>
> Changes since v3 (address review by Gregory Price and David Hildenbrand):
> - Keep user_addr threading internal: public APIs (__alloc_pages,
>   __folio_alloc, folio_alloc_mpol) are unchanged.  Only internal
>   functions (__alloc_frozen_pages_noprof, __alloc_pages_mpol) carry
>   user_addr.  This eliminates all API churn for external callers.
> - Add vma_alloc_folio_user_addr() (2/22) to separate NUMA policy
>   address from the zeroing hint address.  Fixes NUMA interleave
>   index corruption when passing unaligned fault address for
>   higher-order allocations.
> - Add per-page zeroed_bitmap to page_reporting_dev_info (17/22).
>   The driver's report() callback manages the bitmap.  Drain
>   checks it gated by the host_zeroes_pages static key.  This
>   matches the proposed virtio balloon extension at
>   https://lore.kernel.org/all/cover.1776874126.git.mst@redhat.com/
> - Clear PG_zeroed in __isolate_free_page() to prevent the aliased
>   PG_private flag from leaking to compaction/alloc_contig paths.
> - Do not exclude PG_zeroed from PAGE_FLAGS_CHECK_AT_PREP macro.
>   Instead, __free_pages_prepare() clears it (preventing filesystem
>   PG_private leaking as false PG_zeroed), and FPI_ZEROED sets it
>   after prepare.  Only buddy merge assertion is relaxed.
> - Initialize alloc_context.user_addr in alloc_pages_bulk_noprof.
> - Deflate and hugetlb changes are much smaller now.  Still, the
>   patchset can be merged gradually, if desired.
>
> Changes since v2 (address review by Gregory Price and David Hildenbrand):
> - v2 used pghint_t / vma_alloc_folio_hints API.  v3 switches to
>   threading user_addr through the page allocator and using __GFP_ZERO,
>   so post_alloc_hook() can use folio_zero_user() for cache-friendly
>   zeroing when the user fault address is known.
> - Use FPI_ZEROED to set PG_zeroed after __free_pages_prepare() instead
>   of runtime masking in __free_one_page (further refined in v4).
> - Drop redundant page_poisoning_enabled() check from mm core free
>   path, already guarded at feature negotiation time in
>   virtio_balloon_validate.  The balloon driver keeps its own
>   page_poisoning_enabled_static() check as defense in depth.
> - Split free_frozen_pages_zeroed and put_page_zeroed into separate
>   patches.  David Hildenbrand indicated he intends to rework balloon
>   pages to be frozen (no refcount), at which point put_page_zeroed
>   (21/22) can be dropped and the balloon can call
>   free_frozen_pages_zeroed directly.
> - Use HPG_zeroed flag (in hugetlb folio->private) for hugetlb pool
>   pages instead of PG_zeroed, since pool pages are zeroed long after
>   buddy allocation and PG_zeroed is consumed at allocation time.
> - syzbot CI found a PF_NO_COMPOUND BUG in the v2 pghint_t approach
>   where __ClearPageZeroed was called on compound hugetlb pages in
>   free_huge_folio.  The v3 HPG_zeroed approach avoids this.
> - Remove redundant arch vma_alloc_zeroed_movable_folio overrides
>   on x86, s390, m68k, and alpha (12/22). Suggested by David
>   Hildenbrand.
> - Updated benchmarking script to compute per-run avg +- stddev
>   via awk on CSV output.
>
> Changes v1->v2:
> - Replaced __GFP_PREZEROED with PG_zeroed page flag (aliased PG_private)
> - Added pghint_t type and vma_alloc_folio_hints() API
> - Track PG_zeroed across buddy merges and splits
> - Added post_alloc_hook integration (single consume/clear point)
> - Added hugetlb support (pool pages + memfd)
> - Added page_reporting flush parameter for deterministic testing
> - Added free_frozen_pages_hint/put_page_hint for balloon deflate path
> - Added try_to_claim_block PG_zeroed preservation
> - Updated perf numbers with per-iteration flush methodology
>
> Written with assistance from Claude (claude-opus-4-6).
> Reviewed by cursor-agent (GPT-5.4-xhigh).
> Everything manually read, patchset split and commit logs edited manually.
>
>
> Michael S. Tsirkin (37):
>   mm: mempolicy: fix interleave index calculation
>   mm: memory-failure: serialize TestSetPageHWPoison with zone->lock
>   mm: page_alloc: propagate PageReported flag across buddy splits
>   mm: page_reporting: allow driver to set batch capacity
>   mm: hugetlb: remove dead alloc_hugetlb_folio stub
>   mm: move vma_alloc_folio_noprof to page_alloc.c
>   mm: thread user_addr through page allocator for cache-friendly zeroing
>   mm: add alloc_contig_frozen_pages_user for cache-friendly zeroing
>   mm: hugetlb: thread user_addr through gigantic page allocation
>   mm: add folio_zero_user stub for configs without THP/HUGETLBFS
>   mm: page_alloc: move prep_compound_page before post_alloc_hook
>   mm: use folio_zero_user for user pages in post_alloc_hook
>   mm: use __GFP_ZERO in vma_alloc_zeroed_movable_folio
>   mm: remove arch vma_alloc_zeroed_movable_folio overrides
>   mm: alloc_anon_folio: pass raw fault address to vma_alloc_folio
>   mm: alloc_swap_folio: pass raw fault address to vma_alloc_folio
>   mm: page_reporting: skip redundant zeroing of host-zeroed reported
>     pages
>   mm: page_alloc: use aliasing checks instead of
>     user_alloc_needs_zeroing
>   mm: page_alloc: clear PG_zeroed on buddy merge if not both zero
>   mm: page_alloc: preserve PG_zeroed in page_del_and_expand
>   mm: page_alloc: propagate PG_zeroed in split_large_buddy
>   mm: add free_frozen_pages_zeroed
>   mm: page_alloc: skip kernel_init_pages for FPI_ZEROED when safe
>   mm: add put_page_zeroed and folio_put_zeroed
>   mm: use __GFP_ZERO in alloc_anon_folio
>   mm: vma_alloc_anon_folio_pmd: pass raw fault address to
>     vma_alloc_folio
>   mm: use __GFP_ZERO in vma_alloc_anon_folio_pmd
>   mm: hugetlb: add gfp parameter and skip zeroing for zeroed pages
>   mm: memfd: skip zeroing for zeroed hugetlb pool pages
>   mm: page_reporting: add per-page zeroed bitmap for host feedback
>   virtio_balloon: submit reported pages as individual buffers
>   virtio_balloon: disable indirect descriptors
>   mm: page_reporting: add flush parameter with page budget
>   virtio_balloon: skip zeroing for host-zeroed reported pages
>   virtio_balloon: disable reporting zeroed optimization for confidential
>     guests
>   mm: balloon: use put_page_zeroed for zeroed balloon pages
>   virtio_balloon: implement VIRTIO_BALLOON_F_DEVICE_INIT_ON_INFLATE
>
>  arch/alpha/include/asm/page.h       |   3 -
>  arch/m68k/include/asm/page_no.h     |   3 -
>  arch/s390/include/asm/page.h        |   3 -
>  arch/x86/include/asm/page.h         |   3 -
>  drivers/virtio/virtio_balloon.c     | 177 ++++++++++++++---
>  fs/hugetlbfs/inode.c                |   3 +-
>  include/linux/cma.h                 |   3 +-
>  include/linux/gfp.h                 |  18 +-
>  include/linux/highmem.h             |  15 +-
>  include/linux/hugetlb.h             |  18 +-
>  include/linux/mm.h                  |  13 ++
>  include/linux/page-flags.h          |  11 ++
>  include/linux/page_reporting.h      |  13 ++
>  include/uapi/linux/virtio_balloon.h |   2 +
>  mm/balloon.c                        |  10 +-
>  mm/cma.c                            |   6 +-
>  mm/compaction.c                     |   9 +-
>  mm/folio_zero.h                     |  18 ++
>  mm/huge_memory.c                    |  16 +-
>  mm/hugetlb.c                        | 138 ++++++++-----
>  mm/hugetlb_cma.c                    |   4 +-
>  mm/internal.h                       |  22 ++-
>  mm/memfd.c                          |  14 +-
>  mm/memory-failure.c                 |  10 +
>  mm/memory.c                         |  19 +-
>  mm/mempolicy.c                      |  75 +++----
>  mm/mmap.c                           |   6 +
>  mm/page_alloc.c                     | 297 +++++++++++++++++++++++-----
>  mm/page_reporting.c                 |  99 ++++++++--
>  mm/page_reporting.h                 |  12 ++
>  mm/slub.c                           |   4 +-
>  mm/swap.c                           |  20 +-
>  32 files changed, 792 insertions(+), 272 deletions(-)
>  create mode 100644 mm/folio_zero.h
>
> --
> MST
>

next prev parent reply	other threads:[~2026-06-08  9:17 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-08  8:33 [PATCH v10 00/37] mm/virtio: skip redundant zeroing of host-zeroed pages Michael S. Tsirkin
2026-06-08  8:34 ` [PATCH v10 01/37] mm: mempolicy: fix interleave index calculation Michael S. Tsirkin
2026-06-08  9:43   ` Lorenzo Stoakes
2026-06-08  8:34 ` [PATCH v10 02/37] mm: memory-failure: serialize TestSetPageHWPoison with zone->lock Michael S. Tsirkin
2026-06-08  9:43   ` Lorenzo Stoakes
2026-06-08 13:48     ` Michael S. Tsirkin
2026-06-08 14:14       ` Lorenzo Stoakes
2026-06-08 16:20       ` Andrew Morton
2026-06-08  8:34 ` [PATCH v10 03/37] mm: page_alloc: propagate PageReported flag across buddy splits Michael S. Tsirkin
2026-06-08  9:52   ` Lorenzo Stoakes
2026-06-08 12:50     ` Matthew Wilcox
2026-06-08  8:34 ` [PATCH v10 04/37] mm: page_reporting: allow driver to set batch capacity Michael S. Tsirkin
2026-06-08  8:34 ` [PATCH v10 05/37] mm: hugetlb: remove dead alloc_hugetlb_folio stub Michael S. Tsirkin
2026-06-08  9:56   ` Lorenzo Stoakes
2026-06-08  8:35 ` [PATCH v10 06/37] mm: move vma_alloc_folio_noprof to page_alloc.c Michael S. Tsirkin
2026-06-08 10:05   ` Lorenzo Stoakes
2026-06-08  8:35 ` [PATCH v10 07/37] mm: thread user_addr through page allocator for cache-friendly zeroing Michael S. Tsirkin
2026-06-08 10:23   ` Lorenzo Stoakes
2026-06-08 11:06     ` Lorenzo Stoakes
2026-06-08 13:04       ` Matthew Wilcox
2026-06-08 13:09         ` Lorenzo Stoakes
2026-06-08 14:26           ` David Hildenbrand (Arm)
2026-06-08 14:31             ` Matthew Wilcox
2026-06-08 14:37               ` David Hildenbrand (Arm)
2026-06-08 14:44                 ` Matthew Wilcox
2026-06-08 14:55                   ` David Hildenbrand (Arm)
2026-06-08 11:08     ` David Hildenbrand (Arm)
2026-06-08 15:27       ` Zi Yan
2026-06-08  8:35 ` [PATCH v10 08/37] mm: add alloc_contig_frozen_pages_user " Michael S. Tsirkin
2026-06-08 10:29   ` Lorenzo Stoakes
2026-06-08  8:35 ` [PATCH v10 09/37] mm: hugetlb: thread user_addr through gigantic page allocation Michael S. Tsirkin
2026-06-08  8:36 ` [PATCH v10 10/37] mm: add folio_zero_user stub for configs without THP/HUGETLBFS Michael S. Tsirkin
2026-06-08  9:12   ` Lorenzo Stoakes
2026-06-08  8:36 ` [PATCH v10 11/37] mm: page_alloc: move prep_compound_page before post_alloc_hook Michael S. Tsirkin
2026-06-08 10:33   ` Lorenzo Stoakes
2026-06-08  8:36 ` [PATCH v10 12/37] mm: use folio_zero_user for user pages in post_alloc_hook Michael S. Tsirkin
2026-06-08 11:23   ` Lorenzo Stoakes
2026-06-08 15:53     ` Gregory Price
2026-06-08  8:36 ` [PATCH v10 13/37] mm: use __GFP_ZERO in vma_alloc_zeroed_movable_folio Michael S. Tsirkin
2026-06-08 10:39   ` Lorenzo Stoakes
2026-06-08 10:55     ` Lorenzo Stoakes
2026-06-08  8:37 ` [PATCH v10 14/37] mm: remove arch vma_alloc_zeroed_movable_folio overrides Michael S. Tsirkin
2026-06-08 11:29   ` Lorenzo Stoakes
2026-06-08  8:37 ` [PATCH v10 15/37] mm: alloc_anon_folio: pass raw fault address to vma_alloc_folio Michael S. Tsirkin
2026-06-08 11:35   ` Lorenzo Stoakes
2026-06-08  8:37 ` [PATCH v10 16/37] mm: alloc_swap_folio: " Michael S. Tsirkin
2026-06-08 11:37   ` Lorenzo Stoakes
2026-06-08 15:59     ` Gregory Price
2026-06-08  8:37 ` [PATCH v10 17/37] mm: page_reporting: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
2026-06-08 12:00   ` Lorenzo Stoakes
2026-06-08 16:09     ` Gregory Price
2026-06-08  8:38 ` [PATCH v10 18/37] mm: page_alloc: use aliasing checks instead of user_alloc_needs_zeroing Michael S. Tsirkin
2026-06-08 11:39   ` Lorenzo Stoakes
2026-06-08  8:38 ` [PATCH v10 19/37] mm: page_alloc: clear PG_zeroed on buddy merge if not both zero Michael S. Tsirkin
2026-06-08 11:47   ` Lorenzo Stoakes
2026-06-08  8:38 ` [PATCH v10 20/37] mm: page_alloc: preserve PG_zeroed in page_del_and_expand Michael S. Tsirkin
2026-06-08  8:38 ` [PATCH v10 21/37] mm: page_alloc: propagate PG_zeroed in split_large_buddy Michael S. Tsirkin
2026-06-08  8:38 ` [PATCH v10 22/37] mm: add free_frozen_pages_zeroed Michael S. Tsirkin
2026-06-08 12:06   ` Lorenzo Stoakes
2026-06-08  8:38 ` [PATCH v10 23/37] mm: page_alloc: skip kernel_init_pages for FPI_ZEROED when safe Michael S. Tsirkin
2026-06-08 12:18   ` Lorenzo Stoakes
2026-06-08  8:38 ` [PATCH v10 24/37] mm: add put_page_zeroed and folio_put_zeroed Michael S. Tsirkin
2026-06-08 12:25   ` Lorenzo Stoakes
2026-06-08 12:46     ` David Hildenbrand (Arm)
2026-06-08 14:08       ` Michael S. Tsirkin
2026-06-08 14:28         ` David Hildenbrand (Arm)
2026-06-08  8:39 ` [PATCH v10 25/37] mm: use __GFP_ZERO in alloc_anon_folio Michael S. Tsirkin
2026-06-08 12:29   ` Lorenzo Stoakes
2026-06-08  8:39 ` [PATCH v10 26/37] mm: vma_alloc_anon_folio_pmd: pass raw fault address to vma_alloc_folio Michael S. Tsirkin
2026-06-08 12:30   ` Lorenzo Stoakes
2026-06-08  8:39 ` [PATCH v10 27/37] mm: use __GFP_ZERO in vma_alloc_anon_folio_pmd Michael S. Tsirkin
2026-06-08 12:32   ` Lorenzo Stoakes
2026-06-08  8:39 ` [PATCH v10 28/37] mm: hugetlb: add gfp parameter and skip zeroing for zeroed pages Michael S. Tsirkin
2026-06-08 12:44   ` Lorenzo Stoakes
2026-06-08  8:39 ` [PATCH v10 29/37] mm: memfd: skip zeroing for zeroed hugetlb pool pages Michael S. Tsirkin
2026-06-08 12:47   ` Lorenzo Stoakes
2026-06-08  8:39 ` [PATCH v10 30/37] mm: page_reporting: add per-page zeroed bitmap for host feedback Michael S. Tsirkin
2026-06-08  8:39 ` [PATCH v10 31/37] virtio_balloon: submit reported pages as individual buffers Michael S. Tsirkin
2026-06-08  8:40 ` [PATCH v10 32/37] virtio_balloon: disable indirect descriptors Michael S. Tsirkin
2026-06-08  8:40 ` [PATCH v10 33/37] mm: page_reporting: add flush parameter with page budget Michael S. Tsirkin
2026-06-08  8:40 ` [PATCH v10 34/37] virtio_balloon: skip zeroing for host-zeroed reported pages Michael S. Tsirkin
2026-06-08  8:40 ` [PATCH v10 35/37] virtio_balloon: disable reporting zeroed optimization for confidential guests Michael S. Tsirkin
2026-06-08  8:40 ` [PATCH v10 36/37] mm: balloon: use put_page_zeroed for zeroed balloon pages Michael S. Tsirkin
2026-06-08 11:10   ` David Hildenbrand (Arm)
2026-06-08  8:40 ` [PATCH v10 37/37] virtio_balloon: implement VIRTIO_BALLOON_F_DEVICE_INIT_ON_INFLATE Michael S. Tsirkin
2026-06-08  9:17 ` Lorenzo Stoakes [this message]
2026-06-08 12:52   ` [PATCH v10 00/37] mm/virtio: skip redundant zeroing of host-zeroed pages Lorenzo Stoakes
2026-06-08 11:02 ` Vlastimil Babka (SUSE)
2026-06-08 11:13   ` Vlastimil Babka (SUSE)
2026-06-08 15:45     ` Gregory Price
2026-06-08 14:21 ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aiaHd3T42XyB3UBn@lucifer \
    --to=ljs@kernel.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=axelrasmussen@google.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bhe@redhat.com \
    --cc=byungchul@sk.com \
    --cc=chrisl@kernel.org \
    --cc=cl@gentwo.org \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=eperezma@redhat.com \
    --cc=gourry@gourry.net \
    --cc=hannes@cmpxchg.org \
    --cc=harry.yoo@oracle.com \
    --cc=hughd@google.com \
    --cc=jackmanb@google.com \
    --cc=jasowang@redhat.com \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kasong@tencent.com \
    --cc=lance.yang@linux.dev \
    --cc=liam@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matthew.brost@intel.com \
    --cc=mhocko@suse.com \
    --cc=mst@redhat.com \
    --cc=muchun.song@linux.dev \
    --cc=npache@redhat.com \
    --cc=nphamcs@gmail.com \
    --cc=osalvador@suse.de \
    --cc=rakie.kim@sk.com \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=shikemeng@huaweicloud.com \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=virtualization@lists.linux.dev \
    --cc=weixugc@google.com \
    --cc=xuanzhuo@linux.alibaba.com \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yuanchu@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.