From: "Michael S. Tsirkin" <mst@redhat.com>
To: linux-kernel@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Vlastimil Babka <vbabka@kernel.org>,
Brendan Jackman <jackmanb@google.com>,
Michal Hocko <mhocko@suse.com>,
Suren Baghdasaryan <surenb@google.com>,
Jason Wang <jasowang@redhat.com>,
Andrea Arcangeli <aarcange@redhat.com>,
linux-mm@kvack.org, virtualization@lists.linux.dev
Subject: [PATCH RFC v2 00/18] mm/virtio: skip redundant zeroing of host-zeroed reported pages
Date: Mon, 20 Apr 2026 08:51:13 -0400 [thread overview]
Message-ID: <cover.1776689093.git.mst@redhat.com> (raw)
v2 - this is an attempt to address David Hildenbrand's comments:
overloading GFP and using page->private, support for
balloon deflate.
I hope this one is acceptable, API wise.
I also went ahead and implemented an alternative approach
that David suggested:
using GFP_ZERO to zero userspace pages.
The issue is simple: on some architectures, one has to know the
userspace fault address in order to flush the cache.
So, I had to propagate the fault address everywhere.
A lot of churn, and my concern is, if we miss even one
place, silent, subtle data corruption will result and only
on some arches (x86 will be fine).
Still, you can view that approach here:
https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git gfp_zero
David, if you still feel I should switch to that approach,
let me know. Personally, I'd rather keep that as a separate
project from this optimization.
Still an RFC as virtio bits need work, but I would very much like
to get a general agreement on mm bits first. Thanks!
Patch 1 is a minor
optimization that I am carrying here to avoid conflicts. It
might make sense to merge it straight away.
-------
When a guest reports free pages to the hypervisor via virtio-balloon's
free page reporting, the host typically zeros those pages when reclaiming
their backing memory (e.g., via MADV_DONTNEED on anonymous mappings).
When the guest later reallocates those pages, the kernel zeros them
again -- redundantly.
This series eliminates that double-zeroing by propagating the "host
already zeroed this page" information through the buddy allocator and
into the page fault path.
Performance with THP enabled on a 2GB VM, 1 vCPU, allocating
256MB of anonymous pages:
metric baseline optimized delta
task-clock 191 +- 31 ms 60 +- 35 ms -68%
cache-misses 1.10M +- 460K 269K +- 31K -76%
instructions 4.54M +- 275K 4.10M +- 130K -10%
With hugetlb surplus pages:
metric baseline optimized delta
task-clock 183 +- 24 ms 45 +- 23 ms -76%
cache-misses 1.27M +- 544K 270K +- 16K -79%
instructions 5.37M +- 254K 4.94M +- 155K -8%
Notes:
- The virtio_balloon module parameter (15/18) is a testing hack.
A proper virtio feature flag is needed before merging.
- Patch 16/18 adds a sysfs flush trigger for deterministic testing
(avoids waiting for the 2-second reporting delay).
- When host_zeroes_pages is set, callers skip folio_zero_user() for
pages known to be zeroed by the host. This is safe on all
architectures because the hypervisor invalidates guest cache lines
when reclaiming page backing (MADV_DONTNEED).
- PG_zeroed is aliased to PG_private. It is excluded from
PAGE_FLAGS_CHECK_AT_PREP because it must survive on free-list pages
until post_alloc_hook() consumes and clears it. Is this acceptable,
or should a different bit be used?
- The optimization is most effective with THP, where entire 2MB
pages are allocated directly from reported order-9+ buddy pages.
Without THP, only ~21% of order-0 allocations come from reported
pages due to low-order fragmentation.
- Persistent hugetlb pool pages are not covered: when freed by
userspace they return to the hugetlb free pool, not the buddy
allocator, so they are never reported to the host. Surplus
hugetlb pages are allocated from buddy and do benefit.
Test program:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#ifndef MADV_POPULATE_WRITE
#define MADV_POPULATE_WRITE 23
#endif
#ifndef MAP_HUGETLB
#define MAP_HUGETLB 0x40000
#endif
int main(int argc, char **argv)
{
unsigned long size;
int flags = MAP_PRIVATE | MAP_ANONYMOUS;
void *p;
int r;
if (argc < 2) {
fprintf(stderr, "usage: %s <size_mb> [huge]\n", argv[0]);
return 1;
}
size = atol(argv[1]) * 1024UL * 1024;
if (argc >= 3 && strcmp(argv[2], "huge") == 0)
flags |= MAP_HUGETLB;
p = mmap(NULL, size, PROT_READ | PROT_WRITE, flags, -1, 0);
if (p == MAP_FAILED) {
perror("mmap");
return 1;
}
r = madvise(p, size, MADV_POPULATE_WRITE);
if (r) {
perror("madvise");
return 1;
}
munmap(p, size);
return 0;
}
Test script (bench.sh):
#!/bin/bash
# Usage: bench.sh <size_mb> <mode> <iterations> [huge]
# mode 0 = baseline, mode 1 = skip zeroing
SZ=${1:-256}; MODE=${2:-0}; ITER=${3:-10}; HUGE=${4:-}
FLUSH=/sys/module/page_reporting/parameters/flush
PERF_DATA=/tmp/perf-$MODE.csv
rmmod virtio_balloon 2>/dev/null
insmod virtio_balloon.ko host_zeroes_pages=$MODE
echo 512 > $FLUSH
[ "$HUGE" = "huge" ] && echo $((SZ/2)) > /proc/sys/vm/nr_overcommit_hugepages
rm -f $PERF_DATA
echo "=== sz=${SZ}MB mode=$MODE iter=$ITER $HUGE ==="
for i in $(seq 1 $ITER); do
echo 3 > /proc/sys/vm/drop_caches
echo 512 > $FLUSH
perf stat -e task-clock,instructions,cache-misses \
-x, -o $PERF_DATA --append -- ./alloc_once $SZ $HUGE
done
[ "$HUGE" = "huge" ] && echo 0 > /proc/sys/vm/nr_overcommit_hugepages
rmmod virtio_balloon
awk -F, '/^#/||/^$/{next}{v=$1+0;e=$3;gsub(/ /,"",e);s[e]+=v;n[e]++}
END{for(e in s)printf " %-16s %10.2f (n=%d)\n",e,s[e]/n[e],n[e]}' $PERF_DATA
Compile and run:
gcc -static -O2 -o alloc_once alloc_once.c
bash bench.sh 256 0 10 # baseline (regular pages)
bash bench.sh 256 1 10 # optimized (regular pages)
bash bench.sh 256 0 10 huge # baseline (hugetlb surplus)
bash bench.sh 256 1 10 huge # optimized (hugetlb surplus)
Changes since v1:
- Replaced __GFP_PREZEROED with PG_zeroed page flag (aliased PG_private)
- Added pghint_t type and vma_alloc_folio_hints() API
- Track PG_zeroed across buddy merges and splits
- Added post_alloc_hook integration (single consume/clear point)
- Added hugetlb support (pool pages + memfd)
- Added page_reporting flush parameter for deterministic testing
- Added free_frozen_pages_hint/put_page_hint for balloon deflate path
- Added try_to_claim_block PG_zeroed preservation
- Updated perf numbers with per-iteration flush methodology
Michael S. Tsirkin (18):
mm: page_alloc: propagate PageReported flag across buddy splits
mm: add pghint_t type and vma_alloc_folio_hints API
mm: add PG_zeroed page flag for known-zero pages
mm: page_alloc: track PG_zeroed across buddy merges
mm: page_alloc: preserve PG_zeroed in try_to_claim_block
mm: page_alloc: thread pghint_t through get_page_from_freelist
mm: post_alloc_hook: use PG_zeroed to skip zeroing, return pghint_t
mm: hugetlb: thread pghint_t through buddy allocation chain
mm: hugetlb: use PG_zeroed for pool pages, skip redundant zeroing
mm: page_reporting: support host-zeroed reported pages
mm: skip zeroing in vma_alloc_zeroed_movable_folio for pre-zeroed
pages
Michael S. Tsirkin (18):
mm: page_alloc: propagate PageReported flag across buddy splits
mm: add pghint_t type and vma_alloc_folio_hints API
mm: add PG_zeroed page flag for known-zero pages
mm: page_alloc: track PG_zeroed across buddy merges
mm: page_alloc: preserve PG_zeroed in try_to_claim_block
mm: page_alloc: thread pghint_t through get_page_from_freelist
mm: post_alloc_hook: use PG_zeroed to skip zeroing, return pghint_t
mm: hugetlb: thread pghint_t through buddy allocation chain
mm: hugetlb: use PG_zeroed for pool pages, skip redundant zeroing
mm: page_reporting: support host-zeroed reported pages
mm: skip zeroing in vma_alloc_zeroed_movable_folio for pre-zeroed
pages
mm: skip zeroing in alloc_anon_folio for pre-zeroed pages
mm: skip zeroing in vma_alloc_anon_folio_pmd for pre-zeroed pages
mm: memfd: skip zeroing for pre-zeroed hugetlb pages
virtio_balloon: add host_zeroes_pages module parameter
mm: page_reporting: add flush parameter with page budget
mm: add free_frozen_pages_hint and put_page_hint APIs
virtio_balloon: mark deflated pages as pre-zeroed
drivers/virtio/virtio_balloon.c | 11 ++-
fs/hugetlbfs/inode.c | 5 +-
include/linux/gfp.h | 17 +++++
include/linux/highmem.h | 6 +-
include/linux/hugetlb.h | 6 +-
include/linux/mm.h | 12 +++
include/linux/page-flags.h | 13 +++-
include/linux/page_reporting.h | 3 +
mm/compaction.c | 4 +-
mm/huge_memory.c | 12 +--
mm/hugetlb.c | 52 +++++++++----
mm/internal.h | 7 +-
mm/memfd.c | 12 +--
mm/memory.c | 14 ++--
mm/mempolicy.c | 85 +++++++++++++++++++++
mm/page_alloc.c | 131 ++++++++++++++++++++++++--------
mm/page_reporting.c | 55 +++++++++++++-
mm/page_reporting.h | 11 +++
mm/swap.c | 19 +++++
19 files changed, 392 insertions(+), 83 deletions(-)
--
MST
next reply other threads:[~2026-04-20 12:51 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-20 12:51 Michael S. Tsirkin [this message]
2026-04-20 12:50 ` [PATCH RFC v2 01/18] mm: page_alloc: propagate PageReported flag across buddy splits Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 02/18] mm: add pghint_t type and vma_alloc_folio_hints API Michael S. Tsirkin
2026-04-21 0:58 ` Huang, Ying
2026-04-20 12:50 ` [PATCH RFC v2 03/18] mm: add PG_zeroed page flag for known-zero pages Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 04/18] mm: page_alloc: track PG_zeroed across buddy merges Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 05/18] mm: page_alloc: preserve PG_zeroed in try_to_claim_block Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 06/18] mm: page_alloc: thread pghint_t through get_page_from_freelist Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 07/18] mm: post_alloc_hook: use PG_zeroed to skip zeroing, return pghint_t Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 08/18] mm: hugetlb: thread pghint_t through buddy allocation chain Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 09/18] mm: hugetlb: use PG_zeroed for pool pages, skip redundant zeroing Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 10/18] mm: page_reporting: support host-zeroed reported pages Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 11/18] mm: skip zeroing in vma_alloc_zeroed_movable_folio for pre-zeroed pages Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 12/18] mm: skip zeroing in alloc_anon_folio " Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 13/18] mm: skip zeroing in vma_alloc_anon_folio_pmd " Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 14/18] mm: memfd: skip zeroing for pre-zeroed hugetlb pages Michael S. Tsirkin
2026-04-20 12:51 ` [PATCH RFC v2 15/18] virtio_balloon: add host_zeroes_pages module parameter Michael S. Tsirkin
2026-04-20 12:51 ` [PATCH RFC v2 16/18] mm: page_reporting: add flush parameter with page budget Michael S. Tsirkin
2026-04-20 12:51 ` [PATCH RFC v2 17/18] mm: add free_frozen_pages_hint and put_page_hint APIs Michael S. Tsirkin
2026-04-20 12:51 ` [PATCH RFC v2 18/18] virtio_balloon: mark deflated pages as pre-zeroed Michael S. Tsirkin
2026-04-20 18:09 ` [syzbot ci] Re: mm/virtio: skip redundant zeroing of host-zeroed reported pages syzbot ci
2026-04-20 18:20 ` [PATCH RFC v2 00/18] " David Hildenbrand (Arm)
2026-04-20 23:33 ` Michael S. Tsirkin
2026-04-21 2:38 ` Gregory Price
2026-04-21 2:21 ` Gregory Price
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1776689093.git.mst@redhat.com \
--to=mst@redhat.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=david@kernel.org \
--cc=jackmanb@google.com \
--cc=jasowang@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=virtualization@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox