From: "Michael S. Tsirkin" <mst@redhat.com>
To: Gregory Price <gourry@gourry.net>
Cc: linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Vlastimil Babka <vbabka@kernel.org>,
Brendan Jackman <jackmanb@google.com>,
Michal Hocko <mhocko@suse.com>,
Suren Baghdasaryan <surenb@google.com>,
Jason Wang <jasowang@redhat.com>,
Andrea Arcangeli <aarcange@redhat.com>,
linux-mm@kvack.org, virtualization@lists.linux.dev,
Johannes Weiner <hannes@cmpxchg.org>, Zi Yan <ziy@nvidia.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Mike Rapoport <rppt@kernel.org>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
Muchun Song <muchun.song@linux.dev>,
Oscar Salvador <osalvador@suse.de>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
Nico Pache <npache@redhat.com>,
Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
Barry Song <baohua@kernel.org>, Lance Yang <lance.yang@linux.dev>,
Matthew Brost <matthew.brost@intel.com>,
Joshua Hahn <joshua.hahnjy@gmail.com>,
Rakie Kim <rakie.kim@sk.com>, Byungchul Park <byungchul@sk.com>,
Ying Huang <ying.huang@linux.alibaba.com>,
Alistair Popple <apopple@nvidia.com>,
Hugh Dickins <hughd@google.com>,
Christoph Lameter <cl@gentwo.org>,
David Rientjes <rientjes@google.com>,
Roman Gushchin <roman.gushchin@linux.dev>,
Harry Yoo <harry.yoo@oracle.com>, Chris Li <chrisl@kernel.org>,
Kairui Song <kasong@tencent.com>,
Kemeng Shi <shikemeng@huaweicloud.com>,
Nhat Pham <nphamcs@gmail.com>, Baoquan He <bhe@redhat.com>,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH RFC v3 01/19] mm: thread user_addr through page allocator for cache-friendly zeroing
Date: Wed, 22 Apr 2026 17:20:27 -0400 [thread overview]
Message-ID: <20260422171315-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <aekluxNPxF8CN8fL@gourry-fedora-PF4VCD3F>
On Wed, Apr 22, 2026 at 03:47:07PM -0400, Gregory Price wrote:
> On Tue, Apr 21, 2026 at 06:01:10PM -0400, Michael S. Tsirkin wrote:
> > Thread a user virtual address from vma_alloc_folio() down through
> > the page allocator to post_alloc_hook(). This is plumbing preparation
> > for a subsequent patch that will use user_addr to call folio_zero_user()
> > for cache-friendly zeroing of user pages.
> >
> > The user_addr is stored in struct alloc_context and flows through:
> > vma_alloc_folio -> folio_alloc_mpol -> __alloc_pages_mpol ->
> > __alloc_frozen_pages -> get_page_from_freelist -> prep_new_page ->
> > post_alloc_hook
> >
> > Public APIs (__alloc_pages, __folio_alloc, folio_alloc_mpol) gain a
> > user_addr parameter directly. Callers that do not need user_addr
> > pass USER_ADDR_NONE ((unsigned long)-1), since
> > address 0 is a valid user mapping.
> >
>
> Question: rather than churning the entirety of the existing interfaces,
> is there a possibility of adding an explicit interface for this
> interaction that amounts to:
>
> __alloc_user_pages(..., gfp_t gfp, user_addr)
> {
> BUG_ON(!(gfp & __GFP_ZERO));
>
> /* post_alloc_hook implements the already-zeroed skip */
> page = alloc_page(..., gfp, ...); /* existing interface */
>
> /* Do the cacheline stuff here instead of in the core */
> cacheline_nonsense(page, user_addr);
>
> return page; /* user doesn't need to do explicit zeroing */
> }
>
> Then rather than leaking information out of the buddy, we just need to
> get the zeroed information *into* the buddy.
>
> the users that want zeroing but need the explicit user_addr step just
> defer the zeroing to outside post_alloc_hook().
>
> That's just my immediate gut reaction to all this churn on the existing
> interfaces.
>
> Existing users can continue using the buddy as-is, and enlightened users
> can optimize for this specific kind of __GFP_ZERO interaction.
>
> ~Gregory
Hmm. Maybe I misunderstand what you propose, but this seems pretty close
to what v2 did - each callsite checked whether the page was pre-zeroed
and called folio_zero_user() itself. The feedback (both you and David)
was that threading it through the allocator is better.
With a wrapper approach, looks like we'd need something like
__GFP_SKIP_ZERO so post_alloc_hook doesn't zero sequentially, then the
wrapper re-zeros with folio_zero_user(). But then the wrapper needs to
know whether the page was pre-zeroed (PG_zeroed), which is cleared by
post_alloc_hook before return. So the information doesn't survive to
the wrapper.
We could return the zeroed hint via an output parameter, but that's
what v2's pghint_t was, and it was disliked.
The user_addr threading through the allocator does add API churn,
but it's all mechanical (adding one parameter, callers pass
USER_ADDR_NONE), any mistaked are just build errors.
And it makes the zeroing path closer to being correct by
construction: every allocation either explicitly
says no address or has a user_addr - and then gets
cache-friendly zeroing or skip-if-prezeroed, with no possibility
of a callsite forgetting to handle it.
Fundamentally, David told me I need to move folio_zero_user into
post_alloc_hook as a prerequisite to the optimization, so I did that -
let's stick to it then, shall we?
This approach also fixes a pre-existing double-zeroing on architectures with
aliasing data caches + init_on_alloc, where current code zeros once
via kernel_init_pages() then again via clear_user_highpage() at
the callsite. I don't see how that would be possible with the wrapper.
--
MST
next prev parent reply other threads:[~2026-04-22 21:20 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-21 22:01 [PATCH RFC v3 00/19] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
2026-04-21 22:01 ` [PATCH RFC v3 01/19] mm: thread user_addr through page allocator for cache-friendly zeroing Michael S. Tsirkin
2026-04-22 19:47 ` Gregory Price
2026-04-22 20:32 ` Michael S. Tsirkin
2026-04-22 21:20 ` Michael S. Tsirkin [this message]
2026-04-23 4:31 ` Gregory Price
2026-04-23 9:46 ` David Hildenbrand (Arm)
2026-04-23 11:57 ` Michael S. Tsirkin
2026-04-23 13:42 ` Gregory Price
2026-04-23 14:13 ` David Hildenbrand (Arm)
2026-04-23 14:46 ` Michael S. Tsirkin
2026-04-23 15:54 ` David Hildenbrand (Arm)
2026-04-23 16:13 ` Michael S. Tsirkin
2026-04-24 18:33 ` David Hildenbrand (Arm)
2026-04-24 18:36 ` Michael S. Tsirkin
2026-04-24 18:41 ` David Hildenbrand (Arm)
2026-04-23 14:57 ` Gregory Price
2026-04-24 7:28 ` David Hildenbrand (Arm)
2026-04-24 12:30 ` Gregory Price
2026-04-24 12:33 ` Michael S. Tsirkin
2026-04-24 18:41 ` Matthew Wilcox
2026-04-24 18:51 ` David Hildenbrand (Arm)
2026-04-26 20:54 ` Michael S. Tsirkin
2026-04-21 22:01 ` [PATCH RFC v3 02/19] mm: add folio_zero_user stub for configs without THP/HUGETLBFS Michael S. Tsirkin
2026-04-21 22:01 ` [PATCH RFC v3 03/19] mm: page_alloc: move prep_compound_page before post_alloc_hook Michael S. Tsirkin
2026-04-21 22:01 ` [PATCH RFC v3 04/19] mm: use folio_zero_user for user pages in post_alloc_hook Michael S. Tsirkin
2026-04-21 22:01 ` [PATCH RFC v3 05/19] mm: use __GFP_ZERO in vma_alloc_zeroed_movable_folio Michael S. Tsirkin
2026-04-21 22:01 ` [PATCH RFC v3 06/19] mm: use __GFP_ZERO in alloc_anon_folio Michael S. Tsirkin
2026-04-21 22:01 ` [PATCH RFC v3 07/19] mm: use __GFP_ZERO in vma_alloc_anon_folio_pmd Michael S. Tsirkin
2026-04-21 22:01 ` [PATCH RFC v3 08/19] mm: hugetlb: use __GFP_ZERO and skip zeroing for zeroed pages Michael S. Tsirkin
2026-04-21 22:01 ` [PATCH RFC v3 09/19] mm: memfd: skip zeroing for zeroed hugetlb pool pages Michael S. Tsirkin
2026-04-21 22:01 ` [PATCH RFC v3 10/19] mm: remove arch vma_alloc_zeroed_movable_folio overrides Michael S. Tsirkin
2026-04-22 6:54 ` Geert Uytterhoeven
2026-04-22 21:29 ` Greg Ungerer
2026-04-24 7:16 ` Magnus Lindholm
2026-04-21 22:01 ` [PATCH RFC v3 11/19] mm: page_alloc: propagate PageReported flag across buddy splits Michael S. Tsirkin
2026-04-21 22:01 ` [PATCH RFC v3 12/19] mm: page_reporting: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
2026-04-21 22:01 ` [PATCH RFC v3 13/19] virtio_balloon: a hack to enable host-zeroed page optimization Michael S. Tsirkin
2026-04-21 22:01 ` [PATCH RFC v3 14/19] mm: page_reporting: add flush parameter with page budget Michael S. Tsirkin
2026-04-21 22:01 ` [PATCH RFC v3 15/19] mm: add free_frozen_pages_zeroed Michael S. Tsirkin
2026-04-21 22:02 ` [PATCH RFC v3 16/19] mm: add put_page_zeroed and folio_put_zeroed Michael S. Tsirkin
2026-04-21 22:02 ` [PATCH RFC v3 17/19] mm: page_alloc: clear PG_zeroed on buddy merge if not both zero Michael S. Tsirkin
2026-04-21 22:02 ` [PATCH RFC v3 18/19] mm: page_alloc: preserve PG_zeroed in page_del_and_expand Michael S. Tsirkin
2026-04-21 22:02 ` [PATCH RFC v3 19/19] virtio_balloon: mark deflated pages as zeroed Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260422171315-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=Liam.Howlett@oracle.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=bhe@redhat.com \
--cc=byungchul@sk.com \
--cc=chrisl@kernel.org \
--cc=cl@gentwo.org \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=gourry@gourry.net \
--cc=hannes@cmpxchg.org \
--cc=harry.yoo@oracle.com \
--cc=hughd@google.com \
--cc=jackmanb@google.com \
--cc=jasowang@redhat.com \
--cc=joshua.hahnjy@gmail.com \
--cc=kasong@tencent.com \
--cc=lance.yang@linux.dev \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=matthew.brost@intel.com \
--cc=mhocko@suse.com \
--cc=muchun.song@linux.dev \
--cc=npache@redhat.com \
--cc=nphamcs@gmail.com \
--cc=osalvador@suse.de \
--cc=rakie.kim@sk.com \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=shikemeng@huaweicloud.com \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=virtualization@lists.linux.dev \
--cc=willy@infradead.org \
--cc=ying.huang@linux.alibaba.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.