Re: [PATCH resend v6 03/30] mm: thread user_addr through page allocator for cache-friendly zeroing

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Michael S. Tsirkin" <mst@redhat.com>
To: Gregory Price <gourry@gourry.net>
Cc: linux-kernel@vger.kernel.org,
	"David Hildenbrand (Arm)" <david@kernel.org>,
	"Jason Wang" <jasowang@redhat.com>,
	"Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
	"Eugenio Pérez" <eperezma@redhat.com>,
	"Muchun Song" <muchun.song@linux.dev>,
	"Oscar Salvador" <osalvador@suse.de>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Lorenzo Stoakes" <ljs@kernel.org>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	"Vlastimil Babka" <vbabka@kernel.org>,
	"Mike Rapoport" <rppt@kernel.org>,
	"Suren Baghdasaryan" <surenb@google.com>,
	"Michal Hocko" <mhocko@suse.com>,
	"Brendan Jackman" <jackmanb@google.com>,
	"Johannes Weiner" <hannes@cmpxchg.org>, "Zi Yan" <ziy@nvidia.com>,
	"Baolin Wang" <baolin.wang@linux.alibaba.com>,
	"Nico Pache" <npache@redhat.com>,
	"Ryan Roberts" <ryan.roberts@arm.com>,
	"Dev Jain" <dev.jain@arm.com>, "Barry Song" <baohua@kernel.org>,
	"Lance Yang" <lance.yang@linux.dev>,
	"Hugh Dickins" <hughd@google.com>,
	"Matthew Brost" <matthew.brost@intel.com>,
	"Joshua Hahn" <joshua.hahnjy@gmail.com>,
	"Rakie Kim" <rakie.kim@sk.com>,
	"Byungchul Park" <byungchul@sk.com>,
	"Ying Huang" <ying.huang@linux.alibaba.com>,
	"Alistair Popple" <apopple@nvidia.com>,
	"Christoph Lameter" <cl@gentwo.org>,
	"David Rientjes" <rientjes@google.com>,
	"Roman Gushchin" <roman.gushchin@linux.dev>,
	"Harry Yoo" <harry.yoo@oracle.com>,
	"Axel Rasmussen" <axelrasmussen@google.com>,
	"Yuanchu Xie" <yuanchu@google.com>, "Wei Xu" <weixugc@google.com>,
	"Chris Li" <chrisl@kernel.org>,
	"Kairui Song" <kasong@tencent.com>,
	"Kemeng Shi" <shikemeng@huaweicloud.com>,
	"Nhat Pham" <nphamcs@gmail.com>, "Baoquan He" <bhe@redhat.com>,
	virtualization@lists.linux.dev, linux-mm@kvack.org,
	"Andrea Arcangeli" <aarcange@redhat.com>,
	"Liam R. Howlett" <liam@infradead.org>,
	"Harry Yoo" <harry@kernel.org>, "Hao Li" <hao.li@linux.dev>
Subject: Re: [PATCH resend v6 03/30] mm: thread user_addr through page allocator for cache-friendly zeroing
Date: Mon, 11 May 2026 11:55:40 -0400	[thread overview]
Message-ID: <20260511114853-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <agH3wceKcPNmixCj@gourry-fedora-PF4VCD3F>

On Mon, May 11, 2026 at 11:37:37AM -0400, Gregory Price wrote:
> On Mon, May 11, 2026 at 05:01:55AM -0400, Michael S. Tsirkin wrote:
> > Thread a user virtual address from vma_alloc_folio() down through
> > the page allocator to post_alloc_hook(). This is plumbing
> > preparation for a subsequent patch that will use user_addr to
> > call folio_zero_user() for cache-friendly zeroing of user pages.
> > 
> > The user_addr is stored in struct alloc_context and flows through:
> >   vma_alloc_folio -> folio_alloc_mpol -> __alloc_pages_mpol ->
> >   __alloc_frozen_pages -> get_page_from_freelist -> prep_new_page ->
> >   post_alloc_hook
> 
> This is the nitty-est of all nits, but when doing this can we please
> prefer stack style?
> 
>   vma_alloc_folio 
>     folio_alloc_mpol 
>       __alloc_pages_mpol 
>         __alloc_frozen_pages 
>           get_page_from_freelist 
>             prep_new_page 
>               post_alloc_hook
> 
> Claude has a bad habit of writing changelog changes this way, and it's
> painful for a human to try to read.

Sure.

> > 
> > USER_ADDR_NONE ((unsigned long)-1) is used for non-user
> > allocations, since address 0 is a valid userspace mapping.
> > 
> 
> > +/*
> > + * Sentinel for user_addr: indicates a non-user allocation.
> > + * Cannot use 0 because address 0 is a valid userspace mapping.
> > + */
> > +#define USER_ADDR_NONE	((unsigned long)-1)
> 
> Ehm, hm.  Does -1 hold as a non-user address across all architectures?
> 
> What about in linear addressing / no VM mode?


this is used on a fault. I don't think there are any faults then?
But maybe FAULT_ADDR_NONE would be clearer.

> > diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> > index 7ccbda35b9ad..ee35c5367abc 100644
> > --- a/include/linux/gfp.h
> > +++ b/include/linux/gfp.h
> > @@ -337,7 +337,7 @@ static inline struct folio *folio_alloc_noprof(gfp_t gfp, unsigned int order)
> >  static inline struct folio *folio_alloc_mpol_noprof(gfp_t gfp, unsigned int order,
> >  		struct mempolicy *mpol, pgoff_t ilx, int nid)
> >  {
> > -	return folio_alloc_noprof(gfp, order);
> > +	return __folio_alloc_noprof(gfp, order, numa_node_id(), NULL);
> >  }
> >  #endif
> >  
> 
> This change seems out of place unless i'm missing something.
> 

Don't remember. Could be from a change I reverted. I'll look.

> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index f24bf49be047..a999f3ead852 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1806,7 +1806,8 @@ struct address_space *hugetlb_folio_mapping_lock_write(struct folio *folio)
> >  }
> >  
> >  static struct folio *alloc_buddy_frozen_folio(int order, gfp_t gfp_mask,
> > -		int nid, nodemask_t *nmask, nodemask_t *node_alloc_noretry)
> > +		int nid, nodemask_t *nmask, nodemask_t *node_alloc_noretry,
> > +		unsigned long addr)
> 
>                        user_addr?  uaddr?

ok


> > @@ -1823,7 +1824,7 @@ static struct folio *alloc_buddy_frozen_folio(int order, gfp_t gfp_mask,
> >  	if (alloc_try_hard)
> >  		gfp_mask |= __GFP_RETRY_MAYFAIL;
> >  
> > -	folio = (struct folio *)__alloc_frozen_pages(gfp_mask, order, nid, nmask);
> > +	folio = (struct folio *)__alloc_frozen_pages(gfp_mask, order, nid, nmask, addr);
> 
> Not on you, but the changes in hugetlb.c as a whole are :[
> 
> We do all of this to pass USER_ADDR_NONE all over the place, but the
> alternative is having a separate function specifically for user-land
> bound allocations.
> 
> So the trade off is:
>    a) churn the current interface for everyone
>    b) add a user_ variant and know people will just get it wrong

I was also explicitly asked not to proliferate too many new APIs.


> IIRC you said the consequence of getting wrong here is subtle corruption
> if a caller got it wrong, and this was related to cache flushing for the
> provided user address?

Yes.

> Stupid question: Does this not apply to kernel allocations as well? Or
> is it simply a matter of the cache having stale data that could leak,
> and therefore it's not a concern in-kernel?
> 
> ~Gregory


Not a concern since we zero through the kernel address.

-- 
MST

next prev parent reply	other threads:[~2026-05-11 15:55 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-11  9:01 [PATCH resend v6 00/30] mm/virtio: skip redundant zeroing of host-zeroed pages Michael S. Tsirkin
2026-05-11  9:01 ` [PATCH resend v6 01/30] mm: move vma_alloc_folio_noprof to page_alloc.c Michael S. Tsirkin
2026-05-11 14:47   ` Gregory Price
2026-05-11 15:09     ` Michael S. Tsirkin
2026-05-11  9:01 ` [PATCH resend v6 02/30] mm: mempolicy: fix interleave index for unaligned VMA start Michael S. Tsirkin
2026-05-11 14:59   ` Gregory Price
2026-05-11 15:15     ` Michael S. Tsirkin
2026-05-11 15:26     ` Michael S. Tsirkin
2026-05-11  9:01 ` [PATCH resend v6 03/30] mm: thread user_addr through page allocator for cache-friendly zeroing Michael S. Tsirkin
2026-05-11 15:37   ` Gregory Price
2026-05-11 15:55     ` Michael S. Tsirkin [this message]
2026-05-11 16:52       ` Gregory Price
2026-05-11 21:59         ` Michael S. Tsirkin
2026-05-12  2:50           ` Gregory Price
2026-05-12 18:06             ` Michael S. Tsirkin
2026-05-11  9:02 ` [PATCH resend v6 04/30] mm: add folio_zero_user stub for configs without THP/HUGETLBFS Michael S. Tsirkin
2026-05-11  9:02 ` [PATCH resend v6 05/30] mm: page_alloc: move prep_compound_page before post_alloc_hook Michael S. Tsirkin
2026-05-11 15:54   ` Gregory Price
2026-05-11  9:02 ` [PATCH resend v6 06/30] mm: use folio_zero_user for user pages in post_alloc_hook Michael S. Tsirkin
2026-05-11  9:02 ` [PATCH resend v6 07/30] mm: use __GFP_ZERO in vma_alloc_zeroed_movable_folio Michael S. Tsirkin
2026-05-11  9:02 ` [PATCH resend v6 08/30] mm: remove arch vma_alloc_zeroed_movable_folio overrides Michael S. Tsirkin
2026-05-11  9:02 ` [PATCH resend v6 09/30] mm: alloc_anon_folio: pass raw fault address to vma_alloc_folio Michael S. Tsirkin
2026-05-11 16:03   ` Gregory Price
2026-05-11  9:02 ` [PATCH resend v6 10/30] mm: alloc_swap_folio: " Michael S. Tsirkin
2026-05-11 16:05   ` Gregory Price
2026-05-11 21:41     ` Michael S. Tsirkin
2026-05-11 21:52       ` Gregory Price
2026-05-11 22:02         ` Michael S. Tsirkin
2026-05-11  9:02 ` [PATCH resend v6 11/30] mm: use __GFP_ZERO in alloc_anon_folio Michael S. Tsirkin
2026-05-11 16:15   ` Gregory Price
2026-05-11  9:02 ` [PATCH resend v6 12/30] mm: vma_alloc_anon_folio_pmd: pass raw fault address to vma_alloc_folio Michael S. Tsirkin
2026-05-11 16:17   ` Gregory Price
2026-05-11  9:02 ` [PATCH resend v6 13/30] mm: use __GFP_ZERO in vma_alloc_anon_folio_pmd Michael S. Tsirkin
2026-05-11 16:26   ` Gregory Price
2026-05-11  9:03 ` [PATCH resend v6 14/30] mm: hugetlb: use __GFP_ZERO and skip zeroing for zeroed pages Michael S. Tsirkin
2026-05-11 16:36   ` Gregory Price
2026-05-11 21:47     ` Michael S. Tsirkin
2026-05-12  2:49       ` Gregory Price
2026-05-12 16:35         ` Michael S. Tsirkin
2026-05-11  9:03 ` [PATCH resend v6 15/30] mm: memfd: skip zeroing for zeroed hugetlb pool pages Michael S. Tsirkin
2026-05-11 16:39   ` Gregory Price
2026-05-11  9:03 ` [PATCH resend v6 16/30] mm: page_reporting: allow driver to set batch capacity Michael S. Tsirkin
2026-05-11  9:03 ` [PATCH resend v6 17/30] mm: page_alloc: propagate PageReported flag across buddy splits Michael S. Tsirkin
2026-05-12 15:40   ` Gregory Price
2026-05-12 15:46     ` Michael S. Tsirkin
2026-05-11  9:03 ` [PATCH resend v6 18/30] mm: page_reporting: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
2026-05-11  9:03 ` [PATCH resend v6 19/30] mm: page_reporting: add per-page zeroed bitmap for host feedback Michael S. Tsirkin
2026-05-11  9:03 ` [PATCH resend v6 20/30] mm: page_alloc: clear PG_zeroed on buddy merge if not both zero Michael S. Tsirkin
2026-05-12 15:56   ` Gregory Price
2026-05-12 15:58     ` Michael S. Tsirkin
2026-05-12 16:32       ` Gregory Price
2026-05-12 21:06         ` Michael S. Tsirkin
2026-05-11  9:03 ` [PATCH resend v6 21/30] mm: page_alloc: preserve PG_zeroed in page_del_and_expand Michael S. Tsirkin
2026-05-12 15:58   ` Gregory Price
2026-05-11  9:03 ` [PATCH resend v6 22/30] virtio_balloon: submit reported pages as individual buffers Michael S. Tsirkin
2026-05-11  9:03 ` [PATCH resend v6 23/30] mm: page_reporting: add flush parameter with page budget Michael S. Tsirkin
2026-05-11  9:03 ` [PATCH resend v6 24/30] mm: page_alloc: propagate PG_zeroed in split_large_buddy Michael S. Tsirkin
2026-05-11  9:04 ` [PATCH resend v6 25/30] virtio_balloon: skip zeroing for host-zeroed reported pages Michael S. Tsirkin
2026-05-11  9:04 ` [PATCH resend v6 26/30] virtio_balloon: disable reporting zeroed optimization for confidential guests Michael S. Tsirkin
2026-05-11  9:04 ` [PATCH resend v6 27/30] mm: add free_frozen_pages_zeroed Michael S. Tsirkin
2026-05-11  9:04 ` [PATCH resend v6 28/30] mm: add put_page_zeroed and folio_put_zeroed Michael S. Tsirkin
2026-05-12 16:05   ` Gregory Price
2026-05-11  9:04 ` [PATCH resend v6 29/30] virtio_balloon: implement VIRTIO_BALLOON_F_DEVICE_INIT_ON_INFLATE Michael S. Tsirkin
2026-05-11  9:04 ` [PATCH resend v6 30/30] mm: balloon: use put_page_zeroed for zeroed balloon pages Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260511114853-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=axelrasmussen@google.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bhe@redhat.com \
    --cc=byungchul@sk.com \
    --cc=chrisl@kernel.org \
    --cc=cl@gentwo.org \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=eperezma@redhat.com \
    --cc=gourry@gourry.net \
    --cc=hannes@cmpxchg.org \
    --cc=hao.li@linux.dev \
    --cc=harry.yoo@oracle.com \
    --cc=harry@kernel.org \
    --cc=hughd@google.com \
    --cc=jackmanb@google.com \
    --cc=jasowang@redhat.com \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kasong@tencent.com \
    --cc=lance.yang@linux.dev \
    --cc=liam@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=matthew.brost@intel.com \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=npache@redhat.com \
    --cc=nphamcs@gmail.com \
    --cc=osalvador@suse.de \
    --cc=rakie.kim@sk.com \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=shikemeng@huaweicloud.com \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=virtualization@lists.linux.dev \
    --cc=weixugc@google.com \
    --cc=xuanzhuo@linux.alibaba.com \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yuanchu@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.