public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "David Hildenbrand (Arm)" <david@kernel.org>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@kernel.org>,
	Brendan Jackman <jackmanb@google.com>,
	Michal Hocko <mhocko@suse.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Jason Wang <jasowang@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	linux-mm@kvack.org, virtualization@lists.linux.dev
Subject: Re: [PATCH RFC v2 00/18] mm/virtio: skip redundant zeroing of host-zeroed reported pages
Date: Tue, 21 Apr 2026 12:50:11 +0200	[thread overview]
Message-ID: <4bdc66f2-1469-4b91-9935-74c3d3ca0ed9@kernel.org> (raw)
In-Reply-To: <20260420192037-mutt-send-email-mst@kernel.org>

On 4/21/26 01:33, Michael S. Tsirkin wrote:
> On Mon, Apr 20, 2026 at 08:20:57PM +0200, David Hildenbrand (Arm) wrote:
>> On 4/20/26 14:51, Michael S. Tsirkin wrote:
>>>
>>
>> Hi!
>>
>>>
>>> v2 - this is an attempt to address David Hildenbrand's comments:
>>> overloading GFP and using page->private, support for
>>> balloon deflate.
>>>
>>> I hope this one is acceptable, API wise.
>>>
>>> I also went ahead and implemented an alternative approach
>>> that David suggested:
>>> using GFP_ZERO to zero userspace pages.
>>> The issue is simple: on some architectures, one has to know the
>>> userspace fault address in order to flush the cache.
>>>
>>> So, I had to propagate the fault address everywhere.
>>
>> As I said, that might not be necessary. vma_alloc_folio() is the
>> interface we mostly care about in that regard.
>>
> 
> I'm not sure I follow what "might not be necessary". We need a fault
> address so zeroing can be effective wrt cache. Since you asked that it's
> done deep in post alloc hook, the address has to propagate all over mm.

Let's look at who ends up using user_alloc_needs_zeroing() or
folio_zero_user()

3 folio_zero_user() users are hugetlb that might get pages from another
allocator. In particular, mm/memfd.c even passes 0 as it doesn't even
have an address.

I don't think we particularly care about speeding up hugetlb zeroing at
this point when we already don't even care about optimizing for
user_alloc_needs_zeroing(). But it could be reworked later to optimize
zeroing in a similar way when actually allocating a folio from the buddy.

Now, for callers we care more about:

* vma_alloc_anon_folio_pmd() calls
  vma_alloc_folio()+user_alloc_needs_zeroing()+folio_zero_user()
* alloc_anon_folio() calls
  vma_alloc_folio()+user_alloc_needs_zeroing()+folio_zero_user()
* vma_alloc_zeroed_movable_folio() calls
  vma_alloc_zeroed_movable_folio()+user_alloc_needs_zeroing()+
  clear_user_highpage().


Other vma_alloc_folio() users neither specify __GFP_ZERO not use
folio_zero_user(), as they will be overwriting the data either way. Like
KSM when unsharing, for example.

I am saying we move "user_alloc_needs_zeroing()+folio_zero_user()" into
vma_alloc_folio(), by teaching vma_alloc_folio() to respect __GFP_ZERO.

user_alloc_needs_zeroing() will effectively go away as the buddy will
just handle that.


All of the above is what you do on gfp_zero branch already, so I think
you understood what I meant regarding this interface.


Anybody in the tree that would be using another folio_alloc() (or page
allocator) interface with __GFP_ZERO *would already be broken on other
architectures* where we would actually require folio_zero_user(), as
they would already not be using folio_zero_user().

But we don't really need the user address in many cases, like when
allocating a folio for the pagecache where we don't even have an address.

> 
> 
>>> A lot of churn, and my concern is, if we miss even one
>>> place, silent, subtle data corruption will result and only
>>> on some arches (x86 will be fine).
>>
>> Which would *already* be the case of you use folio_alloc(GFP_ZERO)
>> instead of magical vma_alloc_folio() + folio_zero_user().
>>
>> I don't really see how vma_alloc_folio_hints() -- that also consumes the
>> address -- is any better in that regard?
> 
> By itself, it is not. But the issue is propagating the address from
> there all over mm. If we miss even one place - we get a subtle cache
> corruption on non x86.

Yes. Like someone not using folio_zero_user() as of today.

[...]

> 
>>>
>>> Still, you can view that approach here:
>>> https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git gfp_zero
>>>
>>> David, if you still feel I should switch to that approach,
>>> let me know. Personally, I'd rather keep that as a separate
>>> project from this optimization.
>> I'd prefer if we extend vma_alloc_folio() to just handle GFP_ZERO for us.
> 
> Pls take a look at that tree then. What do you think of that approach?
> Better? 

I primarily wonder whether we can limit the impact in patch #1 by
focusing on the vma_alloc_folio() path only.

For example, I don't think converting all folio_alloc_mpol() users to
consume USER_ADDR_NONE at this point is really reasonable.

(a) Focus on vma_alloc_folio(), where we already have an address.

(b) To implement vma_alloc_folio() that way, we might need some internal
    interfaces that consume an address.

For example, instead of changing all callers of post_alloc_hook() to
pass USER_ADDR_NONE, can we make post_alloc_hook() a simple wrapper
around a variant that consumes an address.

So isn't there a way we can just keep the changes mostly to mm/page_alloc.c?


> 
> I also note that we need a flag for free in order to implement
> balloon deflate as you asked. Here, I reused the hints.

Yes, but on the allocation path we do have a flag: __GFP_ZERO.

-- 
Cheers,

David

  parent reply	other threads:[~2026-04-21 10:50 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-20 12:51 [PATCH RFC v2 00/18] mm/virtio: skip redundant zeroing of host-zeroed reported pages Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 01/18] mm: page_alloc: propagate PageReported flag across buddy splits Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 02/18] mm: add pghint_t type and vma_alloc_folio_hints API Michael S. Tsirkin
2026-04-21  0:58   ` Huang, Ying
2026-04-21 13:03     ` Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 03/18] mm: add PG_zeroed page flag for known-zero pages Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 04/18] mm: page_alloc: track PG_zeroed across buddy merges Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 05/18] mm: page_alloc: preserve PG_zeroed in try_to_claim_block Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 06/18] mm: page_alloc: thread pghint_t through get_page_from_freelist Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 07/18] mm: post_alloc_hook: use PG_zeroed to skip zeroing, return pghint_t Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 08/18] mm: hugetlb: thread pghint_t through buddy allocation chain Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 09/18] mm: hugetlb: use PG_zeroed for pool pages, skip redundant zeroing Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 10/18] mm: page_reporting: support host-zeroed reported pages Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 11/18] mm: skip zeroing in vma_alloc_zeroed_movable_folio for pre-zeroed pages Michael S. Tsirkin
2026-04-21 10:58   ` David Hildenbrand (Arm)
2026-04-21 11:12     ` David Hildenbrand (Arm)
2026-04-21 11:25       ` David Hildenbrand (Arm)
2026-04-21 13:19     ` Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 12/18] mm: skip zeroing in alloc_anon_folio " Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 13/18] mm: skip zeroing in vma_alloc_anon_folio_pmd " Michael S. Tsirkin
2026-04-20 12:50 ` [PATCH RFC v2 14/18] mm: memfd: skip zeroing for pre-zeroed hugetlb pages Michael S. Tsirkin
2026-04-20 12:51 ` [PATCH RFC v2 15/18] virtio_balloon: add host_zeroes_pages module parameter Michael S. Tsirkin
2026-04-20 12:51 ` [PATCH RFC v2 16/18] mm: page_reporting: add flush parameter with page budget Michael S. Tsirkin
2026-04-20 12:51 ` [PATCH RFC v2 17/18] mm: add free_frozen_pages_hint and put_page_hint APIs Michael S. Tsirkin
2026-04-21 10:56   ` David Hildenbrand (Arm)
2026-04-21 13:16     ` Michael S. Tsirkin
2026-04-21 13:30       ` David Hildenbrand (Arm)
2026-04-21 13:44         ` Michael S. Tsirkin
2026-04-21 14:07           ` David Hildenbrand (Arm)
2026-04-20 12:51 ` [PATCH RFC v2 18/18] virtio_balloon: mark deflated pages as pre-zeroed Michael S. Tsirkin
2026-04-20 18:09 ` [syzbot ci] Re: mm/virtio: skip redundant zeroing of host-zeroed reported pages syzbot ci
2026-04-20 18:20 ` [PATCH RFC v2 00/18] " David Hildenbrand (Arm)
2026-04-20 23:33   ` Michael S. Tsirkin
2026-04-21  2:38     ` Gregory Price
2026-04-21 10:04       ` David Hildenbrand (Arm)
2026-04-21 14:15         ` Michael S. Tsirkin
2026-04-21 14:18           ` David Hildenbrand (Arm)
2026-04-21 13:06       ` Michael S. Tsirkin
2026-04-21 16:51         ` Gregory Price
2026-04-21 16:59           ` Michael S. Tsirkin
2026-04-21 10:50     ` David Hildenbrand (Arm) [this message]
2026-04-21  2:21 ` Gregory Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4bdc66f2-1469-4b91-9935-74c3d3ca0ed9@kernel.org \
    --to=david@kernel.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=jackmanb@google.com \
    --cc=jasowang@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=mst@redhat.com \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=virtualization@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox