From: Ackerley Tng <ackerleytng@google.com>
To: Michael Roth <michael.roth@amd.com>
Cc: kvm@vger.kernel.org, linux-coco@lists.linux.dev,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
david@redhat.com, tabba@google.com, vannapurve@google.com,
ira.weiny@intel.com, thomas.lendacky@amd.com,
pbonzini@redhat.com, seanjc@google.com, vbabka@suse.cz,
joro@8bytes.org, pratikrajesh.sampat@amd.com,
liam.merwick@oracle.com, yan.y.zhao@intel.com, aik@amd.com
Subject: Re: [PATCH RFC v1 1/5] KVM: guest_memfd: Remove preparation tracking
Date: Thu, 18 Sep 2025 06:31:01 +0000 [thread overview]
Message-ID: <diqzo6r8p90a.fsf@google.com> (raw)
In-Reply-To: <20250916233335.wv2lf4fiejlw53o2@amd.com>
Michael Roth <michael.roth@amd.com> writes:
> On Mon, Aug 25, 2025 at 04:08:19PM -0700, Ackerley Tng wrote:
>> Michael Roth <michael.roth@amd.com> writes:
>>
>> > guest_memfd currently uses the folio uptodate flag to track:
>> >
>> > 1) whether or not a page had been cleared before initial usage
>> > 2) whether or not the architecture hooks have been issued to put the
>> > page in a private state as defined by the architecture
>> >
>> > In practice, 2) is only actually being tracked for SEV-SNP VMs, and
>> > there do not seem to be any plans/reasons that would suggest this will
>> > change in the future, so this additional tracking/complexity is not
>> > really providing any general benefit to guest_memfd users. Future plans
>> > around in-place conversion and hugepage support, where the per-folio
>> > uptodate flag is planned to be used purely to track the initial clearing
>> > of folios, whereas conversion operations could trigger multiple
>> > transitions between 'prepared' and 'unprepared' and thus need separate
>> > tracking, will make the burden of tracking this information within
>> > guest_memfd even more complex, since preparation generally happens
>> > during fault time, on the "read-side" of any global locks that might
>> > protect state tracked by guest_memfd, and so may require more complex
>> > locking schemes to allow for concurrent handling of page faults for
>> > multiple vCPUs where the "preparedness" state tracked by guest_memfd
>> > might need to be updated as part of handling the fault.
>> >
>> > Instead of keeping this current/future complexity within guest_memfd for
>> > what is essentially just SEV-SNP, just drop the tracking for 2) and have
>> > the arch-specific preparation hooks get triggered unconditionally on
>> > every fault so the arch-specific hooks can check the preparation state
>> > directly and decide whether or not a folio still needs additional
>> > preparation. In the case of SEV-SNP, the preparation state is already
>> > checked again via the preparation hooks to avoid double-preparation, so
>> > nothing extra needs to be done to update the handling of things there.
>> >
>> > Signed-off-by: Michael Roth <michael.roth@amd.com>
>> > ---
>> > virt/kvm/guest_memfd.c | 47 ++++++++++++++----------------------------
>> > 1 file changed, 15 insertions(+), 32 deletions(-)
>> >
>> > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
>> > index 35f94a288e52..cc93c502b5d8 100644
>> > --- a/virt/kvm/guest_memfd.c
>> > +++ b/virt/kvm/guest_memfd.c
>> > @@ -421,11 +421,6 @@ static int __kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slo
>> > return 0;
>> > }
>> >
>> > -static inline void kvm_gmem_mark_prepared(struct folio *folio)
>> > -{
>> > - folio_mark_uptodate(folio);
>> > -}
>> > -
>> > /*
>> > * Process @folio, which contains @gfn, so that the guest can use it.
>> > * The folio must be locked and the gfn must be contained in @slot.
>> > @@ -435,13 +430,7 @@ static inline void kvm_gmem_mark_prepared(struct folio *folio)
>> > static int kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slot,
>> > gfn_t gfn, struct folio *folio)
>> > {
>> > - unsigned long nr_pages, i;
>> > pgoff_t index;
>> > - int r;
>> > -
>> > - nr_pages = folio_nr_pages(folio);
>> > - for (i = 0; i < nr_pages; i++)
>> > - clear_highpage(folio_page(folio, i));
>> >
>> > /*
>> > * Preparing huge folios should always be safe, since it should
>> > @@ -459,11 +448,8 @@ static int kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slot,
>>
>> While working on HugeTLB support for guest_memfd, I added a test that
>> tries to map a non-huge-page-aligned gmem.pgoff to a huge-page aligned
>> gfn.
>>
>> I understand that config would destroy the performance advantages of
>> huge pages, but I think the test is necessary since Yan brought up the
>> use case here [1].
>>
>> The conclusion in that thread, I believe, was to allow binding of
>> unaligned GFNs to offsets, but disallow large pages in that case. The
>> next series for guest_memfd HugeTLB support will include a fix similar
>> to this [2].
>>
>> While testing, I hit this WARN_ON with a non-huge-page-aligned
>> gmem.pgoff.
>>
>> > WARN_ON(!IS_ALIGNED(slot->gmem.pgoff, 1 << folio_order(folio)));
>>
>> Do you all think this WARN_ON can be removed?
>
> I think so.. I actually ended up dropping this WARN_ON() for a similar
> reason:
>
Thanks for confirming!
> https://github.com/AMDESE/linux/commit/c654cd144ad0d823f4db8793ebf9b43a3e8a7c48
>
> but in that case it was to deal with memslots where most of the GPA
> ranges are huge-page aligned to the gmemfd, and it's just that the start/end
> GPA ranges have been split up and associated with other memslots. In that case
> I still try to allow hugepages but force order 0 in kvm_gmem_get_pfn()
> for the start/end ranges.
>
> I haven't really considered the case where entire GPA range is misaligned
> with gmemfd hugepage offsets but the proposed handling seems reasonable
> to me... I need to take a closer look at whether the above-mentioned
> logic is at odds with what is/will be implemented in
> kvm_alloc_memslot_metadata() however as that seems a bit more restrictive.
>
Does this help? [1] (from a WIP patch series).
KVM already checks that the guest base address (base_gfn) and the
userspace virtual address (userspace_addr) are aligned relative to each
other for each large page level. If they are not, large pages are
disabled for the entire memory slot.
[1] extends that same check for slot->base_gfn and
slot->gmem.pgoff. Hence, guest_memfd is letting KVM manage the
mapping. guest_memfd reports max_order based on what it knows (folio
size, and folio size is also determined by shareability), and KVM
manages the mapping after taking account lpage_info in addition to
max_order.
[1] https://github.com/googleprodkernel/linux-cc/commit/371ed9281e0c9ba41cfdc20b48a6c5566f61a7df
> Thanks,
>
> Mike
>
>>
>> Also, do you think kvm_gmem_prepare_folio()s interface should perhaps be
>> changed to take pfn, gfn, nr_pages (PAGE_SIZE pages) and level?
>>
>> I think taking a folio is kind of awkward since we're not really setting
>> up the folio, we're setting up something mapping-related for the
>> folio. Also, kvm_gmem_invalidate() doesn't take folios, which is more
>> aligned with invalidating mappings rather than something folio-related.
>>
>> [1] https://lore.kernel.org/all/aA7UXI0NB7oQQrL2@yzhao56-desk.sh.intel.com/
>> [2] https://github.com/googleprodkernel/linux-cc/commit/371ed9281e0c9ba41cfdc20b48a6c5566f61a7df
>>
>> > index = gfn - slot->base_gfn + slot->gmem.pgoff;
>> > index = ALIGN_DOWN(index, 1 << folio_order(folio));
>> > - r = __kvm_gmem_prepare_folio(kvm, slot, index, folio);
>> > - if (!r)
>> > - kvm_gmem_mark_prepared(folio);
>> >
>> > - return r;
>> > + return __kvm_gmem_prepare_folio(kvm, slot, index, folio);
>> > }
>> >
>> >
>> > [...snip...]
>> >
>>
next prev parent reply other threads:[~2025-09-18 6:31 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-13 0:53 [PATCH RFC v1 0/5] KVM: guest_memfd: Support in-place conversion for CoCo VMs Michael Roth
2025-06-13 0:53 ` [PATCH RFC v1 1/5] KVM: guest_memfd: Remove preparation tracking Michael Roth
2025-07-15 12:47 ` Vishal Annapurve
2025-07-15 22:55 ` Michael Roth
2025-08-25 23:08 ` Ackerley Tng
2025-09-16 23:33 ` Michael Roth
2025-09-18 6:31 ` Ackerley Tng [this message]
2025-09-18 7:38 ` Ackerley Tng
2025-09-18 9:29 ` Ackerley Tng
2025-11-07 13:05 ` Yan Zhao
2025-06-13 0:53 ` [PATCH RFC v1 2/5] KVM: guest_memfd: Only access KVM memory attributes when appropriate Michael Roth
2025-06-13 0:53 ` [PATCH RFC v1 3/5] KVM: guest_memfd: Call arch invalidation hooks when converting to shared Michael Roth
2025-07-15 13:20 ` Vishal Annapurve
2025-07-15 22:48 ` Michael Roth
2025-07-16 13:04 ` Vishal Annapurve
2025-06-13 0:53 ` [PATCH RFC v1 4/5] KVM: guest_memfd: Don't prepare shared folios Michael Roth
2025-06-13 0:54 ` [PATCH RFC v1 5/5] KVM: SEV: Make SNP_LAUNCH_UPDATE ignore 'uaddr' if guest_memfd is shareable Michael Roth
2025-06-13 7:36 ` [PATCH RFC v1 0/5] KVM: guest_memfd: Support in-place conversion for CoCo VMs David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=diqzo6r8p90a.fsf@google.com \
--to=ackerleytng@google.com \
--cc=aik@amd.com \
--cc=david@redhat.com \
--cc=ira.weiny@intel.com \
--cc=joro@8bytes.org \
--cc=kvm@vger.kernel.org \
--cc=liam.merwick@oracle.com \
--cc=linux-coco@lists.linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=michael.roth@amd.com \
--cc=pbonzini@redhat.com \
--cc=pratikrajesh.sampat@amd.com \
--cc=seanjc@google.com \
--cc=tabba@google.com \
--cc=thomas.lendacky@amd.com \
--cc=vannapurve@google.com \
--cc=vbabka@suse.cz \
--cc=yan.y.zhao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.