From: Ackerley Tng <ackerleytng@google.com>
To: Michael Roth <michael.roth@amd.com>, kvm@vger.kernel.org
Cc: linux-coco@lists.linux.dev, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, david@redhat.com,
tabba@google.com, vannapurve@google.com, ira.weiny@intel.com,
thomas.lendacky@amd.com, pbonzini@redhat.com, seanjc@google.com,
vbabka@suse.cz, joro@8bytes.org, pratikrajesh.sampat@amd.com,
liam.merwick@oracle.com, yan.y.zhao@intel.com, aik@amd.com
Subject: Re: [PATCH RFC v1 1/5] KVM: guest_memfd: Remove preparation tracking
Date: Mon, 25 Aug 2025 16:08:19 -0700 [thread overview]
Message-ID: <diqztt1vf198.fsf@google.com> (raw)
In-Reply-To: <20250613005400.3694904-2-michael.roth@amd.com>
Michael Roth <michael.roth@amd.com> writes:
> guest_memfd currently uses the folio uptodate flag to track:
>
> 1) whether or not a page had been cleared before initial usage
> 2) whether or not the architecture hooks have been issued to put the
> page in a private state as defined by the architecture
>
> In practice, 2) is only actually being tracked for SEV-SNP VMs, and
> there do not seem to be any plans/reasons that would suggest this will
> change in the future, so this additional tracking/complexity is not
> really providing any general benefit to guest_memfd users. Future plans
> around in-place conversion and hugepage support, where the per-folio
> uptodate flag is planned to be used purely to track the initial clearing
> of folios, whereas conversion operations could trigger multiple
> transitions between 'prepared' and 'unprepared' and thus need separate
> tracking, will make the burden of tracking this information within
> guest_memfd even more complex, since preparation generally happens
> during fault time, on the "read-side" of any global locks that might
> protect state tracked by guest_memfd, and so may require more complex
> locking schemes to allow for concurrent handling of page faults for
> multiple vCPUs where the "preparedness" state tracked by guest_memfd
> might need to be updated as part of handling the fault.
>
> Instead of keeping this current/future complexity within guest_memfd for
> what is essentially just SEV-SNP, just drop the tracking for 2) and have
> the arch-specific preparation hooks get triggered unconditionally on
> every fault so the arch-specific hooks can check the preparation state
> directly and decide whether or not a folio still needs additional
> preparation. In the case of SEV-SNP, the preparation state is already
> checked again via the preparation hooks to avoid double-preparation, so
> nothing extra needs to be done to update the handling of things there.
>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
> virt/kvm/guest_memfd.c | 47 ++++++++++++++----------------------------
> 1 file changed, 15 insertions(+), 32 deletions(-)
>
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 35f94a288e52..cc93c502b5d8 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -421,11 +421,6 @@ static int __kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slo
> return 0;
> }
>
> -static inline void kvm_gmem_mark_prepared(struct folio *folio)
> -{
> - folio_mark_uptodate(folio);
> -}
> -
> /*
> * Process @folio, which contains @gfn, so that the guest can use it.
> * The folio must be locked and the gfn must be contained in @slot.
> @@ -435,13 +430,7 @@ static inline void kvm_gmem_mark_prepared(struct folio *folio)
> static int kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slot,
> gfn_t gfn, struct folio *folio)
> {
> - unsigned long nr_pages, i;
> pgoff_t index;
> - int r;
> -
> - nr_pages = folio_nr_pages(folio);
> - for (i = 0; i < nr_pages; i++)
> - clear_highpage(folio_page(folio, i));
>
> /*
> * Preparing huge folios should always be safe, since it should
> @@ -459,11 +448,8 @@ static int kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slot,
While working on HugeTLB support for guest_memfd, I added a test that
tries to map a non-huge-page-aligned gmem.pgoff to a huge-page aligned
gfn.
I understand that config would destroy the performance advantages of
huge pages, but I think the test is necessary since Yan brought up the
use case here [1].
The conclusion in that thread, I believe, was to allow binding of
unaligned GFNs to offsets, but disallow large pages in that case. The
next series for guest_memfd HugeTLB support will include a fix similar
to this [2].
While testing, I hit this WARN_ON with a non-huge-page-aligned
gmem.pgoff.
> WARN_ON(!IS_ALIGNED(slot->gmem.pgoff, 1 << folio_order(folio)));
Do you all think this WARN_ON can be removed?
Also, do you think kvm_gmem_prepare_folio()s interface should perhaps be
changed to take pfn, gfn, nr_pages (PAGE_SIZE pages) and level?
I think taking a folio is kind of awkward since we're not really setting
up the folio, we're setting up something mapping-related for the
folio. Also, kvm_gmem_invalidate() doesn't take folios, which is more
aligned with invalidating mappings rather than something folio-related.
[1] https://lore.kernel.org/all/aA7UXI0NB7oQQrL2@yzhao56-desk.sh.intel.com/
[2] https://github.com/googleprodkernel/linux-cc/commit/371ed9281e0c9ba41cfdc20b48a6c5566f61a7df
> index = gfn - slot->base_gfn + slot->gmem.pgoff;
> index = ALIGN_DOWN(index, 1 << folio_order(folio));
> - r = __kvm_gmem_prepare_folio(kvm, slot, index, folio);
> - if (!r)
> - kvm_gmem_mark_prepared(folio);
>
> - return r;
> + return __kvm_gmem_prepare_folio(kvm, slot, index, folio);
> }
>
>
> [...snip...]
>
next prev parent reply other threads:[~2025-08-25 23:08 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-13 0:53 [PATCH RFC v1 0/5] KVM: guest_memfd: Support in-place conversion for CoCo VMs Michael Roth
2025-06-13 0:53 ` [PATCH RFC v1 1/5] KVM: guest_memfd: Remove preparation tracking Michael Roth
2025-07-15 12:47 ` Vishal Annapurve
2025-07-15 22:55 ` Michael Roth
2025-08-25 23:08 ` Ackerley Tng [this message]
2025-09-16 23:33 ` Michael Roth
2025-09-18 6:31 ` Ackerley Tng
2025-09-18 7:38 ` Ackerley Tng
2025-09-18 9:29 ` Ackerley Tng
2025-11-07 13:05 ` Yan Zhao
2025-06-13 0:53 ` [PATCH RFC v1 2/5] KVM: guest_memfd: Only access KVM memory attributes when appropriate Michael Roth
2025-06-13 0:53 ` [PATCH RFC v1 3/5] KVM: guest_memfd: Call arch invalidation hooks when converting to shared Michael Roth
2025-07-15 13:20 ` Vishal Annapurve
2025-07-15 22:48 ` Michael Roth
2025-07-16 13:04 ` Vishal Annapurve
2025-06-13 0:53 ` [PATCH RFC v1 4/5] KVM: guest_memfd: Don't prepare shared folios Michael Roth
2025-06-13 0:54 ` [PATCH RFC v1 5/5] KVM: SEV: Make SNP_LAUNCH_UPDATE ignore 'uaddr' if guest_memfd is shareable Michael Roth
2025-06-13 7:36 ` [PATCH RFC v1 0/5] KVM: guest_memfd: Support in-place conversion for CoCo VMs David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=diqztt1vf198.fsf@google.com \
--to=ackerleytng@google.com \
--cc=aik@amd.com \
--cc=david@redhat.com \
--cc=ira.weiny@intel.com \
--cc=joro@8bytes.org \
--cc=kvm@vger.kernel.org \
--cc=liam.merwick@oracle.com \
--cc=linux-coco@lists.linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=michael.roth@amd.com \
--cc=pbonzini@redhat.com \
--cc=pratikrajesh.sampat@amd.com \
--cc=seanjc@google.com \
--cc=tabba@google.com \
--cc=thomas.lendacky@amd.com \
--cc=vannapurve@google.com \
--cc=vbabka@suse.cz \
--cc=yan.y.zhao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.