Re: [PATCH 06/13] KVM: Disallow hugepages for incompatible gmem bindings, but let 'em succeed

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Michael Roth <michael.roth@amd.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Binbin Wu <binbin.wu@linux.intel.com>
Subject: Re: [PATCH 06/13] KVM: Disallow hugepages for incompatible gmem bindings, but let 'em succeed
Date: Thu, 28 Sep 2023 11:31:51 -0700	[thread overview]
Message-ID: <ZRXGl44g8oD-FtNy@google.com> (raw)
In-Reply-To: <20230922224210.6klwbphnsk5j2wft@amd.com>

On Fri, Sep 22, 2023, Michael Roth wrote:
> On Thu, Sep 21, 2023 at 01:33:23PM -0700, Sean Christopherson wrote:
> > +	/*
> > +	 * For simplicity, allow mapping a hugepage if and only if the entire
> > +	 * binding is compatible, i.e. don't bother supporting mapping interior
> > +	 * sub-ranges with hugepages (unless userspace comes up with a *really*
> > +	 * strong use case for needing hugepages within unaligned bindings).
> > +	 */
> > +	if (!IS_ALIGNED(slot->gmem.pgoff, 1ull << *max_order) ||
> > +	    !IS_ALIGNED(slot->npages, 1ull << *max_order))
> > +		*max_order = 0;
> 
> Thanks for working this in. Unfortunately on x86 the bulk of guest memory
> ends up getting slotted directly above legacy regions at GFN 0x100, 

Can you provide an example?  I'm struggling to understand what the layout actually
is.  I don't think it changes the story for the kernel, but it sounds like there
might be room for optimization in QEMU?  Or more likely, I just don't understand
what you're saying :-)

> so the associated slot still ends failing these alignment checks if it tries
> to match the gmemfd offsets up with the shared RAM/memfd offsets.
> 
> I tried to work around it in userspace by padding the gmemfd offset of
> each slot to the next 2M boundary, but that also requires dynamically
> growing the gmemfd inode to account for the padding of each new slot and
> it gets ugly enough that I'm not sure it's any better than your
> suggested alternative of using a unique gmemfd for each slot.
> 
> But what if we relax the check to simply make sure that any large folio
> must is fully-contained by the range of the slot is bound to? It *seems*
> like that would still avoid stuff like mapping 2M pages in the NPT (or
> setting up 2M RMP table entries) that aren't fully contained by a slot
> while still allowing the bulk of guest memory to get mapped as 2M. Are
> there other edge cases to consider?
> 
> The following seems to work for a basic 16GB SNP guest at least:
> 
> diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c
> index 9109bf5751ee..e73128d4ebc2 100644
> --- a/virt/kvm/guest_mem.c
> +++ b/virt/kvm/guest_mem.c
> @@ -618,6 +618,7 @@ int __kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
>                        gfn_t gfn, kvm_pfn_t *pfn, int *max_order, bool prep)
>  {
>         pgoff_t index = gfn - slot->base_gfn + slot->gmem.pgoff;
> +       pgoff_t huge_index;
>         struct kvm_gmem *gmem;
>         struct folio *folio;
>         struct page *page;
> @@ -662,9 +663,12 @@ int __kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
>          * sub-ranges with hugepages (unless userspace comes up with a *really*
>          * strong use case for needing hugepages within unaligned bindings).
>          */
> -       if (!IS_ALIGNED(slot->gmem.pgoff, 1ull << *max_order) ||
> -           !IS_ALIGNED(slot->npages, 1ull << *max_order))
> +       huge_index = round_down(index, 1ull << *max_order);

Why not use ALIGN() here?  The size is obviously a power-of-2.  Or is my math
even worse than I thought?

> +       if (huge_index < ALIGN(slot->gmem.pgoff, 1ull << *max_order) ||
> +           huge_index + (1ull << *max_order) > slot->gmem.pgoff + slot->npages) {

Argh, I keep forgetting that the MMU is responsible for handling misaligned gfns.
Yeah, this looks right.

Can you post this as a proper patch, on top of my fixes?  And without the pr_debug().
That'll be the easiest for me to apply+squash when the time comes.

Thanks much!

next prev parent reply	other threads:[~2023-09-28 18:31 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-21 20:33 [PATCH 00/13] KVM: guest_memfd fixes Sean Christopherson
2023-09-21 20:33 ` [PATCH 01/13] KVM: Assert that mmu_invalidate_in_progress *never* goes negative Sean Christopherson
2023-09-21 20:33 ` [PATCH 02/13] KVM: Actually truncate the inode when doing PUNCH_HOLE for guest_memfd Sean Christopherson
2023-09-21 20:33 ` [PATCH 03/13] KVM: WARN if *any* MMU invalidation sequence doesn't add a range Sean Christopherson
2023-09-21 20:33 ` [PATCH 04/13] KVM: WARN if there are danging MMU invalidations at VM destruction Sean Christopherson
2023-09-27  3:16   ` Binbin Wu
2023-09-28 18:11     ` Sean Christopherson
2023-09-21 20:33 ` [PATCH 05/13] KVM: Fix MMU invalidation bookkeeping in guest_memfd Sean Christopherson
2023-09-21 20:33 ` [PATCH 06/13] KVM: Disallow hugepages for incompatible gmem bindings, but let 'em succeed Sean Christopherson
2023-09-22 22:42   ` Michael Roth
2023-09-28 18:31     ` Sean Christopherson [this message]
2023-10-02 15:53       ` Michael Roth
2023-10-02 16:49         ` Sean Christopherson
2023-09-21 20:33 ` [PATCH 07/13] KVM: x86/mmu: Track PRIVATE impact on hugepage mappings for all memslots Sean Christopherson
2023-09-27  6:01   ` Binbin Wu
2023-09-27 14:25     ` Sean Christopherson
2023-09-21 20:33 ` [PATCH 08/13] KVM: x86/mmu: Zap shared-only memslots when private attribute changes Sean Christopherson
2023-09-21 20:33 ` [PATCH 09/13] KVM: Always add relevant ranges to invalidation set when changing attributes Sean Christopherson
2023-09-21 20:33 ` [PATCH 10/13] KVM: x86/mmu: Drop repeated add() of to-be-invalidated range Sean Christopherson
2023-09-21 20:33 ` [PATCH 11/13] KVM: selftests: Refactor private mem conversions to prep for punch_hole test Sean Christopherson
2023-09-21 20:33 ` [PATCH 12/13] KVM: selftests: Add a "pure" PUNCH_HOLE on guest_memfd testcase Sean Christopherson
2023-09-21 20:33 ` [PATCH 13/13] KVM: Rename guest_mem.c to guest_memfd.c Sean Christopherson
2023-09-29  2:22 ` [PATCH 00/13] KVM: guest_memfd fixes Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZRXGl44g8oD-FtNy@google.com \
    --to=seanjc@google.com \
    --cc=binbin.wu@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michael.roth@amd.com \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.