Re: [PATCH 06/13] KVM: Disallow hugepages for incompatible gmem bindings, but let 'em succeed

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Michael Roth <michael.roth@amd.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Binbin Wu <binbin.wu@linux.intel.com>
Subject: Re: [PATCH 06/13] KVM: Disallow hugepages for incompatible gmem bindings, but let 'em succeed
Date: Mon, 2 Oct 2023 09:49:45 -0700	[thread overview]
Message-ID: <ZRr0qUeSVxZfK-w2@google.com> (raw)
In-Reply-To: <20231002155330.lguyhqgy64rhko4p@amd.com>

On Mon, Oct 02, 2023, Michael Roth wrote:
> On Thu, Sep 28, 2023 at 11:31:51AM -0700, Sean Christopherson wrote:
> > On Fri, Sep 22, 2023, Michael Roth wrote:
> > > On Thu, Sep 21, 2023 at 01:33:23PM -0700, Sean Christopherson wrote:
> > > > +	/*
> > > > +	 * For simplicity, allow mapping a hugepage if and only if the entire
> > > > +	 * binding is compatible, i.e. don't bother supporting mapping interior
> > > > +	 * sub-ranges with hugepages (unless userspace comes up with a *really*
> > > > +	 * strong use case for needing hugepages within unaligned bindings).
> > > > +	 */
> > > > +	if (!IS_ALIGNED(slot->gmem.pgoff, 1ull << *max_order) ||
> > > > +	    !IS_ALIGNED(slot->npages, 1ull << *max_order))
> > > > +		*max_order = 0;
> > > 
> > > Thanks for working this in. Unfortunately on x86 the bulk of guest memory
> > > ends up getting slotted directly above legacy regions at GFN 0x100, 
> > 
> > Can you provide an example?  I'm struggling to understand what the layout actually
> > is.  I don't think it changes the story for the kernel, but it sounds like there
> > might be room for optimization in QEMU?  Or more likely, I just don't understand
> > what you're saying :-)
> 
> Here's one example, which seems to be fairly normal for an x86 boot:
> 
>   kvm_set_user_memory AddrSpace#0 Slot#0 flags=0x4 gpa=0x0 size=0x80000000 ua=0x7f24afc00000 ret=0 restricted_fd=19 restricted_offset=0x0
>   ^ QEMU creates Slot 0 for all of main guest RAM
>   kvm_set_user_memory AddrSpace#0 Slot#0 flags=0x0 gpa=0x0 size=0x0 ua=0x7f24afc00000 ret=0 restricted_fd=19 restricted_offset=0x0
>   kvm_set_user_memory AddrSpace#0 Slot#0 flags=0x4 gpa=0x0 size=0xc0000 ua=0x7f24afc00000 ret=0 restricted_fd=19 restricted_offset=0x0
>   kvm_set_user_memory AddrSpace#0 Slot#3 flags=0x6 gpa=0xc0000 size=0x20000 ua=0x7f2575000000 ret=0 restricted_fd=33 restricted_offset=0x0
>   kvm_set_user_memory AddrSpace#0 Slot#4 flags=0x6 gpa=0xe0000 size=0x20000 ua=0x7f2575400000 ret=0 restricted_fd=31 restricted_offset=0x0
>   ^ legacy regions are created and mapped on top of GPA ranges [0xc0000:0xe0000) and [0xe0000:0x100000)
>   kvm_set_user_memory AddrSpace#0 Slot#5 flags=0x4 gpa=0x100000 size=0x7ff00000 ua=0x7f24afd00000 ret=0 restricted_fd=19 restricted_offset=0x100000
>   ^ QEMU divides Slot 0 into Slot 0 at [0x0:0xc0000) and Slot 5 at [0x100000:0x80000000)
>     Both Slots still share the same backing memory allocation, so same gmem
>     fd 19 is used,but Slot 5 is assigned to offset 0x100000, whih is not
>     2M-aligned
> 
> I tried messing with QEMU handling to pad out guest_memfd offsets to 2MB
> boundaries but then the inode size needs to be enlarged to account for it
> and things get a bit messy. Not sure if there are alternative approaches
> that can be taken from userspace, but with normal malloc()'d or mmap()'d
> backing memory the kernel can still allocate a 2MB backing page for the
> [0x0:0x200000) range and I think KVM still handles that when setting up
> NPT of sub-ranges so there might not be much room for further optimization
> there.

Oooh, duh.  QEMU intentionally creates a gap for the VGA and/or BIOS holes, and
so the lower DRAM chunk that goes from the end of the system reserved chunk to
to TOLUD is started at an unaligned offset, even though 99% of the slot is properly
aligned.

Yeah, KVM definitely needs to support that.  Requiring userspace to align based
on the hugepage size could work, e.g. QEMU could divide slot 5 into N slots, to
end up with a series of slots to get from 4KiB aligned => 2MiB aligned => 1GiB
aligned.  But pushing for that would be beyond stubborn.

Thanks for being patient :-)

next prev parent reply	other threads:[~2023-10-02 16:49 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-21 20:33 [PATCH 00/13] KVM: guest_memfd fixes Sean Christopherson
2023-09-21 20:33 ` [PATCH 01/13] KVM: Assert that mmu_invalidate_in_progress *never* goes negative Sean Christopherson
2023-09-21 20:33 ` [PATCH 02/13] KVM: Actually truncate the inode when doing PUNCH_HOLE for guest_memfd Sean Christopherson
2023-09-21 20:33 ` [PATCH 03/13] KVM: WARN if *any* MMU invalidation sequence doesn't add a range Sean Christopherson
2023-09-21 20:33 ` [PATCH 04/13] KVM: WARN if there are danging MMU invalidations at VM destruction Sean Christopherson
2023-09-27  3:16   ` Binbin Wu
2023-09-28 18:11     ` Sean Christopherson
2023-09-21 20:33 ` [PATCH 05/13] KVM: Fix MMU invalidation bookkeeping in guest_memfd Sean Christopherson
2023-09-21 20:33 ` [PATCH 06/13] KVM: Disallow hugepages for incompatible gmem bindings, but let 'em succeed Sean Christopherson
2023-09-22 22:42   ` Michael Roth
2023-09-28 18:31     ` Sean Christopherson
2023-10-02 15:53       ` Michael Roth
2023-10-02 16:49         ` Sean Christopherson [this message]
2023-09-21 20:33 ` [PATCH 07/13] KVM: x86/mmu: Track PRIVATE impact on hugepage mappings for all memslots Sean Christopherson
2023-09-27  6:01   ` Binbin Wu
2023-09-27 14:25     ` Sean Christopherson
2023-09-21 20:33 ` [PATCH 08/13] KVM: x86/mmu: Zap shared-only memslots when private attribute changes Sean Christopherson
2023-09-21 20:33 ` [PATCH 09/13] KVM: Always add relevant ranges to invalidation set when changing attributes Sean Christopherson
2023-09-21 20:33 ` [PATCH 10/13] KVM: x86/mmu: Drop repeated add() of to-be-invalidated range Sean Christopherson
2023-09-21 20:33 ` [PATCH 11/13] KVM: selftests: Refactor private mem conversions to prep for punch_hole test Sean Christopherson
2023-09-21 20:33 ` [PATCH 12/13] KVM: selftests: Add a "pure" PUNCH_HOLE on guest_memfd testcase Sean Christopherson
2023-09-21 20:33 ` [PATCH 13/13] KVM: Rename guest_mem.c to guest_memfd.c Sean Christopherson
2023-09-29  2:22 ` [PATCH 00/13] KVM: guest_memfd fixes Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZRr0qUeSVxZfK-w2@google.com \
    --to=seanjc@google.com \
    --cc=binbin.wu@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michael.roth@amd.com \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.