From: Sean Christopherson <seanjc@google.com>
To: Michael Roth <michael.roth@amd.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
Marc Zyngier <maz@kernel.org>,
Oliver Upton <oliver.upton@linux.dev>,
Huacai Chen <chenhuacai@kernel.org>,
Michael Ellerman <mpe@ellerman.id.au>,
Anup Patel <anup@brainfault.org>,
Paul Walmsley <paul.walmsley@sifive.com>,
Palmer Dabbelt <palmer@dabbelt.com>,
Albert Ou <aou@eecs.berkeley.edu>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
Paul Moore <paul@paul-moore.com>,
James Morris <jmorris@namei.org>,
"Serge E. Hallyn" <serge@hallyn.com>,
kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
kvmarm@lists.linux.dev, linux-mips@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org,
linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, linux-security-module@vger.kernel.org,
linux-kernel@vger.kernel.org,
Chao Peng <chao.p.peng@linux.intel.com>,
Fuad Tabba <tabba@google.com>,
Jarkko Sakkinen <jarkko@kernel.org>,
Anish Moorthy <amoorthy@google.com>,
Yu Zhang <yu.c.zhang@linux.intel.com>,
Isaku Yamahata <isaku.yamahata@intel.com>,
Xu Yilun <yilun.xu@intel.com>, Vlastimil Babka <vbabka@suse.cz>,
Vishal Annapurve <vannapurve@google.com>,
Ackerley Tng <ackerleytng@google.com>,
Maciej Szmigiero <mail@maciej.szmigiero.name>,
David Hildenbrand <david@redhat.com>,
Quentin Perret <qperret@google.com>, Wang <wei.w.wang@intel.com>,
Liam Merwick <liam.merwick@oracle.com>,
Isaku Yamahata <isaku.yamahata@gmail.com>,
"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [RFC PATCH v12 14/33] KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory
Date: Wed, 20 Sep 2023 16:44:47 -0700 [thread overview]
Message-ID: <ZQuD77vlBiSU/PE4@google.com> (raw)
In-Reply-To: <20230918163647.m6bjgwusc7ww5tyu@amd.com>
On Mon, Sep 18, 2023, Michael Roth wrote:
> > +static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t len)
> > +{
> > + struct list_head *gmem_list = &inode->i_mapping->private_list;
> > + pgoff_t start = offset >> PAGE_SHIFT;
> > + pgoff_t end = (offset + len) >> PAGE_SHIFT;
> > + struct kvm_gmem *gmem;
> > +
> > + /*
> > + * Bindings must stable across invalidation to ensure the start+end
> > + * are balanced.
> > + */
> > + filemap_invalidate_lock(inode->i_mapping);
> > +
> > + list_for_each_entry(gmem, gmem_list, entry) {
> > + kvm_gmem_invalidate_begin(gmem, start, end);
>
> In v11 we used to call truncate_inode_pages_range() here to drop filemap's
> reference on the folio. AFAICT the folios are only getting free'd upon
> guest shutdown without this. Was this on purpose?
Nope, I just spotted this too. And then after scratching my head for a few minutes,
wondering if I was having an -ENOCOFFEE moment, I finally read your mail. *sigh*
Looking at my reflog history, I'm pretty sure I deleted the wrong line when
removing the truncation from kvm_gmem_error_page().
> > + kvm_gmem_invalidate_end(gmem, start, end);
> > + }
> > +
> > + filemap_invalidate_unlock(inode->i_mapping);
> > +
> > + return 0;
> > +}
> > +
> > +static long kvm_gmem_allocate(struct inode *inode, loff_t offset, loff_t len)
> > +{
> > + struct address_space *mapping = inode->i_mapping;
> > + pgoff_t start, index, end;
> > + int r;
> > +
> > + /* Dedicated guest is immutable by default. */
> > + if (offset + len > i_size_read(inode))
> > + return -EINVAL;
> > +
> > + filemap_invalidate_lock_shared(mapping);
>
> We take the filemap lock here, but not for
> kvm_gmem_get_pfn()->kvm_gmem_get_folio(). Is it needed there as well?
No, we specifically do not want to take a rwsem when faulting in guest memory.
Callers of kvm_gmem_get_pfn() *must* guard against concurrent invalidations via
mmu_invalidate_seq and friends.
> > + /*
> > + * For simplicity, require the offset into the file and the size of the
> > + * memslot to be aligned to the largest possible page size used to back
> > + * the file (same as the size of the file itself).
> > + */
> > + if (!kvm_gmem_is_valid_size(offset, flags) ||
> > + !kvm_gmem_is_valid_size(size, flags))
> > + goto err;
>
> I needed to relax this check for SNP. KVM_GUEST_MEMFD_ALLOW_HUGEPAGE
> applies to entire gmem inode, so it makes sense for userspace to enable
> hugepages if start/end are hugepage-aligned, but QEMU will do things
> like map overlapping regions for ROMs and other things on top of the
> GPA range that the gmem inode was originally allocated for. For
> instance:
>
> 692500@1689108688.696338:kvm_set_user_memory AddrSpace#0 Slot#0 flags=0x4 gpa=0x0 size=0x80000000 ua=0x7fbf5be00000 ret=0 restricted_fd=19 restricted_offset=0x0
> 692500@1689108688.699802:kvm_set_user_memory AddrSpace#0 Slot#1 flags=0x4 gpa=0x100000000 size=0x380000000 ua=0x7fbfdbe00000 ret=0 restricted_fd=19 restricted_offset=0x80000000
> 692500@1689108688.795412:kvm_set_user_memory AddrSpace#0 Slot#0 flags=0x0 gpa=0x0 size=0x0 ua=0x7fbf5be00000 ret=0 restricted_fd=19 restricted_offset=0x0
> 692500@1689108688.795550:kvm_set_user_memory AddrSpace#0 Slot#0 flags=0x4 gpa=0x0 size=0xc0000 ua=0x7fbf5be00000 ret=0 restricted_fd=19 restricted_offset=0x0
> 692500@1689108688.796227:kvm_set_user_memory AddrSpace#0 Slot#6 flags=0x4 gpa=0x100000 size=0x7ff00000 ua=0x7fbf5bf00000 ret=0 restricted_fd=19 restricted_offset=0x100000
>
> Because of that the KVM_SET_USER_MEMORY_REGIONs for non-THP-aligned GPAs
> will fail. Maybe instead it should be allowed, and kvm_gmem_get_folio()
> should handle the alignment checks on a case-by-case and simply force 4k
> for offsets corresponding to unaligned bindings?
Yeah, I wanted to keep the code simple, but disallowing small bindings/memslots
is probably going to be a deal-breaker. Even though I'm skeptical that QEMU
_needs_ to play these games for SNP guests, not playing nice will make it all
but impossible to use guest_memfd for regular VMs.
And the code isn't really any more complex, so long as we punt on allowing
hugepages on interior sub-ranges.
Compile-tested only, but this?
---
virt/kvm/guest_mem.c | 54 ++++++++++++++++++++++----------------------
1 file changed, 27 insertions(+), 27 deletions(-)
diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c
index a819367434e9..dc12e38211df 100644
--- a/virt/kvm/guest_mem.c
+++ b/virt/kvm/guest_mem.c
@@ -426,20 +426,6 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags,
return err;
}
-static bool kvm_gmem_is_valid_size(loff_t size, u64 flags)
-{
- if (size < 0 || !PAGE_ALIGNED(size))
- return false;
-
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- if ((flags & KVM_GUEST_MEMFD_ALLOW_HUGEPAGE) &&
- !IS_ALIGNED(size, HPAGE_PMD_SIZE))
- return false;
-#endif
-
- return true;
-}
-
int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
{
loff_t size = args->size;
@@ -452,9 +438,15 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
if (flags & ~valid_flags)
return -EINVAL;
- if (!kvm_gmem_is_valid_size(size, flags))
+ if (size < 0 || !PAGE_ALIGNED(size))
return -EINVAL;
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ if ((flags & KVM_GUEST_MEMFD_ALLOW_HUGEPAGE) &&
+ !IS_ALIGNED(size, HPAGE_PMD_SIZE))
+ return false;
+#endif
+
return __kvm_gmem_create(kvm, size, flags, kvm_gmem_mnt);
}
@@ -462,7 +454,7 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
unsigned int fd, loff_t offset)
{
loff_t size = slot->npages << PAGE_SHIFT;
- unsigned long start, end, flags;
+ unsigned long start, end;
struct kvm_gmem *gmem;
struct inode *inode;
struct file *file;
@@ -481,16 +473,9 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
goto err;
inode = file_inode(file);
- flags = (unsigned long)inode->i_private;
- /*
- * For simplicity, require the offset into the file and the size of the
- * memslot to be aligned to the largest possible page size used to back
- * the file (same as the size of the file itself).
- */
- if (!kvm_gmem_is_valid_size(offset, flags) ||
- !kvm_gmem_is_valid_size(size, flags))
- goto err;
+ if (offset < 0 || !PAGE_ALIGNED(offset))
+ return -EINVAL;
if (offset + size > i_size_read(inode))
goto err;
@@ -591,8 +576,23 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
page = folio_file_page(folio, index);
*pfn = page_to_pfn(page);
- if (max_order)
- *max_order = compound_order(compound_head(page));
+ if (!max_order)
+ goto success;
+
+ *max_order = compound_order(compound_head(page));
+ if (!*max_order)
+ goto success;
+
+ /*
+ * For simplicity, allow mapping a hugepage if and only if the entire
+ * binding is compatible, i.e. don't bother supporting mapping interior
+ * sub-ranges with hugepages (unless userspace comes up with a *really*
+ * strong use case for needing hugepages within unaligned bindings).
+ */
+ if (!IS_ALIGNED(slot->gmem.pgoff, 1ull << *max_order) ||
+ !IS_ALIGNED(slot->npages, 1ull << *max_order))
+ *max_order = 0;
+success:
r = 0;
out_unlock:
base-commit: bc1a54ee393e0574ea422525cf0b2f1e768e38c5
--
next prev parent reply other threads:[~2023-09-20 23:44 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-14 1:54 [RFC PATCH v12 00/33] KVM: guest_memfd() and per-page attributes Sean Christopherson
2023-09-14 1:54 ` [RFC PATCH v12 01/33] KVM: Tweak kvm_hva_range and hva_handler_t to allow reusing for gfn ranges Sean Christopherson
2023-09-15 6:47 ` Xiaoyao Li
2023-09-15 21:05 ` Sean Christopherson
2023-09-14 1:55 ` [RFC PATCH v12 02/33] KVM: Use gfn instead of hva for mmu_notifier_retry Sean Christopherson
2023-09-14 3:07 ` Binbin Wu
2023-09-14 14:19 ` Sean Christopherson
2023-09-20 6:07 ` Xu Yilun
2023-09-20 13:55 ` Sean Christopherson
2023-09-21 2:39 ` Xu Yilun
2023-09-21 14:24 ` Sean Christopherson
2023-09-14 1:55 ` [RFC PATCH v12 03/33] KVM: PPC: Drop dead code related to KVM_ARCH_WANT_MMU_NOTIFIER Sean Christopherson
2023-09-14 1:55 ` [RFC PATCH v12 04/33] KVM: PPC: Return '1' unconditionally for KVM_CAP_SYNC_MMU Sean Christopherson
2023-09-14 1:55 ` [RFC PATCH v12 05/33] KVM: Convert KVM_ARCH_WANT_MMU_NOTIFIER to CONFIG_KVM_GENERIC_MMU_NOTIFIER Sean Christopherson
2023-10-09 16:42 ` Anup Patel
2023-09-14 1:55 ` [RFC PATCH v12 06/33] KVM: Introduce KVM_SET_USER_MEMORY_REGION2 Sean Christopherson
2023-09-15 6:59 ` Xiaoyao Li
2023-09-14 1:55 ` [RFC PATCH v12 07/33] KVM: Add KVM_EXIT_MEMORY_FAULT exit to report faults to userspace Sean Christopherson
2023-09-22 6:03 ` Xiaoyao Li
2023-09-22 14:30 ` Sean Christopherson
2023-09-14 1:55 ` [RFC PATCH v12 08/33] KVM: Add a dedicated mmu_notifier flag for reclaiming freed memory Sean Christopherson
2023-09-14 1:55 ` [RFC PATCH v12 09/33] KVM: Drop .on_unlock() mmu_notifier hook Sean Christopherson
2023-09-14 1:55 ` [RFC PATCH v12 10/33] KVM: Set the stage for handling only shared mappings in mmu_notifier events Sean Christopherson
2023-09-18 1:14 ` Binbin Wu
2023-09-18 15:57 ` Sean Christopherson
2023-09-18 18:07 ` Michael Roth
2023-09-19 0:08 ` Sean Christopherson
2023-09-14 1:55 ` [RFC PATCH v12 11/33] KVM: Introduce per-page memory attributes Sean Christopherson
2023-09-15 6:32 ` Yan Zhao
2023-09-20 21:00 ` Sean Christopherson
2023-09-21 1:21 ` Yan Zhao
2023-09-25 17:37 ` Sean Christopherson
2023-09-18 7:51 ` Binbin Wu
2023-09-20 21:03 ` Sean Christopherson
2023-09-27 5:19 ` Binbin Wu
2023-10-03 12:47 ` Fuad Tabba
2023-10-03 15:59 ` Sean Christopherson
2023-10-03 18:33 ` Fuad Tabba
2023-10-03 20:51 ` Sean Christopherson
2023-09-14 1:55 ` [RFC PATCH v12 12/33] mm: Add AS_UNMOVABLE to mark mapping as completely unmovable Sean Christopherson
2023-09-14 1:55 ` [RFC PATCH v12 13/33] security: Export security_inode_init_security_anon() for use by KVM Sean Christopherson
2023-09-14 1:55 ` [RFC PATCH v12 14/33] KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory Sean Christopherson
2023-09-15 6:11 ` Yan Zhao
2023-09-18 16:36 ` Michael Roth
2023-09-20 23:44 ` Sean Christopherson [this message]
2023-09-19 9:01 ` Binbin Wu
2023-09-20 14:24 ` Sean Christopherson
2023-09-21 5:58 ` Binbin Wu
2023-09-21 19:10 ` Sean Christopherson
2023-09-14 1:55 ` [RFC PATCH v12 15/33] KVM: Add transparent hugepage support for dedicated guest memory Sean Christopherson
2023-09-14 1:55 ` [RFC PATCH v12 16/33] KVM: x86: "Reset" vcpu->run->exit_reason early in KVM_RUN Sean Christopherson
2023-09-14 1:55 ` [RFC PATCH v12 17/33] KVM: x86: Disallow hugepages when memory attributes are mixed Sean Christopherson
2023-09-14 1:55 ` [RFC PATCH v12 18/33] KVM: x86/mmu: Handle page fault for private memory Sean Christopherson
2023-09-15 5:40 ` Yan Zhao
2023-09-15 14:26 ` Sean Christopherson
2023-09-18 0:54 ` Yan Zhao
2023-09-21 14:59 ` Sean Christopherson
2023-09-21 5:51 ` Binbin Wu
2023-09-14 1:55 ` [RFC PATCH v12 19/33] KVM: Drop superfluous __KVM_VCPU_MULTIPLE_ADDRESS_SPACE macro Sean Christopherson
2023-09-14 1:55 ` [RFC PATCH v12 20/33] KVM: Allow arch code to track number of memslot address spaces per VM Sean Christopherson
2023-09-14 1:55 ` [RFC PATCH v12 21/33] KVM: x86: Add support for "protected VMs" that can utilize private memory Sean Christopherson
2023-09-14 1:55 ` [RFC PATCH v12 22/33] KVM: selftests: Drop unused kvm_userspace_memory_region_find() helper Sean Christopherson
2023-09-14 1:55 ` [RFC PATCH v12 23/33] KVM: selftests: Convert lib's mem regions to KVM_SET_USER_MEMORY_REGION2 Sean Christopherson
2023-09-14 1:55 ` [RFC PATCH v12 24/33] KVM: selftests: Add support for creating private memslots Sean Christopherson
2023-09-14 1:55 ` [RFC PATCH v12 25/33] KVM: selftests: Add helpers to convert guest memory b/w private and shared Sean Christopherson
2023-09-14 1:55 ` [RFC PATCH v12 26/33] KVM: selftests: Add helpers to do KVM_HC_MAP_GPA_RANGE hypercalls (x86) Sean Christopherson
2023-09-14 1:55 ` [RFC PATCH v12 27/33] KVM: selftests: Introduce VM "shape" to allow tests to specify the VM type Sean Christopherson
2023-09-14 1:55 ` [RFC PATCH v12 28/33] KVM: selftests: Add GUEST_SYNC[1-6] macros for synchronizing more data Sean Christopherson
2023-09-14 1:55 ` [RFC PATCH v12 29/33] KVM: selftests: Add x86-only selftest for private memory conversions Sean Christopherson
2023-09-14 1:55 ` [RFC PATCH v12 30/33] KVM: selftests: Add KVM_SET_USER_MEMORY_REGION2 helper Sean Christopherson
2023-09-14 1:55 ` [RFC PATCH v12 31/33] KVM: selftests: Expand set_memory_region_test to validate guest_memfd() Sean Christopherson
2023-09-14 1:55 ` [RFC PATCH v12 32/33] KVM: selftests: Add basic selftest for guest_memfd() Sean Christopherson
2023-09-14 1:55 ` [RFC PATCH v12 33/33] KVM: selftests: Test KVM exit behavior for private memory/access Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZQuD77vlBiSU/PE4@google.com \
--to=seanjc@google.com \
--cc=ackerleytng@google.com \
--cc=akpm@linux-foundation.org \
--cc=amoorthy@google.com \
--cc=anup@brainfault.org \
--cc=aou@eecs.berkeley.edu \
--cc=chao.p.peng@linux.intel.com \
--cc=chenhuacai@kernel.org \
--cc=david@redhat.com \
--cc=isaku.yamahata@gmail.com \
--cc=isaku.yamahata@intel.com \
--cc=jarkko@kernel.org \
--cc=jmorris@namei.org \
--cc=kirill.shutemov@linux.intel.com \
--cc=kvm-riscv@lists.infradead.org \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.linux.dev \
--cc=liam.merwick@oracle.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mips@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-riscv@lists.infradead.org \
--cc=linux-security-module@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mail@maciej.szmigiero.name \
--cc=maz@kernel.org \
--cc=michael.roth@amd.com \
--cc=mpe@ellerman.id.au \
--cc=oliver.upton@linux.dev \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
--cc=paul@paul-moore.com \
--cc=pbonzini@redhat.com \
--cc=qperret@google.com \
--cc=serge@hallyn.com \
--cc=tabba@google.com \
--cc=vannapurve@google.com \
--cc=vbabka@suse.cz \
--cc=wei.w.wang@intel.com \
--cc=willy@infradead.org \
--cc=yilun.xu@intel.com \
--cc=yu.c.zhang@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).