[PATCH 00/12] KVM: guest_memfd: lazy preparation of pages + prefault support for SEV-SNP

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 00/12] KVM: guest_memfd: lazy preparation of pages + prefault support for SEV-SNP
@ 2024-07-11 22:27 Paolo Bonzini
  2024-07-11 22:27 ` [PATCH 01/12] KVM: guest_memfd: return folio from __kvm_gmem_get_pfn() Paolo Bonzini
                   ` (11 more replies)
  0 siblings, 12 replies; 34+ messages in thread
From: Paolo Bonzini @ 2024-07-11 22:27 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc, michael.roth

There are a few issues with the implementation of SEV page "preparation"
and KVM_SEV_SNP_LAUNCH_UPDATE:

- doing fallocate() before KVM_SEV_SNP_LAUNCH_UPDATE will cause the latter to fail.

- checking that only private pages are passed to KVM_SEV_SNP_LAUNCH_UPDATE
  is done in the guts of vendor code, as is the check that pages are not
  passed twice.  This goes against the idea of putting as much common
  code as possible in kvm_gmem_populate(), for example it returns -EINVAL
  if the page is already assigned.

- clearing the page is unnecessary if the firmware will overwrite it

- the "prepare" bool argument is kinda gross

The important observation here is that the two paths that initialize the
contents of the folio are mutually exclusive: either the folio is populated
with unencrypted data by userspace, or it is zeroed because userspace did
not do anything beyond fallocate().  But in the latter there's no need to
zero and prepare the page at the time of fallocate(): it can be done instead
when the guest actually uses the page.

Pulling all the zero-and-prepare code into kvm_gmem_get_pfn() separates the
flows more clearly, but how do you recognize folios that haven't been
prepared yet?  The answer is to use the up-to-date flag; there is a
limited attempt to use it right now, but it only concerns itself with
the folio having been cleared.  Instead after this series the flag is set
for folios that went through either kvm_gmem_populate() or that have been
mapped into the guest once (and thus went through kvm_arch_gmem_prepare(),
on architectures where that is a thing).

As a bonus, KVM knows exactly which guest is mapping a page, and thus
the preparation code does not have to iterate through all bound
instances of "struct kvm".

There is an easy way vendor-independent way to obtain the previous
behavior if desired, and that is simply to do KVM_PRE_FAULT_MEMORY after
KVM_SEV_SNP_LAUNCH_FINISH.

(Credit for the idea goes to Sean Christopherson, though none of his
tentative implementation is in these patches).

The bulk of the changes is in patches 6, 9 and 12.  Everything else is
small preparatory changes; many patches in the first half for example try
to use struct folio more instead of struct page and pfns, which is useful
when we finally extend the code region before folio_unlock() into callers
of kvm_gmem_get_folio().  There's also a couple cleanup in the middle,
mostly patches 4 and 8.

Sorry about the delay sending these out.  This should probably be in 6.11
but will not necessarily go in during the merge window, depending on how
review goes.

Tested with the SEV-SNP smoke test that was just posted, and by booting
a Linux guest.

Paolo

Paolo Bonzini (12):
  KVM: guest_memfd: return folio from __kvm_gmem_get_pfn()
  KVM: guest_memfd: delay folio_mark_uptodate() until after successful
    preparation
  KVM: guest_memfd: do not go through struct page
  KVM: rename CONFIG_HAVE_KVM_GMEM_* to CONFIG_HAVE_KVM_ARCH_GMEM_*
  KVM: guest_memfd: return locked folio from __kvm_gmem_get_pfn
  KVM: guest_memfd: delay kvm_gmem_prepare_folio() until the memory is
    passed to the guest
  KVM: guest_memfd: make kvm_gmem_prepare_folio() operate on a single
    struct kvm
  KVM: remove kvm_arch_gmem_prepare_needed()
  KVM: guest_memfd: move check for already-populated page to common code
  KVM: cleanup and add shortcuts to kvm_range_has_memory_attributes()
  KVM: extend kvm_range_has_memory_attributes() to check subset of
    attributes
  KVM: guest_memfd: let kvm_gmem_populate() operate only on private gfns

 arch/x86/kvm/Kconfig     |   4 +-
 arch/x86/kvm/mmu/mmu.c   |   2 +-
 arch/x86/kvm/svm/sev.c   |   9 +-
 arch/x86/kvm/x86.c       |   9 +-
 include/linux/kvm_host.h |   9 +-
 virt/kvm/Kconfig         |   4 +-
 virt/kvm/guest_memfd.c   | 209 +++++++++++++++++++++++----------------
 virt/kvm/kvm_main.c      |  73 +++++++-------
 8 files changed, 171 insertions(+), 148 deletions(-)

-- 
2.43.0

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 01/12] KVM: guest_memfd: return folio from __kvm_gmem_get_pfn()
  2024-07-11 22:27 [PATCH 00/12] KVM: guest_memfd: lazy preparation of pages + prefault support for SEV-SNP Paolo Bonzini
@ 2024-07-11 22:27 ` Paolo Bonzini
  2024-07-15 22:26   ` Michael Roth
  2024-07-11 22:27 ` [PATCH 02/12] KVM: guest_memfd: delay folio_mark_uptodate() until after successful preparation Paolo Bonzini
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2024-07-11 22:27 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc, michael.roth

Right now this is simply more consistent and avoids use of pfn_to_page()
and put_page().  It will be put to more use in upcoming patches, to
ensure that the up-to-date flag is set at the very end of both the
kvm_gmem_get_pfn() and kvm_gmem_populate() flows.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 virt/kvm/guest_memfd.c | 37 ++++++++++++++++++++-----------------
 1 file changed, 20 insertions(+), 17 deletions(-)

diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 1c509c351261..522e1b28e7ae 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -541,34 +541,34 @@ void kvm_gmem_unbind(struct kvm_memory_slot *slot)
 	fput(file);
 }
 
-static int __kvm_gmem_get_pfn(struct file *file, struct kvm_memory_slot *slot,
-		       gfn_t gfn, kvm_pfn_t *pfn, int *max_order, bool prepare)
+static struct folio *
+__kvm_gmem_get_pfn(struct file *file, struct kvm_memory_slot *slot,
+		   gfn_t gfn, kvm_pfn_t *pfn, int *max_order, bool prepare)
 {
 	pgoff_t index = gfn - slot->base_gfn + slot->gmem.pgoff;
 	struct kvm_gmem *gmem = file->private_data;
 	struct folio *folio;
 	struct page *page;
-	int r;
 
 	if (file != slot->gmem.file) {
 		WARN_ON_ONCE(slot->gmem.file);
-		return -EFAULT;
+		return ERR_PTR(-EFAULT);
 	}
 
 	gmem = file->private_data;
 	if (xa_load(&gmem->bindings, index) != slot) {
 		WARN_ON_ONCE(xa_load(&gmem->bindings, index));
-		return -EIO;
+		return ERR_PTR(-EIO);
 	}
 
 	folio = kvm_gmem_get_folio(file_inode(file), index, prepare);
 	if (IS_ERR(folio))
-		return PTR_ERR(folio);
+		return folio;
 
 	if (folio_test_hwpoison(folio)) {
 		folio_unlock(folio);
 		folio_put(folio);
-		return -EHWPOISON;
+		return ERR_PTR(-EHWPOISON);
 	}
 
 	page = folio_file_page(folio, index);
@@ -577,25 +577,25 @@ static int __kvm_gmem_get_pfn(struct file *file, struct kvm_memory_slot *slot,
 	if (max_order)
 		*max_order = 0;
 
-	r = 0;
-
 	folio_unlock(folio);
-
-	return r;
+	return folio;
 }
 
 int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 		     gfn_t gfn, kvm_pfn_t *pfn, int *max_order)
 {
 	struct file *file = kvm_gmem_get_file(slot);
-	int r;
+	struct folio *folio;
 
 	if (!file)
 		return -EFAULT;
 
-	r = __kvm_gmem_get_pfn(file, slot, gfn, pfn, max_order, true);
+	folio = __kvm_gmem_get_pfn(file, slot, gfn, pfn, max_order, true);
 	fput(file);
-	return r;
+	if (IS_ERR(folio))
+		return PTR_ERR(folio);
+
+	return 0;
 }
 EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn);
 
@@ -625,6 +625,7 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long
 
 	npages = min_t(ulong, slot->npages - (start_gfn - slot->base_gfn), npages);
 	for (i = 0; i < npages; i += (1 << max_order)) {
+		struct folio *folio;
 		gfn_t gfn = start_gfn + i;
 		kvm_pfn_t pfn;
 
@@ -633,9 +634,11 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long
 			break;
 		}
 
-		ret = __kvm_gmem_get_pfn(file, slot, gfn, &pfn, &max_order, false);
-		if (ret)
+		folio = __kvm_gmem_get_pfn(file, slot, gfn, &pfn, &max_order, false);
+		if (IS_ERR(folio)) {
+			ret = PTR_ERR(folio);
 			break;
+		}
 
 		if (!IS_ALIGNED(gfn, (1 << max_order)) ||
 		    (npages - i) < (1 << max_order))
@@ -644,7 +647,7 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long
 		p = src ? src + i * PAGE_SIZE : NULL;
 		ret = post_populate(kvm, gfn, pfn, p, max_order, opaque);
 
-		put_page(pfn_to_page(pfn));
+		folio_put(folio);
 		if (ret)
 			break;
 	}
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 02/12] KVM: guest_memfd: delay folio_mark_uptodate() until after successful preparation
  2024-07-11 22:27 [PATCH 00/12] KVM: guest_memfd: lazy preparation of pages + prefault support for SEV-SNP Paolo Bonzini
  2024-07-11 22:27 ` [PATCH 01/12] KVM: guest_memfd: return folio from __kvm_gmem_get_pfn() Paolo Bonzini
@ 2024-07-11 22:27 ` Paolo Bonzini
  2024-07-15 22:32   ` Michael Roth
  2024-07-11 22:27 ` [PATCH 03/12] KVM: guest_memfd: do not go through struct page Paolo Bonzini
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2024-07-11 22:27 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc, michael.roth

The up-to-date flag as is now is not too useful; it tells guest_memfd not
to overwrite the contents of a folio, but it doesn't say that the page
is ready to be mapped into the guest.  For encrypted guests, mapping
a private page requires that the "preparation" phase has succeeded,
and at the same time the same page cannot be prepared twice.

So, ensure that folio_mark_uptodate() is only called on a prepared page.  If
kvm_gmem_prepare_folio() or the post_populate callback fail, the folio
will not be marked up-to-date; it's not a problem to call clear_highpage()
again on such a page prior to the next preparation attempt.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 virt/kvm/guest_memfd.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 522e1b28e7ae..1ea632dbae57 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -73,8 +73,6 @@ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index, bool
 
 		for (i = 0; i < nr_pages; i++)
 			clear_highpage(folio_page(folio, i));
-
-		folio_mark_uptodate(folio);
 	}
 
 	if (prepare) {
@@ -84,6 +82,8 @@ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index, bool
 			folio_put(folio);
 			return ERR_PTR(r);
 		}
+
+		folio_mark_uptodate(folio);
 	}
 
 	/*
@@ -646,6 +646,8 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long
 
 		p = src ? src + i * PAGE_SIZE : NULL;
 		ret = post_populate(kvm, gfn, pfn, p, max_order, opaque);
+		if (!ret)
+			folio_mark_uptodate(folio);
 
 		folio_put(folio);
 		if (ret)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 03/12] KVM: guest_memfd: do not go through struct page
  2024-07-11 22:27 [PATCH 00/12] KVM: guest_memfd: lazy preparation of pages + prefault support for SEV-SNP Paolo Bonzini
  2024-07-11 22:27 ` [PATCH 01/12] KVM: guest_memfd: return folio from __kvm_gmem_get_pfn() Paolo Bonzini
  2024-07-11 22:27 ` [PATCH 02/12] KVM: guest_memfd: delay folio_mark_uptodate() until after successful preparation Paolo Bonzini
@ 2024-07-11 22:27 ` Paolo Bonzini
  2024-07-15 22:36   ` Michael Roth
  2024-07-11 22:27 ` [PATCH 04/12] KVM: rename CONFIG_HAVE_KVM_GMEM_* to CONFIG_HAVE_KVM_ARCH_GMEM_* Paolo Bonzini
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2024-07-11 22:27 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc, michael.roth

We have a perfectly usable folio, use it to retrieve the pfn and order.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 virt/kvm/guest_memfd.c | 27 +++++++++++++++++----------
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 1ea632dbae57..5221b584288f 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -13,6 +13,18 @@ struct kvm_gmem {
 	struct list_head entry;
 };
 
+/**
+ * folio_file_pfn - like folio_file_page, but return a pfn.
+ * @folio: The folio which contains this index.
+ * @index: The index we want to look up.
+ *
+ * Return: The pfn for this index.
+ */
+static inline kvm_pfn_t folio_file_pfn(struct folio *folio, pgoff_t index)
+{
+	return folio_pfn(folio) + (index & (folio_nr_pages(folio) - 1));
+}
+
 static int kvm_gmem_prepare_folio(struct inode *inode, pgoff_t index, struct folio *folio)
 {
 #ifdef CONFIG_HAVE_KVM_GMEM_PREPARE
@@ -22,7 +34,6 @@ static int kvm_gmem_prepare_folio(struct inode *inode, pgoff_t index, struct fol
 	list_for_each_entry(gmem, gmem_list, entry) {
 		struct kvm_memory_slot *slot;
 		struct kvm *kvm = gmem->kvm;
-		struct page *page;
 		kvm_pfn_t pfn;
 		gfn_t gfn;
 		int rc;
@@ -34,13 +45,12 @@ static int kvm_gmem_prepare_folio(struct inode *inode, pgoff_t index, struct fol
 		if (!slot)
 			continue;
 
-		page = folio_file_page(folio, index);
-		pfn = page_to_pfn(page);
+		pfn = folio_file_pfn(folio, index);
 		gfn = slot->base_gfn + index - slot->gmem.pgoff;
-		rc = kvm_arch_gmem_prepare(kvm, gfn, pfn, compound_order(compound_head(page)));
+		rc = kvm_arch_gmem_prepare(kvm, gfn, pfn, folio_order(folio));
 		if (rc) {
-			pr_warn_ratelimited("gmem: Failed to prepare folio for index %lx GFN %llx PFN %llx error %d.\n",
-					    index, gfn, pfn, rc);
+			pr_warn_ratelimited("gmem: Failed to prepare folio for GFN %llx PFN %llx error %d.\n",
+					    gfn, pfn, rc);
 			return rc;
 		}
 	}
@@ -548,7 +558,6 @@ __kvm_gmem_get_pfn(struct file *file, struct kvm_memory_slot *slot,
 	pgoff_t index = gfn - slot->base_gfn + slot->gmem.pgoff;
 	struct kvm_gmem *gmem = file->private_data;
 	struct folio *folio;
-	struct page *page;
 
 	if (file != slot->gmem.file) {
 		WARN_ON_ONCE(slot->gmem.file);
@@ -571,9 +580,7 @@ __kvm_gmem_get_pfn(struct file *file, struct kvm_memory_slot *slot,
 		return ERR_PTR(-EHWPOISON);
 	}
 
-	page = folio_file_page(folio, index);
-
-	*pfn = page_to_pfn(page);
+	*pfn = folio_file_pfn(folio, index);
 	if (max_order)
 		*max_order = 0;
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 04/12] KVM: rename CONFIG_HAVE_KVM_GMEM_* to CONFIG_HAVE_KVM_ARCH_GMEM_*
  2024-07-11 22:27 [PATCH 00/12] KVM: guest_memfd: lazy preparation of pages + prefault support for SEV-SNP Paolo Bonzini
                   ` (2 preceding siblings ...)
  2024-07-11 22:27 ` [PATCH 03/12] KVM: guest_memfd: do not go through struct page Paolo Bonzini
@ 2024-07-11 22:27 ` Paolo Bonzini
  2024-07-15 22:40   ` Michael Roth
  2024-07-11 22:27 ` [PATCH 05/12] KVM: guest_memfd: return locked folio from __kvm_gmem_get_pfn Paolo Bonzini
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2024-07-11 22:27 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc, michael.roth

Add "ARCH" to the symbols; shortly, the "prepare" phase will include both
the arch-independent step to clear out contents left in the page by the
host, and the arch-dependent step enabled by CONFIG_HAVE_KVM_GMEM_PREPARE.
For consistency do the same for CONFIG_HAVE_KVM_GMEM_INVALIDATE as well.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/Kconfig     | 4 ++--
 arch/x86/kvm/x86.c       | 4 ++--
 include/linux/kvm_host.h | 4 ++--
 virt/kvm/Kconfig         | 4 ++--
 virt/kvm/guest_memfd.c   | 6 +++---
 5 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 4287a8071a3a..472a1537b7a9 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -141,8 +141,8 @@ config KVM_AMD_SEV
 	depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
 	select ARCH_HAS_CC_PLATFORM
 	select KVM_GENERIC_PRIVATE_MEM
-	select HAVE_KVM_GMEM_PREPARE
-	select HAVE_KVM_GMEM_INVALIDATE
+	select HAVE_KVM_ARCH_GMEM_PREPARE
+	select HAVE_KVM_ARCH_GMEM_INVALIDATE
 	help
 	  Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
 	  with Encrypted State (SEV-ES) on AMD processors.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a6968eadd418..a1c85591f92c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13603,7 +13603,7 @@ bool kvm_arch_no_poll(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_arch_no_poll);
 
-#ifdef CONFIG_HAVE_KVM_GMEM_PREPARE
+#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE
 bool kvm_arch_gmem_prepare_needed(struct kvm *kvm)
 {
 	return kvm->arch.vm_type == KVM_X86_SNP_VM;
@@ -13615,7 +13615,7 @@ int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_ord
 }
 #endif
 
-#ifdef CONFIG_HAVE_KVM_GMEM_INVALIDATE
+#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
 void kvm_arch_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
 {
 	static_call_cond(kvm_x86_gmem_invalidate)(start, end);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c3c922bf077f..eb8404e9aa03 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2441,7 +2441,7 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
 }
 #endif /* CONFIG_KVM_PRIVATE_MEM */
 
-#ifdef CONFIG_HAVE_KVM_GMEM_PREPARE
+#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE
 int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order);
 bool kvm_arch_gmem_prepare_needed(struct kvm *kvm);
 #endif
@@ -2473,7 +2473,7 @@ typedef int (*kvm_gmem_populate_cb)(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
 long kvm_gmem_populate(struct kvm *kvm, gfn_t gfn, void __user *src, long npages,
 		       kvm_gmem_populate_cb post_populate, void *opaque);
 
-#ifdef CONFIG_HAVE_KVM_GMEM_INVALIDATE
+#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
 void kvm_arch_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
 #endif
 
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index b14e14cdbfb9..fd6a3010afa8 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -113,10 +113,10 @@ config KVM_GENERIC_PRIVATE_MEM
        select KVM_PRIVATE_MEM
        bool
 
-config HAVE_KVM_GMEM_PREPARE
+config HAVE_KVM_ARCH_GMEM_PREPARE
        bool
        depends on KVM_PRIVATE_MEM
 
-config HAVE_KVM_GMEM_INVALIDATE
+config HAVE_KVM_ARCH_GMEM_INVALIDATE
        bool
        depends on KVM_PRIVATE_MEM
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 5221b584288f..76139332f2f3 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -27,7 +27,7 @@ static inline kvm_pfn_t folio_file_pfn(struct folio *folio, pgoff_t index)
 
 static int kvm_gmem_prepare_folio(struct inode *inode, pgoff_t index, struct folio *folio)
 {
-#ifdef CONFIG_HAVE_KVM_GMEM_PREPARE
+#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE
 	struct list_head *gmem_list = &inode->i_mapping->i_private_list;
 	struct kvm_gmem *gmem;
 
@@ -353,7 +353,7 @@ static int kvm_gmem_error_folio(struct address_space *mapping, struct folio *fol
 	return MF_DELAYED;
 }
 
-#ifdef CONFIG_HAVE_KVM_GMEM_INVALIDATE
+#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
 static void kvm_gmem_free_folio(struct folio *folio)
 {
 	struct page *page = folio_page(folio, 0);
@@ -368,7 +368,7 @@ static const struct address_space_operations kvm_gmem_aops = {
 	.dirty_folio = noop_dirty_folio,
 	.migrate_folio	= kvm_gmem_migrate_folio,
 	.error_remove_folio = kvm_gmem_error_folio,
-#ifdef CONFIG_HAVE_KVM_GMEM_INVALIDATE
+#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
 	.free_folio = kvm_gmem_free_folio,
 #endif
 };
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 05/12] KVM: guest_memfd: return locked folio from __kvm_gmem_get_pfn
  2024-07-11 22:27 [PATCH 00/12] KVM: guest_memfd: lazy preparation of pages + prefault support for SEV-SNP Paolo Bonzini
                   ` (3 preceding siblings ...)
  2024-07-11 22:27 ` [PATCH 04/12] KVM: rename CONFIG_HAVE_KVM_GMEM_* to CONFIG_HAVE_KVM_ARCH_GMEM_* Paolo Bonzini
@ 2024-07-11 22:27 ` Paolo Bonzini
  2024-07-15 23:55   ` Michael Roth
  2024-07-11 22:27 ` [PATCH 06/12] KVM: guest_memfd: delay kvm_gmem_prepare_folio() until the memory is passed to the guest Paolo Bonzini
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2024-07-11 22:27 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc, michael.roth

Allow testing the up-to-date flag in the caller without taking the
lock again.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 virt/kvm/guest_memfd.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 76139332f2f3..9271aba9b7b3 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -59,6 +59,7 @@ static int kvm_gmem_prepare_folio(struct inode *inode, pgoff_t index, struct fol
 	return 0;
 }
 
+/* Returns a locked folio on success.  */
 static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index, bool prepare)
 {
 	struct folio *folio;
@@ -551,6 +552,7 @@ void kvm_gmem_unbind(struct kvm_memory_slot *slot)
 	fput(file);
 }
 
+/* Returns a locked folio on success.  */
 static struct folio *
 __kvm_gmem_get_pfn(struct file *file, struct kvm_memory_slot *slot,
 		   gfn_t gfn, kvm_pfn_t *pfn, int *max_order, bool prepare)
@@ -584,7 +586,6 @@ __kvm_gmem_get_pfn(struct file *file, struct kvm_memory_slot *slot,
 	if (max_order)
 		*max_order = 0;
 
-	folio_unlock(folio);
 	return folio;
 }
 
@@ -602,6 +603,7 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 	if (IS_ERR(folio))
 		return PTR_ERR(folio);
 
+	folio_unlock(folio);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn);
@@ -647,6 +649,7 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long
 			break;
 		}
 
+		folio_unlock(folio);
 		if (!IS_ALIGNED(gfn, (1 << max_order)) ||
 		    (npages - i) < (1 << max_order))
 			max_order = 0;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 06/12] KVM: guest_memfd: delay kvm_gmem_prepare_folio() until the memory is passed to the guest
  2024-07-11 22:27 [PATCH 00/12] KVM: guest_memfd: lazy preparation of pages + prefault support for SEV-SNP Paolo Bonzini
                   ` (4 preceding siblings ...)
  2024-07-11 22:27 ` [PATCH 05/12] KVM: guest_memfd: return locked folio from __kvm_gmem_get_pfn Paolo Bonzini
@ 2024-07-11 22:27 ` Paolo Bonzini
  2024-07-17 21:28   ` Michael Roth
  2024-07-11 22:27 ` [PATCH 07/12] KVM: guest_memfd: make kvm_gmem_prepare_folio() operate on a single struct kvm Paolo Bonzini
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2024-07-11 22:27 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc, michael.roth

Initializing the contents of the folio on fallocate() is unnecessarily
restrictive.  It means that the page is registered with the firmware and
then it cannot be touched anymore.  In particular, this loses the
possibility of using fallocate() to pre-allocate the page for SEV-SNP
guests, because kvm_arch_gmem_prepare() then fails.

It's only when the guest actually accesses the page (and therefore
kvm_gmem_get_pfn() is called) that the page must be cleared from any
stale host data and registered with the firmware.  The up-to-date flag
is clear if this has to be done (i.e. it is the first access and
kvm_gmem_populate() has not been called).

All in all, there are enough differences between kvm_gmem_get_pfn() and
kvm_gmem_populate(), that it's better to separate the two flows completely.
Extract the bulk of kvm_gmem_get_folio(), which take a folio and end up
setting its up-to-date flag, to a new function kvm_gmem_prepare_folio();
these are now done only by the non-__-prefixed kvm_gmem_get_pfn().
As a bonus, __kvm_gmem_get_pfn() loses its ugly "bool prepare" argument.

The previous semantics, where fallocate() could be used to prepare
the pages in advance of running the guest, can be accessed with
KVM_PRE_FAULT_MEMORY.

For now, accessing a page in one VM will attempt to call
kvm_arch_gmem_prepare() in all of those that have bound the guest_memfd.
Cleaning this up is left to a separate patch.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 virt/kvm/guest_memfd.c | 107 ++++++++++++++++++++++++-----------------
 1 file changed, 63 insertions(+), 44 deletions(-)

diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 9271aba9b7b3..f637327ad8e1 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -25,7 +25,7 @@ static inline kvm_pfn_t folio_file_pfn(struct folio *folio, pgoff_t index)
 	return folio_pfn(folio) + (index & (folio_nr_pages(folio) - 1));
 }
 
-static int kvm_gmem_prepare_folio(struct inode *inode, pgoff_t index, struct folio *folio)
+static int __kvm_gmem_prepare_folio(struct inode *inode, pgoff_t index, struct folio *folio)
 {
 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE
 	struct list_head *gmem_list = &inode->i_mapping->i_private_list;
@@ -59,49 +59,60 @@ static int kvm_gmem_prepare_folio(struct inode *inode, pgoff_t index, struct fol
 	return 0;
 }
 
-/* Returns a locked folio on success.  */
-static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index, bool prepare)
+/* Process @folio, which contains @gfn (which in turn is contained in @slot),
+ * so that the guest can use it.  On successful return the guest sees a zero
+ * page so as to avoid leaking host data and the up-to-date flag is set.
+ */
+static int kvm_gmem_prepare_folio(struct file *file, struct kvm_memory_slot *slot,
+				  gfn_t gfn, struct folio *folio)
 {
-	struct folio *folio;
+	unsigned long nr_pages, i;
+	pgoff_t index;
+	int r;
 
-	/* TODO: Support huge pages. */
-	folio = filemap_grab_folio(inode->i_mapping, index);
-	if (IS_ERR(folio))
-		return folio;
+	if (folio_test_uptodate(folio))
+		return 0;
+
+	nr_pages = folio_nr_pages(folio);
+	for (i = 0; i < nr_pages; i++)
+		clear_highpage(folio_page(folio, i));
 
 	/*
-	 * Use the up-to-date flag to track whether or not the memory has been
-	 * zeroed before being handed off to the guest.  There is no backing
-	 * storage for the memory, so the folio will remain up-to-date until
-	 * it's removed.
+	 * Right now the folio order is always going to be zero,
+	 * but the code is ready for huge folios, the only
+	 * assumption being that the base pgoff of memslots is
+	 * naturally aligned according to the folio's order.
+	 * The desired order will be passed when creating the
+	 * guest_memfd and checked when creating memslots.
 	 *
-	 * TODO: Skip clearing pages when trusted firmware will do it when
-	 * assigning memory to the guest.
+	 * By making the base pgoff of the memslot naturally aligned
+	 * to the folio's order, huge folios are safe to prepare, no
+	 * matter what page size is used to map memory into the guest.
 	 */
-	if (!folio_test_uptodate(folio)) {
-		unsigned long nr_pages = folio_nr_pages(folio);
-		unsigned long i;
-
-		for (i = 0; i < nr_pages; i++)
-			clear_highpage(folio_page(folio, i));
-	}
-
-	if (prepare) {
-		int r =	kvm_gmem_prepare_folio(inode, index, folio);
-		if (r < 0) {
-			folio_unlock(folio);
-			folio_put(folio);
-			return ERR_PTR(r);
-		}
+	WARN_ON(!IS_ALIGNED(slot->gmem.pgoff, 1 << folio_order(folio)));
+	index = gfn - slot->base_gfn + slot->gmem.pgoff;
+	index = ALIGN_DOWN(index, 1 << folio_order(folio));
 
+	r = __kvm_gmem_prepare_folio(file_inode(file), index, folio);
+	if (!r)
 		folio_mark_uptodate(folio);
-	}
 
-	/*
-	 * Ignore accessed, referenced, and dirty flags.  The memory is
-	 * unevictable and there is no storage to write back to.
-	 */
-	return folio;
+	return r;
+}
+
+/*
+ * Returns a locked folio on success.  The caller is responsible for
+ * setting the up-to-date flag before the memory is mapped into the guest.
+ * There is no backing storage for the memory, so the folio will remain
+ * up-to-date until it's removed.
+ *
+ * Ignore accessed, referenced, and dirty flags.  The memory is
+ * unevictable and there is no storage to write back to.
+ */
+static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index)
+{
+	/* TODO: Support huge pages. */
+	return filemap_grab_folio(inode->i_mapping, index);
 }
 
 static void kvm_gmem_invalidate_begin(struct kvm_gmem *gmem, pgoff_t start,
@@ -201,7 +212,7 @@ static long kvm_gmem_allocate(struct inode *inode, loff_t offset, loff_t len)
 			break;
 		}
 
-		folio = kvm_gmem_get_folio(inode, index, true);
+		folio = kvm_gmem_get_folio(inode, index);
 		if (IS_ERR(folio)) {
 			r = PTR_ERR(folio);
 			break;
@@ -555,7 +566,7 @@ void kvm_gmem_unbind(struct kvm_memory_slot *slot)
 /* Returns a locked folio on success.  */
 static struct folio *
 __kvm_gmem_get_pfn(struct file *file, struct kvm_memory_slot *slot,
-		   gfn_t gfn, kvm_pfn_t *pfn, int *max_order, bool prepare)
+		   gfn_t gfn, kvm_pfn_t *pfn, int *max_order)
 {
 	pgoff_t index = gfn - slot->base_gfn + slot->gmem.pgoff;
 	struct kvm_gmem *gmem = file->private_data;
@@ -572,7 +583,7 @@ __kvm_gmem_get_pfn(struct file *file, struct kvm_memory_slot *slot,
 		return ERR_PTR(-EIO);
 	}
 
-	folio = kvm_gmem_get_folio(file_inode(file), index, prepare);
+	folio = kvm_gmem_get_folio(file_inode(file), index);
 	if (IS_ERR(folio))
 		return folio;
 
@@ -594,17 +605,25 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 {
 	struct file *file = kvm_gmem_get_file(slot);
 	struct folio *folio;
+	int r = 0;
 
 	if (!file)
 		return -EFAULT;
 
-	folio = __kvm_gmem_get_pfn(file, slot, gfn, pfn, max_order, true);
-	fput(file);
-	if (IS_ERR(folio))
-		return PTR_ERR(folio);
+	folio = __kvm_gmem_get_pfn(file, slot, gfn, pfn, max_order);
+	if (IS_ERR(folio)) {
+		r = PTR_ERR(folio);
+		goto out;
+	}
 
+	r = kvm_gmem_prepare_folio(file, slot, gfn, folio);
 	folio_unlock(folio);
-	return 0;
+	if (r < 0)
+		folio_put(folio);
+
+out:
+	fput(file);
+	return r;
 }
 EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn);
 
@@ -643,7 +662,7 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long
 			break;
 		}
 
-		folio = __kvm_gmem_get_pfn(file, slot, gfn, &pfn, &max_order, false);
+		folio = __kvm_gmem_get_pfn(file, slot, gfn, &pfn, &max_order);
 		if (IS_ERR(folio)) {
 			ret = PTR_ERR(folio);
 			break;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 07/12] KVM: guest_memfd: make kvm_gmem_prepare_folio() operate on a single struct kvm
  2024-07-11 22:27 [PATCH 00/12] KVM: guest_memfd: lazy preparation of pages + prefault support for SEV-SNP Paolo Bonzini
                   ` (5 preceding siblings ...)
  2024-07-11 22:27 ` [PATCH 06/12] KVM: guest_memfd: delay kvm_gmem_prepare_folio() until the memory is passed to the guest Paolo Bonzini
@ 2024-07-11 22:27 ` Paolo Bonzini
  2024-07-17 21:34   ` Michael Roth
  2024-07-11 22:27 ` [PATCH 08/12] KVM: remove kvm_arch_gmem_prepare_needed() Paolo Bonzini
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2024-07-11 22:27 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc, michael.roth

This is now possible because preparation is done by kvm_gmem_get_pfn()
instead of fallocate().  In practice this is not a limitation, because
even though guest_memfd can be bound to multiple struct kvm, for
hardware implementations of confidential computing only one guest
(identified by an ASID on SEV-SNP, or an HKID on TDX) will be able
to access it.

In the case of intra-host migration (not implemented yet for SEV-SNP,
but we can use SEV-ES as an idea of how it will work), the new struct
kvm inherits the same ASID and preparation need not be repeated.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 virt/kvm/guest_memfd.c | 47 ++++++++++++++++--------------------------
 1 file changed, 18 insertions(+), 29 deletions(-)

diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index f637327ad8e1..f4d82719ec19 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -25,37 +25,27 @@ static inline kvm_pfn_t folio_file_pfn(struct folio *folio, pgoff_t index)
 	return folio_pfn(folio) + (index & (folio_nr_pages(folio) - 1));
 }
 
-static int __kvm_gmem_prepare_folio(struct inode *inode, pgoff_t index, struct folio *folio)
+static int __kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slot,
+				    pgoff_t index, struct folio *folio)
 {
 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE
-	struct list_head *gmem_list = &inode->i_mapping->i_private_list;
-	struct kvm_gmem *gmem;
+	kvm_pfn_t pfn;
+	gfn_t gfn;
+	int rc;
 
-	list_for_each_entry(gmem, gmem_list, entry) {
-		struct kvm_memory_slot *slot;
-		struct kvm *kvm = gmem->kvm;
-		kvm_pfn_t pfn;
-		gfn_t gfn;
-		int rc;
+	if (!kvm_arch_gmem_prepare_needed(kvm))
+		return 0;
 
-		if (!kvm_arch_gmem_prepare_needed(kvm))
-			continue;
-
-		slot = xa_load(&gmem->bindings, index);
-		if (!slot)
-			continue;
-
-		pfn = folio_file_pfn(folio, index);
-		gfn = slot->base_gfn + index - slot->gmem.pgoff;
-		rc = kvm_arch_gmem_prepare(kvm, gfn, pfn, folio_order(folio));
-		if (rc) {
-			pr_warn_ratelimited("gmem: Failed to prepare folio for GFN %llx PFN %llx error %d.\n",
-					    gfn, pfn, rc);
-			return rc;
-		}
+	pfn = folio_file_pfn(folio, index);
+	gfn = slot->base_gfn + index - slot->gmem.pgoff;
+	rc = kvm_arch_gmem_prepare(kvm, gfn, pfn, folio_order(folio));
+	if (rc) {
+		pr_warn_ratelimited("gmem: Failed to prepare folio for index %lx GFN %llx PFN %llx error %d.\n",
+				    index, gfn, pfn, rc);
+		return rc;
 	}
-
 #endif
+
 	return 0;
 }
 
@@ -63,7 +53,7 @@ static int __kvm_gmem_prepare_folio(struct inode *inode, pgoff_t index, struct f
  * so that the guest can use it.  On successful return the guest sees a zero
  * page so as to avoid leaking host data and the up-to-date flag is set.
  */
-static int kvm_gmem_prepare_folio(struct file *file, struct kvm_memory_slot *slot,
+static int kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slot,
 				  gfn_t gfn, struct folio *folio)
 {
 	unsigned long nr_pages, i;
@@ -92,8 +82,7 @@ static int kvm_gmem_prepare_folio(struct file *file, struct kvm_memory_slot *slo
 	WARN_ON(!IS_ALIGNED(slot->gmem.pgoff, 1 << folio_order(folio)));
 	index = gfn - slot->base_gfn + slot->gmem.pgoff;
 	index = ALIGN_DOWN(index, 1 << folio_order(folio));
-
-	r = __kvm_gmem_prepare_folio(file_inode(file), index, folio);
+	r = __kvm_gmem_prepare_folio(kvm, slot, index, folio);
 	if (!r)
 		folio_mark_uptodate(folio);
 
@@ -616,7 +605,7 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 		goto out;
 	}
 
-	r = kvm_gmem_prepare_folio(file, slot, gfn, folio);
+	r = kvm_gmem_prepare_folio(kvm, slot, gfn, folio);
 	folio_unlock(folio);
 	if (r < 0)
 		folio_put(folio);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 08/12] KVM: remove kvm_arch_gmem_prepare_needed()
  2024-07-11 22:27 [PATCH 00/12] KVM: guest_memfd: lazy preparation of pages + prefault support for SEV-SNP Paolo Bonzini
                   ` (6 preceding siblings ...)
  2024-07-11 22:27 ` [PATCH 07/12] KVM: guest_memfd: make kvm_gmem_prepare_folio() operate on a single struct kvm Paolo Bonzini
@ 2024-07-11 22:27 ` Paolo Bonzini
  2024-07-17 21:42   ` Michael Roth
  2024-07-11 22:27 ` [PATCH 09/12] KVM: guest_memfd: move check for already-populated page to common code Paolo Bonzini
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2024-07-11 22:27 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc, michael.roth

It is enough to return 0 if a guest need not do any preparation.
This is in fact how sev_gmem_prepare() works for non-SNP guests,
and it extends naturally to Intel hosts: the x86 callback for
gmem_prepare is optional and returns 0 if not defined.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/x86.c       |  5 -----
 include/linux/kvm_host.h |  1 -
 virt/kvm/guest_memfd.c   | 13 +++----------
 3 files changed, 3 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a1c85591f92c..4f58423c6148 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13604,11 +13604,6 @@ bool kvm_arch_no_poll(struct kvm_vcpu *vcpu)
 EXPORT_SYMBOL_GPL(kvm_arch_no_poll);
 
 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE
-bool kvm_arch_gmem_prepare_needed(struct kvm *kvm)
-{
-	return kvm->arch.vm_type == KVM_X86_SNP_VM;
-}
-
 int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order)
 {
 	return static_call(kvm_x86_gmem_prepare)(kvm, pfn, gfn, max_order);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index eb8404e9aa03..f6e11991442d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2443,7 +2443,6 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
 
 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE
 int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order);
-bool kvm_arch_gmem_prepare_needed(struct kvm *kvm);
 #endif
 
 /**
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index f4d82719ec19..509360eefea5 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -29,16 +29,9 @@ static int __kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slo
 				    pgoff_t index, struct folio *folio)
 {
 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE
-	kvm_pfn_t pfn;
-	gfn_t gfn;
-	int rc;
-
-	if (!kvm_arch_gmem_prepare_needed(kvm))
-		return 0;
-
-	pfn = folio_file_pfn(folio, index);
-	gfn = slot->base_gfn + index - slot->gmem.pgoff;
-	rc = kvm_arch_gmem_prepare(kvm, gfn, pfn, folio_order(folio));
+	kvm_pfn_t pfn = folio_file_pfn(folio, index);
+	gfn_t gfn = slot->base_gfn + index - slot->gmem.pgoff;
+	int rc = kvm_arch_gmem_prepare(kvm, gfn, pfn, folio_order(folio));
 	if (rc) {
 		pr_warn_ratelimited("gmem: Failed to prepare folio for index %lx GFN %llx PFN %llx error %d.\n",
 				    index, gfn, pfn, rc);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 09/12] KVM: guest_memfd: move check for already-populated page to common code
  2024-07-11 22:27 [PATCH 00/12] KVM: guest_memfd: lazy preparation of pages + prefault support for SEV-SNP Paolo Bonzini
                   ` (7 preceding siblings ...)
  2024-07-11 22:27 ` [PATCH 08/12] KVM: remove kvm_arch_gmem_prepare_needed() Paolo Bonzini
@ 2024-07-11 22:27 ` Paolo Bonzini
  2024-07-13  1:28   ` Edgecombe, Rick P
  2024-07-17 21:53   ` Michael Roth
  2024-07-11 22:27 ` [PATCH 10/12] KVM: cleanup and add shortcuts to kvm_range_has_memory_attributes() Paolo Bonzini
                   ` (2 subsequent siblings)
  11 siblings, 2 replies; 34+ messages in thread
From: Paolo Bonzini @ 2024-07-11 22:27 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc, michael.roth

Do not allow populating the same page twice with startup data.  In the
case of SEV-SNP, for example, the firmware does not allow it anyway,
since the launch-update operation is only possible on pages that are
still shared in the RMP.

Even if it worked, kvm_gmem_populate()'s callback is meant to have side
effects such as updating launch measurements, and updating the same
page twice is unlikely to have the desired results.

Races between calls to the ioctl are not possible because kvm_gmem_populate()
holds slots_lock and the VM should not be running.  But again, even if
this worked on other confidential computing technology, it doesn't matter
to guest_memfd.c whether this is an intentional attempt to do something
fishy, or missing synchronization in userspace, or even something
intentional.  One of the racers wins, and the page is initialized by
either kvm_gmem_prepare_folio() or kvm_gmem_populate().

Anyway, out of paranoia, adjust sev_gmem_post_populate() anyway to use
the same errno that kvm_gmem_populate() is using.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/svm/sev.c | 2 +-
 virt/kvm/guest_memfd.c | 7 +++++++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index df8818759698..397ef9e70182 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2213,7 +2213,7 @@ static int sev_gmem_post_populate(struct kvm *kvm, gfn_t gfn_start, kvm_pfn_t pf
 		if (ret || assigned) {
 			pr_debug("%s: Failed to ensure GFN 0x%llx RMP entry is initial shared state, ret: %d assigned: %d\n",
 				 __func__, gfn, ret, assigned);
-			ret = -EINVAL;
+			ret = ret ? -EINVAL : -EEXIST;
 			goto err;
 		}
 
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 509360eefea5..266810bb91c9 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -650,6 +650,13 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long
 			break;
 		}
 
+		if (folio_test_uptodate(folio)) {
+			folio_unlock(folio);
+			folio_put(folio);
+			ret = -EEXIST;
+			break;
+		}
+
 		folio_unlock(folio);
 		if (!IS_ALIGNED(gfn, (1 << max_order)) ||
 		    (npages - i) < (1 << max_order))
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 10/12] KVM: cleanup and add shortcuts to kvm_range_has_memory_attributes()
  2024-07-11 22:27 [PATCH 00/12] KVM: guest_memfd: lazy preparation of pages + prefault support for SEV-SNP Paolo Bonzini
                   ` (8 preceding siblings ...)
  2024-07-11 22:27 ` [PATCH 09/12] KVM: guest_memfd: move check for already-populated page to common code Paolo Bonzini
@ 2024-07-11 22:27 ` Paolo Bonzini
  2024-07-17 22:23   ` Michael Roth
  2024-07-11 22:27 ` [PATCH 11/12] KVM: extend kvm_range_has_memory_attributes() to check subset of attributes Paolo Bonzini
  2024-07-11 22:27 ` [PATCH 12/12] KVM: guest_memfd: let kvm_gmem_populate() operate only on private gfns Paolo Bonzini
  11 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2024-07-11 22:27 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc, michael.roth

Use a guard to simplify early returns, and add two more easy
shortcuts.  If the requested attributes are invalid, the attributes
xarray will never show them as set.  And if testing a single page,
kvm_get_memory_attributes() is more efficient.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 virt/kvm/kvm_main.c | 42 ++++++++++++++++++++----------------------
 1 file changed, 20 insertions(+), 22 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f817ec66c85f..8ab9d8ff7b74 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2392,6 +2392,14 @@ static int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm,
 #endif /* CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */
 
 #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
+static u64 kvm_supported_mem_attributes(struct kvm *kvm)
+{
+	if (!kvm || kvm_arch_has_private_mem(kvm))
+		return KVM_MEMORY_ATTRIBUTE_PRIVATE;
+
+	return 0;
+}
+
 /*
  * Returns true if _all_ gfns in the range [@start, @end) have attributes
  * matching @attrs.
@@ -2400,40 +2408,30 @@ bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
 				     unsigned long attrs)
 {
 	XA_STATE(xas, &kvm->mem_attr_array, start);
+	unsigned long mask = kvm_supported_mem_attributes(kvm);;
 	unsigned long index;
-	bool has_attrs;
 	void *entry;
 
-	rcu_read_lock();
+	if (attrs & ~mask)
+		return false;
 
-	if (!attrs) {
-		has_attrs = !xas_find(&xas, end - 1);
-		goto out;
-	}
+	if (end == start + 1)
+		return kvm_get_memory_attributes(kvm, start) == attrs;
+
+	guard(rcu)();
+	if (!attrs)
+		return !xas_find(&xas, end - 1);
 
-	has_attrs = true;
 	for (index = start; index < end; index++) {
 		do {
 			entry = xas_next(&xas);
 		} while (xas_retry(&xas, entry));
 
-		if (xas.xa_index != index || xa_to_value(entry) != attrs) {
-			has_attrs = false;
-			break;
-		}
+		if (xas.xa_index != index || xa_to_value(entry) != attrs)
+			return false;
 	}
 
-out:
-	rcu_read_unlock();
-	return has_attrs;
-}
-
-static u64 kvm_supported_mem_attributes(struct kvm *kvm)
-{
-	if (!kvm || kvm_arch_has_private_mem(kvm))
-		return KVM_MEMORY_ATTRIBUTE_PRIVATE;
-
-	return 0;
+	return true;
 }
 
 static __always_inline void kvm_handle_gfn_range(struct kvm *kvm,
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 11/12] KVM: extend kvm_range_has_memory_attributes() to check subset of attributes
  2024-07-11 22:27 [PATCH 00/12] KVM: guest_memfd: lazy preparation of pages + prefault support for SEV-SNP Paolo Bonzini
                   ` (9 preceding siblings ...)
  2024-07-11 22:27 ` [PATCH 10/12] KVM: cleanup and add shortcuts to kvm_range_has_memory_attributes() Paolo Bonzini
@ 2024-07-11 22:27 ` Paolo Bonzini
  2024-07-17 22:32   ` Michael Roth
  2024-07-11 22:27 ` [PATCH 12/12] KVM: guest_memfd: let kvm_gmem_populate() operate only on private gfns Paolo Bonzini
  11 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2024-07-11 22:27 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc, michael.roth

While currently there is no other attribute than KVM_MEMORY_ATTRIBUTE_PRIVATE,
KVM code such as kvm_mem_is_private() is written to expect their existence.
Allow using kvm_range_has_memory_attributes() as a multi-page version of
kvm_mem_is_private(), without it breaking later when more attributes are
introduced.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c   |  2 +-
 include/linux/kvm_host.h |  2 +-
 virt/kvm/kvm_main.c      | 13 +++++++------
 3 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4e0e9963066f..f6b7391fe438 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -7524,7 +7524,7 @@ static bool hugepage_has_attrs(struct kvm *kvm, struct kvm_memory_slot *slot,
 	const unsigned long end = start + KVM_PAGES_PER_HPAGE(level);
 
 	if (level == PG_LEVEL_2M)
-		return kvm_range_has_memory_attributes(kvm, start, end, attrs);
+		return kvm_range_has_memory_attributes(kvm, start, end, ~0, attrs);
 
 	for (gfn = start; gfn < end; gfn += KVM_PAGES_PER_HPAGE(level - 1)) {
 		if (hugepage_test_mixed(slot, gfn, level - 1) ||
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f6e11991442d..456dbdf7aaaf 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2410,7 +2410,7 @@ static inline unsigned long kvm_get_memory_attributes(struct kvm *kvm, gfn_t gfn
 }
 
 bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
-				     unsigned long attrs);
+				     unsigned long mask, unsigned long attrs);
 bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
 					struct kvm_gfn_range *range);
 bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8ab9d8ff7b74..62e5d9b0ae83 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2402,21 +2402,21 @@ static u64 kvm_supported_mem_attributes(struct kvm *kvm)
 
 /*
  * Returns true if _all_ gfns in the range [@start, @end) have attributes
- * matching @attrs.
+ * such that the bits in @mask match @attrs.
  */
 bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
-				     unsigned long attrs)
+				     unsigned long mask, unsigned long attrs)
 {
 	XA_STATE(xas, &kvm->mem_attr_array, start);
-	unsigned long mask = kvm_supported_mem_attributes(kvm);;
 	unsigned long index;
 	void *entry;
 
+	mask &= kvm_supported_mem_attributes(kvm);
 	if (attrs & ~mask)
 		return false;
 
 	if (end == start + 1)
-		return kvm_get_memory_attributes(kvm, start) == attrs;
+		return (kvm_get_memory_attributes(kvm, start) & mask) == attrs;
 
 	guard(rcu)();
 	if (!attrs)
@@ -2427,7 +2427,8 @@ bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
 			entry = xas_next(&xas);
 		} while (xas_retry(&xas, entry));
 
-		if (xas.xa_index != index || xa_to_value(entry) != attrs)
+		if (xas.xa_index != index ||
+		    (xa_to_value(entry) & mask) != attrs)
 			return false;
 	}
 
@@ -2526,7 +2527,7 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
 	mutex_lock(&kvm->slots_lock);
 
 	/* Nothing to do if the entire range as the desired attributes. */
-	if (kvm_range_has_memory_attributes(kvm, start, end, attributes))
+	if (kvm_range_has_memory_attributes(kvm, start, end, ~0, attributes))
 		goto out_unlock;
 
 	/*
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 12/12] KVM: guest_memfd: let kvm_gmem_populate() operate only on private gfns
  2024-07-11 22:27 [PATCH 00/12] KVM: guest_memfd: lazy preparation of pages + prefault support for SEV-SNP Paolo Bonzini
                   ` (10 preceding siblings ...)
  2024-07-11 22:27 ` [PATCH 11/12] KVM: extend kvm_range_has_memory_attributes() to check subset of attributes Paolo Bonzini
@ 2024-07-11 22:27 ` Paolo Bonzini
  2024-07-17 22:49   ` Michael Roth
  11 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2024-07-11 22:27 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc, michael.roth

This check is currently performed by sev_gmem_post_populate(), but it
applies to all callers of kvm_gmem_populate(): the point of the function
is that the memory is being encrypted and some work has to be done
on all the gfns in order to encrypt them.

Therefore, check the KVM_MEMORY_ATTRIBUTE_PRIVATE attribute prior
to invoking the callback, and stop the operation if a shared page
is encountered.  Because CONFIG_KVM_PRIVATE_MEM in principle does
not require attributes, this makes kvm_gmem_populate() depend on
CONFIG_KVM_GENERIC_PRIVATE_MEM (which does require them).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/svm/sev.c   |  7 -------
 include/linux/kvm_host.h |  2 ++
 virt/kvm/guest_memfd.c   | 12 ++++++++++++
 3 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 397ef9e70182..5a93e554cbb6 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2202,13 +2202,6 @@ static int sev_gmem_post_populate(struct kvm *kvm, gfn_t gfn_start, kvm_pfn_t pf
 		bool assigned;
 		int level;
 
-		if (!kvm_mem_is_private(kvm, gfn)) {
-			pr_debug("%s: Failed to ensure GFN 0x%llx has private memory attribute set\n",
-				 __func__, gfn);
-			ret = -EINVAL;
-			goto err;
-		}
-
 		ret = snp_lookup_rmpentry((u64)pfn + i, &assigned, &level);
 		if (ret || assigned) {
 			pr_debug("%s: Failed to ensure GFN 0x%llx RMP entry is initial shared state, ret: %d assigned: %d\n",
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 456dbdf7aaaf..f7ba665652f3 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2445,6 +2445,7 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
 int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order);
 #endif
 
+#ifdef CONFIG_KVM_GENERIC_PRIVATE_MEM
 /**
  * kvm_gmem_populate() - Populate/prepare a GPA range with guest data
  *
@@ -2471,6 +2472,7 @@ typedef int (*kvm_gmem_populate_cb)(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
 
 long kvm_gmem_populate(struct kvm *kvm, gfn_t gfn, void __user *src, long npages,
 		       kvm_gmem_populate_cb post_populate, void *opaque);
+#endif
 
 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
 void kvm_arch_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 266810bb91c9..7e2c9274fd16 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -609,6 +609,7 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 }
 EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn);
 
+#ifdef CONFIG_KVM_GENERIC_PRIVATE_MEM
 long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long npages,
 		       kvm_gmem_populate_cb post_populate, void *opaque)
 {
@@ -662,11 +663,21 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long
 		    (npages - i) < (1 << max_order))
 			max_order = 0;
 
+		ret = -EINVAL;
+		while (!kvm_range_has_memory_attributes(kvm, gfn, gfn + (1 << max_order),
+							KVM_MEMORY_ATTRIBUTE_PRIVATE,
+							KVM_MEMORY_ATTRIBUTE_PRIVATE)) {
+			if (!max_order)
+				goto put_folio_and_exit;
+			max_order--;
+		}
+
 		p = src ? src + i * PAGE_SIZE : NULL;
 		ret = post_populate(kvm, gfn, pfn, p, max_order, opaque);
 		if (!ret)
 			folio_mark_uptodate(folio);
 
+put_folio_and_exit:
 		folio_put(folio);
 		if (ret)
 			break;
@@ -678,3 +689,4 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long
 	return ret && !i ? ret : i;
 }
 EXPORT_SYMBOL_GPL(kvm_gmem_populate);
+#endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 09/12] KVM: guest_memfd: move check for already-populated page to common code
  2024-07-11 22:27 ` [PATCH 09/12] KVM: guest_memfd: move check for already-populated page to common code Paolo Bonzini
@ 2024-07-13  1:28   ` Edgecombe, Rick P
  2024-07-13 10:10     ` Paolo Bonzini
  2024-07-17 21:53   ` Michael Roth
  1 sibling, 1 reply; 34+ messages in thread
From: Edgecombe, Rick P @ 2024-07-13  1:28 UTC (permalink / raw)
  To: kvm@vger.kernel.org, pbonzini@redhat.com,
	linux-kernel@vger.kernel.org
  Cc: seanjc@google.com, michael.roth@amd.com

On Thu, 2024-07-11 at 18:27 -0400, Paolo Bonzini wrote:
> Do not allow populating the same page twice with startup data.  In the
> case of SEV-SNP, for example, the firmware does not allow it anyway,
> since the launch-update operation is only possible on pages that are
> still shared in the RMP.
> 
> Even if it worked, kvm_gmem_populate()'s callback is meant to have side
> effects such as updating launch measurements, and updating the same
> page twice is unlikely to have the desired results.
> 
> Races between calls to the ioctl are not possible because kvm_gmem_populate()
> holds slots_lock and the VM should not be running.  But again, even if
> this worked on other confidential computing technology, it doesn't matter
> to guest_memfd.c whether this is an intentional attempt to do something
> fishy, or missing synchronization in userspace, or even something
> intentional.  One of the racers wins, and the page is initialized by
> either kvm_gmem_prepare_folio() or kvm_gmem_populate().
> 
> Anyway, out of paranoia, adjust sev_gmem_post_populate() anyway to use
> the same errno that kvm_gmem_populate() is using.

This patch breaks our rebased TDX development tree. First
kvm_gmem_prepare_folio() is called during the KVM_PRE_FAULT_MEMORY operation,
then next kvm_gmem_populate() is called during the KVM_TDX_INIT_MEM_REGION ioctl
to actually populate the memory, which hits the new -EEXIST error path.

Given we are not actually populating during KVM_PRE_FAULT_MEMORY and try to
avoid booting a TD until we've done so, maybe setting folio_mark_uptodate() in
kvm_gmem_prepare_folio() is not appropriate in that case? But it may not be easy
to separate.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 09/12] KVM: guest_memfd: move check for already-populated page to common code
  2024-07-13  1:28   ` Edgecombe, Rick P
@ 2024-07-13 10:10     ` Paolo Bonzini
  2024-07-13 20:25       ` Edgecombe, Rick P
  0 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2024-07-13 10:10 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	seanjc@google.com, michael.roth@amd.com

On Sat, Jul 13, 2024 at 3:28 AM Edgecombe, Rick P
<rick.p.edgecombe@intel.com> wrote:
> On Thu, 2024-07-11 at 18:27 -0400, Paolo Bonzini wrote:
> > Do not allow populating the same page twice with startup data.  In the
> > case of SEV-SNP, for example, the firmware does not allow it anyway,
> > since the launch-update operation is only possible on pages that are
> > still shared in the RMP.
> >
> > Even if it worked, kvm_gmem_populate()'s callback is meant to have side
> > effects such as updating launch measurements, and updating the same
> > page twice is unlikely to have the desired results.
> >
> > Races between calls to the ioctl are not possible because kvm_gmem_populate()
> > holds slots_lock and the VM should not be running.  But again, even if
> > this worked on other confidential computing technology, it doesn't matter
> > to guest_memfd.c whether this is an intentional attempt to do something
> > fishy, or missing synchronization in userspace, or even something
> > intentional.  One of the racers wins, and the page is initialized by
> > either kvm_gmem_prepare_folio() or kvm_gmem_populate().
> >
> > Anyway, out of paranoia, adjust sev_gmem_post_populate() anyway to use
> > the same errno that kvm_gmem_populate() is using.
>
> This patch breaks our rebased TDX development tree. First
> kvm_gmem_prepare_folio() is called during the KVM_PRE_FAULT_MEMORY operation,
> then next kvm_gmem_populate() is called during the KVM_TDX_INIT_MEM_REGION ioctl
> to actually populate the memory, which hits the new -EEXIST error path.

It's not a problem to only keep patches 1-8 for 6.11, and move the
rest to 6.12 (except for the bit that returns -EEXIST in sev.c).

Could you push a branch for me to take a look? I've never liked that
you have to do the explicit prefault before the VM setup is finished;
it's a TDX-specific detail that is transpiring into the API.

> Given we are not actually populating during KVM_PRE_FAULT_MEMORY and try to
> avoid booting a TD until we've done so, maybe setting folio_mark_uptodate() in
> kvm_gmem_prepare_folio() is not appropriate in that case? But it may not be easy
> to separate.

It would be easy (just return a boolean value from
kvm_arch_gmem_prepare() to skip folio_mark_uptodate() before the VM is
ready, and implement it for TDX) but it's ugly. You're also clearing
the memory unnecessarily before overwriting it.

Paolo


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 09/12] KVM: guest_memfd: move check for already-populated page to common code
  2024-07-13 10:10     ` Paolo Bonzini
@ 2024-07-13 20:25       ` Edgecombe, Rick P
  2024-07-14  5:32         ` Michael Roth
  0 siblings, 1 reply; 34+ messages in thread
From: Edgecombe, Rick P @ 2024-07-13 20:25 UTC (permalink / raw)
  To: pbonzini@redhat.com
  Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	seanjc@google.com, michael.roth@amd.com

On Sat, 2024-07-13 at 12:10 +0200, Paolo Bonzini wrote:
> > 
> > This patch breaks our rebased TDX development tree. First
> > kvm_gmem_prepare_folio() is called during the KVM_PRE_FAULT_MEMORY
> > operation,
> > then next kvm_gmem_populate() is called during the KVM_TDX_INIT_MEM_REGION
> > ioctl
> > to actually populate the memory, which hits the new -EEXIST error path.
> 
> It's not a problem to only keep patches 1-8 for 6.11, and move the
> rest to 6.12 (except for the bit that returns -EEXIST in sev.c).
> 
> Could you push a branch for me to take a look?

Sure, here it is.

KVM:
https://github.com/rpedgeco/linux/tree/tdx_kvm_dev-2024-07-12-mark_uptodate_issue
Matching QEMU:
https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu-wip-2024.06.19-v9.0.0

It is not fully based on kvm-coco-queue because it has the latest v2 of the
zapping quirk swapped in.

>  I've never liked that
> you have to do the explicit prefault before the VM setup is finished;
> it's a TDX-specific detail that is transpiring into the API.

Well, it's not too late to change direction again. I remember you and Sean were
not fully of one mind on the tradeoffs.

I guess this series is trying to help userspace not mess up the order of things
for SEV, where as TDX's design was to let userspace hold the pieces from the
beginning. As in, needing to match up the KVM_PRE_FAULT_MEMORY and
KVM_TDX_INIT_MEM_REGION calls, mysteriously return errors in later IOCTLs if
something was missed, etc.

Still, I might lean towards staying the course just because we have gone down
this path for a while and we don't currently have any fundamental issues.
Probably we *really* need to get the next TDX MMU stuff posted so we can start
to add a bit more certainty to that statement.

> 
> > Given we are not actually populating during KVM_PRE_FAULT_MEMORY and try to
> > avoid booting a TD until we've done so, maybe setting folio_mark_uptodate()
> > in
> > kvm_gmem_prepare_folio() is not appropriate in that case? But it may not be
> > easy
> > to separate.
> 
> It would be easy (just return a boolean value from
> kvm_arch_gmem_prepare() to skip folio_mark_uptodate() before the VM is
> ready, and implement it for TDX) but it's ugly. You're also clearing
> the memory unnecessarily before overwriting it.

Hmm, right. Since kvm_gmem_populate() does folio_mark_uptodate() again despite
testing for it earlier, we can skip folio_mark_uptodate() in
kvm_gmem_prepare_folio() for TDX during the pre-finalization stage and it will
get marked there.

I put a little POC of this suggestion at the end of the branch. Just revert it
to reproduce the issue.

I think in the context of the work to launch a TD, extra clearing of pages is
not too bad. I'm more bothered by how it highlights the general pitfalls of
TDX's special clever behavior for KVM_PRE_FAULT_MEMORY before TD initialization.

If/when we want to skip it, I wonder if we could move the clearing into the
gmem_prepare callbacks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 09/12] KVM: guest_memfd: move check for already-populated page to common code
  2024-07-13 20:25       ` Edgecombe, Rick P
@ 2024-07-14  5:32         ` Michael Roth
  2024-07-15 16:08           ` Paolo Bonzini
  0 siblings, 1 reply; 34+ messages in thread
From: Michael Roth @ 2024-07-14  5:32 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: pbonzini@redhat.com, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, seanjc@google.com

On Sat, Jul 13, 2024 at 08:25:42PM +0000, Edgecombe, Rick P wrote:
> On Sat, 2024-07-13 at 12:10 +0200, Paolo Bonzini wrote:
> > > 
> > > This patch breaks our rebased TDX development tree. First
> > > kvm_gmem_prepare_folio() is called during the KVM_PRE_FAULT_MEMORY
> > > operation,
> > > then next kvm_gmem_populate() is called during the KVM_TDX_INIT_MEM_REGION
> > > ioctl
> > > to actually populate the memory, which hits the new -EEXIST error path.
> > 
> > It's not a problem to only keep patches 1-8 for 6.11, and move the
> > rest to 6.12 (except for the bit that returns -EEXIST in sev.c).
> > 
> > Could you push a branch for me to take a look?
> 
> Sure, here it is.
> 
> KVM:
> https://github.com/rpedgeco/linux/tree/tdx_kvm_dev-2024-07-12-mark_uptodate_issue
> Matching QEMU:
> https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu-wip-2024.06.19-v9.0.0
> 
> It is not fully based on kvm-coco-queue because it has the latest v2 of the
> zapping quirk swapped in.
> 
> >  I've never liked that
> > you have to do the explicit prefault before the VM setup is finished;
> > it's a TDX-specific detail that is transpiring into the API.
> 
> Well, it's not too late to change direction again. I remember you and Sean were
> not fully of one mind on the tradeoffs.
> 
> I guess this series is trying to help userspace not mess up the order of things
> for SEV, where as TDX's design was to let userspace hold the pieces from the
> beginning. As in, needing to match up the KVM_PRE_FAULT_MEMORY and
> KVM_TDX_INIT_MEM_REGION calls, mysteriously return errors in later IOCTLs if
> something was missed, etc.

If SNP were to try to call KVM_PRE_FAULT_MEMORY before SNP_LAUNCH_UPDATE
(rough equivalent to KVM_TDX_INIT_MEM_REGION), I think the same issue
would arise, and in that case the uptodate flag you prototyped would
wouldn't be enough to address it because SNP_LAUNCH_UPDATE would end up
failing because the gmem_prepare hook previously triggered by
KVM_PRE_FAULT_MEMORY would have put the corresponding RMP entries into
an unexpected state (guest-owned/private).

So for SNP, KVM_PRE_FAULT_MEMORY/SNP_LAUNCH_UPDATE are mutually
exclusive on what GPA ranges they can prep before finalizing launch state.

*After* finalizing launch state however, KVM_PRE_FAULT_MEMORY can be
called for whatever range it likes. If gmem_prepare/gmem_populate was
already called for a GPA, the uptodate flag will be set and KVM only
needs to deal with the mapping.

So I wonder if it would be possible to enforce that KVM_PRE_FAULT_MEMORY
only be used after finalizing the VM in the CoCo case?

I realize that is awkward for TDX, where the KVM_PRE_FAULT_MEMORY is
required to create the sEPT mapping before encrypting, but maybe it
would be possible for TDX to just do that implicitly within
KVM_TDX_INIT_MEM_REGION?

That would free up KVM_PRE_FAULT_MEMORY to be called on any range
post-finalization, and all the edge cases prior to finalization could be
avoided if we have some way to enforce that finalization has already
been done.

One thing I'm not sure of is if KVM_TDX_INIT_MEM_REGION for a 4K page could
maybe lead to a 2M sEPT mapping that overlaps with a GPA range passed to
KVM_PRE_FAULT_MEMORY, which I think could lead to unexpected 'left' return
values unless we can make sure to only map exactly the GPA ranges populated
by KVM_TDX_INIT_MEM_REGION and nothing more.

-Mike

> 
> Still, I might lean towards staying the course just because we have gone down
> this path for a while and we don't currently have any fundamental issues.
> Probably we *really* need to get the next TDX MMU stuff posted so we can start
> to add a bit more certainty to that statement.
> 
> > 
> > > Given we are not actually populating during KVM_PRE_FAULT_MEMORY and try to
> > > avoid booting a TD until we've done so, maybe setting folio_mark_uptodate()
> > > in
> > > kvm_gmem_prepare_folio() is not appropriate in that case? But it may not be
> > > easy
> > > to separate.
> > 
> > It would be easy (just return a boolean value from
> > kvm_arch_gmem_prepare() to skip folio_mark_uptodate() before the VM is
> > ready, and implement it for TDX) but it's ugly. You're also clearing
> > the memory unnecessarily before overwriting it.
> 
> Hmm, right. Since kvm_gmem_populate() does folio_mark_uptodate() again despite
> testing for it earlier, we can skip folio_mark_uptodate() in
> kvm_gmem_prepare_folio() for TDX during the pre-finalization stage and it will
> get marked there.
> 
> I put a little POC of this suggestion at the end of the branch. Just revert it
> to reproduce the issue.
> 
> I think in the context of the work to launch a TD, extra clearing of pages is
> not too bad. I'm more bothered by how it highlights the general pitfalls of
> TDX's special clever behavior for KVM_PRE_FAULT_MEMORY before TD initialization.
> 
> If/when we want to skip it, I wonder if we could move the clearing into the
> gmem_prepare callbacks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 09/12] KVM: guest_memfd: move check for already-populated page to common code
  2024-07-14  5:32         ` Michael Roth
@ 2024-07-15 16:08           ` Paolo Bonzini
  2024-07-15 21:47             ` Michael Roth
  0 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2024-07-15 16:08 UTC (permalink / raw)
  To: Michael Roth
  Cc: Edgecombe, Rick P, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, seanjc@google.com

On Sun, Jul 14, 2024 at 7:33 AM Michael Roth <michael.roth@amd.com> wrote:
> > I guess this series is trying to help userspace not mess up the order of things
> > for SEV, where as TDX's design was to let userspace hold the pieces from the
> > beginning. As in, needing to match up the KVM_PRE_FAULT_MEMORY and
> > KVM_TDX_INIT_MEM_REGION calls, mysteriously return errors in later IOCTLs if
> > something was missed, etc.
>
> If SNP were to try to call KVM_PRE_FAULT_MEMORY before SNP_LAUNCH_UPDATE
> (rough equivalent to KVM_TDX_INIT_MEM_REGION), I think the same issue
> would arise, and in that case the uptodate flag you prototyped would
> wouldn't be enough to address it because SNP_LAUNCH_UPDATE would end up
> failing because the gmem_prepare hook previously triggered by
> KVM_PRE_FAULT_MEMORY would have put the corresponding RMP entries into
> an unexpected state (guest-owned/private).

Indeed, and I'd love for that to be the case for both TDX and SNP.

> So for SNP, KVM_PRE_FAULT_MEMORY/SNP_LAUNCH_UPDATE are mutually
> exclusive on what GPA ranges they can prep before finalizing launch state.

Not a problem; is KVM_PRE_FAULT_MEMORY before finalization the same as
zeroing memory?

> I realize that is awkward for TDX, where the KVM_PRE_FAULT_MEMORY is
> required to create the sEPT mapping before encrypting, but maybe it
> would be possible for TDX to just do that implicitly within
> KVM_TDX_INIT_MEM_REGION?

Yes, and it's what the TDX API used to be like a while ago.
Locking-wise, Rick confirmed offlist that there's no problem in
calling kvm_arch_vcpu_pre_fault_memory() from tdx_gmem_post_populate()
(my fault that it went offlist - email from the phone is hard...).

To be clear, I have no problem at all reusing the prefaulting code,
that's better than TDX having to do its own thing.  But forcing
userspace to do two passes is not great (it's already not great that
it has to be TDX_INIT_MEM_REGION has to be a VCPU operation, but
that's unfortunately unavoidable ).

Paolo


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 09/12] KVM: guest_memfd: move check for already-populated page to common code
  2024-07-15 16:08           ` Paolo Bonzini
@ 2024-07-15 21:47             ` Michael Roth
  2024-07-15 22:57               ` Edgecombe, Rick P
  0 siblings, 1 reply; 34+ messages in thread
From: Michael Roth @ 2024-07-15 21:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Edgecombe, Rick P, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, seanjc@google.com

On Mon, Jul 15, 2024 at 06:08:51PM +0200, Paolo Bonzini wrote:
> On Sun, Jul 14, 2024 at 7:33 AM Michael Roth <michael.roth@amd.com> wrote:
> > > I guess this series is trying to help userspace not mess up the order of things
> > > for SEV, where as TDX's design was to let userspace hold the pieces from the
> > > beginning. As in, needing to match up the KVM_PRE_FAULT_MEMORY and
> > > KVM_TDX_INIT_MEM_REGION calls, mysteriously return errors in later IOCTLs if
> > > something was missed, etc.
> >
> > If SNP were to try to call KVM_PRE_FAULT_MEMORY before SNP_LAUNCH_UPDATE
> > (rough equivalent to KVM_TDX_INIT_MEM_REGION), I think the same issue
> > would arise, and in that case the uptodate flag you prototyped would
> > wouldn't be enough to address it because SNP_LAUNCH_UPDATE would end up
> > failing because the gmem_prepare hook previously triggered by
> > KVM_PRE_FAULT_MEMORY would have put the corresponding RMP entries into
> > an unexpected state (guest-owned/private).
> 
> Indeed, and I'd love for that to be the case for both TDX and SNP.
> 
> > So for SNP, KVM_PRE_FAULT_MEMORY/SNP_LAUNCH_UPDATE are mutually
> > exclusive on what GPA ranges they can prep before finalizing launch state.
> 
> Not a problem; is KVM_PRE_FAULT_MEMORY before finalization the same as
> zeroing memory?

Sort of: you get a page that's zero'd by
gmem->kvm_gmem_prepare_folio()->clear_highpage(), and then that page is
mapped in the guest as a private page. But since no encrypted data was
written the guest will effectively see ciphertext until the guest actually
initializes the page after accepting it.

But you can sort of treat that whole sequence as being similar to calling
sev_gmem_post_populate() with page type KVM_SEV_SNP_PAGE_TYPE_ZERO (or
rather, a theoretical KVM_SEV_SNP_PAGE_TYPE_SCRAMBLE in this case), just
that in that case the page won't be pre-validated and it won't be
measured into the launch digest. But still if you look at it that way it's
a bit clearer why pre-fault shouldn't be performed on any ranges that will
later be populated again (unless gmem_populate callback itself handles it
and so aware of the restrictions).

> 
> > I realize that is awkward for TDX, where the KVM_PRE_FAULT_MEMORY is
> > required to create the sEPT mapping before encrypting, but maybe it
> > would be possible for TDX to just do that implicitly within
> > KVM_TDX_INIT_MEM_REGION?
> 
> Yes, and it's what the TDX API used to be like a while ago.
> Locking-wise, Rick confirmed offlist that there's no problem in
> calling kvm_arch_vcpu_pre_fault_memory() from tdx_gmem_post_populate()
> (my fault that it went offlist - email from the phone is hard...).

That's promising, if we go that route it probably makes sense for SNP to
do that as well, even though it's only an optimization in that case.

> 
> To be clear, I have no problem at all reusing the prefaulting code,
> that's better than TDX having to do its own thing.  But forcing
> userspace to do two passes is not great (it's already not great that
> it has to be TDX_INIT_MEM_REGION has to be a VCPU operation, but
> that's unfortunately unavoidable ).

Makes sense.

If we document mutual exclusion between ranges touched by
gmem_populate() vs ranges touched by actual userspace issuance of
KVM_PRE_FAULT_MEMORY (and make sure nothing too crazy happens if users
don't abide by the documentation), then I think most problems go away...

But there is still at least one awkward constraint for SNP:
KVM_PRE_FAULT_MEMORY cannot be called for private GPA ranges until after
SNP_LAUNCH_START is called. This is true even if the GPA range is not
one of the ranges that will get passed to
gmem_populate()/SNP_LAUNCH_UPDATE. The reason for this is that when
binding the ASID to the SNP context as part of SNP_LAUNCH_START, firmware
will perform checks to make sure that ASID is not already being used in
the RMP table, and that check will fail if KVM_PRE_FAULT_MEMORY triggered
for a private page before calling SNP_LAUNCH_START.

So we have this constraint that KVM_PRE_FAULT_MEMORY can't be issued
before SNP_LAUNCH_START. So it makes me wonder if we should just broaden
that constraint and for now just disallow KVM_PRE_FAULT_MEMORY prior to
finalizing a guest, since it'll be easier to lift that restriction later
versus discovering some other sort of edge case and need to
retroactively place restrictions.

I've taken Isaku's original pre_fault_memory_test and added a new
x86-specific coco_pre_fault_memory_test to try to better document and
exercise these corner cases for SEV and SNP, but I'm hoping it could
also be useful for TDX (hence the generic name). These are based on
Pratik's initial SNP selftests (which are in turn based on kvm/queue +
these patches):

  https://github.com/mdroth/linux/commits/snp-uptodate0-kst/
  https://github.com/mdroth/linux/commit/dd7d4b1983ceeb653132cfd54ad63f597db85f74

It could be interesting to also add some coverage interleaving of
fallocate()/FALLOC_FL_PUNCH_HOLE as well. I'll look into that more,
and more negative testing depending on the final semantics we end up
with.

Thanks,

Mike

> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 01/12] KVM: guest_memfd: return folio from __kvm_gmem_get_pfn()
  2024-07-11 22:27 ` [PATCH 01/12] KVM: guest_memfd: return folio from __kvm_gmem_get_pfn() Paolo Bonzini
@ 2024-07-15 22:26   ` Michael Roth
  0 siblings, 0 replies; 34+ messages in thread
From: Michael Roth @ 2024-07-15 22:26 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, seanjc

On Thu, Jul 11, 2024 at 06:27:44PM -0400, Paolo Bonzini wrote:
> Right now this is simply more consistent and avoids use of pfn_to_page()
> and put_page().  It will be put to more use in upcoming patches, to
> ensure that the up-to-date flag is set at the very end of both the
> kvm_gmem_get_pfn() and kvm_gmem_populate() flows.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Michael Roth <michael.roth@amd.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 02/12] KVM: guest_memfd: delay folio_mark_uptodate() until after successful preparation
  2024-07-11 22:27 ` [PATCH 02/12] KVM: guest_memfd: delay folio_mark_uptodate() until after successful preparation Paolo Bonzini
@ 2024-07-15 22:32   ` Michael Roth
  0 siblings, 0 replies; 34+ messages in thread
From: Michael Roth @ 2024-07-15 22:32 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, seanjc

On Thu, Jul 11, 2024 at 06:27:45PM -0400, Paolo Bonzini wrote:
> The up-to-date flag as is now is not too useful; it tells guest_memfd not
> to overwrite the contents of a folio, but it doesn't say that the page
> is ready to be mapped into the guest.  For encrypted guests, mapping
> a private page requires that the "preparation" phase has succeeded,
> and at the same time the same page cannot be prepared twice.
> 
> So, ensure that folio_mark_uptodate() is only called on a prepared page.  If
> kvm_gmem_prepare_folio() or the post_populate callback fail, the folio
> will not be marked up-to-date; it's not a problem to call clear_highpage()
> again on such a page prior to the next preparation attempt.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Michael Roth <michael.roth@amd.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 03/12] KVM: guest_memfd: do not go through struct page
  2024-07-11 22:27 ` [PATCH 03/12] KVM: guest_memfd: do not go through struct page Paolo Bonzini
@ 2024-07-15 22:36   ` Michael Roth
  0 siblings, 0 replies; 34+ messages in thread
From: Michael Roth @ 2024-07-15 22:36 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, seanjc

On Thu, Jul 11, 2024 at 06:27:46PM -0400, Paolo Bonzini wrote:
> We have a perfectly usable folio, use it to retrieve the pfn and order.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Michael Roth <michael.roth@amd.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 04/12] KVM: rename CONFIG_HAVE_KVM_GMEM_* to CONFIG_HAVE_KVM_ARCH_GMEM_*
  2024-07-11 22:27 ` [PATCH 04/12] KVM: rename CONFIG_HAVE_KVM_GMEM_* to CONFIG_HAVE_KVM_ARCH_GMEM_* Paolo Bonzini
@ 2024-07-15 22:40   ` Michael Roth
  0 siblings, 0 replies; 34+ messages in thread
From: Michael Roth @ 2024-07-15 22:40 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, seanjc

On Thu, Jul 11, 2024 at 06:27:47PM -0400, Paolo Bonzini wrote:
> Add "ARCH" to the symbols; shortly, the "prepare" phase will include both
> the arch-independent step to clear out contents left in the page by the
> host, and the arch-dependent step enabled by CONFIG_HAVE_KVM_GMEM_PREPARE.
> For consistency do the same for CONFIG_HAVE_KVM_GMEM_INVALIDATE as well.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Michael Roth <michael.roth@amd.com>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 09/12] KVM: guest_memfd: move check for already-populated page to common code
  2024-07-15 21:47             ` Michael Roth
@ 2024-07-15 22:57               ` Edgecombe, Rick P
  2024-07-16  0:13                 ` Michael Roth
  2024-07-17  6:42                 ` Michael Roth
  0 siblings, 2 replies; 34+ messages in thread
From: Edgecombe, Rick P @ 2024-07-15 22:57 UTC (permalink / raw)
  To: pbonzini@redhat.com, michael.roth@amd.com
  Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	seanjc@google.com

On Mon, 2024-07-15 at 16:47 -0500, Michael Roth wrote:
> Makes sense.
> 
> If we document mutual exclusion between ranges touched by
> gmem_populate() vs ranges touched by actual userspace issuance of
> KVM_PRE_FAULT_MEMORY (and make sure nothing too crazy happens if users
> don't abide by the documentation), then I think most problems go away...
> 
> But there is still at least one awkward constraint for SNP:
> KVM_PRE_FAULT_MEMORY cannot be called for private GPA ranges until after
> SNP_LAUNCH_START is called. This is true even if the GPA range is not
> one of the ranges that will get passed to
> gmem_populate()/SNP_LAUNCH_UPDATE. The reason for this is that when
> binding the ASID to the SNP context as part of SNP_LAUNCH_START, firmware
> will perform checks to make sure that ASID is not already being used in
> the RMP table, and that check will fail if KVM_PRE_FAULT_MEMORY triggered
> for a private page before calling SNP_LAUNCH_START.
> 
> So we have this constraint that KVM_PRE_FAULT_MEMORY can't be issued
> before SNP_LAUNCH_START. So it makes me wonder if we should just broaden
> that constraint and for now just disallow KVM_PRE_FAULT_MEMORY prior to
> finalizing a guest, since it'll be easier to lift that restriction later
> versus discovering some other sort of edge case and need to
> retroactively place restrictions.
> 
> I've taken Isaku's original pre_fault_memory_test and added a new
> x86-specific coco_pre_fault_memory_test to try to better document and
> exercise these corner cases for SEV and SNP, but I'm hoping it could
> also be useful for TDX (hence the generic name). These are based on
> Pratik's initial SNP selftests (which are in turn based on kvm/queue +
> these patches):
> 
>   https://github.com/mdroth/linux/commits/snp-uptodate0-kst/
>  
> https://github.com/mdroth/linux/commit/dd7d4b1983ceeb653132cfd54ad63f597db85f74
> 
> 

From the TDX side it wouldn't be horrible to not have to worry about userspace
mucking around with the mirrored page tables in unexpected ways during the
special period. TDX already has its own "finalized" state in kvm_tdx, is there
something similar on the SEV side we could unify with?

I looked at moving from kvm_arch_vcpu_pre_fault_memory() to directly calling
kvm_tdp_map_page(), so we could potentially put whatever check in
kvm_arch_vcpu_pre_fault_memory(). It required a couple exports:

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 03737f3aaeeb..9004ac597a85 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -277,6 +277,7 @@ extern bool tdp_mmu_enabled;
 #endif
 
 int kvm_tdp_mmu_get_walk_mirror_pfn(struct kvm_vcpu *vcpu, u64 gpa, kvm_pfn_t
*pfn);
+int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, u8
*level);
 
 static inline bool kvm_memslots_have_rmaps(struct kvm *kvm)
 {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 7bb6b17b455f..4a3e471ec9fe 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4721,8 +4721,7 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct
kvm_page_fault *fault)
        return direct_page_fault(vcpu, fault);
 }
 
-static int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code,
-                           u8 *level)
+int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, u8
*level)
 {
        int r;
 
@@ -4759,6 +4758,7 @@ static int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t
gpa, u64 error_code,
                return -EIO;
        }
 }
+EXPORT_SYMBOL_GPL(kvm_tdp_map_page);
 
 long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
                                    struct kvm_pre_fault_memory *range)
@@ -5770,6 +5770,7 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
 out:
        return r;
 }
+EXPORT_SYMBOL_GPL(kvm_mmu_load);
 
 void kvm_mmu_unload(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 9ac0821eb44b..7161ef68f3da 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -2809,11 +2809,13 @@ struct tdx_gmem_post_populate_arg {
 static int tdx_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
                                  void __user *src, int order, void *_arg)
 {
+       u64 error_code = PFERR_GUEST_FINAL_MASK | PFERR_PRIVATE_ACCESS;
        struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
        struct tdx_gmem_post_populate_arg *arg = _arg;
        struct kvm_vcpu *vcpu = arg->vcpu;
        struct kvm_memory_slot *slot;
        gpa_t gpa = gfn_to_gpa(gfn);
+       u8 level = PG_LEVEL_4K;
        struct page *page;
        kvm_pfn_t mmu_pfn;
        int ret, i;
@@ -2832,6 +2834,10 @@ static int tdx_gmem_post_populate(struct kvm *kvm, gfn_t
gfn, kvm_pfn_t pfn,
                goto out_put_page;
        }
 
+       ret = kvm_tdp_map_page(vcpu, gpa, error_code, &level);
+       if (ret < 0)
+               goto out_put_page;
+
        read_lock(&kvm->mmu_lock);
 
        ret = kvm_tdp_mmu_get_walk_mirror_pfn(vcpu, gpa, &mmu_pfn);
@@ -2910,6 +2916,7 @@ static int tdx_vcpu_init_mem_region(struct kvm_vcpu *vcpu,
struct kvm_tdx_cmd *c
        mutex_lock(&kvm->slots_lock);
        idx = srcu_read_lock(&kvm->srcu);
 
+       kvm_mmu_reload(vcpu);
        ret = 0;
        while (region.nr_pages) {
                if (signal_pending(current)) {


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 05/12] KVM: guest_memfd: return locked folio from __kvm_gmem_get_pfn
  2024-07-11 22:27 ` [PATCH 05/12] KVM: guest_memfd: return locked folio from __kvm_gmem_get_pfn Paolo Bonzini
@ 2024-07-15 23:55   ` Michael Roth
  0 siblings, 0 replies; 34+ messages in thread
From: Michael Roth @ 2024-07-15 23:55 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, seanjc

On Thu, Jul 11, 2024 at 06:27:48PM -0400, Paolo Bonzini wrote:
> Allow testing the up-to-date flag in the caller without taking the
> lock again.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Michael Roth <michael.roth@amd.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 09/12] KVM: guest_memfd: move check for already-populated page to common code
  2024-07-15 22:57               ` Edgecombe, Rick P
@ 2024-07-16  0:13                 ` Michael Roth
  2024-07-17  6:42                 ` Michael Roth
  1 sibling, 0 replies; 34+ messages in thread
From: Michael Roth @ 2024-07-16  0:13 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: pbonzini@redhat.com, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, seanjc@google.com

On Mon, Jul 15, 2024 at 10:57:35PM +0000, Edgecombe, Rick P wrote:
> On Mon, 2024-07-15 at 16:47 -0500, Michael Roth wrote:
> > Makes sense.
> > 
> > If we document mutual exclusion between ranges touched by
> > gmem_populate() vs ranges touched by actual userspace issuance of
> > KVM_PRE_FAULT_MEMORY (and make sure nothing too crazy happens if users
> > don't abide by the documentation), then I think most problems go away...
> > 
> > But there is still at least one awkward constraint for SNP:
> > KVM_PRE_FAULT_MEMORY cannot be called for private GPA ranges until after
> > SNP_LAUNCH_START is called. This is true even if the GPA range is not
> > one of the ranges that will get passed to
> > gmem_populate()/SNP_LAUNCH_UPDATE. The reason for this is that when
> > binding the ASID to the SNP context as part of SNP_LAUNCH_START, firmware
> > will perform checks to make sure that ASID is not already being used in
> > the RMP table, and that check will fail if KVM_PRE_FAULT_MEMORY triggered
> > for a private page before calling SNP_LAUNCH_START.
> > 
> > So we have this constraint that KVM_PRE_FAULT_MEMORY can't be issued
> > before SNP_LAUNCH_START. So it makes me wonder if we should just broaden
> > that constraint and for now just disallow KVM_PRE_FAULT_MEMORY prior to
> > finalizing a guest, since it'll be easier to lift that restriction later
> > versus discovering some other sort of edge case and need to
> > retroactively place restrictions.
> > 
> > I've taken Isaku's original pre_fault_memory_test and added a new
> > x86-specific coco_pre_fault_memory_test to try to better document and
> > exercise these corner cases for SEV and SNP, but I'm hoping it could
> > also be useful for TDX (hence the generic name). These are based on
> > Pratik's initial SNP selftests (which are in turn based on kvm/queue +
> > these patches):
> > 
> >   https://github.com/mdroth/linux/commits/snp-uptodate0-kst/
> >  
> > https://github.com/mdroth/linux/commit/dd7d4b1983ceeb653132cfd54ad63f597db85f74
> > 
> > 
> 
> From the TDX side it wouldn't be horrible to not have to worry about userspace
> mucking around with the mirrored page tables in unexpected ways during the
> special period. TDX already has its own "finalized" state in kvm_tdx, is there
> something similar on the SEV side we could unify with?

Unfortunately there isn't currently anything in place like that for SNP,
but if we had a common 'finalized' field somewhere it could easily be set in
snp_launch_finish() as well.

> 
> I looked at moving from kvm_arch_vcpu_pre_fault_memory() to directly calling
> kvm_tdp_map_page(), so we could potentially put whatever check in
> kvm_arch_vcpu_pre_fault_memory(). It required a couple exports:

Thanks. I'll give this a spin for SNP as well.

-Mike

> 
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 03737f3aaeeb..9004ac597a85 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -277,6 +277,7 @@ extern bool tdp_mmu_enabled;
>  #endif
>  
>  int kvm_tdp_mmu_get_walk_mirror_pfn(struct kvm_vcpu *vcpu, u64 gpa, kvm_pfn_t
> *pfn);
> +int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, u8
> *level);
>  
>  static inline bool kvm_memslots_have_rmaps(struct kvm *kvm)
>  {
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 7bb6b17b455f..4a3e471ec9fe 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4721,8 +4721,7 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct
> kvm_page_fault *fault)
>         return direct_page_fault(vcpu, fault);
>  }
>  
> -static int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code,
> -                           u8 *level)
> +int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, u8
> *level)
>  {
>         int r;
>  
> @@ -4759,6 +4758,7 @@ static int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t
> gpa, u64 error_code,
>                 return -EIO;
>         }
>  }
> +EXPORT_SYMBOL_GPL(kvm_tdp_map_page);
>  
>  long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
>                                     struct kvm_pre_fault_memory *range)
> @@ -5770,6 +5770,7 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
>  out:
>         return r;
>  }
> +EXPORT_SYMBOL_GPL(kvm_mmu_load);
>  
>  void kvm_mmu_unload(struct kvm_vcpu *vcpu)
>  {
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index 9ac0821eb44b..7161ef68f3da 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -2809,11 +2809,13 @@ struct tdx_gmem_post_populate_arg {
>  static int tdx_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
>                                   void __user *src, int order, void *_arg)
>  {
> +       u64 error_code = PFERR_GUEST_FINAL_MASK | PFERR_PRIVATE_ACCESS;
>         struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
>         struct tdx_gmem_post_populate_arg *arg = _arg;
>         struct kvm_vcpu *vcpu = arg->vcpu;
>         struct kvm_memory_slot *slot;
>         gpa_t gpa = gfn_to_gpa(gfn);
> +       u8 level = PG_LEVEL_4K;
>         struct page *page;
>         kvm_pfn_t mmu_pfn;
>         int ret, i;
> @@ -2832,6 +2834,10 @@ static int tdx_gmem_post_populate(struct kvm *kvm, gfn_t
> gfn, kvm_pfn_t pfn,
>                 goto out_put_page;
>         }
>  
> +       ret = kvm_tdp_map_page(vcpu, gpa, error_code, &level);
> +       if (ret < 0)
> +               goto out_put_page;
> +
>         read_lock(&kvm->mmu_lock);
>  
>         ret = kvm_tdp_mmu_get_walk_mirror_pfn(vcpu, gpa, &mmu_pfn);
> @@ -2910,6 +2916,7 @@ static int tdx_vcpu_init_mem_region(struct kvm_vcpu *vcpu,
> struct kvm_tdx_cmd *c
>         mutex_lock(&kvm->slots_lock);
>         idx = srcu_read_lock(&kvm->srcu);
>  
> +       kvm_mmu_reload(vcpu);
>         ret = 0;
>         while (region.nr_pages) {
>                 if (signal_pending(current)) {
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 09/12] KVM: guest_memfd: move check for already-populated page to common code
  2024-07-15 22:57               ` Edgecombe, Rick P
  2024-07-16  0:13                 ` Michael Roth
@ 2024-07-17  6:42                 ` Michael Roth
  1 sibling, 0 replies; 34+ messages in thread
From: Michael Roth @ 2024-07-17  6:42 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: pbonzini@redhat.com, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, seanjc@google.com

On Mon, Jul 15, 2024 at 10:57:35PM +0000, Edgecombe, Rick P wrote:
> On Mon, 2024-07-15 at 16:47 -0500, Michael Roth wrote:
> > Makes sense.
> > 
> > If we document mutual exclusion between ranges touched by
> > gmem_populate() vs ranges touched by actual userspace issuance of
> > KVM_PRE_FAULT_MEMORY (and make sure nothing too crazy happens if users
> > don't abide by the documentation), then I think most problems go away...
> > 
> > But there is still at least one awkward constraint for SNP:
> > KVM_PRE_FAULT_MEMORY cannot be called for private GPA ranges until after
> > SNP_LAUNCH_START is called. This is true even if the GPA range is not
> > one of the ranges that will get passed to
> > gmem_populate()/SNP_LAUNCH_UPDATE. The reason for this is that when
> > binding the ASID to the SNP context as part of SNP_LAUNCH_START, firmware
> > will perform checks to make sure that ASID is not already being used in
> > the RMP table, and that check will fail if KVM_PRE_FAULT_MEMORY triggered
> > for a private page before calling SNP_LAUNCH_START.
> > 
> > So we have this constraint that KVM_PRE_FAULT_MEMORY can't be issued
> > before SNP_LAUNCH_START. So it makes me wonder if we should just broaden
> > that constraint and for now just disallow KVM_PRE_FAULT_MEMORY prior to
> > finalizing a guest, since it'll be easier to lift that restriction later
> > versus discovering some other sort of edge case and need to
> > retroactively place restrictions.
> > 
> > I've taken Isaku's original pre_fault_memory_test and added a new
> > x86-specific coco_pre_fault_memory_test to try to better document and
> > exercise these corner cases for SEV and SNP, but I'm hoping it could
> > also be useful for TDX (hence the generic name). These are based on
> > Pratik's initial SNP selftests (which are in turn based on kvm/queue +
> > these patches):
> > 
> >   https://github.com/mdroth/linux/commits/snp-uptodate0-kst/
> >  
> > https://github.com/mdroth/linux/commit/dd7d4b1983ceeb653132cfd54ad63f597db85f74
> > 
> > 
> 
> From the TDX side it wouldn't be horrible to not have to worry about userspace
> mucking around with the mirrored page tables in unexpected ways during the
> special period. TDX already has its own "finalized" state in kvm_tdx, is there
> something similar on the SEV side we could unify with?

While trying to doing pre-mapping in sev_gmem_post_populate() like you've
done below for TDX, I hit another issue that I think would be avoided by
enforcing finalization (or otherwise needs some alternative fix within
this series).

I'd already mentioned the issue with gmem_prepare() putting pages in an
unexpected RMP state if called prior to initializing the same GPA range
in gmem_populate(). This get handled gracefully however when the
firmware call is issued to encrypt/measure the pages and it re-checks
the page's RMP state.

However, if another thread is issuing KVM_PRE_FAULT_MEMORY and triggers
a gmem_prepare() just after gmem_populate() places the pages in a
private RMP state, then gmem_prepare()->kvm_gmem_get_pfn() will trigger
the clear_highpage() call in kvm_gmem_prepare_folio() because the
uptodate flag isn't set until after sev_gmem_post_populate() returns.
So I ended up just calling kvm_arch_vcpu_pre_fault_memory() on the full
GPA range after the post-populate callback returns:

  https://github.com/mdroth/linux/commit/da4e7465ced1a708ff1c5e9ab27c570de7e8974e

...

But a much larger concern is that even without this series, but just
plain kvm/queue, KVM_PRE_FAULT_MEMORY can still race with
sev_gmem_post_populate() in bad ways, so I think for 6.11 we should
consider locking this finalization requirement in and applying
something like the below fix:

  https://github.com/mdroth/linux/commit/7095ba6ee8f050af11620daaf6c2219fd0bbb1c3

Or if that's potentially too big a chance to decide right now, we could
just make KVM_PRE_FAULT_MEMORY return EOPNOTSUPP for
kvm_arch_has_private_memory() until we've had more time to work out the
missing bits for CoCo there.

-Mike

> 
> I looked at moving from kvm_arch_vcpu_pre_fault_memory() to directly calling
> kvm_tdp_map_page(), so we could potentially put whatever check in
> kvm_arch_vcpu_pre_fault_memory(). It required a couple exports:
> 
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 03737f3aaeeb..9004ac597a85 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -277,6 +277,7 @@ extern bool tdp_mmu_enabled;
>  #endif
>  
>  int kvm_tdp_mmu_get_walk_mirror_pfn(struct kvm_vcpu *vcpu, u64 gpa, kvm_pfn_t
> *pfn);
> +int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, u8
> *level);
>  
>  static inline bool kvm_memslots_have_rmaps(struct kvm *kvm)
>  {
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 7bb6b17b455f..4a3e471ec9fe 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4721,8 +4721,7 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct
> kvm_page_fault *fault)
>         return direct_page_fault(vcpu, fault);
>  }
>  
> -static int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code,
> -                           u8 *level)
> +int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, u8
> *level)
>  {
>         int r;
>  
> @@ -4759,6 +4758,7 @@ static int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t
> gpa, u64 error_code,
>                 return -EIO;
>         }
>  }
> +EXPORT_SYMBOL_GPL(kvm_tdp_map_page);
>  
>  long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
>                                     struct kvm_pre_fault_memory *range)
> @@ -5770,6 +5770,7 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
>  out:
>         return r;
>  }
> +EXPORT_SYMBOL_GPL(kvm_mmu_load);
>  
>  void kvm_mmu_unload(struct kvm_vcpu *vcpu)
>  {
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index 9ac0821eb44b..7161ef68f3da 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -2809,11 +2809,13 @@ struct tdx_gmem_post_populate_arg {
>  static int tdx_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
>                                   void __user *src, int order, void *_arg)
>  {
> +       u64 error_code = PFERR_GUEST_FINAL_MASK | PFERR_PRIVATE_ACCESS;
>         struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
>         struct tdx_gmem_post_populate_arg *arg = _arg;
>         struct kvm_vcpu *vcpu = arg->vcpu;
>         struct kvm_memory_slot *slot;
>         gpa_t gpa = gfn_to_gpa(gfn);
> +       u8 level = PG_LEVEL_4K;
>         struct page *page;
>         kvm_pfn_t mmu_pfn;
>         int ret, i;
> @@ -2832,6 +2834,10 @@ static int tdx_gmem_post_populate(struct kvm *kvm, gfn_t
> gfn, kvm_pfn_t pfn,
>                 goto out_put_page;
>         }
>  
> +       ret = kvm_tdp_map_page(vcpu, gpa, error_code, &level);
> +       if (ret < 0)
> +               goto out_put_page;
> +
>         read_lock(&kvm->mmu_lock);
>  
>         ret = kvm_tdp_mmu_get_walk_mirror_pfn(vcpu, gpa, &mmu_pfn);
> @@ -2910,6 +2916,7 @@ static int tdx_vcpu_init_mem_region(struct kvm_vcpu *vcpu,
> struct kvm_tdx_cmd *c
>         mutex_lock(&kvm->slots_lock);
>         idx = srcu_read_lock(&kvm->srcu);
>  
> +       kvm_mmu_reload(vcpu);
>         ret = 0;
>         while (region.nr_pages) {
>                 if (signal_pending(current)) {
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 06/12] KVM: guest_memfd: delay kvm_gmem_prepare_folio() until the memory is passed to the guest
  2024-07-11 22:27 ` [PATCH 06/12] KVM: guest_memfd: delay kvm_gmem_prepare_folio() until the memory is passed to the guest Paolo Bonzini
@ 2024-07-17 21:28   ` Michael Roth
  0 siblings, 0 replies; 34+ messages in thread
From: Michael Roth @ 2024-07-17 21:28 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, seanjc

On Thu, Jul 11, 2024 at 06:27:49PM -0400, Paolo Bonzini wrote:
> Initializing the contents of the folio on fallocate() is unnecessarily
> restrictive.  It means that the page is registered with the firmware and
> then it cannot be touched anymore.  In particular, this loses the
> possibility of using fallocate() to pre-allocate the page for SEV-SNP
> guests, because kvm_arch_gmem_prepare() then fails.
> 
> It's only when the guest actually accesses the page (and therefore
> kvm_gmem_get_pfn() is called) that the page must be cleared from any
> stale host data and registered with the firmware.  The up-to-date flag
> is clear if this has to be done (i.e. it is the first access and
> kvm_gmem_populate() has not been called).
> 
> All in all, there are enough differences between kvm_gmem_get_pfn() and
> kvm_gmem_populate(), that it's better to separate the two flows completely.
> Extract the bulk of kvm_gmem_get_folio(), which take a folio and end up
> setting its up-to-date flag, to a new function kvm_gmem_prepare_folio();
> these are now done only by the non-__-prefixed kvm_gmem_get_pfn().
> As a bonus, __kvm_gmem_get_pfn() loses its ugly "bool prepare" argument.
> 
> The previous semantics, where fallocate() could be used to prepare
> the pages in advance of running the guest, can be accessed with
> KVM_PRE_FAULT_MEMORY.
> 
> For now, accessing a page in one VM will attempt to call
> kvm_arch_gmem_prepare() in all of those that have bound the guest_memfd.
> Cleaning this up is left to a separate patch.
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  virt/kvm/guest_memfd.c | 107 ++++++++++++++++++++++++-----------------
>  1 file changed, 63 insertions(+), 44 deletions(-)
> 
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 9271aba9b7b3..f637327ad8e1 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -25,7 +25,7 @@ static inline kvm_pfn_t folio_file_pfn(struct folio *folio, pgoff_t index)
>  	return folio_pfn(folio) + (index & (folio_nr_pages(folio) - 1));
>  }
>  
> -static int kvm_gmem_prepare_folio(struct inode *inode, pgoff_t index, struct folio *folio)
> +static int __kvm_gmem_prepare_folio(struct inode *inode, pgoff_t index, struct folio *folio)
>  {
>  #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE
>  	struct list_head *gmem_list = &inode->i_mapping->i_private_list;
> @@ -59,49 +59,60 @@ static int kvm_gmem_prepare_folio(struct inode *inode, pgoff_t index, struct fol
>  	return 0;
>  }
>  
> -/* Returns a locked folio on success.  */
> -static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index, bool prepare)
> +/* Process @folio, which contains @gfn (which in turn is contained in @slot),

Need a line break after "/*"

> + * so that the guest can use it.  On successful return the guest sees a zero
> + * page so as to avoid leaking host data and the up-to-date flag is set.

Might be worth noting that the folio must be locked so that the
up-to-date flag can be read/set. Or maybe WARN_ONCE() just to avoid any
subtle races slipping in.

> + */
> +static int kvm_gmem_prepare_folio(struct file *file, struct kvm_memory_slot *slot,
> +				  gfn_t gfn, struct folio *folio)
>  {
> -	struct folio *folio;
> +	unsigned long nr_pages, i;
> +	pgoff_t index;
> +	int r;
>  
> -	/* TODO: Support huge pages. */
> -	folio = filemap_grab_folio(inode->i_mapping, index);
> -	if (IS_ERR(folio))
> -		return folio;
> +	if (folio_test_uptodate(folio))
> +		return 0;
> +
> +	nr_pages = folio_nr_pages(folio);
> +	for (i = 0; i < nr_pages; i++)
> +		clear_highpage(folio_page(folio, i));
>  
>  	/*
> -	 * Use the up-to-date flag to track whether or not the memory has been
> -	 * zeroed before being handed off to the guest.  There is no backing
> -	 * storage for the memory, so the folio will remain up-to-date until
> -	 * it's removed.
> +	 * Right now the folio order is always going to be zero,
> +	 * but the code is ready for huge folios, the only
> +	 * assumption being that the base pgoff of memslots is
> +	 * naturally aligned according to the folio's order.
> +	 * The desired order will be passed when creating the
> +	 * guest_memfd and checked when creating memslots.
>  	 *
> -	 * TODO: Skip clearing pages when trusted firmware will do it when
> -	 * assigning memory to the guest.
> +	 * By making the base pgoff of the memslot naturally aligned
> +	 * to the folio's order, huge folios are safe to prepare, no
> +	 * matter what page size is used to map memory into the guest.
>  	 */
> -	if (!folio_test_uptodate(folio)) {
> -		unsigned long nr_pages = folio_nr_pages(folio);
> -		unsigned long i;
> -
> -		for (i = 0; i < nr_pages; i++)
> -			clear_highpage(folio_page(folio, i));
> -	}
> -
> -	if (prepare) {
> -		int r =	kvm_gmem_prepare_folio(inode, index, folio);
> -		if (r < 0) {
> -			folio_unlock(folio);
> -			folio_put(folio);
> -			return ERR_PTR(r);
> -		}
> +	WARN_ON(!IS_ALIGNED(slot->gmem.pgoff, 1 << folio_order(folio)));
> +	index = gfn - slot->base_gfn + slot->gmem.pgoff;
> +	index = ALIGN_DOWN(index, 1 << folio_order(folio));

I think, when huge folio support is added, we'd also need to make sure
that the huge folio is completely contained within the GPA range
covered by the memslot. For instance, Sean's original THP patch does
this to ensure the order that KVM MMU uses does not result in it
trying to map GFNs beyond the end of the memslot:

  @@ -533,9 +577,24 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
   	page = folio_file_page(folio, index);
   
   	*pfn = page_to_pfn(page);
  -	if (max_order)
  +	if (!max_order)
  +		goto success;
  +
  +	*max_order = compound_order(compound_head(page));
  +	if (!*max_order)
  +		goto success;
  +
  +	/*
  +	 * The folio can be mapped with a hugepage if and only if the folio is
  +	 * fully contained by the range the memslot is bound to.  Note, the
  +	 * caller is responsible for handling gfn alignment, this only deals
  +	 * with the file binding.
  +	 */
  +	huge_index = ALIGN(index, 1ull << *max_order);
  +	if (huge_index < ALIGN(slot->gmem.pgoff, 1ull << *max_order) ||
  +	    huge_index + (1ull << *max_order) > slot->gmem.pgoff + slot->npages)
   		*max_order = 0;

  https://lore.kernel.org/kvm/20231027182217.3615211-1-seanjc@google.com/T/#mccbd3e8bf9897f0ddbf864e6318d6f2f208b269c

so maybe we'd want to similarly limit the order that
kvm_gmem_prepare_folio() ends up using to match up with that...

...but maybe it's not necessary. I don't think this causes any
problems currently: if a 2nd memslot, e.g.:

       ________________________
      |Slot2       |Slot1      |
      |------------|-----------|
  GPA |4MB         |2.5MB      |0MB
      +------------+-----------+

overlaps with a folio allocated for another memslot using that same
gmemfd, and the pages are already in a prepared state, then it's
fair to skip re-preparing the range. As long as KVM MMU does not
create mappings for GFN ranges beyond the end of the memslot (e.g.
via the above snippet from Sean's patch), then the guest will
naturally end up with 4K mappings for the range in question, which
will trigger an RMP #NPF which will get quickly resolved with a
PSMASH to split the 2MB RMP entry if some other slot trigger the
preparation of the folio.

With mapping_large_folio_support() not being set yet these conditions
are always met, but when it gets enabled we'd need the above logic
to limit the max_order passed to KVM MMU, so it might make sense to
add a comment on that.

Looks good otherwise.

-Mike

>  
> +	r = __kvm_gmem_prepare_folio(file_inode(file), index, folio);
> +	if (!r)
>  		folio_mark_uptodate(folio);
> -	}
>  
> -	/*
> -	 * Ignore accessed, referenced, and dirty flags.  The memory is
> -	 * unevictable and there is no storage to write back to.
> -	 */
> -	return folio;
> +	return r;
> +}
> +
> +/*
> + * Returns a locked folio on success.  The caller is responsible for
> + * setting the up-to-date flag before the memory is mapped into the guest.
> + * There is no backing storage for the memory, so the folio will remain
> + * up-to-date until it's removed.
> + *
> + * Ignore accessed, referenced, and dirty flags.  The memory is
> + * unevictable and there is no storage to write back to.
> + */
> +static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index)
> +{
> +	/* TODO: Support huge pages. */
> +	return filemap_grab_folio(inode->i_mapping, index);
>  }
>  
>  static void kvm_gmem_invalidate_begin(struct kvm_gmem *gmem, pgoff_t start,
> @@ -201,7 +212,7 @@ static long kvm_gmem_allocate(struct inode *inode, loff_t offset, loff_t len)
>  			break;
>  		}
>  
> -		folio = kvm_gmem_get_folio(inode, index, true);
> +		folio = kvm_gmem_get_folio(inode, index);
>  		if (IS_ERR(folio)) {
>  			r = PTR_ERR(folio);
>  			break;
> @@ -555,7 +566,7 @@ void kvm_gmem_unbind(struct kvm_memory_slot *slot)
>  /* Returns a locked folio on success.  */
>  static struct folio *
>  __kvm_gmem_get_pfn(struct file *file, struct kvm_memory_slot *slot,
> -		   gfn_t gfn, kvm_pfn_t *pfn, int *max_order, bool prepare)
> +		   gfn_t gfn, kvm_pfn_t *pfn, int *max_order)
>  {
>  	pgoff_t index = gfn - slot->base_gfn + slot->gmem.pgoff;
>  	struct kvm_gmem *gmem = file->private_data;
> @@ -572,7 +583,7 @@ __kvm_gmem_get_pfn(struct file *file, struct kvm_memory_slot *slot,
>  		return ERR_PTR(-EIO);
>  	}
>  
> -	folio = kvm_gmem_get_folio(file_inode(file), index, prepare);
> +	folio = kvm_gmem_get_folio(file_inode(file), index);
>  	if (IS_ERR(folio))
>  		return folio;
>  
> @@ -594,17 +605,25 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
>  {
>  	struct file *file = kvm_gmem_get_file(slot);
>  	struct folio *folio;
> +	int r = 0;
>  
>  	if (!file)
>  		return -EFAULT;
>  
> -	folio = __kvm_gmem_get_pfn(file, slot, gfn, pfn, max_order, true);
> -	fput(file);
> -	if (IS_ERR(folio))
> -		return PTR_ERR(folio);
> +	folio = __kvm_gmem_get_pfn(file, slot, gfn, pfn, max_order);
> +	if (IS_ERR(folio)) {
> +		r = PTR_ERR(folio);
> +		goto out;
> +	}
>  
> +	r = kvm_gmem_prepare_folio(file, slot, gfn, folio);
>  	folio_unlock(folio);
> -	return 0;
> +	if (r < 0)
> +		folio_put(folio);
> +
> +out:
> +	fput(file);
> +	return r;
>  }
>  EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn);
>  
> @@ -643,7 +662,7 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long
>  			break;
>  		}
>  
> -		folio = __kvm_gmem_get_pfn(file, slot, gfn, &pfn, &max_order, false);
> +		folio = __kvm_gmem_get_pfn(file, slot, gfn, &pfn, &max_order);
>  		if (IS_ERR(folio)) {
>  			ret = PTR_ERR(folio);
>  			break;
> -- 
> 2.43.0
> 
> 
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 07/12] KVM: guest_memfd: make kvm_gmem_prepare_folio() operate on a single struct kvm
  2024-07-11 22:27 ` [PATCH 07/12] KVM: guest_memfd: make kvm_gmem_prepare_folio() operate on a single struct kvm Paolo Bonzini
@ 2024-07-17 21:34   ` Michael Roth
  0 siblings, 0 replies; 34+ messages in thread
From: Michael Roth @ 2024-07-17 21:34 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, seanjc

On Thu, Jul 11, 2024 at 06:27:50PM -0400, Paolo Bonzini wrote:
> This is now possible because preparation is done by kvm_gmem_get_pfn()
> instead of fallocate().  In practice this is not a limitation, because
> even though guest_memfd can be bound to multiple struct kvm, for
> hardware implementations of confidential computing only one guest
> (identified by an ASID on SEV-SNP, or an HKID on TDX) will be able
> to access it.
> 
> In the case of intra-host migration (not implemented yet for SEV-SNP,
> but we can use SEV-ES as an idea of how it will work), the new struct
> kvm inherits the same ASID and preparation need not be repeated.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Michael Roth <michael.roth@amd.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 08/12] KVM: remove kvm_arch_gmem_prepare_needed()
  2024-07-11 22:27 ` [PATCH 08/12] KVM: remove kvm_arch_gmem_prepare_needed() Paolo Bonzini
@ 2024-07-17 21:42   ` Michael Roth
  0 siblings, 0 replies; 34+ messages in thread
From: Michael Roth @ 2024-07-17 21:42 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, seanjc

On Thu, Jul 11, 2024 at 06:27:51PM -0400, Paolo Bonzini wrote:
> It is enough to return 0 if a guest need not do any preparation.
> This is in fact how sev_gmem_prepare() works for non-SNP guests,
> and it extends naturally to Intel hosts: the x86 callback for
> gmem_prepare is optional and returns 0 if not defined.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/x86.c       |  5 -----
>  include/linux/kvm_host.h |  1 -
>  virt/kvm/guest_memfd.c   | 13 +++----------
>  3 files changed, 3 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a1c85591f92c..4f58423c6148 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -13604,11 +13604,6 @@ bool kvm_arch_no_poll(struct kvm_vcpu *vcpu)
>  EXPORT_SYMBOL_GPL(kvm_arch_no_poll);
>  
>  #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE
> -bool kvm_arch_gmem_prepare_needed(struct kvm *kvm)
> -{
> -	return kvm->arch.vm_type == KVM_X86_SNP_VM;
> -}
> -
>  int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order)
>  {
>  	return static_call(kvm_x86_gmem_prepare)(kvm, pfn, gfn, max_order);
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index eb8404e9aa03..f6e11991442d 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2443,7 +2443,6 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
>  
>  #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE
>  int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order);
> -bool kvm_arch_gmem_prepare_needed(struct kvm *kvm);
>  #endif
>  
>  /**
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index f4d82719ec19..509360eefea5 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -29,16 +29,9 @@ static int __kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slo
>  				    pgoff_t index, struct folio *folio)
>  {
>  #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE
> -	kvm_pfn_t pfn;
> -	gfn_t gfn;
> -	int rc;
> -
> -	if (!kvm_arch_gmem_prepare_needed(kvm))
> -		return 0;
> -
> -	pfn = folio_file_pfn(folio, index);
> -	gfn = slot->base_gfn + index - slot->gmem.pgoff;
> -	rc = kvm_arch_gmem_prepare(kvm, gfn, pfn, folio_order(folio));
> +	kvm_pfn_t pfn = folio_file_pfn(folio, index);
> +	gfn_t gfn = slot->base_gfn + index - slot->gmem.pgoff;
> +	int rc = kvm_arch_gmem_prepare(kvm, gfn, pfn, folio_order(folio));

Looks like this hunk was meant to be part of a different patch.

Otherwise:

Reviewed-by: Michael Roth <michael.roth@amd.com>

-Mike

>  	if (rc) {
>  		pr_warn_ratelimited("gmem: Failed to prepare folio for index %lx GFN %llx PFN %llx error %d.\n",
>  				    index, gfn, pfn, rc);
> -- 
> 2.43.0
> 
> 
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 09/12] KVM: guest_memfd: move check for already-populated page to common code
  2024-07-11 22:27 ` [PATCH 09/12] KVM: guest_memfd: move check for already-populated page to common code Paolo Bonzini
  2024-07-13  1:28   ` Edgecombe, Rick P
@ 2024-07-17 21:53   ` Michael Roth
  1 sibling, 0 replies; 34+ messages in thread
From: Michael Roth @ 2024-07-17 21:53 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, seanjc

On Thu, Jul 11, 2024 at 06:27:52PM -0400, Paolo Bonzini wrote:
> Do not allow populating the same page twice with startup data.  In the
> case of SEV-SNP, for example, the firmware does not allow it anyway,
> since the launch-update operation is only possible on pages that are
> still shared in the RMP.
> 
> Even if it worked, kvm_gmem_populate()'s callback is meant to have side
> effects such as updating launch measurements, and updating the same
> page twice is unlikely to have the desired results.
> 
> Races between calls to the ioctl are not possible because kvm_gmem_populate()
> holds slots_lock and the VM should not be running.  But again, even if
> this worked on other confidential computing technology, it doesn't matter
> to guest_memfd.c whether this is an intentional attempt to do something
> fishy, or missing synchronization in userspace, or even something
> intentional.  One of the racers wins, and the page is initialized by

I think one of those "intentional"s was meant to be an "unintentional".

> either kvm_gmem_prepare_folio() or kvm_gmem_populate().
> 
> Anyway, out of paranoia, adjust sev_gmem_post_populate() anyway to use
> the same errno that kvm_gmem_populate() is using.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Assuming based on the discussion here that there will be some logic added
to enforce that KVM_PRE_FAULT_MEMORY can only happen after finalization, I
think the new checks make sense.

Reviewed-by: Michael Roth <michael.roth@amd.com>

> ---
>  arch/x86/kvm/svm/sev.c | 2 +-
>  virt/kvm/guest_memfd.c | 7 +++++++
>  2 files changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index df8818759698..397ef9e70182 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -2213,7 +2213,7 @@ static int sev_gmem_post_populate(struct kvm *kvm, gfn_t gfn_start, kvm_pfn_t pf
>  		if (ret || assigned) {
>  			pr_debug("%s: Failed to ensure GFN 0x%llx RMP entry is initial shared state, ret: %d assigned: %d\n",
>  				 __func__, gfn, ret, assigned);
> -			ret = -EINVAL;
> +			ret = ret ? -EINVAL : -EEXIST;
>  			goto err;
>  		}
>  
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 509360eefea5..266810bb91c9 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -650,6 +650,13 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long
>  			break;
>  		}
>  
> +		if (folio_test_uptodate(folio)) {
> +			folio_unlock(folio);
> +			folio_put(folio);
> +			ret = -EEXIST;
> +			break;
> +		}
> +
>  		folio_unlock(folio);
>  		if (!IS_ALIGNED(gfn, (1 << max_order)) ||
>  		    (npages - i) < (1 << max_order))
> -- 
> 2.43.0
> 
> 
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 10/12] KVM: cleanup and add shortcuts to kvm_range_has_memory_attributes()
  2024-07-11 22:27 ` [PATCH 10/12] KVM: cleanup and add shortcuts to kvm_range_has_memory_attributes() Paolo Bonzini
@ 2024-07-17 22:23   ` Michael Roth
  0 siblings, 0 replies; 34+ messages in thread
From: Michael Roth @ 2024-07-17 22:23 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, seanjc

On Thu, Jul 11, 2024 at 06:27:53PM -0400, Paolo Bonzini wrote:
> Use a guard to simplify early returns, and add two more easy
> shortcuts.  If the requested attributes are invalid, the attributes
> xarray will never show them as set.  And if testing a single page,
> kvm_get_memory_attributes() is more efficient.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  virt/kvm/kvm_main.c | 42 ++++++++++++++++++++----------------------
>  1 file changed, 20 insertions(+), 22 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index f817ec66c85f..8ab9d8ff7b74 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2392,6 +2392,14 @@ static int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm,
>  #endif /* CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */
>  
>  #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
> +static u64 kvm_supported_mem_attributes(struct kvm *kvm)
> +{
> +	if (!kvm || kvm_arch_has_private_mem(kvm))
> +		return KVM_MEMORY_ATTRIBUTE_PRIVATE;
> +
> +	return 0;
> +}
> +
>  /*
>   * Returns true if _all_ gfns in the range [@start, @end) have attributes
>   * matching @attrs.
> @@ -2400,40 +2408,30 @@ bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
>  				     unsigned long attrs)
>  {
>  	XA_STATE(xas, &kvm->mem_attr_array, start);
> +	unsigned long mask = kvm_supported_mem_attributes(kvm);;

Extra semicolon.

Otherwise:

Reviewed-by: Michael Roth <michael.roth@amd.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 11/12] KVM: extend kvm_range_has_memory_attributes() to check subset of attributes
  2024-07-11 22:27 ` [PATCH 11/12] KVM: extend kvm_range_has_memory_attributes() to check subset of attributes Paolo Bonzini
@ 2024-07-17 22:32   ` Michael Roth
  0 siblings, 0 replies; 34+ messages in thread
From: Michael Roth @ 2024-07-17 22:32 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, seanjc

On Thu, Jul 11, 2024 at 06:27:54PM -0400, Paolo Bonzini wrote:
> While currently there is no other attribute than KVM_MEMORY_ATTRIBUTE_PRIVATE,
> KVM code such as kvm_mem_is_private() is written to expect their existence.
> Allow using kvm_range_has_memory_attributes() as a multi-page version of
> kvm_mem_is_private(), without it breaking later when more attributes are
> introduced.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Michael Roth <michael.roth@amd.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 12/12] KVM: guest_memfd: let kvm_gmem_populate() operate only on private gfns
  2024-07-11 22:27 ` [PATCH 12/12] KVM: guest_memfd: let kvm_gmem_populate() operate only on private gfns Paolo Bonzini
@ 2024-07-17 22:49   ` Michael Roth
  0 siblings, 0 replies; 34+ messages in thread
From: Michael Roth @ 2024-07-17 22:49 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, seanjc

On Thu, Jul 11, 2024 at 06:27:55PM -0400, Paolo Bonzini wrote:
> This check is currently performed by sev_gmem_post_populate(), but it
> applies to all callers of kvm_gmem_populate(): the point of the function
> is that the memory is being encrypted and some work has to be done
> on all the gfns in order to encrypt them.
> 
> Therefore, check the KVM_MEMORY_ATTRIBUTE_PRIVATE attribute prior
> to invoking the callback, and stop the operation if a shared page
> is encountered.  Because CONFIG_KVM_PRIVATE_MEM in principle does
> not require attributes, this makes kvm_gmem_populate() depend on
> CONFIG_KVM_GENERIC_PRIVATE_MEM (which does require them).
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Michael Roth <michael.roth@amd.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2024-07-17 22:50 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-11 22:27 [PATCH 00/12] KVM: guest_memfd: lazy preparation of pages + prefault support for SEV-SNP Paolo Bonzini
2024-07-11 22:27 ` [PATCH 01/12] KVM: guest_memfd: return folio from __kvm_gmem_get_pfn() Paolo Bonzini
2024-07-15 22:26   ` Michael Roth
2024-07-11 22:27 ` [PATCH 02/12] KVM: guest_memfd: delay folio_mark_uptodate() until after successful preparation Paolo Bonzini
2024-07-15 22:32   ` Michael Roth
2024-07-11 22:27 ` [PATCH 03/12] KVM: guest_memfd: do not go through struct page Paolo Bonzini
2024-07-15 22:36   ` Michael Roth
2024-07-11 22:27 ` [PATCH 04/12] KVM: rename CONFIG_HAVE_KVM_GMEM_* to CONFIG_HAVE_KVM_ARCH_GMEM_* Paolo Bonzini
2024-07-15 22:40   ` Michael Roth
2024-07-11 22:27 ` [PATCH 05/12] KVM: guest_memfd: return locked folio from __kvm_gmem_get_pfn Paolo Bonzini
2024-07-15 23:55   ` Michael Roth
2024-07-11 22:27 ` [PATCH 06/12] KVM: guest_memfd: delay kvm_gmem_prepare_folio() until the memory is passed to the guest Paolo Bonzini
2024-07-17 21:28   ` Michael Roth
2024-07-11 22:27 ` [PATCH 07/12] KVM: guest_memfd: make kvm_gmem_prepare_folio() operate on a single struct kvm Paolo Bonzini
2024-07-17 21:34   ` Michael Roth
2024-07-11 22:27 ` [PATCH 08/12] KVM: remove kvm_arch_gmem_prepare_needed() Paolo Bonzini
2024-07-17 21:42   ` Michael Roth
2024-07-11 22:27 ` [PATCH 09/12] KVM: guest_memfd: move check for already-populated page to common code Paolo Bonzini
2024-07-13  1:28   ` Edgecombe, Rick P
2024-07-13 10:10     ` Paolo Bonzini
2024-07-13 20:25       ` Edgecombe, Rick P
2024-07-14  5:32         ` Michael Roth
2024-07-15 16:08           ` Paolo Bonzini
2024-07-15 21:47             ` Michael Roth
2024-07-15 22:57               ` Edgecombe, Rick P
2024-07-16  0:13                 ` Michael Roth
2024-07-17  6:42                 ` Michael Roth
2024-07-17 21:53   ` Michael Roth
2024-07-11 22:27 ` [PATCH 10/12] KVM: cleanup and add shortcuts to kvm_range_has_memory_attributes() Paolo Bonzini
2024-07-17 22:23   ` Michael Roth
2024-07-11 22:27 ` [PATCH 11/12] KVM: extend kvm_range_has_memory_attributes() to check subset of attributes Paolo Bonzini
2024-07-17 22:32   ` Michael Roth
2024-07-11 22:27 ` [PATCH 12/12] KVM: guest_memfd: let kvm_gmem_populate() operate only on private gfns Paolo Bonzini
2024-07-17 22:49   ` Michael Roth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox