* [PATCH v13 01/85] KVM: Drop KVM_ERR_PTR_BAD_PAGE and instead return NULL to indicate an error
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 02/85] KVM: Allow calling kvm_release_page_{clean,dirty}() on a NULL page pointer Sean Christopherson
` (85 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Remove KVM_ERR_PTR_BAD_PAGE and instead return NULL, as "bad page" is just
a leftover bit of weirdness from days of old when KVM stuffed a "bad" page
into the guest instead of actually handling missing pages. See commit
cea7bb21280e ("KVM: MMU: Make gfn_to_page() always safe").
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/powerpc/kvm/book3s_pr.c | 2 +-
arch/powerpc/kvm/book3s_xive_native.c | 2 +-
arch/s390/kvm/vsie.c | 2 +-
arch/x86/kvm/lapic.c | 2 +-
include/linux/kvm_host.h | 7 -------
virt/kvm/kvm_main.c | 15 ++++++---------
6 files changed, 10 insertions(+), 20 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 7b8ae509328f..d7721297b9b6 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -645,7 +645,7 @@ static void kvmppc_patch_dcbz(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte)
int i;
hpage = gfn_to_page(vcpu->kvm, pte->raddr >> PAGE_SHIFT);
- if (is_error_page(hpage))
+ if (!hpage)
return;
hpage_offset = pte->raddr & ~PAGE_MASK;
diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
index 6e2ebbd8aaac..d9bf1bc3ff61 100644
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -654,7 +654,7 @@ static int kvmppc_xive_native_set_queue_config(struct kvmppc_xive *xive,
}
page = gfn_to_page(kvm, gfn);
- if (is_error_page(page)) {
+ if (!page) {
srcu_read_unlock(&kvm->srcu, srcu_idx);
pr_err("Couldn't get queue page %llx!\n", kvm_eq.qaddr);
return -EINVAL;
diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
index 89cafea4c41f..763a070f5955 100644
--- a/arch/s390/kvm/vsie.c
+++ b/arch/s390/kvm/vsie.c
@@ -661,7 +661,7 @@ static int pin_guest_page(struct kvm *kvm, gpa_t gpa, hpa_t *hpa)
struct page *page;
page = gfn_to_page(kvm, gpa_to_gfn(gpa));
- if (is_error_page(page))
+ if (!page)
return -EINVAL;
*hpa = (hpa_t)page_to_phys(page) + (gpa & ~PAGE_MASK);
return 0;
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 2098dc689088..20526e4d6c62 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2664,7 +2664,7 @@ int kvm_alloc_apic_access_page(struct kvm *kvm)
}
page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
- if (is_error_page(page)) {
+ if (!page) {
ret = -EFAULT;
goto out;
}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index db567d26f7b9..ee186a1fbaad 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -153,13 +153,6 @@ static inline bool kvm_is_error_gpa(gpa_t gpa)
return gpa == INVALID_GPA;
}
-#define KVM_ERR_PTR_BAD_PAGE (ERR_PTR(-ENOENT))
-
-static inline bool is_error_page(struct page *page)
-{
- return IS_ERR(page);
-}
-
#define KVM_REQUEST_MASK GENMASK(7,0)
#define KVM_REQUEST_NO_WAKEUP BIT(8)
#define KVM_REQUEST_WAIT BIT(9)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 05cbb2548d99..4b659a649dfa 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3078,19 +3078,14 @@ EXPORT_SYMBOL_GPL(gfn_to_page_many_atomic);
*/
struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
{
- struct page *page;
kvm_pfn_t pfn;
pfn = gfn_to_pfn(kvm, gfn);
if (is_error_noslot_pfn(pfn))
- return KVM_ERR_PTR_BAD_PAGE;
+ return NULL;
- page = kvm_pfn_to_refcounted_page(pfn);
- if (!page)
- return KVM_ERR_PTR_BAD_PAGE;
-
- return page;
+ return kvm_pfn_to_refcounted_page(pfn);
}
EXPORT_SYMBOL_GPL(gfn_to_page);
@@ -3184,7 +3179,8 @@ static void kvm_set_page_accessed(struct page *page)
void kvm_release_page_clean(struct page *page)
{
- WARN_ON(is_error_page(page));
+ if (WARN_ON(!page))
+ return;
kvm_set_page_accessed(page);
put_page(page);
@@ -3208,7 +3204,8 @@ EXPORT_SYMBOL_GPL(kvm_release_pfn_clean);
void kvm_release_page_dirty(struct page *page)
{
- WARN_ON(is_error_page(page));
+ if (WARN_ON(!page))
+ return;
kvm_set_page_dirty(page);
kvm_release_page_clean(page);
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 02/85] KVM: Allow calling kvm_release_page_{clean,dirty}() on a NULL page pointer
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 01/85] KVM: Drop KVM_ERR_PTR_BAD_PAGE and instead return NULL to indicate an error Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 03/85] KVM: Add kvm_release_page_unused() API to put pages that KVM never consumes Sean Christopherson
` (84 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Allow passing a NULL @page to kvm_release_page_{clean,dirty}(), there's no
tangible benefit to forcing the callers to pre-check @page, and it ends up
generating a lot of duplicate boilerplate code.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/kvm_main.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4b659a649dfa..2032292df0b0 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3179,7 +3179,7 @@ static void kvm_set_page_accessed(struct page *page)
void kvm_release_page_clean(struct page *page)
{
- if (WARN_ON(!page))
+ if (!page)
return;
kvm_set_page_accessed(page);
@@ -3204,7 +3204,7 @@ EXPORT_SYMBOL_GPL(kvm_release_pfn_clean);
void kvm_release_page_dirty(struct page *page)
{
- if (WARN_ON(!page))
+ if (!page)
return;
kvm_set_page_dirty(page);
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 03/85] KVM: Add kvm_release_page_unused() API to put pages that KVM never consumes
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 01/85] KVM: Drop KVM_ERR_PTR_BAD_PAGE and instead return NULL to indicate an error Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 02/85] KVM: Allow calling kvm_release_page_{clean,dirty}() on a NULL page pointer Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 04/85] KVM: x86/mmu: Skip the "try unsync" path iff the old SPTE was a leaf SPTE Sean Christopherson
` (83 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Add an API to release an unused page, i.e. to put a page without marking
it accessed or dirty. The API will be used when KVM faults-in a page but
bails before installing the guest mapping (and other similar flows).
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
include/linux/kvm_host.h | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ee186a1fbaad..ab4485b2bddc 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1216,6 +1216,15 @@ unsigned long gfn_to_hva_prot(struct kvm *kvm, gfn_t gfn, bool *writable);
unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot, gfn_t gfn);
unsigned long gfn_to_hva_memslot_prot(struct kvm_memory_slot *slot, gfn_t gfn,
bool *writable);
+
+static inline void kvm_release_page_unused(struct page *page)
+{
+ if (!page)
+ return;
+
+ put_page(page);
+}
+
void kvm_release_page_clean(struct page *page);
void kvm_release_page_dirty(struct page *page);
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 04/85] KVM: x86/mmu: Skip the "try unsync" path iff the old SPTE was a leaf SPTE
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (2 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 03/85] KVM: Add kvm_release_page_unused() API to put pages that KVM never consumes Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 05/85] KVM: x86/mmu: Don't overwrite shadow-present MMU SPTEs when prefaulting Sean Christopherson
` (82 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Apply make_spte()'s optimization to skip trying to unsync shadow pages if
and only if the old SPTE was a leaf SPTE, as non-leaf SPTEs in direct MMUs
are always writable, i.e. could trigger a false positive and incorrectly
lead to KVM creating a SPTE without write-protecting or marking shadow
pages unsync.
This bug only affects the TDP MMU, as the shadow MMU only overwrites a
shadow-present SPTE when synchronizing SPTEs (and only 4KiB SPTEs can be
unsync). Specifically, mmu_set_spte() drops any non-leaf SPTEs *before*
calling make_spte(), whereas the TDP MMU can do a direct replacement of a
page table with the leaf SPTE.
Opportunistically update the comment to explain why skipping the unsync
stuff is safe, as opposed to simply saying "it's someone else's problem".
Cc: stable@vger.kernel.org
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/mmu/spte.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 8f7eb3ad88fc..5521608077ec 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -226,12 +226,20 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
spte |= PT_WRITABLE_MASK | shadow_mmu_writable_mask;
/*
- * Optimization: for pte sync, if spte was writable the hash
- * lookup is unnecessary (and expensive). Write protection
- * is responsibility of kvm_mmu_get_page / kvm_mmu_sync_roots.
- * Same reasoning can be applied to dirty page accounting.
+ * When overwriting an existing leaf SPTE, and the old SPTE was
+ * writable, skip trying to unsync shadow pages as any relevant
+ * shadow pages must already be unsync, i.e. the hash lookup is
+ * unnecessary (and expensive).
+ *
+ * The same reasoning applies to dirty page/folio accounting;
+ * KVM will mark the folio dirty using the old SPTE, thus
+ * there's no need to immediately mark the new SPTE as dirty.
+ *
+ * Note, both cases rely on KVM not changing PFNs without first
+ * zapping the old SPTE, which is guaranteed by both the shadow
+ * MMU and the TDP MMU.
*/
- if (is_writable_pte(old_spte))
+ if (is_last_spte(old_spte, level) && is_writable_pte(old_spte))
goto out;
/*
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 05/85] KVM: x86/mmu: Don't overwrite shadow-present MMU SPTEs when prefaulting
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (3 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 04/85] KVM: x86/mmu: Skip the "try unsync" path iff the old SPTE was a leaf SPTE Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 06/85] KVM: x86/mmu: Invert @can_unsync and renamed to @synchronizing Sean Christopherson
` (81 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Treat attempts to prefetch/prefault MMU SPTEs as spurious if there's an
existing shadow-present SPTE, as overwriting a SPTE that may have been
create by a "real" fault is at best confusing, and at worst potentially
harmful. E.g. mmu_try_to_unsync_pages() doesn't unsync when prefetching,
which creates a scenario where KVM could try to replace a Writable SPTE
with a !Writable SPTE, as sp->unsync is checked prior to acquiring
mmu_unsync_pages_lock.
Note, this applies to three of the four flavors of "prefetch" in KVM:
- KVM_PRE_FAULT_MEMORY
- Async #PF (host or PV)
- Prefetching
The fourth flavor, SPTE synchronization, i.e. FNAME(sync_spte), _only_
overwrites shadow-present SPTEs when calling make_spte(). But SPTE
synchronization specifically uses mmu_spte_update(), and so naturally
avoids the @prefetch check in mmu_set_spte().
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/mmu/mmu.c | 3 +++
arch/x86/kvm/mmu/tdp_mmu.c | 3 +++
2 files changed, 6 insertions(+)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a9a23e058555..a8c64069aa89 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2919,6 +2919,9 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot,
}
if (is_shadow_present_pte(*sptep)) {
+ if (prefetch)
+ return RET_PF_SPURIOUS;
+
/*
* If we overwrite a PTE page pointer with a 2MB PMD, unlink
* the parent of the now unreachable PTE.
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 3b996c1fdaab..3c6583468742 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1026,6 +1026,9 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
if (WARN_ON_ONCE(sp->role.level != fault->goal_level))
return RET_PF_RETRY;
+ if (fault->prefetch && is_shadow_present_pte(iter->old_spte))
+ return RET_PF_SPURIOUS;
+
if (unlikely(!fault->slot))
new_spte = make_mmio_spte(vcpu, iter->gfn, ACC_ALL);
else
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 06/85] KVM: x86/mmu: Invert @can_unsync and renamed to @synchronizing
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (4 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 05/85] KVM: x86/mmu: Don't overwrite shadow-present MMU SPTEs when prefaulting Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 07/85] KVM: x86/mmu: Mark new SPTE as Accessed when synchronizing existing SPTE Sean Christopherson
` (80 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Invert the polarity of "can_unsync" and rename the parameter to
"synchronizing" to allow a future change to set the Accessed bit if KVM
is synchronizing an existing SPTE. Querying "can_unsync" in that case is
nonsensical, as the fact that KVM can't unsync SPTEs doesn't provide any
justification for setting the Accessed bit.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/mmu/mmu.c | 12 ++++++------
arch/x86/kvm/mmu/mmu_internal.h | 2 +-
arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
arch/x86/kvm/mmu/spte.c | 4 ++--
arch/x86/kvm/mmu/spte.h | 2 +-
arch/x86/kvm/mmu/tdp_mmu.c | 4 ++--
6 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a8c64069aa89..0f21d6f76cab 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2795,7 +2795,7 @@ static void kvm_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
* be write-protected.
*/
int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
- gfn_t gfn, bool can_unsync, bool prefetch)
+ gfn_t gfn, bool synchronizing, bool prefetch)
{
struct kvm_mmu_page *sp;
bool locked = false;
@@ -2810,12 +2810,12 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
/*
* The page is not write-tracked, mark existing shadow pages unsync
- * unless KVM is synchronizing an unsync SP (can_unsync = false). In
- * that case, KVM must complete emulation of the guest TLB flush before
- * allowing shadow pages to become unsync (writable by the guest).
+ * unless KVM is synchronizing an unsync SP. In that case, KVM must
+ * complete emulation of the guest TLB flush before allowing shadow
+ * pages to become unsync (writable by the guest).
*/
for_each_gfn_valid_sp_with_gptes(kvm, sp, gfn) {
- if (!can_unsync)
+ if (synchronizing)
return -EPERM;
if (sp->unsync)
@@ -2941,7 +2941,7 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot,
}
wrprot = make_spte(vcpu, sp, slot, pte_access, gfn, pfn, *sptep, prefetch,
- true, host_writable, &spte);
+ false, host_writable, &spte);
if (*sptep == spte) {
ret = RET_PF_SPURIOUS;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index c98827840e07..4da83544c4e1 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -164,7 +164,7 @@ static inline gfn_t gfn_round_for_level(gfn_t gfn, int level)
}
int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
- gfn_t gfn, bool can_unsync, bool prefetch);
+ gfn_t gfn, bool synchronizing, bool prefetch);
void kvm_mmu_gfn_disallow_lpage(const struct kvm_memory_slot *slot, gfn_t gfn);
void kvm_mmu_gfn_allow_lpage(const struct kvm_memory_slot *slot, gfn_t gfn);
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index ae7d39ff2d07..6e7bd8921c6f 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -963,7 +963,7 @@ static int FNAME(sync_spte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, int
host_writable = spte & shadow_host_writable_mask;
slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
make_spte(vcpu, sp, slot, pte_access, gfn,
- spte_to_pfn(spte), spte, true, false,
+ spte_to_pfn(spte), spte, true, true,
host_writable, &spte);
return mmu_spte_update(sptep, spte);
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 5521608077ec..0e47fea1a2d9 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -157,7 +157,7 @@ bool spte_has_volatile_bits(u64 spte)
bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
const struct kvm_memory_slot *slot,
unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn,
- u64 old_spte, bool prefetch, bool can_unsync,
+ u64 old_spte, bool prefetch, bool synchronizing,
bool host_writable, u64 *new_spte)
{
int level = sp->role.level;
@@ -248,7 +248,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
* e.g. it's write-tracked (upper-level SPs) or has one or more
* shadow pages and unsync'ing pages is not allowed.
*/
- if (mmu_try_to_unsync_pages(vcpu->kvm, slot, gfn, can_unsync, prefetch)) {
+ if (mmu_try_to_unsync_pages(vcpu->kvm, slot, gfn, synchronizing, prefetch)) {
wrprot = true;
pte_access &= ~ACC_WRITE_MASK;
spte &= ~(PT_WRITABLE_MASK | shadow_mmu_writable_mask);
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 2cb816ea2430..c81cac9358e0 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -499,7 +499,7 @@ bool spte_has_volatile_bits(u64 spte);
bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
const struct kvm_memory_slot *slot,
unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn,
- u64 old_spte, bool prefetch, bool can_unsync,
+ u64 old_spte, bool prefetch, bool synchronizing,
bool host_writable, u64 *new_spte);
u64 make_huge_page_split_spte(struct kvm *kvm, u64 huge_spte,
union kvm_mmu_page_role role, int index);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 3c6583468742..76bca7a726c1 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1033,8 +1033,8 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
new_spte = make_mmio_spte(vcpu, iter->gfn, ACC_ALL);
else
wrprot = make_spte(vcpu, sp, fault->slot, ACC_ALL, iter->gfn,
- fault->pfn, iter->old_spte, fault->prefetch, true,
- fault->map_writable, &new_spte);
+ fault->pfn, iter->old_spte, fault->prefetch,
+ false, fault->map_writable, &new_spte);
if (new_spte == iter->old_spte)
ret = RET_PF_SPURIOUS;
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 07/85] KVM: x86/mmu: Mark new SPTE as Accessed when synchronizing existing SPTE
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (5 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 06/85] KVM: x86/mmu: Invert @can_unsync and renamed to @synchronizing Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 08/85] KVM: x86/mmu: Mark folio dirty when creating SPTE, not when zapping/modifying Sean Christopherson
` (79 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Set the Accessed bit when making a "new" SPTE during SPTE synchronization,
as _clearing_ the Accessed bit is counter-productive, and even if the
Accessed bit wasn't set in the old SPTE, odds are very good the guest will
access the page in the near future, as the most common case where KVM
synchronizes a shadow-present SPTE is when the guest is making the gPTE
read-only for Copy-on-Write (CoW).
Preserving the Accessed bit will allow dropping the logic that propagates
the Accessed bit to the underlying struct page when overwriting an existing
SPTE, without undue risk of regressing page aging.
Note, KVM's current behavior is very deliberate, as SPTE synchronization
was the only "speculative" access type as of commit 947da5383069 ("KVM:
MMU: Set the accessed bit on non-speculative shadow ptes").
But, much has changed since 2008, and more changes are on the horizon.
Spurious clearing of the Accessed (and Dirty) was mitigated by commit
e6722d9211b2 ("KVM: x86/mmu: Reduce the update to the spte in
FNAME(sync_spte)"), which changed FNAME(sync_spte) to only overwrite SPTEs
if the protections are actually changing. I.e. KVM is already preserving
Accessed information for SPTEs that aren't dropping protections.
And with the aforementioned future change to NOT mark the page/folio as
accessed, KVM's SPTEs will become the "source of truth" so to speak, in
which case clearing the Accessed bit outside of page aging becomes very
undesirable.
Suggested-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/mmu/spte.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 0e47fea1a2d9..618059b30b8b 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -178,7 +178,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
spte |= SPTE_TDP_AD_WRPROT_ONLY;
spte |= shadow_present_mask;
- if (!prefetch)
+ if (!prefetch || synchronizing)
spte |= spte_shadow_accessed_mask(spte);
/*
@@ -259,7 +259,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
spte |= spte_shadow_dirty_mask(spte);
out:
- if (prefetch)
+ if (prefetch && !synchronizing)
spte = mark_spte_for_access_track(spte);
WARN_ONCE(is_rsvd_spte(&vcpu->arch.mmu->shadow_zero_check, spte, level),
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 08/85] KVM: x86/mmu: Mark folio dirty when creating SPTE, not when zapping/modifying
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (6 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 07/85] KVM: x86/mmu: Mark new SPTE as Accessed when synchronizing existing SPTE Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 09/85] KVM: x86/mmu: Mark page/folio accessed only when zapping leaf SPTEs Sean Christopherson
` (78 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Mark pages/folios dirty when creating SPTEs to map PFNs into the guest,
not when zapping or modifying SPTEs, as marking folios dirty when zapping
or modifying SPTEs can be extremely inefficient. E.g. when KVM is zapping
collapsible SPTEs to reconstitute a hugepage after disbling dirty logging,
KVM will mark every 4KiB pfn as dirty, even though _at least_ 512 pfns are
guaranteed to be in a single folio (the SPTE couldn't potentially be huge
if that weren't the case). The problem only becomes worse for 1GiB
HugeTLB pages, as KVM can mark a single folio dirty 512*512 times.
Marking a folio dirty when mapping is functionally safe as KVM drops all
relevant SPTEs in response to an mmu_notifier invalidation, i.e. ensures
that the guest can't dirty a folio after access has been removed.
And because KVM already marks folios dirty when zapping/modifying SPTEs
for KVM reasons, i.e. not in response to an mmu_notifier invalidation,
there is no danger of "prematurely" marking a folio dirty. E.g. if a
filesystems cleans a folio without first removing write access, then there
already exists races where KVM could mark a folio dirty before remote TLBs
are flushed, i.e. before guest writes are guaranteed to stop. Furthermore,
x86 is literally the only architecture that marks folios dirty on the
backend; every other KVM architecture marks folios dirty at map time.
x86's unique behavior likely stems from the fact that x86's MMU predates
mmu_notifiers. Long, long ago, before mmu_notifiers were added, marking
pages dirty when zapping SPTEs was logical, and perhaps even necessary, as
KVM held references to pages, i.e. kept a page's refcount elevated while
the page was mapped into the guest. At the time, KVM's rmap_remove()
simply did:
if (is_writeble_pte(*spte))
kvm_release_pfn_dirty(pfn);
else
kvm_release_pfn_clean(pfn);
i.e. dropped the refcount and marked the page dirty at the same time.
After mmu_notifiers were introduced, commit acb66dd051d0 ("KVM: MMU:
don't hold pagecount reference for mapped sptes pages") removed the
refcount logic, but kept the dirty logic, i.e. converted the above to:
if (is_writeble_pte(*spte))
kvm_release_pfn_dirty(pfn);
And for KVM x86, that's essentially how things have stayed over the last
~15 years, without anyone revisiting *why* KVM marks pages/folios dirty at
zap/modification time, e.g. the behavior was blindly carried forward to
the TDP MMU.
Practically speaking, the only downside to marking a folio dirty during
mapping is that KVM could trigger writeback of memory that was never
actually written. Except that can't actually happen if KVM marks folios
dirty if and only if a writable SPTE is created (as done here), because
KVM always marks writable SPTEs as dirty during make_spte(). See commit
9b51a63024bd ("KVM: MMU: Explicitly set D-bit for writable spte."), circa
2015.
Note, KVM's access tracking logic for prefetched SPTEs is a bit odd. If a
guest PTE is dirty and writable, KVM will create a writable SPTE, but then
mark the SPTE for access tracking. Which isn't wrong, just a bit odd, as
it results in _more_ precise dirty tracking for MMUs _without_ A/D bits.
To keep things simple, mark the folio dirty before access tracking comes
into play, as an access-tracked SPTE can be restored in the fast page
fault path, i.e. without holding mmu_lock. While writing SPTEs and
accessing memslots outside of mmu_lock is safe, marking a folio dirty is
not. E.g. if the fast path gets interrupted _just_ after setting a SPTE,
the primary MMU could theoretically invalidate and free a folio before KVM
marks it dirty. Unlike the shadow MMU, which waits for CPUs to respond to
an IPI, the TDP MMU only guarantees the page tables themselves won't be
freed (via RCU).
Opportunistically update a few stale comments.
Cc: David Matlack <dmatlack@google.com>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/mmu/mmu.c | 30 ++++--------------------------
arch/x86/kvm/mmu/paging_tmpl.h | 6 +++---
arch/x86/kvm/mmu/spte.c | 20 ++++++++++++++++++--
arch/x86/kvm/mmu/tdp_mmu.c | 12 ------------
4 files changed, 25 insertions(+), 43 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 0f21d6f76cab..1ae823ebd12b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -547,10 +547,8 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte)
kvm_set_pfn_accessed(spte_to_pfn(old_spte));
}
- if (is_dirty_spte(old_spte) && !is_dirty_spte(new_spte)) {
+ if (is_dirty_spte(old_spte) && !is_dirty_spte(new_spte))
flush = true;
- kvm_set_pfn_dirty(spte_to_pfn(old_spte));
- }
return flush;
}
@@ -593,9 +591,6 @@ static u64 mmu_spte_clear_track_bits(struct kvm *kvm, u64 *sptep)
if (is_accessed_spte(old_spte))
kvm_set_pfn_accessed(pfn);
- if (is_dirty_spte(old_spte))
- kvm_set_pfn_dirty(pfn);
-
return old_spte;
}
@@ -1250,16 +1245,6 @@ static bool spte_clear_dirty(u64 *sptep)
return mmu_spte_update(sptep, spte);
}
-static bool spte_wrprot_for_clear_dirty(u64 *sptep)
-{
- bool was_writable = test_and_clear_bit(PT_WRITABLE_SHIFT,
- (unsigned long *)sptep);
- if (was_writable && !spte_ad_enabled(*sptep))
- kvm_set_pfn_dirty(spte_to_pfn(*sptep));
-
- return was_writable;
-}
-
/*
* Gets the GFN ready for another round of dirty logging by clearing the
* - D bit on ad-enabled SPTEs, and
@@ -1275,7 +1260,8 @@ static bool __rmap_clear_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
for_each_rmap_spte(rmap_head, &iter, sptep)
if (spte_ad_need_write_protect(*sptep))
- flush |= spte_wrprot_for_clear_dirty(sptep);
+ flush |= test_and_clear_bit(PT_WRITABLE_SHIFT,
+ (unsigned long *)sptep);
else
flush |= spte_clear_dirty(sptep);
@@ -1628,14 +1614,6 @@ static bool kvm_rmap_age_gfn_range(struct kvm *kvm,
clear_bit((ffs(shadow_accessed_mask) - 1),
(unsigned long *)sptep);
} else {
- /*
- * Capture the dirty status of the page, so that
- * it doesn't get lost when the SPTE is marked
- * for access tracking.
- */
- if (is_writable_pte(spte))
- kvm_set_pfn_dirty(spte_to_pfn(spte));
-
spte = mark_spte_for_access_track(spte);
mmu_spte_update_no_track(sptep, spte);
}
@@ -3415,7 +3393,7 @@ static bool fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu,
* harm. This also avoids the TLB flush needed after setting dirty bit
* so non-PML cases won't be impacted.
*
- * Compare with set_spte where instead shadow_dirty_mask is set.
+ * Compare with make_spte() where instead shadow_dirty_mask is set.
*/
if (!try_cmpxchg64(sptep, &old_spte, new_spte))
return false;
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 6e7bd8921c6f..fbaae040218b 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -892,9 +892,9 @@ static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
/*
* Using the information in sp->shadowed_translation (kvm_mmu_page_get_gfn()) is
- * safe because:
- * - The spte has a reference to the struct page, so the pfn for a given gfn
- * can't change unless all sptes pointing to it are nuked first.
+ * safe because SPTEs are protected by mmu_notifiers and memslot generations, so
+ * the pfn for a given gfn can't change unless all SPTEs pointing to the gfn are
+ * nuked first.
*
* Returns
* < 0: failed to sync spte
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 618059b30b8b..8e8d6ee79c8b 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -232,8 +232,8 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
* unnecessary (and expensive).
*
* The same reasoning applies to dirty page/folio accounting;
- * KVM will mark the folio dirty using the old SPTE, thus
- * there's no need to immediately mark the new SPTE as dirty.
+ * KVM marked the folio dirty when the old SPTE was created,
+ * thus there's no need to mark the folio dirty again.
*
* Note, both cases rely on KVM not changing PFNs without first
* zapping the old SPTE, which is guaranteed by both the shadow
@@ -266,12 +266,28 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
"spte = 0x%llx, level = %d, rsvd bits = 0x%llx", spte, level,
get_rsvd_bits(&vcpu->arch.mmu->shadow_zero_check, spte, level));
+ /*
+ * Mark the memslot dirty *after* modifying it for access tracking.
+ * Unlike folios, memslots can be safely marked dirty out of mmu_lock,
+ * i.e. in the fast page fault handler.
+ */
if ((spte & PT_WRITABLE_MASK) && kvm_slot_dirty_track_enabled(slot)) {
/* Enforced by kvm_mmu_hugepage_adjust. */
WARN_ON_ONCE(level > PG_LEVEL_4K);
mark_page_dirty_in_slot(vcpu->kvm, slot, gfn);
}
+ /*
+ * If the page that KVM got from the primary MMU is writable, i.e. if
+ * it's host-writable, mark the page/folio dirty. As alluded to above,
+ * folios can't be safely marked dirty in the fast page fault handler,
+ * and so KVM must (somewhat) speculatively mark the folio dirty even
+ * though it isn't guaranteed to be written as KVM won't mark the folio
+ * dirty if/when the SPTE is made writable.
+ */
+ if (host_writable)
+ kvm_set_pfn_dirty(pfn);
+
*new_spte = spte;
return wrprot;
}
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 76bca7a726c1..517b384473c1 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -511,10 +511,6 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
if (is_leaf != was_leaf)
kvm_update_page_stats(kvm, level, is_leaf ? 1 : -1);
- if (was_leaf && is_dirty_spte(old_spte) &&
- (!is_present || !is_dirty_spte(new_spte) || pfn_changed))
- kvm_set_pfn_dirty(spte_to_pfn(old_spte));
-
/*
* Recursively handle child PTs if the change removed a subtree from
* the paging structure. Note the WARN on the PFN changing without the
@@ -1249,13 +1245,6 @@ static bool age_gfn_range(struct kvm *kvm, struct tdp_iter *iter,
iter->level);
new_spte = iter->old_spte & ~shadow_accessed_mask;
} else {
- /*
- * Capture the dirty status of the page, so that it doesn't get
- * lost when the SPTE is marked for access tracking.
- */
- if (is_writable_pte(iter->old_spte))
- kvm_set_pfn_dirty(spte_to_pfn(iter->old_spte));
-
new_spte = mark_spte_for_access_track(iter->old_spte);
iter->old_spte = kvm_tdp_mmu_write_spte(iter->sptep,
iter->old_spte, new_spte,
@@ -1596,7 +1585,6 @@ static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
trace_kvm_tdp_mmu_spte_changed(iter.as_id, iter.gfn, iter.level,
iter.old_spte,
iter.old_spte & ~dbit);
- kvm_set_pfn_dirty(spte_to_pfn(iter.old_spte));
}
rcu_read_unlock();
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 09/85] KVM: x86/mmu: Mark page/folio accessed only when zapping leaf SPTEs
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (7 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 08/85] KVM: x86/mmu: Mark folio dirty when creating SPTE, not when zapping/modifying Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 10/85] KVM: x86/mmu: Use gfn_to_page_many_atomic() when prefetching indirect PTEs Sean Christopherson
` (77 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Now that KVM doesn't clobber Accessed bits of shadow-present SPTEs,
e.g. when prefetching, mark folios as accessed only when zapping leaf
SPTEs, which is a rough heuristic for "only in response to an mmu_notifier
invalidation". Page aging and LRUs are tolerant of false negatives, i.e.
KVM doesn't need to be precise for correctness, and re-marking folios as
accessed when zapping entire roots or when zapping collapsible SPTEs is
expensive and adds very little value.
E.g. when a VM is dying, all of its memory is being freed; marking folios
accessed at that time provides no known value. Similarly, because KVM
marks folios as accessed when creating SPTEs, marking all folios as
accessed when userspace happens to delete a memslot doesn't add value.
The folio was marked access when the old SPTE was created, and will be
marked accessed yet again if a vCPU accesses the pfn again after reloading
a new root. Zapping collapsible SPTEs is a similar story; marking folios
accessed just because userspace disable dirty logging is a side effect of
KVM behavior, not a deliberate goal.
As an intermediate step, a.k.a. bisection point, towards *never* marking
folios accessed when dropping SPTEs, mark folios accessed when the primary
MMU might be invalidating mappings, as such zappings are not KVM initiated,
i.e. might actually be related to page aging and LRU activity.
Note, x86 is the only KVM architecture that "double dips"; every other
arch marks pfns as accessed only when mapping into the guest, not when
mapping into the guest _and_ when removing from the guest.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
Documentation/virt/kvm/locking.rst | 76 +++++++++++++++---------------
arch/x86/kvm/mmu/mmu.c | 4 +-
arch/x86/kvm/mmu/tdp_mmu.c | 7 ++-
3 files changed, 43 insertions(+), 44 deletions(-)
diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst
index 20a9a37d1cdd..3d8bf40ca448 100644
--- a/Documentation/virt/kvm/locking.rst
+++ b/Documentation/virt/kvm/locking.rst
@@ -147,49 +147,51 @@ Then, we can ensure the dirty bitmaps is correctly set for a gfn.
2) Dirty bit tracking
-In the origin code, the spte can be fast updated (non-atomically) if the
+In the original code, the spte can be fast updated (non-atomically) if the
spte is read-only and the Accessed bit has already been set since the
Accessed bit and Dirty bit can not be lost.
But it is not true after fast page fault since the spte can be marked
writable between reading spte and updating spte. Like below case:
-+------------------------------------------------------------------------+
-| At the beginning:: |
-| |
-| spte.W = 0 |
-| spte.Accessed = 1 |
-+------------------------------------+-----------------------------------+
-| CPU 0: | CPU 1: |
-+------------------------------------+-----------------------------------+
-| In mmu_spte_clear_track_bits():: | |
-| | |
-| old_spte = *spte; | |
-| | |
-| | |
-| /* 'if' condition is satisfied. */| |
-| if (old_spte.Accessed == 1 && | |
-| old_spte.W == 0) | |
-| spte = 0ull; | |
-+------------------------------------+-----------------------------------+
-| | on fast page fault path:: |
-| | |
-| | spte.W = 1 |
-| | |
-| | memory write on the spte:: |
-| | |
-| | spte.Dirty = 1 |
-+------------------------------------+-----------------------------------+
-| :: | |
-| | |
-| else | |
-| old_spte = xchg(spte, 0ull) | |
-| if (old_spte.Accessed == 1) | |
-| kvm_set_pfn_accessed(spte.pfn);| |
-| if (old_spte.Dirty == 1) | |
-| kvm_set_pfn_dirty(spte.pfn); | |
-| OOPS!!! | |
-+------------------------------------+-----------------------------------+
++-------------------------------------------------------------------------+
+| At the beginning:: |
+| |
+| spte.W = 0 |
+| spte.Accessed = 1 |
++-------------------------------------+-----------------------------------+
+| CPU 0: | CPU 1: |
++-------------------------------------+-----------------------------------+
+| In mmu_spte_update():: | |
+| | |
+| old_spte = *spte; | |
+| | |
+| | |
+| /* 'if' condition is satisfied. */ | |
+| if (old_spte.Accessed == 1 && | |
+| old_spte.W == 0) | |
+| spte = new_spte; | |
++-------------------------------------+-----------------------------------+
+| | on fast page fault path:: |
+| | |
+| | spte.W = 1 |
+| | |
+| | memory write on the spte:: |
+| | |
+| | spte.Dirty = 1 |
++-------------------------------------+-----------------------------------+
+| :: | |
+| | |
+| else | |
+| old_spte = xchg(spte, new_spte);| |
+| if (old_spte.Accessed && | |
+| !new_spte.Accessed) | |
+| flush = true; | |
+| if (old_spte.Dirty && | |
+| !new_spte.Dirty) | |
+| flush = true; | |
+| OOPS!!! | |
++-------------------------------------+-----------------------------------+
The Dirty bit is lost in this case.
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 1ae823ebd12b..04228a7da69a 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -542,10 +542,8 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte)
* to guarantee consistency between TLB and page tables.
*/
- if (is_accessed_spte(old_spte) && !is_accessed_spte(new_spte)) {
+ if (is_accessed_spte(old_spte) && !is_accessed_spte(new_spte))
flush = true;
- kvm_set_pfn_accessed(spte_to_pfn(old_spte));
- }
if (is_dirty_spte(old_spte) && !is_dirty_spte(new_spte))
flush = true;
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 517b384473c1..8aa0d7a7602b 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -520,10 +520,6 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
if (was_present && !was_leaf &&
(is_leaf || !is_present || WARN_ON_ONCE(pfn_changed)))
handle_removed_pt(kvm, spte_to_child_pt(old_spte, level), shared);
-
- if (was_leaf && is_accessed_spte(old_spte) &&
- (!is_present || !is_accessed_spte(new_spte) || pfn_changed))
- kvm_set_pfn_accessed(spte_to_pfn(old_spte));
}
static inline int __must_check __tdp_mmu_set_spte_atomic(struct tdp_iter *iter,
@@ -865,6 +861,9 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root,
tdp_mmu_iter_set_spte(kvm, &iter, SHADOW_NONPRESENT_VALUE);
+ if (is_accessed_spte(iter.old_spte))
+ kvm_set_pfn_accessed(spte_to_pfn(iter.old_spte));
+
/*
* Zappings SPTEs in invalid roots doesn't require a TLB flush,
* see kvm_tdp_mmu_zap_invalidated_roots() for details.
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 10/85] KVM: x86/mmu: Use gfn_to_page_many_atomic() when prefetching indirect PTEs
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (8 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 09/85] KVM: x86/mmu: Mark page/folio accessed only when zapping leaf SPTEs Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 11/85] KVM: Rename gfn_to_page_many_atomic() to kvm_prefetch_pages() Sean Christopherson
` (76 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Use gfn_to_page_many_atomic() instead of gfn_to_pfn_memslot_atomic() when
prefetching indirect PTEs (direct_pte_prefetch_many() already uses the
"to page" APIS). Functionally, the two are subtly equivalent, as the "to
pfn" API short-circuits hva_to_pfn() if hva_to_pfn_fast() fails, i.e. is
just a wrapper for get_user_page_fast_only()/get_user_pages_fast_only().
Switching to the "to page" API will allow dropping the @atomic parameter
from the entire hva_to_pfn() callchain.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/mmu/paging_tmpl.h | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index fbaae040218b..36b2607280f0 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -535,8 +535,8 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
{
struct kvm_memory_slot *slot;
unsigned pte_access;
+ struct page *page;
gfn_t gfn;
- kvm_pfn_t pfn;
if (FNAME(prefetch_invalid_gpte)(vcpu, sp, spte, gpte))
return false;
@@ -549,12 +549,11 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
if (!slot)
return false;
- pfn = gfn_to_pfn_memslot_atomic(slot, gfn);
- if (is_error_pfn(pfn))
+ if (gfn_to_page_many_atomic(slot, gfn, &page, 1) != 1)
return false;
- mmu_set_spte(vcpu, slot, spte, pte_access, gfn, pfn, NULL);
- kvm_release_pfn_clean(pfn);
+ mmu_set_spte(vcpu, slot, spte, pte_access, gfn, page_to_pfn(page), NULL);
+ kvm_release_page_clean(page);
return true;
}
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 11/85] KVM: Rename gfn_to_page_many_atomic() to kvm_prefetch_pages()
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (9 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 10/85] KVM: x86/mmu: Use gfn_to_page_many_atomic() when prefetching indirect PTEs Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 12/85] KVM: Drop @atomic param from gfn=>pfn and hva=>pfn APIs Sean Christopherson
` (75 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Rename gfn_to_page_many_atomic() to kvm_prefetch_pages() to try and
communicate its true purpose, as the "atomic" aspect is essentially a
side effect of the fact that x86 uses the API while holding mmu_lock.
E.g. even if mmu_lock weren't held, KVM wouldn't want to fault-in pages,
as the goal is to opportunistically grab surrounding pages that have
already been accessed and/or dirtied by the host, and to do so quickly.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/mmu/mmu.c | 2 +-
arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
include/linux/kvm_host.h | 4 ++--
virt/kvm/kvm_main.c | 6 +++---
4 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 04228a7da69a..5fe45ab0e818 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2958,7 +2958,7 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
if (!slot)
return -1;
- ret = gfn_to_page_many_atomic(slot, gfn, pages, end - start);
+ ret = kvm_prefetch_pages(slot, gfn, pages, end - start);
if (ret <= 0)
return -1;
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 36b2607280f0..143b7e9f26dc 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -549,7 +549,7 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
if (!slot)
return false;
- if (gfn_to_page_many_atomic(slot, gfn, &page, 1) != 1)
+ if (kvm_prefetch_pages(slot, gfn, &page, 1) != 1)
return false;
mmu_set_spte(vcpu, slot, spte, pte_access, gfn, page_to_pfn(page), NULL);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ab4485b2bddc..56e7cde8c8b8 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1207,8 +1207,8 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm);
void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
struct kvm_memory_slot *slot);
-int gfn_to_page_many_atomic(struct kvm_memory_slot *slot, gfn_t gfn,
- struct page **pages, int nr_pages);
+int kvm_prefetch_pages(struct kvm_memory_slot *slot, gfn_t gfn,
+ struct page **pages, int nr_pages);
struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn);
unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 2032292df0b0..957b4a6c9254 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3053,8 +3053,8 @@ kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn)
}
EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_pfn);
-int gfn_to_page_many_atomic(struct kvm_memory_slot *slot, gfn_t gfn,
- struct page **pages, int nr_pages)
+int kvm_prefetch_pages(struct kvm_memory_slot *slot, gfn_t gfn,
+ struct page **pages, int nr_pages)
{
unsigned long addr;
gfn_t entry = 0;
@@ -3068,7 +3068,7 @@ int gfn_to_page_many_atomic(struct kvm_memory_slot *slot, gfn_t gfn,
return get_user_pages_fast_only(addr, nr_pages, FOLL_WRITE, pages);
}
-EXPORT_SYMBOL_GPL(gfn_to_page_many_atomic);
+EXPORT_SYMBOL_GPL(kvm_prefetch_pages);
/*
* Do not use this helper unless you are absolutely certain the gfn _must_ be
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 12/85] KVM: Drop @atomic param from gfn=>pfn and hva=>pfn APIs
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (10 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 11/85] KVM: Rename gfn_to_page_many_atomic() to kvm_prefetch_pages() Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 13/85] KVM: Annotate that all paths in hva_to_pfn() might sleep Sean Christopherson
` (74 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Drop @atomic from the myriad "to_pfn" APIs now that all callers pass
"false", and remove a comment blurb about KVM running only the "GUP fast"
part in atomic context.
No functional change intended.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
Documentation/virt/kvm/locking.rst | 4 +--
arch/arm64/kvm/mmu.c | 2 +-
arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +-
arch/powerpc/kvm/book3s_64_mmu_radix.c | 2 +-
arch/x86/kvm/mmu/mmu.c | 12 ++++----
include/linux/kvm_host.h | 4 +--
virt/kvm/kvm_main.c | 39 ++++++--------------------
virt/kvm/kvm_mm.h | 4 +--
virt/kvm/pfncache.c | 2 +-
9 files changed, 23 insertions(+), 48 deletions(-)
diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst
index 3d8bf40ca448..f463ac42ac7a 100644
--- a/Documentation/virt/kvm/locking.rst
+++ b/Documentation/virt/kvm/locking.rst
@@ -135,8 +135,8 @@ We dirty-log for gfn1, that means gfn2 is lost in dirty-bitmap.
For direct sp, we can easily avoid it since the spte of direct sp is fixed
to gfn. For indirect sp, we disabled fast page fault for simplicity.
-A solution for indirect sp could be to pin the gfn, for example via
-kvm_vcpu_gfn_to_pfn_atomic, before the cmpxchg. After the pinning:
+A solution for indirect sp could be to pin the gfn before the cmpxchg. After
+the pinning:
- We have held the refcount of pfn; that means the pfn can not be freed and
be reused for another gfn.
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index a509b63bd4dd..a6e62cc9015c 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1569,7 +1569,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
mmu_seq = vcpu->kvm->mmu_invalidate_seq;
mmap_read_unlock(current->mm);
- pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false, NULL,
+ pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
write_fault, &writable, NULL);
if (pfn == KVM_PFN_ERR_HWPOISON) {
kvm_send_hwpoison_signal(hva, vma_shift);
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 1b51b1c4713b..8cd02ca4b1b8 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -613,7 +613,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_vcpu *vcpu,
write_ok = true;
} else {
/* Call KVM generic code to do the slow-path check */
- pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false, NULL,
+ pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
writing, &write_ok, NULL);
if (is_error_noslot_pfn(pfn))
return -EFAULT;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 408d98f8a514..26a969e935e3 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -852,7 +852,7 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu,
unsigned long pfn;
/* Call KVM generic code to do the slow-path check */
- pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false, NULL,
+ pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
writing, upgrade_p, NULL);
if (is_error_noslot_pfn(pfn))
return -EFAULT;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 5fe45ab0e818..0e235f276ee5 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4380,9 +4380,9 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
return kvm_faultin_pfn_private(vcpu, fault);
async = false;
- fault->pfn = __gfn_to_pfn_memslot(fault->slot, fault->gfn, false, false,
- &async, fault->write,
- &fault->map_writable, &fault->hva);
+ fault->pfn = __gfn_to_pfn_memslot(fault->slot, fault->gfn, false, &async,
+ fault->write, &fault->map_writable,
+ &fault->hva);
if (!async)
return RET_PF_CONTINUE; /* *pfn has correct page already */
@@ -4402,9 +4402,9 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
* to wait for IO. Note, gup always bails if it is unable to quickly
* get a page and a fatal signal, i.e. SIGKILL, is pending.
*/
- fault->pfn = __gfn_to_pfn_memslot(fault->slot, fault->gfn, false, true,
- NULL, fault->write,
- &fault->map_writable, &fault->hva);
+ fault->pfn = __gfn_to_pfn_memslot(fault->slot, fault->gfn, true, NULL,
+ fault->write, &fault->map_writable,
+ &fault->hva);
return RET_PF_CONTINUE;
}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 56e7cde8c8b8..2faafc7a56ae 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1232,9 +1232,8 @@ kvm_pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn);
kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
bool *writable);
kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn);
-kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn);
kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
- bool atomic, bool interruptible, bool *async,
+ bool interruptible, bool *async,
bool write_fault, bool *writable, hva_t *hva);
void kvm_release_pfn_clean(kvm_pfn_t pfn);
@@ -1315,7 +1314,6 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu);
struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn);
-kvm_pfn_t kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn);
kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn);
int kvm_vcpu_map(struct kvm_vcpu *vcpu, gpa_t gpa, struct kvm_host_map *map);
void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map, bool dirty);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 957b4a6c9254..0bc077213d3e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2756,8 +2756,7 @@ static inline int check_user_page_hwpoison(unsigned long addr)
/*
* The fast path to get the writable pfn which will be stored in @pfn,
- * true indicates success, otherwise false is returned. It's also the
- * only part that runs if we can in atomic context.
+ * true indicates success, otherwise false is returned.
*/
static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
bool *writable, kvm_pfn_t *pfn)
@@ -2922,7 +2921,6 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
/*
* Pin guest page in memory and return its pfn.
* @addr: host virtual address which maps memory to the guest
- * @atomic: whether this function is forbidden from sleeping
* @interruptible: whether the process can be interrupted by non-fatal signals
* @async: whether this function need to wait IO complete if the
* host page is not in the memory
@@ -2934,22 +2932,16 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
* 2): @write_fault = false && @writable, @writable will tell the caller
* whether the mapping is writable.
*/
-kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool interruptible,
- bool *async, bool write_fault, bool *writable)
+kvm_pfn_t hva_to_pfn(unsigned long addr, bool interruptible, bool *async,
+ bool write_fault, bool *writable)
{
struct vm_area_struct *vma;
kvm_pfn_t pfn;
int npages, r;
- /* we can do it either atomically or asynchronously, not both */
- BUG_ON(atomic && async);
-
if (hva_to_pfn_fast(addr, write_fault, writable, &pfn))
return pfn;
- if (atomic)
- return KVM_PFN_ERR_FAULT;
-
npages = hva_to_pfn_slow(addr, async, write_fault, interruptible,
writable, &pfn);
if (npages == 1)
@@ -2986,7 +2978,7 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool interruptible,
}
kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
- bool atomic, bool interruptible, bool *async,
+ bool interruptible, bool *async,
bool write_fault, bool *writable, hva_t *hva)
{
unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, write_fault);
@@ -3008,39 +3000,24 @@ kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
writable = NULL;
}
- return hva_to_pfn(addr, atomic, interruptible, async, write_fault,
- writable);
+ return hva_to_pfn(addr, interruptible, async, write_fault, writable);
}
EXPORT_SYMBOL_GPL(__gfn_to_pfn_memslot);
kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
bool *writable)
{
- return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, false,
- NULL, write_fault, writable, NULL);
+ return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, NULL,
+ write_fault, writable, NULL);
}
EXPORT_SYMBOL_GPL(gfn_to_pfn_prot);
kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn)
{
- return __gfn_to_pfn_memslot(slot, gfn, false, false, NULL, true,
- NULL, NULL);
+ return __gfn_to_pfn_memslot(slot, gfn, false, NULL, true, NULL, NULL);
}
EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot);
-kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn)
-{
- return __gfn_to_pfn_memslot(slot, gfn, true, false, NULL, true,
- NULL, NULL);
-}
-EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot_atomic);
-
-kvm_pfn_t kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn)
-{
- return gfn_to_pfn_memslot_atomic(kvm_vcpu_gfn_to_memslot(vcpu, gfn), gfn);
-}
-EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_pfn_atomic);
-
kvm_pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
{
return gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn);
diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h
index 715f19669d01..a3fa86f60d6c 100644
--- a/virt/kvm/kvm_mm.h
+++ b/virt/kvm/kvm_mm.h
@@ -20,8 +20,8 @@
#define KVM_MMU_UNLOCK(kvm) spin_unlock(&(kvm)->mmu_lock)
#endif /* KVM_HAVE_MMU_RWLOCK */
-kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool interruptible,
- bool *async, bool write_fault, bool *writable);
+kvm_pfn_t hva_to_pfn(unsigned long addr, bool interruptible, bool *async,
+ bool write_fault, bool *writable);
#ifdef CONFIG_HAVE_KVM_PFNCACHE
void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm,
diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index f0039efb9e1e..58c706a610e5 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -198,7 +198,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
}
/* We always request a writeable mapping */
- new_pfn = hva_to_pfn(gpc->uhva, false, false, NULL, true, NULL);
+ new_pfn = hva_to_pfn(gpc->uhva, false, NULL, true, NULL);
if (is_error_noslot_pfn(new_pfn))
goto out_error;
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 13/85] KVM: Annotate that all paths in hva_to_pfn() might sleep
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (11 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 12/85] KVM: Drop @atomic param from gfn=>pfn and hva=>pfn APIs Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 14/85] KVM: Return ERR_SIGPENDING from hva_to_pfn() if GUP returns -EGAIN Sean Christopherson
` (73 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Now that hva_to_pfn() no longer supports being called in atomic context,
move the might_sleep() annotation from hva_to_pfn_slow() to hva_to_pfn().
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/kvm_main.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 0bc077213d3e..17acc75990a5 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2804,8 +2804,6 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
struct page *page;
int npages;
- might_sleep();
-
if (writable)
*writable = write_fault;
@@ -2939,6 +2937,8 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool interruptible, bool *async,
kvm_pfn_t pfn;
int npages, r;
+ might_sleep();
+
if (hva_to_pfn_fast(addr, write_fault, writable, &pfn))
return pfn;
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 14/85] KVM: Return ERR_SIGPENDING from hva_to_pfn() if GUP returns -EGAIN
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (12 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 13/85] KVM: Annotate that all paths in hva_to_pfn() might sleep Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 15/85] KVM: Drop extra GUP (via check_user_page_hwpoison()) to detect poisoned page Sean Christopherson
` (72 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Treat an -EAGAIN return from GUP the same as -EINTR and immediately report
to the caller that a signal is pending. GUP only returns -EAGAIN if
the _initial_ mmap_read_lock_killable() fails, which in turn onnly fails
if a signal is pending
Note, rwsem_down_read_slowpath() actually returns -EINTR, so GUP is really
just making life harder than it needs to be. And the call to
mmap_read_lock_killable() in the retry path returns its -errno verbatim,
i.e. GUP (and thus KVM) is already handling locking failure this way, but
only some of the time.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/kvm_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 17acc75990a5..ebba5d22db2d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2946,7 +2946,7 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool interruptible, bool *async,
writable, &pfn);
if (npages == 1)
return pfn;
- if (npages == -EINTR)
+ if (npages == -EINTR || npages == -EAGAIN)
return KVM_PFN_ERR_SIGPENDING;
mmap_read_lock(current->mm);
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 15/85] KVM: Drop extra GUP (via check_user_page_hwpoison()) to detect poisoned page
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (13 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 14/85] KVM: Return ERR_SIGPENDING from hva_to_pfn() if GUP returns -EGAIN Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 16/85] KVM: Replace "async" pointer in gfn=>pfn with "no_wait" and error code Sean Christopherson
` (71 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Remove check_user_page_hwpoison() as it's effectively dead code. Prior to
commit 234b239bea39 ("kvm: Faults which trigger IO release the mmap_sem"),
hva_to_pfn_slow() wasn't actually a slow path in all cases, i.e. would do
get_user_pages_fast() without ever doing slow GUP with FOLL_HWPOISON.
Now that hva_to_pfn_slow() is a straight shot to get_user_pages_unlocked(),
and unconditionally passes FOLL_HWPOISON, it is impossible for hva_to_pfn()
to get an -errno that needs to be morphed to -EHWPOISON.
There are essentially four cases in KVM:
- npages == 0, then FOLL_NOWAIT, a.k.a. @async, must be true, and thus
check_user_page_hwpoison() will not be called
- npages == 1 || npages == -EHWPOISON, all good
- npages == -EINTR || npages == -EAGAIN, bail early, all good
- everything else, including -EFAULT, can go down the vma_lookup() path,
as npages < 0 means KVM went through hva_to_pfn_slow() which passes
FOLL_HWPOISON
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/kvm_main.c | 17 ++---------------
1 file changed, 2 insertions(+), 15 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ebba5d22db2d..87f81e74cbc0 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2746,14 +2746,6 @@ unsigned long kvm_vcpu_gfn_to_hva_prot(struct kvm_vcpu *vcpu, gfn_t gfn, bool *w
return gfn_to_hva_memslot_prot(slot, gfn, writable);
}
-static inline int check_user_page_hwpoison(unsigned long addr)
-{
- int rc, flags = FOLL_HWPOISON | FOLL_WRITE;
-
- rc = get_user_pages(addr, 1, flags, NULL);
- return rc == -EHWPOISON;
-}
-
/*
* The fast path to get the writable pfn which will be stored in @pfn,
* true indicates success, otherwise false is returned.
@@ -2948,14 +2940,10 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool interruptible, bool *async,
return pfn;
if (npages == -EINTR || npages == -EAGAIN)
return KVM_PFN_ERR_SIGPENDING;
+ if (npages == -EHWPOISON)
+ return KVM_PFN_ERR_HWPOISON;
mmap_read_lock(current->mm);
- if (npages == -EHWPOISON ||
- (!async && check_user_page_hwpoison(addr))) {
- pfn = KVM_PFN_ERR_HWPOISON;
- goto exit;
- }
-
retry:
vma = vma_lookup(current->mm, addr);
@@ -2972,7 +2960,6 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool interruptible, bool *async,
*async = true;
pfn = KVM_PFN_ERR_FAULT;
}
-exit:
mmap_read_unlock(current->mm);
return pfn;
}
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 16/85] KVM: Replace "async" pointer in gfn=>pfn with "no_wait" and error code
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (14 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 15/85] KVM: Drop extra GUP (via check_user_page_hwpoison()) to detect poisoned page Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 17/85] KVM: x86/mmu: Drop kvm_page_fault.hva, i.e. don't track intermediate hva Sean Christopherson
` (70 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
From: David Stevens <stevensd@chromium.org>
Add a pfn error code to communicate that hva_to_pfn() failed because I/O
was needed and disallowed, and convert @async to a constant @no_wait
boolean. This will allow eliminating the @no_wait param by having callers
pass in FOLL_NOWAIT along with other FOLL_* flags.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: David Stevens <stevensd@chromium.org>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/mmu/mmu.c | 18 +++++++++++-------
include/linux/kvm_host.h | 3 ++-
virt/kvm/kvm_main.c | 27 ++++++++++++++-------------
virt/kvm/kvm_mm.h | 2 +-
virt/kvm/pfncache.c | 4 ++--
5 files changed, 30 insertions(+), 24 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 0e235f276ee5..fa8f3fb7c14b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4374,17 +4374,21 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
- bool async;
-
if (fault->is_private)
return kvm_faultin_pfn_private(vcpu, fault);
- async = false;
- fault->pfn = __gfn_to_pfn_memslot(fault->slot, fault->gfn, false, &async,
+ fault->pfn = __gfn_to_pfn_memslot(fault->slot, fault->gfn, false, true,
fault->write, &fault->map_writable,
&fault->hva);
- if (!async)
- return RET_PF_CONTINUE; /* *pfn has correct page already */
+
+ /*
+ * If resolving the page failed because I/O is needed to fault-in the
+ * page, then either set up an asynchronous #PF to do the I/O, or if
+ * doing an async #PF isn't possible, retry with I/O allowed. All
+ * other failures are terminal, i.e. retrying won't help.
+ */
+ if (fault->pfn != KVM_PFN_ERR_NEEDS_IO)
+ return RET_PF_CONTINUE;
if (!fault->prefetch && kvm_can_do_async_pf(vcpu)) {
trace_kvm_try_async_get_page(fault->addr, fault->gfn);
@@ -4402,7 +4406,7 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
* to wait for IO. Note, gup always bails if it is unable to quickly
* get a page and a fatal signal, i.e. SIGKILL, is pending.
*/
- fault->pfn = __gfn_to_pfn_memslot(fault->slot, fault->gfn, true, NULL,
+ fault->pfn = __gfn_to_pfn_memslot(fault->slot, fault->gfn, true, true,
fault->write, &fault->map_writable,
&fault->hva);
return RET_PF_CONTINUE;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 2faafc7a56ae..071a0a1f1c60 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -97,6 +97,7 @@
#define KVM_PFN_ERR_HWPOISON (KVM_PFN_ERR_MASK + 1)
#define KVM_PFN_ERR_RO_FAULT (KVM_PFN_ERR_MASK + 2)
#define KVM_PFN_ERR_SIGPENDING (KVM_PFN_ERR_MASK + 3)
+#define KVM_PFN_ERR_NEEDS_IO (KVM_PFN_ERR_MASK + 4)
/*
* error pfns indicate that the gfn is in slot but faild to
@@ -1233,7 +1234,7 @@ kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
bool *writable);
kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn);
kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
- bool interruptible, bool *async,
+ bool interruptible, bool no_wait,
bool write_fault, bool *writable, hva_t *hva);
void kvm_release_pfn_clean(kvm_pfn_t pfn);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 87f81e74cbc0..dd5839abef6c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2778,7 +2778,7 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
* The slow path to get the pfn of the specified host virtual address,
* 1 indicates success, -errno is returned if error is detected.
*/
-static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
+static int hva_to_pfn_slow(unsigned long addr, bool no_wait, bool write_fault,
bool interruptible, bool *writable, kvm_pfn_t *pfn)
{
/*
@@ -2801,7 +2801,7 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
if (write_fault)
flags |= FOLL_WRITE;
- if (async)
+ if (no_wait)
flags |= FOLL_NOWAIT;
if (interruptible)
flags |= FOLL_INTERRUPTIBLE;
@@ -2912,8 +2912,8 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
* Pin guest page in memory and return its pfn.
* @addr: host virtual address which maps memory to the guest
* @interruptible: whether the process can be interrupted by non-fatal signals
- * @async: whether this function need to wait IO complete if the
- * host page is not in the memory
+ * @no_wait: whether or not this function need to wait IO complete if the
+ * host page is not in the memory
* @write_fault: whether we should get a writable host page
* @writable: whether it allows to map a writable host page for !@write_fault
*
@@ -2922,7 +2922,7 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
* 2): @write_fault = false && @writable, @writable will tell the caller
* whether the mapping is writable.
*/
-kvm_pfn_t hva_to_pfn(unsigned long addr, bool interruptible, bool *async,
+kvm_pfn_t hva_to_pfn(unsigned long addr, bool interruptible, bool no_wait,
bool write_fault, bool *writable)
{
struct vm_area_struct *vma;
@@ -2934,7 +2934,7 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool interruptible, bool *async,
if (hva_to_pfn_fast(addr, write_fault, writable, &pfn))
return pfn;
- npages = hva_to_pfn_slow(addr, async, write_fault, interruptible,
+ npages = hva_to_pfn_slow(addr, no_wait, write_fault, interruptible,
writable, &pfn);
if (npages == 1)
return pfn;
@@ -2956,16 +2956,17 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool interruptible, bool *async,
if (r < 0)
pfn = KVM_PFN_ERR_FAULT;
} else {
- if (async && vma_is_valid(vma, write_fault))
- *async = true;
- pfn = KVM_PFN_ERR_FAULT;
+ if (no_wait && vma_is_valid(vma, write_fault))
+ pfn = KVM_PFN_ERR_NEEDS_IO;
+ else
+ pfn = KVM_PFN_ERR_FAULT;
}
mmap_read_unlock(current->mm);
return pfn;
}
kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
- bool interruptible, bool *async,
+ bool interruptible, bool no_wait,
bool write_fault, bool *writable, hva_t *hva)
{
unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, write_fault);
@@ -2987,21 +2988,21 @@ kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
writable = NULL;
}
- return hva_to_pfn(addr, interruptible, async, write_fault, writable);
+ return hva_to_pfn(addr, interruptible, no_wait, write_fault, writable);
}
EXPORT_SYMBOL_GPL(__gfn_to_pfn_memslot);
kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
bool *writable)
{
- return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, NULL,
+ return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, false,
write_fault, writable, NULL);
}
EXPORT_SYMBOL_GPL(gfn_to_pfn_prot);
kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn)
{
- return __gfn_to_pfn_memslot(slot, gfn, false, NULL, true, NULL, NULL);
+ return __gfn_to_pfn_memslot(slot, gfn, false, false, true, NULL, NULL);
}
EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot);
diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h
index a3fa86f60d6c..51f3fee4ca3f 100644
--- a/virt/kvm/kvm_mm.h
+++ b/virt/kvm/kvm_mm.h
@@ -20,7 +20,7 @@
#define KVM_MMU_UNLOCK(kvm) spin_unlock(&(kvm)->mmu_lock)
#endif /* KVM_HAVE_MMU_RWLOCK */
-kvm_pfn_t hva_to_pfn(unsigned long addr, bool interruptible, bool *async,
+kvm_pfn_t hva_to_pfn(unsigned long addr, bool interruptible, bool no_wait,
bool write_fault, bool *writable);
#ifdef CONFIG_HAVE_KVM_PFNCACHE
diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 58c706a610e5..32dc61f48c81 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -197,8 +197,8 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
cond_resched();
}
- /* We always request a writeable mapping */
- new_pfn = hva_to_pfn(gpc->uhva, false, NULL, true, NULL);
+ /* We always request a writable mapping */
+ new_pfn = hva_to_pfn(gpc->uhva, false, false, true, NULL);
if (is_error_noslot_pfn(new_pfn))
goto out_error;
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 17/85] KVM: x86/mmu: Drop kvm_page_fault.hva, i.e. don't track intermediate hva
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (15 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 16/85] KVM: Replace "async" pointer in gfn=>pfn with "no_wait" and error code Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 18/85] KVM: Drop unused "hva" pointer from __gfn_to_pfn_memslot() Sean Christopherson
` (69 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Remove kvm_page_fault.hva as it is never read, only written. This will
allow removing the @hva param from __gfn_to_pfn_memslot().
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/mmu/mmu.c | 5 ++---
arch/x86/kvm/mmu/mmu_internal.h | 2 --
2 files changed, 2 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index fa8f3fb7c14b..c67228b46bd5 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3294,7 +3294,6 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu,
fault->slot = NULL;
fault->pfn = KVM_PFN_NOSLOT;
fault->map_writable = false;
- fault->hva = KVM_HVA_ERR_BAD;
/*
* If MMIO caching is disabled, emulate immediately without
@@ -4379,7 +4378,7 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
fault->pfn = __gfn_to_pfn_memslot(fault->slot, fault->gfn, false, true,
fault->write, &fault->map_writable,
- &fault->hva);
+ NULL);
/*
* If resolving the page failed because I/O is needed to fault-in the
@@ -4408,7 +4407,7 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
*/
fault->pfn = __gfn_to_pfn_memslot(fault->slot, fault->gfn, true, true,
fault->write, &fault->map_writable,
- &fault->hva);
+ NULL);
return RET_PF_CONTINUE;
}
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 4da83544c4e1..633aedec3c2e 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -238,7 +238,6 @@ struct kvm_page_fault {
/* Outputs of kvm_faultin_pfn. */
unsigned long mmu_seq;
kvm_pfn_t pfn;
- hva_t hva;
bool map_writable;
/*
@@ -313,7 +312,6 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
.is_private = err & PFERR_PRIVATE_ACCESS,
.pfn = KVM_PFN_ERR_FAULT,
- .hva = KVM_HVA_ERR_BAD,
};
int r;
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 18/85] KVM: Drop unused "hva" pointer from __gfn_to_pfn_memslot()
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (16 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 17/85] KVM: x86/mmu: Drop kvm_page_fault.hva, i.e. don't track intermediate hva Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 19/85] KVM: Introduce kvm_follow_pfn() to eventually replace "gfn_to_pfn" APIs Sean Christopherson
` (68 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Drop @hva from __gfn_to_pfn_memslot() now that all callers pass NULL.
No functional change intended.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/arm64/kvm/mmu.c | 2 +-
arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +-
arch/powerpc/kvm/book3s_64_mmu_radix.c | 2 +-
arch/x86/kvm/mmu/mmu.c | 6 ++----
include/linux/kvm_host.h | 2 +-
virt/kvm/kvm_main.c | 9 +++------
6 files changed, 9 insertions(+), 14 deletions(-)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index a6e62cc9015c..dd221587fcca 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1570,7 +1570,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
mmap_read_unlock(current->mm);
pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
- write_fault, &writable, NULL);
+ write_fault, &writable);
if (pfn == KVM_PFN_ERR_HWPOISON) {
kvm_send_hwpoison_signal(hva, vma_shift);
return 0;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 8cd02ca4b1b8..2f1d58984b41 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -614,7 +614,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_vcpu *vcpu,
} else {
/* Call KVM generic code to do the slow-path check */
pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
- writing, &write_ok, NULL);
+ writing, &write_ok);
if (is_error_noslot_pfn(pfn))
return -EFAULT;
page = NULL;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 26a969e935e3..8304b6f8fe45 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -853,7 +853,7 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu,
/* Call KVM generic code to do the slow-path check */
pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
- writing, upgrade_p, NULL);
+ writing, upgrade_p);
if (is_error_noslot_pfn(pfn))
return -EFAULT;
page = NULL;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c67228b46bd5..28f2b842d6ca 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4377,8 +4377,7 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
return kvm_faultin_pfn_private(vcpu, fault);
fault->pfn = __gfn_to_pfn_memslot(fault->slot, fault->gfn, false, true,
- fault->write, &fault->map_writable,
- NULL);
+ fault->write, &fault->map_writable);
/*
* If resolving the page failed because I/O is needed to fault-in the
@@ -4406,8 +4405,7 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
* get a page and a fatal signal, i.e. SIGKILL, is pending.
*/
fault->pfn = __gfn_to_pfn_memslot(fault->slot, fault->gfn, true, true,
- fault->write, &fault->map_writable,
- NULL);
+ fault->write, &fault->map_writable);
return RET_PF_CONTINUE;
}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 071a0a1f1c60..cbc7b9c04c14 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1235,7 +1235,7 @@ kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn);
kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
bool interruptible, bool no_wait,
- bool write_fault, bool *writable, hva_t *hva);
+ bool write_fault, bool *writable);
void kvm_release_pfn_clean(kvm_pfn_t pfn);
void kvm_release_pfn_dirty(kvm_pfn_t pfn);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index dd5839abef6c..10071f31b2ca 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2967,13 +2967,10 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool interruptible, bool no_wait,
kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
bool interruptible, bool no_wait,
- bool write_fault, bool *writable, hva_t *hva)
+ bool write_fault, bool *writable)
{
unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, write_fault);
- if (hva)
- *hva = addr;
-
if (kvm_is_error_hva(addr)) {
if (writable)
*writable = false;
@@ -2996,13 +2993,13 @@ kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
bool *writable)
{
return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, false,
- write_fault, writable, NULL);
+ write_fault, writable);
}
EXPORT_SYMBOL_GPL(gfn_to_pfn_prot);
kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn)
{
- return __gfn_to_pfn_memslot(slot, gfn, false, false, true, NULL, NULL);
+ return __gfn_to_pfn_memslot(slot, gfn, false, false, true, NULL);
}
EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot);
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 19/85] KVM: Introduce kvm_follow_pfn() to eventually replace "gfn_to_pfn" APIs
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (17 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 18/85] KVM: Drop unused "hva" pointer from __gfn_to_pfn_memslot() Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-21 8:49 ` Yan Zhao
2024-10-10 18:23 ` [PATCH v13 20/85] KVM: Remove pointless sanity check on @map param to kvm_vcpu_(un)map() Sean Christopherson
` (67 subsequent siblings)
86 siblings, 1 reply; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
From: David Stevens <stevensd@chromium.org>
Introduce kvm_follow_pfn() to eventually supplant the various "gfn_to_pfn"
APIs, albeit by adding more wrappers. The primary motivation of the new
helper is to pass a structure instead of an ever changing set of parameters,
e.g. so that tweaking the behavior, inputs, and/or outputs of the "to pfn"
helpers doesn't require churning half of KVM.
In the more distant future, the APIs exposed to arch code could also
follow suit, e.g. by adding something akin to x86's "struct kvm_page_fault"
when faulting in guest memory. But for now, the goal is purely to clean
up KVM's "internal" MMU code.
As part of the conversion, replace the write_fault, interruptible, and
no-wait boolean flags with FOLL_WRITE, FOLL_INTERRUPTIBLE, and FOLL_NOWAIT
respectively. Collecting the various FOLL_* flags into a single field
will again ease the pain of passing new flags.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: David Stevens <stevensd@chromium.org>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/kvm_main.c | 162 +++++++++++++++++++++++---------------------
virt/kvm/kvm_mm.h | 20 +++++-
virt/kvm/pfncache.c | 9 ++-
3 files changed, 109 insertions(+), 82 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 10071f31b2ca..52629ac26119 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2750,8 +2750,7 @@ unsigned long kvm_vcpu_gfn_to_hva_prot(struct kvm_vcpu *vcpu, gfn_t gfn, bool *w
* The fast path to get the writable pfn which will be stored in @pfn,
* true indicates success, otherwise false is returned.
*/
-static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
- bool *writable, kvm_pfn_t *pfn)
+static bool hva_to_pfn_fast(struct kvm_follow_pfn *kfp, kvm_pfn_t *pfn)
{
struct page *page[1];
@@ -2760,14 +2759,13 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
* or the caller allows to map a writable pfn for a read fault
* request.
*/
- if (!(write_fault || writable))
+ if (!((kfp->flags & FOLL_WRITE) || kfp->map_writable))
return false;
- if (get_user_page_fast_only(addr, FOLL_WRITE, page)) {
+ if (get_user_page_fast_only(kfp->hva, FOLL_WRITE, page)) {
*pfn = page_to_pfn(page[0]);
-
- if (writable)
- *writable = true;
+ if (kfp->map_writable)
+ *kfp->map_writable = true;
return true;
}
@@ -2778,8 +2776,7 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
* The slow path to get the pfn of the specified host virtual address,
* 1 indicates success, -errno is returned if error is detected.
*/
-static int hva_to_pfn_slow(unsigned long addr, bool no_wait, bool write_fault,
- bool interruptible, bool *writable, kvm_pfn_t *pfn)
+static int hva_to_pfn_slow(struct kvm_follow_pfn *kfp, kvm_pfn_t *pfn)
{
/*
* When a VCPU accesses a page that is not mapped into the secondary
@@ -2792,34 +2789,30 @@ static int hva_to_pfn_slow(unsigned long addr, bool no_wait, bool write_fault,
* Note that get_user_page_fast_only() and FOLL_WRITE for now
* implicitly honor NUMA hinting faults and don't need this flag.
*/
- unsigned int flags = FOLL_HWPOISON | FOLL_HONOR_NUMA_FAULT;
- struct page *page;
+ unsigned int flags = FOLL_HWPOISON | FOLL_HONOR_NUMA_FAULT | kfp->flags;
+ struct page *page, *wpage;
int npages;
- if (writable)
- *writable = write_fault;
-
- if (write_fault)
- flags |= FOLL_WRITE;
- if (no_wait)
- flags |= FOLL_NOWAIT;
- if (interruptible)
- flags |= FOLL_INTERRUPTIBLE;
-
- npages = get_user_pages_unlocked(addr, 1, &page, flags);
+ npages = get_user_pages_unlocked(kfp->hva, 1, &page, flags);
if (npages != 1)
return npages;
+ if (!kfp->map_writable)
+ goto out;
+
+ if (kfp->flags & FOLL_WRITE) {
+ *kfp->map_writable = true;
+ goto out;
+ }
+
/* map read fault as writable if possible */
- if (unlikely(!write_fault) && writable) {
- struct page *wpage;
-
- if (get_user_page_fast_only(addr, FOLL_WRITE, &wpage)) {
- *writable = true;
- put_page(page);
- page = wpage;
- }
+ if (get_user_page_fast_only(kfp->hva, FOLL_WRITE, &wpage)) {
+ *kfp->map_writable = true;
+ put_page(page);
+ page = wpage;
}
+
+out:
*pfn = page_to_pfn(page);
return npages;
}
@@ -2846,10 +2839,10 @@ static int kvm_try_get_pfn(kvm_pfn_t pfn)
}
static int hva_to_pfn_remapped(struct vm_area_struct *vma,
- unsigned long addr, bool write_fault,
- bool *writable, kvm_pfn_t *p_pfn)
+ struct kvm_follow_pfn *kfp, kvm_pfn_t *p_pfn)
{
- struct follow_pfnmap_args args = { .vma = vma, .address = addr };
+ struct follow_pfnmap_args args = { .vma = vma, .address = kfp->hva };
+ bool write_fault = kfp->flags & FOLL_WRITE;
kvm_pfn_t pfn;
int r;
@@ -2860,7 +2853,7 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
* not call the fault handler, so do it here.
*/
bool unlocked = false;
- r = fixup_user_fault(current->mm, addr,
+ r = fixup_user_fault(current->mm, kfp->hva,
(write_fault ? FAULT_FLAG_WRITE : 0),
&unlocked);
if (unlocked)
@@ -2878,8 +2871,8 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
goto out;
}
- if (writable)
- *writable = args.writable;
+ if (kfp->map_writable)
+ *kfp->map_writable = args.writable;
pfn = args.pfn;
/*
@@ -2908,22 +2901,7 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
return r;
}
-/*
- * Pin guest page in memory and return its pfn.
- * @addr: host virtual address which maps memory to the guest
- * @interruptible: whether the process can be interrupted by non-fatal signals
- * @no_wait: whether or not this function need to wait IO complete if the
- * host page is not in the memory
- * @write_fault: whether we should get a writable host page
- * @writable: whether it allows to map a writable host page for !@write_fault
- *
- * The function will map a writable host page for these two cases:
- * 1): @write_fault = true
- * 2): @write_fault = false && @writable, @writable will tell the caller
- * whether the mapping is writable.
- */
-kvm_pfn_t hva_to_pfn(unsigned long addr, bool interruptible, bool no_wait,
- bool write_fault, bool *writable)
+kvm_pfn_t hva_to_pfn(struct kvm_follow_pfn *kfp)
{
struct vm_area_struct *vma;
kvm_pfn_t pfn;
@@ -2931,11 +2909,10 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool interruptible, bool no_wait,
might_sleep();
- if (hva_to_pfn_fast(addr, write_fault, writable, &pfn))
+ if (hva_to_pfn_fast(kfp, &pfn))
return pfn;
- npages = hva_to_pfn_slow(addr, no_wait, write_fault, interruptible,
- writable, &pfn);
+ npages = hva_to_pfn_slow(kfp, &pfn);
if (npages == 1)
return pfn;
if (npages == -EINTR || npages == -EAGAIN)
@@ -2945,18 +2922,19 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool interruptible, bool no_wait,
mmap_read_lock(current->mm);
retry:
- vma = vma_lookup(current->mm, addr);
+ vma = vma_lookup(current->mm, kfp->hva);
if (vma == NULL)
pfn = KVM_PFN_ERR_FAULT;
else if (vma->vm_flags & (VM_IO | VM_PFNMAP)) {
- r = hva_to_pfn_remapped(vma, addr, write_fault, writable, &pfn);
+ r = hva_to_pfn_remapped(vma, kfp, &pfn);
if (r == -EAGAIN)
goto retry;
if (r < 0)
pfn = KVM_PFN_ERR_FAULT;
} else {
- if (no_wait && vma_is_valid(vma, write_fault))
+ if ((kfp->flags & FOLL_NOWAIT) &&
+ vma_is_valid(vma, kfp->flags & FOLL_WRITE))
pfn = KVM_PFN_ERR_NEEDS_IO;
else
pfn = KVM_PFN_ERR_FAULT;
@@ -2965,41 +2943,69 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool interruptible, bool no_wait,
return pfn;
}
+static kvm_pfn_t kvm_follow_pfn(struct kvm_follow_pfn *kfp)
+{
+ kfp->hva = __gfn_to_hva_many(kfp->slot, kfp->gfn, NULL,
+ kfp->flags & FOLL_WRITE);
+
+ if (kfp->hva == KVM_HVA_ERR_RO_BAD)
+ return KVM_PFN_ERR_RO_FAULT;
+
+ if (kvm_is_error_hva(kfp->hva))
+ return KVM_PFN_NOSLOT;
+
+ if (memslot_is_readonly(kfp->slot) && kfp->map_writable) {
+ *kfp->map_writable = false;
+ kfp->map_writable = NULL;
+ }
+
+ return hva_to_pfn(kfp);
+}
+
kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
bool interruptible, bool no_wait,
bool write_fault, bool *writable)
{
- unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, write_fault);
-
- if (kvm_is_error_hva(addr)) {
- if (writable)
- *writable = false;
-
- return addr == KVM_HVA_ERR_RO_BAD ? KVM_PFN_ERR_RO_FAULT :
- KVM_PFN_NOSLOT;
- }
-
- /* Do not map writable pfn in the readonly memslot. */
- if (writable && memslot_is_readonly(slot)) {
- *writable = false;
- writable = NULL;
- }
-
- return hva_to_pfn(addr, interruptible, no_wait, write_fault, writable);
+ struct kvm_follow_pfn kfp = {
+ .slot = slot,
+ .gfn = gfn,
+ .map_writable = writable,
+ };
+
+ if (write_fault)
+ kfp.flags |= FOLL_WRITE;
+ if (no_wait)
+ kfp.flags |= FOLL_NOWAIT;
+ if (interruptible)
+ kfp.flags |= FOLL_INTERRUPTIBLE;
+
+ return kvm_follow_pfn(&kfp);
}
EXPORT_SYMBOL_GPL(__gfn_to_pfn_memslot);
kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
bool *writable)
{
- return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, false,
- write_fault, writable);
+ struct kvm_follow_pfn kfp = {
+ .slot = gfn_to_memslot(kvm, gfn),
+ .gfn = gfn,
+ .flags = write_fault ? FOLL_WRITE : 0,
+ .map_writable = writable,
+ };
+
+ return kvm_follow_pfn(&kfp);
}
EXPORT_SYMBOL_GPL(gfn_to_pfn_prot);
kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn)
{
- return __gfn_to_pfn_memslot(slot, gfn, false, false, true, NULL);
+ struct kvm_follow_pfn kfp = {
+ .slot = slot,
+ .gfn = gfn,
+ .flags = FOLL_WRITE,
+ };
+
+ return kvm_follow_pfn(&kfp);
}
EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot);
diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h
index 51f3fee4ca3f..d5a215958f06 100644
--- a/virt/kvm/kvm_mm.h
+++ b/virt/kvm/kvm_mm.h
@@ -20,8 +20,24 @@
#define KVM_MMU_UNLOCK(kvm) spin_unlock(&(kvm)->mmu_lock)
#endif /* KVM_HAVE_MMU_RWLOCK */
-kvm_pfn_t hva_to_pfn(unsigned long addr, bool interruptible, bool no_wait,
- bool write_fault, bool *writable);
+
+struct kvm_follow_pfn {
+ const struct kvm_memory_slot *slot;
+ const gfn_t gfn;
+
+ unsigned long hva;
+
+ /* FOLL_* flags modifying lookup behavior, e.g. FOLL_WRITE. */
+ unsigned int flags;
+
+ /*
+ * If non-NULL, try to get a writable mapping even for a read fault.
+ * Set to true if a writable mapping was obtained.
+ */
+ bool *map_writable;
+};
+
+kvm_pfn_t hva_to_pfn(struct kvm_follow_pfn *kfp);
#ifdef CONFIG_HAVE_KVM_PFNCACHE
void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm,
diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 32dc61f48c81..067daf9ad6ef 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -159,6 +159,12 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
kvm_pfn_t new_pfn = KVM_PFN_ERR_FAULT;
void *new_khva = NULL;
unsigned long mmu_seq;
+ struct kvm_follow_pfn kfp = {
+ .slot = gpc->memslot,
+ .gfn = gpa_to_gfn(gpc->gpa),
+ .flags = FOLL_WRITE,
+ .hva = gpc->uhva,
+ };
lockdep_assert_held(&gpc->refresh_lock);
@@ -197,8 +203,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
cond_resched();
}
- /* We always request a writable mapping */
- new_pfn = hva_to_pfn(gpc->uhva, false, false, true, NULL);
+ new_pfn = hva_to_pfn(&kfp);
if (is_error_noslot_pfn(new_pfn))
goto out_error;
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* Re: [PATCH v13 19/85] KVM: Introduce kvm_follow_pfn() to eventually replace "gfn_to_pfn" APIs
2024-10-10 18:23 ` [PATCH v13 19/85] KVM: Introduce kvm_follow_pfn() to eventually replace "gfn_to_pfn" APIs Sean Christopherson
@ 2024-10-21 8:49 ` Yan Zhao
2024-10-21 18:08 ` Sean Christopherson
0 siblings, 1 reply; 99+ messages in thread
From: Yan Zhao @ 2024-10-21 8:49 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, David Matlack, David Stevens, Andrew Jones
On Thu, Oct 10, 2024 at 11:23:21AM -0700, Sean Christopherson wrote:
> --- a/virt/kvm/pfncache.c
> +++ b/virt/kvm/pfncache.c
> @@ -159,6 +159,12 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
> kvm_pfn_t new_pfn = KVM_PFN_ERR_FAULT;
> void *new_khva = NULL;
> unsigned long mmu_seq;
> + struct kvm_follow_pfn kfp = {
> + .slot = gpc->memslot,
> + .gfn = gpa_to_gfn(gpc->gpa),
> + .flags = FOLL_WRITE,
> + .hva = gpc->uhva,
> + };
Is .map_writable uninitialized?
>
> lockdep_assert_held(&gpc->refresh_lock);
>
> @@ -197,8 +203,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
> cond_resched();
> }
>
> - /* We always request a writable mapping */
> - new_pfn = hva_to_pfn(gpc->uhva, false, false, true, NULL);
> + new_pfn = hva_to_pfn(&kfp);
> if (is_error_noslot_pfn(new_pfn))
> goto out_error;
>
> --
> 2.47.0.rc1.288.g06298d1525-goog
>
^ permalink raw reply [flat|nested] 99+ messages in thread
* Re: [PATCH v13 19/85] KVM: Introduce kvm_follow_pfn() to eventually replace "gfn_to_pfn" APIs
2024-10-21 8:49 ` Yan Zhao
@ 2024-10-21 18:08 ` Sean Christopherson
2024-10-22 1:25 ` Yan Zhao
0 siblings, 1 reply; 99+ messages in thread
From: Sean Christopherson @ 2024-10-21 18:08 UTC (permalink / raw)
To: Yan Zhao
Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, David Matlack, David Stevens, Andrew Jones
On Mon, Oct 21, 2024, Yan Zhao wrote:
> On Thu, Oct 10, 2024 at 11:23:21AM -0700, Sean Christopherson wrote:
> > --- a/virt/kvm/pfncache.c
> > +++ b/virt/kvm/pfncache.c
> > @@ -159,6 +159,12 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
> > kvm_pfn_t new_pfn = KVM_PFN_ERR_FAULT;
> > void *new_khva = NULL;
> > unsigned long mmu_seq;
> > + struct kvm_follow_pfn kfp = {
> > + .slot = gpc->memslot,
> > + .gfn = gpa_to_gfn(gpc->gpa),
> > + .flags = FOLL_WRITE,
> > + .hva = gpc->uhva,
> > + };
> Is .map_writable uninitialized?
Nope, per C99, "subobjects without explicit initializers are initialized to zero",
i.e. map_writable is initialized to "false".
^ permalink raw reply [flat|nested] 99+ messages in thread
* Re: [PATCH v13 19/85] KVM: Introduce kvm_follow_pfn() to eventually replace "gfn_to_pfn" APIs
2024-10-21 18:08 ` Sean Christopherson
@ 2024-10-22 1:25 ` Yan Zhao
0 siblings, 0 replies; 99+ messages in thread
From: Yan Zhao @ 2024-10-22 1:25 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, David Matlack, David Stevens, Andrew Jones
On Mon, Oct 21, 2024 at 11:08:49AM -0700, Sean Christopherson wrote:
> On Mon, Oct 21, 2024, Yan Zhao wrote:
> > On Thu, Oct 10, 2024 at 11:23:21AM -0700, Sean Christopherson wrote:
> > > --- a/virt/kvm/pfncache.c
> > > +++ b/virt/kvm/pfncache.c
> > > @@ -159,6 +159,12 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
> > > kvm_pfn_t new_pfn = KVM_PFN_ERR_FAULT;
> > > void *new_khva = NULL;
> > > unsigned long mmu_seq;
> > > + struct kvm_follow_pfn kfp = {
> > > + .slot = gpc->memslot,
> > > + .gfn = gpa_to_gfn(gpc->gpa),
> > > + .flags = FOLL_WRITE,
> > > + .hva = gpc->uhva,
> > > + };
> > Is .map_writable uninitialized?
>
> Nope, per C99, "subobjects without explicit initializers are initialized to zero",
> i.e. map_writable is initialized to "false".
Ah, thanks, good to know that!
^ permalink raw reply [flat|nested] 99+ messages in thread
* [PATCH v13 20/85] KVM: Remove pointless sanity check on @map param to kvm_vcpu_(un)map()
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (18 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 19/85] KVM: Introduce kvm_follow_pfn() to eventually replace "gfn_to_pfn" APIs Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 21/85] KVM: Explicitly initialize all fields at the start of kvm_vcpu_map() Sean Christopherson
` (66 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Drop kvm_vcpu_{,un}map()'s useless checks on @map being non-NULL. The map
is 100% kernel controlled, any caller that passes a NULL pointer is broken
and needs to be fixed, i.e. a crash due to a NULL pointer dereference is
desirable (though obviously not as desirable as not having a bug in the
first place).
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/kvm_main.c | 6 ------
1 file changed, 6 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 52629ac26119..c7691bc40389 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3071,9 +3071,6 @@ int kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map)
void *hva = NULL;
struct page *page = KVM_UNMAPPED_PAGE;
- if (!map)
- return -EINVAL;
-
pfn = gfn_to_pfn(vcpu->kvm, gfn);
if (is_error_noslot_pfn(pfn))
return -EINVAL;
@@ -3101,9 +3098,6 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_map);
void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map, bool dirty)
{
- if (!map)
- return;
-
if (!map->hva)
return;
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 21/85] KVM: Explicitly initialize all fields at the start of kvm_vcpu_map()
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (19 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 20/85] KVM: Remove pointless sanity check on @map param to kvm_vcpu_(un)map() Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 22/85] KVM: Use NULL for struct page pointer to indicate mremapped memory Sean Christopherson
` (65 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Explicitly initialize the entire kvm_host_map structure when mapping a
pfn, as some callers declare their struct on the stack, i.e. don't
zero-initialize the struct, which makes the map->hva in kvm_vcpu_unmap()
*very* suspect.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/kvm_main.c | 40 ++++++++++++++++------------------------
1 file changed, 16 insertions(+), 24 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c7691bc40389..f1c9a781315c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3067,32 +3067,24 @@ void kvm_release_pfn(kvm_pfn_t pfn, bool dirty)
int kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map)
{
- kvm_pfn_t pfn;
- void *hva = NULL;
- struct page *page = KVM_UNMAPPED_PAGE;
-
- pfn = gfn_to_pfn(vcpu->kvm, gfn);
- if (is_error_noslot_pfn(pfn))
- return -EINVAL;
-
- if (pfn_valid(pfn)) {
- page = pfn_to_page(pfn);
- hva = kmap(page);
-#ifdef CONFIG_HAS_IOMEM
- } else {
- hva = memremap(pfn_to_hpa(pfn), PAGE_SIZE, MEMREMAP_WB);
-#endif
- }
-
- if (!hva)
- return -EFAULT;
-
- map->page = page;
- map->hva = hva;
- map->pfn = pfn;
+ map->page = KVM_UNMAPPED_PAGE;
+ map->hva = NULL;
map->gfn = gfn;
- return 0;
+ map->pfn = gfn_to_pfn(vcpu->kvm, gfn);
+ if (is_error_noslot_pfn(map->pfn))
+ return -EINVAL;
+
+ if (pfn_valid(map->pfn)) {
+ map->page = pfn_to_page(map->pfn);
+ map->hva = kmap(map->page);
+#ifdef CONFIG_HAS_IOMEM
+ } else {
+ map->hva = memremap(pfn_to_hpa(map->pfn), PAGE_SIZE, MEMREMAP_WB);
+#endif
+ }
+
+ return map->hva ? 0 : -EFAULT;
}
EXPORT_SYMBOL_GPL(kvm_vcpu_map);
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 22/85] KVM: Use NULL for struct page pointer to indicate mremapped memory
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (20 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 21/85] KVM: Explicitly initialize all fields at the start of kvm_vcpu_map() Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 23/85] KVM: nVMX: Rely on kvm_vcpu_unmap() to track validity of eVMCS mapping Sean Christopherson
` (64 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Drop yet another unnecessary magic page value from KVM, as there's zero
reason to use a poisoned pointer to indicate "no page". If KVM uses a
NULL page pointer, the kernel will explode just as quickly as if KVM uses
a poisoned pointer. Never mind the fact that such usage would be a
blatant and egregious KVM bug.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
include/linux/kvm_host.h | 4 ----
virt/kvm/kvm_main.c | 4 ++--
2 files changed, 2 insertions(+), 6 deletions(-)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index cbc7b9c04c14..e3c01cbbc41a 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -273,16 +273,12 @@ enum {
READING_SHADOW_PAGE_TABLES,
};
-#define KVM_UNMAPPED_PAGE ((void *) 0x500 + POISON_POINTER_DELTA)
-
struct kvm_host_map {
/*
* Only valid if the 'pfn' is managed by the host kernel (i.e. There is
* a 'struct page' for it. When using mem= kernel parameter some memory
* can be used as guest memory but they are not managed by host
* kernel).
- * If 'pfn' is not managed by the host kernel, this field is
- * initialized to KVM_UNMAPPED_PAGE.
*/
struct page *page;
void *hva;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f1c9a781315c..7acb1a8af2e4 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3067,7 +3067,7 @@ void kvm_release_pfn(kvm_pfn_t pfn, bool dirty)
int kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map)
{
- map->page = KVM_UNMAPPED_PAGE;
+ map->page = NULL;
map->hva = NULL;
map->gfn = gfn;
@@ -3093,7 +3093,7 @@ void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map, bool dirty)
if (!map->hva)
return;
- if (map->page != KVM_UNMAPPED_PAGE)
+ if (map->page)
kunmap(map->page);
#ifdef CONFIG_HAS_IOMEM
else
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 23/85] KVM: nVMX: Rely on kvm_vcpu_unmap() to track validity of eVMCS mapping
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (21 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 22/85] KVM: Use NULL for struct page pointer to indicate mremapped memory Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 24/85] KVM: nVMX: Drop pointless msr_bitmap_map field from struct nested_vmx Sean Christopherson
` (63 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Remove the explicit evmptr12 validity check when deciding whether or not
to unmap the eVMCS pointer, and instead rely on kvm_vcpu_unmap() to play
nice with a NULL map->hva, i.e. to do nothing if the map is invalid.
Note, vmx->nested.hv_evmcs_map is zero-allocated along with the rest of
vcpu_vmx, i.e. the map starts out invalid/NULL.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/vmx/nested.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index a8e7bc04d9bf..e94a25373a59 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -231,11 +231,8 @@ static inline void nested_release_evmcs(struct kvm_vcpu *vcpu)
struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
struct vcpu_vmx *vmx = to_vmx(vcpu);
- if (nested_vmx_is_evmptr12_valid(vmx)) {
- kvm_vcpu_unmap(vcpu, &vmx->nested.hv_evmcs_map, true);
- vmx->nested.hv_evmcs = NULL;
- }
-
+ kvm_vcpu_unmap(vcpu, &vmx->nested.hv_evmcs_map, true);
+ vmx->nested.hv_evmcs = NULL;
vmx->nested.hv_evmcs_vmptr = EVMPTR_INVALID;
if (hv_vcpu) {
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 24/85] KVM: nVMX: Drop pointless msr_bitmap_map field from struct nested_vmx
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (22 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 23/85] KVM: nVMX: Rely on kvm_vcpu_unmap() to track validity of eVMCS mapping Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 25/85] KVM: nVMX: Add helper to put (unmap) vmcs12 pages Sean Christopherson
` (62 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Remove vcpu_vmx.msr_bitmap_map and instead use an on-stack structure in
the one function that uses the map, nested_vmx_prepare_msr_bitmap().
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/vmx/nested.c | 8 ++++----
arch/x86/kvm/vmx/vmx.h | 2 --
2 files changed, 4 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index e94a25373a59..fb37658b62c9 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -621,7 +621,7 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
int msr;
unsigned long *msr_bitmap_l1;
unsigned long *msr_bitmap_l0 = vmx->nested.vmcs02.msr_bitmap;
- struct kvm_host_map *map = &vmx->nested.msr_bitmap_map;
+ struct kvm_host_map msr_bitmap_map;
/* Nothing to do if the MSR bitmap is not in use. */
if (!cpu_has_vmx_msr_bitmap() ||
@@ -644,10 +644,10 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
return true;
}
- if (kvm_vcpu_map(vcpu, gpa_to_gfn(vmcs12->msr_bitmap), map))
+ if (kvm_vcpu_map(vcpu, gpa_to_gfn(vmcs12->msr_bitmap), &msr_bitmap_map))
return false;
- msr_bitmap_l1 = (unsigned long *)map->hva;
+ msr_bitmap_l1 = (unsigned long *)msr_bitmap_map.hva;
/*
* To keep the control flow simple, pay eight 8-byte writes (sixteen
@@ -711,7 +711,7 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
MSR_IA32_FLUSH_CMD, MSR_TYPE_W);
- kvm_vcpu_unmap(vcpu, &vmx->nested.msr_bitmap_map, false);
+ kvm_vcpu_unmap(vcpu, &msr_bitmap_map, false);
vmx->nested.force_msr_bitmap_recalc = false;
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 2325f773a20b..40303b43da6c 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -200,8 +200,6 @@ struct nested_vmx {
struct kvm_host_map virtual_apic_map;
struct kvm_host_map pi_desc_map;
- struct kvm_host_map msr_bitmap_map;
-
struct pi_desc *pi_desc;
bool pi_pending;
u16 posted_intr_nv;
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 25/85] KVM: nVMX: Add helper to put (unmap) vmcs12 pages
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (23 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 24/85] KVM: nVMX: Drop pointless msr_bitmap_map field from struct nested_vmx Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 26/85] KVM: Use plain "struct page" pointer instead of single-entry array Sean Christopherson
` (61 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Add a helper to dedup unmapping the vmcs12 pages. This will reduce the
amount of churn when a future patch refactors the kvm_vcpu_unmap() API.
No functional change intended.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/vmx/nested.c | 32 ++++++++++++++++++--------------
1 file changed, 18 insertions(+), 14 deletions(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index fb37658b62c9..81865db18e12 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -314,6 +314,21 @@ static void vmx_switch_vmcs(struct kvm_vcpu *vcpu, struct loaded_vmcs *vmcs)
vcpu->arch.regs_dirty = 0;
}
+static void nested_put_vmcs12_pages(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+ /*
+ * Unpin physical memory we referred to in the vmcs02. The APIC access
+ * page's backing page (yeah, confusing) shouldn't actually be accessed,
+ * and if it is written, the contents are irrelevant.
+ */
+ kvm_vcpu_unmap(vcpu, &vmx->nested.apic_access_page_map, false);
+ kvm_vcpu_unmap(vcpu, &vmx->nested.virtual_apic_map, true);
+ kvm_vcpu_unmap(vcpu, &vmx->nested.pi_desc_map, true);
+ vmx->nested.pi_desc = NULL;
+}
+
/*
* Free whatever needs to be freed from vmx->nested when L1 goes down, or
* just stops using VMX.
@@ -346,15 +361,8 @@ static void free_nested(struct kvm_vcpu *vcpu)
vmx->nested.cached_vmcs12 = NULL;
kfree(vmx->nested.cached_shadow_vmcs12);
vmx->nested.cached_shadow_vmcs12 = NULL;
- /*
- * Unpin physical memory we referred to in the vmcs02. The APIC access
- * page's backing page (yeah, confusing) shouldn't actually be accessed,
- * and if it is written, the contents are irrelevant.
- */
- kvm_vcpu_unmap(vcpu, &vmx->nested.apic_access_page_map, false);
- kvm_vcpu_unmap(vcpu, &vmx->nested.virtual_apic_map, true);
- kvm_vcpu_unmap(vcpu, &vmx->nested.pi_desc_map, true);
- vmx->nested.pi_desc = NULL;
+
+ nested_put_vmcs12_pages(vcpu);
kvm_mmu_free_roots(vcpu->kvm, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL);
@@ -5010,11 +5018,7 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
vmx_update_cpu_dirty_logging(vcpu);
}
- /* Unpin physical memory we referred to in vmcs02 */
- kvm_vcpu_unmap(vcpu, &vmx->nested.apic_access_page_map, false);
- kvm_vcpu_unmap(vcpu, &vmx->nested.virtual_apic_map, true);
- kvm_vcpu_unmap(vcpu, &vmx->nested.pi_desc_map, true);
- vmx->nested.pi_desc = NULL;
+ nested_put_vmcs12_pages(vcpu);
if (vmx->nested.reload_vmcs01_apic_access_page) {
vmx->nested.reload_vmcs01_apic_access_page = false;
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 26/85] KVM: Use plain "struct page" pointer instead of single-entry array
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (24 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 25/85] KVM: nVMX: Add helper to put (unmap) vmcs12 pages Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 27/85] KVM: Provide refcounted page as output field in struct kvm_follow_pfn Sean Christopherson
` (60 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Use a single pointer instead of a single-entry array for the struct page
pointer in hva_to_pfn_fast(). Using an array makes the code unnecessarily
annoying to read and update.
No functional change intended.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/kvm_main.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7acb1a8af2e4..d3e48fcc4fb0 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2752,7 +2752,7 @@ unsigned long kvm_vcpu_gfn_to_hva_prot(struct kvm_vcpu *vcpu, gfn_t gfn, bool *w
*/
static bool hva_to_pfn_fast(struct kvm_follow_pfn *kfp, kvm_pfn_t *pfn)
{
- struct page *page[1];
+ struct page *page;
/*
* Fast pin a writable pfn only if it is a write fault request
@@ -2762,8 +2762,8 @@ static bool hva_to_pfn_fast(struct kvm_follow_pfn *kfp, kvm_pfn_t *pfn)
if (!((kfp->flags & FOLL_WRITE) || kfp->map_writable))
return false;
- if (get_user_page_fast_only(kfp->hva, FOLL_WRITE, page)) {
- *pfn = page_to_pfn(page[0]);
+ if (get_user_page_fast_only(kfp->hva, FOLL_WRITE, &page)) {
+ *pfn = page_to_pfn(page);
if (kfp->map_writable)
*kfp->map_writable = true;
return true;
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 27/85] KVM: Provide refcounted page as output field in struct kvm_follow_pfn
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (25 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 26/85] KVM: Use plain "struct page" pointer instead of single-entry array Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 28/85] KVM: Move kvm_{set,release}_page_{clean,dirty}() helpers up in kvm_main.c Sean Christopherson
` (59 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Add kvm_follow_pfn.refcounted_page as an output for the "to pfn" APIs to
"return" the struct page that is associated with the returned pfn (if KVM
acquired a reference to the page). This will eventually allow removing
KVM's hacky kvm_pfn_to_refcounted_page() code, which is error prone and
can't detect pfns that are valid, but aren't (currently) refcounted.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/kvm_main.c | 99 +++++++++++++++++++++------------------------
virt/kvm/kvm_mm.h | 9 +++++
2 files changed, 56 insertions(+), 52 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d3e48fcc4fb0..e29f78ed6f48 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2746,6 +2746,46 @@ unsigned long kvm_vcpu_gfn_to_hva_prot(struct kvm_vcpu *vcpu, gfn_t gfn, bool *w
return gfn_to_hva_memslot_prot(slot, gfn, writable);
}
+static kvm_pfn_t kvm_resolve_pfn(struct kvm_follow_pfn *kfp, struct page *page,
+ struct follow_pfnmap_args *map, bool writable)
+{
+ kvm_pfn_t pfn;
+
+ WARN_ON_ONCE(!!page == !!map);
+
+ if (kfp->map_writable)
+ *kfp->map_writable = writable;
+
+ /*
+ * FIXME: Remove this once KVM no longer blindly calls put_page() on
+ * every pfn that points at a struct page.
+ *
+ * Get a reference for follow_pte() pfns if they happen to point at a
+ * struct page, as KVM will ultimately call kvm_release_pfn_clean() on
+ * the returned pfn, i.e. KVM expects to have a reference.
+ *
+ * Certain IO or PFNMAP mappings can be backed with valid struct pages,
+ * but be allocated without refcounting, e.g. tail pages of
+ * non-compound higher order allocations. Grabbing and putting a
+ * reference to such pages would cause KVM to prematurely free a page
+ * it doesn't own (KVM gets and puts the one and only reference).
+ * Don't allow those pages until the FIXME is resolved.
+ */
+ if (map) {
+ pfn = map->pfn;
+ page = kvm_pfn_to_refcounted_page(pfn);
+ if (page && !get_page_unless_zero(page))
+ return KVM_PFN_ERR_FAULT;
+ } else {
+ pfn = page_to_pfn(page);
+ }
+
+ if (kfp->refcounted_page)
+ *kfp->refcounted_page = page;
+
+ return pfn;
+}
+
/*
* The fast path to get the writable pfn which will be stored in @pfn,
* true indicates success, otherwise false is returned.
@@ -2763,9 +2803,7 @@ static bool hva_to_pfn_fast(struct kvm_follow_pfn *kfp, kvm_pfn_t *pfn)
return false;
if (get_user_page_fast_only(kfp->hva, FOLL_WRITE, &page)) {
- *pfn = page_to_pfn(page);
- if (kfp->map_writable)
- *kfp->map_writable = true;
+ *pfn = kvm_resolve_pfn(kfp, page, NULL, true);
return true;
}
@@ -2797,23 +2835,15 @@ static int hva_to_pfn_slow(struct kvm_follow_pfn *kfp, kvm_pfn_t *pfn)
if (npages != 1)
return npages;
- if (!kfp->map_writable)
- goto out;
-
- if (kfp->flags & FOLL_WRITE) {
- *kfp->map_writable = true;
- goto out;
- }
-
/* map read fault as writable if possible */
- if (get_user_page_fast_only(kfp->hva, FOLL_WRITE, &wpage)) {
- *kfp->map_writable = true;
+ if (!(flags & FOLL_WRITE) && kfp->map_writable &&
+ get_user_page_fast_only(kfp->hva, FOLL_WRITE, &wpage)) {
put_page(page);
page = wpage;
+ flags |= FOLL_WRITE;
}
-out:
- *pfn = page_to_pfn(page);
+ *pfn = kvm_resolve_pfn(kfp, page, NULL, flags & FOLL_WRITE);
return npages;
}
@@ -2828,22 +2858,11 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
return true;
}
-static int kvm_try_get_pfn(kvm_pfn_t pfn)
-{
- struct page *page = kvm_pfn_to_refcounted_page(pfn);
-
- if (!page)
- return 1;
-
- return get_page_unless_zero(page);
-}
-
static int hva_to_pfn_remapped(struct vm_area_struct *vma,
struct kvm_follow_pfn *kfp, kvm_pfn_t *p_pfn)
{
struct follow_pfnmap_args args = { .vma = vma, .address = kfp->hva };
bool write_fault = kfp->flags & FOLL_WRITE;
- kvm_pfn_t pfn;
int r;
r = follow_pfnmap_start(&args);
@@ -2867,37 +2886,13 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
}
if (write_fault && !args.writable) {
- pfn = KVM_PFN_ERR_RO_FAULT;
+ *p_pfn = KVM_PFN_ERR_RO_FAULT;
goto out;
}
- if (kfp->map_writable)
- *kfp->map_writable = args.writable;
- pfn = args.pfn;
-
- /*
- * Get a reference here because callers of *hva_to_pfn* and
- * *gfn_to_pfn* ultimately call kvm_release_pfn_clean on the
- * returned pfn. This is only needed if the VMA has VM_MIXEDMAP
- * set, but the kvm_try_get_pfn/kvm_release_pfn_clean pair will
- * simply do nothing for reserved pfns.
- *
- * Whoever called remap_pfn_range is also going to call e.g.
- * unmap_mapping_range before the underlying pages are freed,
- * causing a call to our MMU notifier.
- *
- * Certain IO or PFNMAP mappings can be backed with valid
- * struct pages, but be allocated without refcounting e.g.,
- * tail pages of non-compound higher order allocations, which
- * would then underflow the refcount when the caller does the
- * required put_page. Don't allow those pages here.
- */
- if (!kvm_try_get_pfn(pfn))
- r = -EFAULT;
+ *p_pfn = kvm_resolve_pfn(kfp, NULL, &args, args.writable);
out:
follow_pfnmap_end(&args);
- *p_pfn = pfn;
-
return r;
}
diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h
index d5a215958f06..d3ac1ba8ba66 100644
--- a/virt/kvm/kvm_mm.h
+++ b/virt/kvm/kvm_mm.h
@@ -35,6 +35,15 @@ struct kvm_follow_pfn {
* Set to true if a writable mapping was obtained.
*/
bool *map_writable;
+
+ /*
+ * Optional output. Set to a valid "struct page" if the returned pfn
+ * is for a refcounted or pinned struct page, NULL if the returned pfn
+ * has no struct page or if the struct page is not being refcounted
+ * (e.g. tail pages of non-compound higher order allocations from
+ * IO/PFNMAP mappings).
+ */
+ struct page **refcounted_page;
};
kvm_pfn_t hva_to_pfn(struct kvm_follow_pfn *kfp);
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 28/85] KVM: Move kvm_{set,release}_page_{clean,dirty}() helpers up in kvm_main.c
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (26 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 27/85] KVM: Provide refcounted page as output field in struct kvm_follow_pfn Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 29/85] KVM: pfncache: Precisely track refcounted pages Sean Christopherson
` (58 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Hoist the kvm_{set,release}_page_{clean,dirty}() APIs further up in
kvm_main.c so that they can be used by the kvm_follow_pfn family of APIs.
No functional change intended.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/kvm_main.c | 82 ++++++++++++++++++++++-----------------------
1 file changed, 41 insertions(+), 41 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e29f78ed6f48..6cdbd0516d58 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2746,6 +2746,47 @@ unsigned long kvm_vcpu_gfn_to_hva_prot(struct kvm_vcpu *vcpu, gfn_t gfn, bool *w
return gfn_to_hva_memslot_prot(slot, gfn, writable);
}
+static bool kvm_is_ad_tracked_page(struct page *page)
+{
+ /*
+ * Per page-flags.h, pages tagged PG_reserved "should in general not be
+ * touched (e.g. set dirty) except by its owner".
+ */
+ return !PageReserved(page);
+}
+
+static void kvm_set_page_dirty(struct page *page)
+{
+ if (kvm_is_ad_tracked_page(page))
+ SetPageDirty(page);
+}
+
+static void kvm_set_page_accessed(struct page *page)
+{
+ if (kvm_is_ad_tracked_page(page))
+ mark_page_accessed(page);
+}
+
+void kvm_release_page_clean(struct page *page)
+{
+ if (!page)
+ return;
+
+ kvm_set_page_accessed(page);
+ put_page(page);
+}
+EXPORT_SYMBOL_GPL(kvm_release_page_clean);
+
+void kvm_release_page_dirty(struct page *page)
+{
+ if (!page)
+ return;
+
+ kvm_set_page_dirty(page);
+ kvm_release_page_clean(page);
+}
+EXPORT_SYMBOL_GPL(kvm_release_page_dirty);
+
static kvm_pfn_t kvm_resolve_pfn(struct kvm_follow_pfn *kfp, struct page *page,
struct follow_pfnmap_args *map, bool writable)
{
@@ -3105,37 +3146,6 @@ void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map, bool dirty)
}
EXPORT_SYMBOL_GPL(kvm_vcpu_unmap);
-static bool kvm_is_ad_tracked_page(struct page *page)
-{
- /*
- * Per page-flags.h, pages tagged PG_reserved "should in general not be
- * touched (e.g. set dirty) except by its owner".
- */
- return !PageReserved(page);
-}
-
-static void kvm_set_page_dirty(struct page *page)
-{
- if (kvm_is_ad_tracked_page(page))
- SetPageDirty(page);
-}
-
-static void kvm_set_page_accessed(struct page *page)
-{
- if (kvm_is_ad_tracked_page(page))
- mark_page_accessed(page);
-}
-
-void kvm_release_page_clean(struct page *page)
-{
- if (!page)
- return;
-
- kvm_set_page_accessed(page);
- put_page(page);
-}
-EXPORT_SYMBOL_GPL(kvm_release_page_clean);
-
void kvm_release_pfn_clean(kvm_pfn_t pfn)
{
struct page *page;
@@ -3151,16 +3161,6 @@ void kvm_release_pfn_clean(kvm_pfn_t pfn)
}
EXPORT_SYMBOL_GPL(kvm_release_pfn_clean);
-void kvm_release_page_dirty(struct page *page)
-{
- if (!page)
- return;
-
- kvm_set_page_dirty(page);
- kvm_release_page_clean(page);
-}
-EXPORT_SYMBOL_GPL(kvm_release_page_dirty);
-
void kvm_release_pfn_dirty(kvm_pfn_t pfn)
{
struct page *page;
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 29/85] KVM: pfncache: Precisely track refcounted pages
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (27 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 28/85] KVM: Move kvm_{set,release}_page_{clean,dirty}() helpers up in kvm_main.c Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 30/85] KVM: Migrate kvm_vcpu_map() to kvm_follow_pfn() Sean Christopherson
` (57 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Track refcounted struct page memory using kvm_follow_pfn.refcounted_page
instead of relying on kvm_release_pfn_clean() to correctly detect that the
pfn is associated with a struct page.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/pfncache.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 067daf9ad6ef..728d2c1b488a 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -159,11 +159,14 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
kvm_pfn_t new_pfn = KVM_PFN_ERR_FAULT;
void *new_khva = NULL;
unsigned long mmu_seq;
+ struct page *page;
+
struct kvm_follow_pfn kfp = {
.slot = gpc->memslot,
.gfn = gpa_to_gfn(gpc->gpa),
.flags = FOLL_WRITE,
.hva = gpc->uhva,
+ .refcounted_page = &page,
};
lockdep_assert_held(&gpc->refresh_lock);
@@ -198,7 +201,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
if (new_khva != old_khva)
gpc_unmap(new_pfn, new_khva);
- kvm_release_pfn_clean(new_pfn);
+ kvm_release_page_unused(page);
cond_resched();
}
@@ -218,7 +221,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
new_khva = gpc_map(new_pfn);
if (!new_khva) {
- kvm_release_pfn_clean(new_pfn);
+ kvm_release_page_unused(page);
goto out_error;
}
@@ -236,11 +239,11 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
gpc->khva = new_khva + offset_in_page(gpc->uhva);
/*
- * Put the reference to the _new_ pfn. The pfn is now tracked by the
+ * Put the reference to the _new_ page. The page is now tracked by the
* cache and can be safely migrated, swapped, etc... as the cache will
* invalidate any mappings in response to relevant mmu_notifier events.
*/
- kvm_release_pfn_clean(new_pfn);
+ kvm_release_page_clean(page);
return 0;
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 30/85] KVM: Migrate kvm_vcpu_map() to kvm_follow_pfn()
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (28 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 29/85] KVM: pfncache: Precisely track refcounted pages Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 31/85] KVM: Pin (as in FOLL_PIN) pages during kvm_vcpu_map() Sean Christopherson
` (56 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
From: David Stevens <stevensd@chromium.org>
Migrate kvm_vcpu_map() to kvm_follow_pfn(), and have it track whether or
not the map holds a refcounted struct page. Precisely tracking struct
page references will eventually allow removing kvm_pfn_to_refcounted_page()
and its various wrappers.
Signed-off-by: David Stevens <stevensd@chromium.org>
[sean: use a pointer instead of a boolean]
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
include/linux/kvm_host.h | 2 +-
virt/kvm/kvm_main.c | 26 ++++++++++++++++----------
2 files changed, 17 insertions(+), 11 deletions(-)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index e3c01cbbc41a..02ab3a657aa6 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -280,6 +280,7 @@ struct kvm_host_map {
* can be used as guest memory but they are not managed by host
* kernel).
*/
+ struct page *refcounted_page;
struct page *page;
void *hva;
kvm_pfn_t pfn;
@@ -1238,7 +1239,6 @@ void kvm_release_pfn_dirty(kvm_pfn_t pfn);
void kvm_set_pfn_dirty(kvm_pfn_t pfn);
void kvm_set_pfn_accessed(kvm_pfn_t pfn);
-void kvm_release_pfn(kvm_pfn_t pfn, bool dirty);
int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset,
int len);
int kvm_read_guest(struct kvm *kvm, gpa_t gpa, void *data, unsigned long len);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6cdbd0516d58..b1c1b7e4f33a 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3093,21 +3093,21 @@ struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
}
EXPORT_SYMBOL_GPL(gfn_to_page);
-void kvm_release_pfn(kvm_pfn_t pfn, bool dirty)
-{
- if (dirty)
- kvm_release_pfn_dirty(pfn);
- else
- kvm_release_pfn_clean(pfn);
-}
-
int kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map)
{
+ struct kvm_follow_pfn kfp = {
+ .slot = gfn_to_memslot(vcpu->kvm, gfn),
+ .gfn = gfn,
+ .flags = FOLL_WRITE,
+ .refcounted_page = &map->refcounted_page,
+ };
+
+ map->refcounted_page = NULL;
map->page = NULL;
map->hva = NULL;
map->gfn = gfn;
- map->pfn = gfn_to_pfn(vcpu->kvm, gfn);
+ map->pfn = kvm_follow_pfn(&kfp);
if (is_error_noslot_pfn(map->pfn))
return -EINVAL;
@@ -3139,10 +3139,16 @@ void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map, bool dirty)
if (dirty)
kvm_vcpu_mark_page_dirty(vcpu, map->gfn);
- kvm_release_pfn(map->pfn, dirty);
+ if (map->refcounted_page) {
+ if (dirty)
+ kvm_release_page_dirty(map->refcounted_page);
+ else
+ kvm_release_page_clean(map->refcounted_page);
+ }
map->hva = NULL;
map->page = NULL;
+ map->refcounted_page = NULL;
}
EXPORT_SYMBOL_GPL(kvm_vcpu_unmap);
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 31/85] KVM: Pin (as in FOLL_PIN) pages during kvm_vcpu_map()
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (29 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 30/85] KVM: Migrate kvm_vcpu_map() to kvm_follow_pfn() Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 32/85] KVM: nVMX: Mark vmcs12's APIC access page dirty when unmapping Sean Christopherson
` (55 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Pin, as in FOLL_PIN, pages when mapping them for direct access by KVM.
As per Documentation/core-api/pin_user_pages.rst, writing to a page that
was gotten via FOLL_GET is explicitly disallowed.
Correct (uses FOLL_PIN calls):
pin_user_pages()
write to the data within the pages
unpin_user_pages()
INCORRECT (uses FOLL_GET calls):
get_user_pages()
write to the data within the pages
put_page()
Unfortunately, FOLL_PIN is a "private" flag, and so kvm_follow_pfn must
use a one-off bool instead of being able to piggyback the "flags" field.
Link: https://lwn.net/Articles/930667
Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
include/linux/kvm_host.h | 2 +-
virt/kvm/kvm_main.c | 54 +++++++++++++++++++++++++++++-----------
virt/kvm/kvm_mm.h | 7 ++++++
3 files changed, 47 insertions(+), 16 deletions(-)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 02ab3a657aa6..8739b905d85b 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -280,7 +280,7 @@ struct kvm_host_map {
* can be used as guest memory but they are not managed by host
* kernel).
*/
- struct page *refcounted_page;
+ struct page *pinned_page;
struct page *page;
void *hva;
kvm_pfn_t pfn;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b1c1b7e4f33a..40a59526d466 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2814,9 +2814,12 @@ static kvm_pfn_t kvm_resolve_pfn(struct kvm_follow_pfn *kfp, struct page *page,
*/
if (map) {
pfn = map->pfn;
- page = kvm_pfn_to_refcounted_page(pfn);
- if (page && !get_page_unless_zero(page))
- return KVM_PFN_ERR_FAULT;
+
+ if (!kfp->pin) {
+ page = kvm_pfn_to_refcounted_page(pfn);
+ if (page && !get_page_unless_zero(page))
+ return KVM_PFN_ERR_FAULT;
+ }
} else {
pfn = page_to_pfn(page);
}
@@ -2834,16 +2837,24 @@ static kvm_pfn_t kvm_resolve_pfn(struct kvm_follow_pfn *kfp, struct page *page,
static bool hva_to_pfn_fast(struct kvm_follow_pfn *kfp, kvm_pfn_t *pfn)
{
struct page *page;
+ bool r;
/*
- * Fast pin a writable pfn only if it is a write fault request
- * or the caller allows to map a writable pfn for a read fault
- * request.
+ * Try the fast-only path when the caller wants to pin/get the page for
+ * writing. If the caller only wants to read the page, KVM must go
+ * down the full, slow path in order to avoid racing an operation that
+ * breaks Copy-on-Write (CoW), e.g. so that KVM doesn't end up pointing
+ * at the old, read-only page while mm/ points at a new, writable page.
*/
if (!((kfp->flags & FOLL_WRITE) || kfp->map_writable))
return false;
- if (get_user_page_fast_only(kfp->hva, FOLL_WRITE, &page)) {
+ if (kfp->pin)
+ r = pin_user_pages_fast(kfp->hva, 1, FOLL_WRITE, &page) == 1;
+ else
+ r = get_user_page_fast_only(kfp->hva, FOLL_WRITE, &page);
+
+ if (r) {
*pfn = kvm_resolve_pfn(kfp, page, NULL, true);
return true;
}
@@ -2872,10 +2883,21 @@ static int hva_to_pfn_slow(struct kvm_follow_pfn *kfp, kvm_pfn_t *pfn)
struct page *page, *wpage;
int npages;
- npages = get_user_pages_unlocked(kfp->hva, 1, &page, flags);
+ if (kfp->pin)
+ npages = pin_user_pages_unlocked(kfp->hva, 1, &page, flags);
+ else
+ npages = get_user_pages_unlocked(kfp->hva, 1, &page, flags);
if (npages != 1)
return npages;
+ /*
+ * Pinning is mutually exclusive with opportunistically mapping a read
+ * fault as writable, as KVM should never pin pages when mapping memory
+ * into the guest (pinning is only for direct accesses from KVM).
+ */
+ if (WARN_ON_ONCE(kfp->map_writable && kfp->pin))
+ goto out;
+
/* map read fault as writable if possible */
if (!(flags & FOLL_WRITE) && kfp->map_writable &&
get_user_page_fast_only(kfp->hva, FOLL_WRITE, &wpage)) {
@@ -2884,6 +2906,7 @@ static int hva_to_pfn_slow(struct kvm_follow_pfn *kfp, kvm_pfn_t *pfn)
flags |= FOLL_WRITE;
}
+out:
*pfn = kvm_resolve_pfn(kfp, page, NULL, flags & FOLL_WRITE);
return npages;
}
@@ -3099,10 +3122,11 @@ int kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map)
.slot = gfn_to_memslot(vcpu->kvm, gfn),
.gfn = gfn,
.flags = FOLL_WRITE,
- .refcounted_page = &map->refcounted_page,
+ .refcounted_page = &map->pinned_page,
+ .pin = true,
};
- map->refcounted_page = NULL;
+ map->pinned_page = NULL;
map->page = NULL;
map->hva = NULL;
map->gfn = gfn;
@@ -3139,16 +3163,16 @@ void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map, bool dirty)
if (dirty)
kvm_vcpu_mark_page_dirty(vcpu, map->gfn);
- if (map->refcounted_page) {
+ if (map->pinned_page) {
if (dirty)
- kvm_release_page_dirty(map->refcounted_page);
- else
- kvm_release_page_clean(map->refcounted_page);
+ kvm_set_page_dirty(map->pinned_page);
+ kvm_set_page_accessed(map->pinned_page);
+ unpin_user_page(map->pinned_page);
}
map->hva = NULL;
map->page = NULL;
- map->refcounted_page = NULL;
+ map->pinned_page = NULL;
}
EXPORT_SYMBOL_GPL(kvm_vcpu_unmap);
diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h
index d3ac1ba8ba66..acef3f5c582a 100644
--- a/virt/kvm/kvm_mm.h
+++ b/virt/kvm/kvm_mm.h
@@ -30,6 +30,13 @@ struct kvm_follow_pfn {
/* FOLL_* flags modifying lookup behavior, e.g. FOLL_WRITE. */
unsigned int flags;
+ /*
+ * Pin the page (effectively FOLL_PIN, which is an mm/ internal flag).
+ * The page *must* be pinned if KVM will write to the page via a kernel
+ * mapping, e.g. via kmap(), mremap(), etc.
+ */
+ bool pin;
+
/*
* If non-NULL, try to get a writable mapping even for a read fault.
* Set to true if a writable mapping was obtained.
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 32/85] KVM: nVMX: Mark vmcs12's APIC access page dirty when unmapping
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (30 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 31/85] KVM: Pin (as in FOLL_PIN) pages during kvm_vcpu_map() Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 33/85] KVM: Pass in write/dirty to kvm_vcpu_map(), not kvm_vcpu_unmap() Sean Christopherson
` (54 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Mark the APIC access page as dirty when unmapping it from KVM. The fact
that the page _shouldn't_ be written doesn't guarantee the page _won't_ be
written. And while the contents are likely irrelevant, the values _are_
visible to the guest, i.e. dropping writes would be visible to the guest
(though obviously highly unlikely to be problematic in practice).
Marking the map dirty will allow specifying the write vs. read-only when
*mapping* the memory, which in turn will allow creating read-only maps.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/vmx/nested.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 81865db18e12..ff83b56fe2fa 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -318,12 +318,7 @@ static void nested_put_vmcs12_pages(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
- /*
- * Unpin physical memory we referred to in the vmcs02. The APIC access
- * page's backing page (yeah, confusing) shouldn't actually be accessed,
- * and if it is written, the contents are irrelevant.
- */
- kvm_vcpu_unmap(vcpu, &vmx->nested.apic_access_page_map, false);
+ kvm_vcpu_unmap(vcpu, &vmx->nested.apic_access_page_map, true);
kvm_vcpu_unmap(vcpu, &vmx->nested.virtual_apic_map, true);
kvm_vcpu_unmap(vcpu, &vmx->nested.pi_desc_map, true);
vmx->nested.pi_desc = NULL;
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 33/85] KVM: Pass in write/dirty to kvm_vcpu_map(), not kvm_vcpu_unmap()
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (31 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 32/85] KVM: nVMX: Mark vmcs12's APIC access page dirty when unmapping Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 34/85] KVM: Get writable mapping for __kvm_vcpu_map() only when necessary Sean Christopherson
` (53 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Now that all kvm_vcpu_{,un}map() users pass "true" for @dirty, have them
pass "true" as a @writable param to kvm_vcpu_map(), and thus create a
read-only mapping when possible.
Note, creating read-only mappings can be theoretically slower, as they
don't play nice with fast GUP due to the need to break CoW before mapping
the underlying PFN. But practically speaking, creating a mapping isn't
a super hot path, and getting a writable mapping for reading is weird and
confusing.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/svm/nested.c | 4 ++--
arch/x86/kvm/svm/sev.c | 2 +-
arch/x86/kvm/svm/svm.c | 8 ++++----
arch/x86/kvm/vmx/nested.c | 16 ++++++++--------
include/linux/kvm_host.h | 20 ++++++++++++++++++--
virt/kvm/kvm_main.c | 12 +++++++-----
6 files changed, 40 insertions(+), 22 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index d5314cb7dff4..9f9478bdecfc 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -922,7 +922,7 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
nested_svm_vmexit(svm);
out:
- kvm_vcpu_unmap(vcpu, &map, true);
+ kvm_vcpu_unmap(vcpu, &map);
return ret;
}
@@ -1126,7 +1126,7 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
vmcb12->control.exit_int_info_err,
KVM_ISA_SVM);
- kvm_vcpu_unmap(vcpu, &map, true);
+ kvm_vcpu_unmap(vcpu, &map);
nested_svm_transition_tlb_flush(vcpu);
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 0b851ef937f2..4557ff3804ae 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3468,7 +3468,7 @@ void sev_es_unmap_ghcb(struct vcpu_svm *svm)
sev_es_sync_to_ghcb(svm);
- kvm_vcpu_unmap(&svm->vcpu, &svm->sev_es.ghcb_map, true);
+ kvm_vcpu_unmap(&svm->vcpu, &svm->sev_es.ghcb_map);
svm->sev_es.ghcb = NULL;
}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 9df3e1e5ae81..c1e29307826b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2299,7 +2299,7 @@ static int vmload_vmsave_interception(struct kvm_vcpu *vcpu, bool vmload)
svm_copy_vmloadsave_state(vmcb12, svm->vmcb);
}
- kvm_vcpu_unmap(vcpu, &map, true);
+ kvm_vcpu_unmap(vcpu, &map);
return ret;
}
@@ -4714,7 +4714,7 @@ static int svm_enter_smm(struct kvm_vcpu *vcpu, union kvm_smram *smram)
svm_copy_vmrun_state(map_save.hva + 0x400,
&svm->vmcb01.ptr->save);
- kvm_vcpu_unmap(vcpu, &map_save, true);
+ kvm_vcpu_unmap(vcpu, &map_save);
return 0;
}
@@ -4774,9 +4774,9 @@ static int svm_leave_smm(struct kvm_vcpu *vcpu, const union kvm_smram *smram)
svm->nested.nested_run_pending = 1;
unmap_save:
- kvm_vcpu_unmap(vcpu, &map_save, true);
+ kvm_vcpu_unmap(vcpu, &map_save);
unmap_map:
- kvm_vcpu_unmap(vcpu, &map, true);
+ kvm_vcpu_unmap(vcpu, &map);
return ret;
}
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index ff83b56fe2fa..259fe445e695 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -231,7 +231,7 @@ static inline void nested_release_evmcs(struct kvm_vcpu *vcpu)
struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
struct vcpu_vmx *vmx = to_vmx(vcpu);
- kvm_vcpu_unmap(vcpu, &vmx->nested.hv_evmcs_map, true);
+ kvm_vcpu_unmap(vcpu, &vmx->nested.hv_evmcs_map);
vmx->nested.hv_evmcs = NULL;
vmx->nested.hv_evmcs_vmptr = EVMPTR_INVALID;
@@ -318,9 +318,9 @@ static void nested_put_vmcs12_pages(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
- kvm_vcpu_unmap(vcpu, &vmx->nested.apic_access_page_map, true);
- kvm_vcpu_unmap(vcpu, &vmx->nested.virtual_apic_map, true);
- kvm_vcpu_unmap(vcpu, &vmx->nested.pi_desc_map, true);
+ kvm_vcpu_unmap(vcpu, &vmx->nested.apic_access_page_map);
+ kvm_vcpu_unmap(vcpu, &vmx->nested.virtual_apic_map);
+ kvm_vcpu_unmap(vcpu, &vmx->nested.pi_desc_map);
vmx->nested.pi_desc = NULL;
}
@@ -624,7 +624,7 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
int msr;
unsigned long *msr_bitmap_l1;
unsigned long *msr_bitmap_l0 = vmx->nested.vmcs02.msr_bitmap;
- struct kvm_host_map msr_bitmap_map;
+ struct kvm_host_map map;
/* Nothing to do if the MSR bitmap is not in use. */
if (!cpu_has_vmx_msr_bitmap() ||
@@ -647,10 +647,10 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
return true;
}
- if (kvm_vcpu_map(vcpu, gpa_to_gfn(vmcs12->msr_bitmap), &msr_bitmap_map))
+ if (kvm_vcpu_map_readonly(vcpu, gpa_to_gfn(vmcs12->msr_bitmap), &map))
return false;
- msr_bitmap_l1 = (unsigned long *)msr_bitmap_map.hva;
+ msr_bitmap_l1 = (unsigned long *)map.hva;
/*
* To keep the control flow simple, pay eight 8-byte writes (sixteen
@@ -714,7 +714,7 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
MSR_IA32_FLUSH_CMD, MSR_TYPE_W);
- kvm_vcpu_unmap(vcpu, &msr_bitmap_map, false);
+ kvm_vcpu_unmap(vcpu, &map);
vmx->nested.force_msr_bitmap_recalc = false;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 8739b905d85b..9263375d0362 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -285,6 +285,7 @@ struct kvm_host_map {
void *hva;
kvm_pfn_t pfn;
kvm_pfn_t gfn;
+ bool writable;
};
/*
@@ -1312,8 +1313,23 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu);
struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn);
kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn);
-int kvm_vcpu_map(struct kvm_vcpu *vcpu, gpa_t gpa, struct kvm_host_map *map);
-void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map, bool dirty);
+
+int __kvm_vcpu_map(struct kvm_vcpu *vcpu, gpa_t gpa, struct kvm_host_map *map,
+ bool writable);
+void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map);
+
+static inline int kvm_vcpu_map(struct kvm_vcpu *vcpu, gpa_t gpa,
+ struct kvm_host_map *map)
+{
+ return __kvm_vcpu_map(vcpu, gpa, map, true);
+}
+
+static inline int kvm_vcpu_map_readonly(struct kvm_vcpu *vcpu, gpa_t gpa,
+ struct kvm_host_map *map)
+{
+ return __kvm_vcpu_map(vcpu, gpa, map, false);
+}
+
unsigned long kvm_vcpu_gfn_to_hva(struct kvm_vcpu *vcpu, gfn_t gfn);
unsigned long kvm_vcpu_gfn_to_hva_prot(struct kvm_vcpu *vcpu, gfn_t gfn, bool *writable);
int kvm_vcpu_read_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn, void *data, int offset,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 40a59526d466..080740f65061 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3116,7 +3116,8 @@ struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
}
EXPORT_SYMBOL_GPL(gfn_to_page);
-int kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map)
+int __kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map,
+ bool writable)
{
struct kvm_follow_pfn kfp = {
.slot = gfn_to_memslot(vcpu->kvm, gfn),
@@ -3130,6 +3131,7 @@ int kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map)
map->page = NULL;
map->hva = NULL;
map->gfn = gfn;
+ map->writable = writable;
map->pfn = kvm_follow_pfn(&kfp);
if (is_error_noslot_pfn(map->pfn))
@@ -3146,9 +3148,9 @@ int kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map)
return map->hva ? 0 : -EFAULT;
}
-EXPORT_SYMBOL_GPL(kvm_vcpu_map);
+EXPORT_SYMBOL_GPL(__kvm_vcpu_map);
-void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map, bool dirty)
+void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map)
{
if (!map->hva)
return;
@@ -3160,11 +3162,11 @@ void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map, bool dirty)
memunmap(map->hva);
#endif
- if (dirty)
+ if (map->writable)
kvm_vcpu_mark_page_dirty(vcpu, map->gfn);
if (map->pinned_page) {
- if (dirty)
+ if (map->writable)
kvm_set_page_dirty(map->pinned_page);
kvm_set_page_accessed(map->pinned_page);
unpin_user_page(map->pinned_page);
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 34/85] KVM: Get writable mapping for __kvm_vcpu_map() only when necessary
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (32 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 33/85] KVM: Pass in write/dirty to kvm_vcpu_map(), not kvm_vcpu_unmap() Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-21 9:25 ` Yan Zhao
2024-10-10 18:23 ` [PATCH v13 35/85] KVM: Disallow direct access (w/o mmu_notifier) to unpinned pfn by default Sean Christopherson
` (52 subsequent siblings)
86 siblings, 1 reply; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
When creating a memory map for read, don't request a writable pfn from the
primary MMU. While creating read-only mappings can be theoretically slower,
as they don't play nice with fast GUP due to the need to break CoW before
mapping the underlying PFN, practically speaking, creating a mapping isn't
a super hot path, and getting a writable mapping for reading is weird and
confusing.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/kvm_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 080740f65061..b845e9252633 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3122,7 +3122,7 @@ int __kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map,
struct kvm_follow_pfn kfp = {
.slot = gfn_to_memslot(vcpu->kvm, gfn),
.gfn = gfn,
- .flags = FOLL_WRITE,
+ .flags = writable ? FOLL_WRITE : 0,
.refcounted_page = &map->pinned_page,
.pin = true,
};
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* Re: [PATCH v13 34/85] KVM: Get writable mapping for __kvm_vcpu_map() only when necessary
2024-10-10 18:23 ` [PATCH v13 34/85] KVM: Get writable mapping for __kvm_vcpu_map() only when necessary Sean Christopherson
@ 2024-10-21 9:25 ` Yan Zhao
2024-10-21 18:13 ` Sean Christopherson
0 siblings, 1 reply; 99+ messages in thread
From: Yan Zhao @ 2024-10-21 9:25 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, David Matlack, David Stevens, Andrew Jones
On Thu, Oct 10, 2024 at 11:23:36AM -0700, Sean Christopherson wrote:
> When creating a memory map for read, don't request a writable pfn from the
> primary MMU. While creating read-only mappings can be theoretically slower,
> as they don't play nice with fast GUP due to the need to break CoW before
> mapping the underlying PFN, practically speaking, creating a mapping isn't
> a super hot path, and getting a writable mapping for reading is weird and
> confusing.
>
> Tested-by: Alex Bennée <alex.bennee@linaro.org>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> virt/kvm/kvm_main.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 080740f65061..b845e9252633 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3122,7 +3122,7 @@ int __kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map,
> struct kvm_follow_pfn kfp = {
> .slot = gfn_to_memslot(vcpu->kvm, gfn),
> .gfn = gfn,
> - .flags = FOLL_WRITE,
> + .flags = writable ? FOLL_WRITE : 0,
> .refcounted_page = &map->pinned_page,
> .pin = true,
> };
When writable is false, could we set ".pin = false," ?
Also not sure if ".map_writable = NULL" is missing.
^ permalink raw reply [flat|nested] 99+ messages in thread
* Re: [PATCH v13 34/85] KVM: Get writable mapping for __kvm_vcpu_map() only when necessary
2024-10-21 9:25 ` Yan Zhao
@ 2024-10-21 18:13 ` Sean Christopherson
2024-10-22 1:51 ` Yan Zhao
0 siblings, 1 reply; 99+ messages in thread
From: Sean Christopherson @ 2024-10-21 18:13 UTC (permalink / raw)
To: Yan Zhao
Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, David Matlack, David Stevens, Andrew Jones
On Mon, Oct 21, 2024, Yan Zhao wrote:
> On Thu, Oct 10, 2024 at 11:23:36AM -0700, Sean Christopherson wrote:
> > When creating a memory map for read, don't request a writable pfn from the
> > primary MMU. While creating read-only mappings can be theoretically slower,
> > as they don't play nice with fast GUP due to the need to break CoW before
> > mapping the underlying PFN, practically speaking, creating a mapping isn't
> > a super hot path, and getting a writable mapping for reading is weird and
> > confusing.
> >
> > Tested-by: Alex Bennée <alex.bennee@linaro.org>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> > virt/kvm/kvm_main.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 080740f65061..b845e9252633 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -3122,7 +3122,7 @@ int __kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map,
> > struct kvm_follow_pfn kfp = {
> > .slot = gfn_to_memslot(vcpu->kvm, gfn),
> > .gfn = gfn,
> > - .flags = FOLL_WRITE,
> > + .flags = writable ? FOLL_WRITE : 0,
> > .refcounted_page = &map->pinned_page,
> > .pin = true,
> > };
> When writable is false, could we set ".pin = false," ?
Hmm, maybe? I can't imagine anything would actually break, but unless FOLL_PIN
implies writing, my preference would still be to pin the page so that KVM always
pins when accessing the actual data of a page.
> Also not sure if ".map_writable = NULL" is missing.
Doh, my previous response was slightly wrong, it's implicitly initialized to NULL,
not false. I forgot map_writable is a pointer to a bool.
^ permalink raw reply [flat|nested] 99+ messages in thread
* Re: [PATCH v13 34/85] KVM: Get writable mapping for __kvm_vcpu_map() only when necessary
2024-10-21 18:13 ` Sean Christopherson
@ 2024-10-22 1:51 ` Yan Zhao
0 siblings, 0 replies; 99+ messages in thread
From: Yan Zhao @ 2024-10-22 1:51 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, David Matlack, David Stevens, Andrew Jones
On Mon, Oct 21, 2024 at 11:13:08AM -0700, Sean Christopherson wrote:
> On Mon, Oct 21, 2024, Yan Zhao wrote:
> > On Thu, Oct 10, 2024 at 11:23:36AM -0700, Sean Christopherson wrote:
> > > When creating a memory map for read, don't request a writable pfn from the
> > > primary MMU. While creating read-only mappings can be theoretically slower,
> > > as they don't play nice with fast GUP due to the need to break CoW before
> > > mapping the underlying PFN, practically speaking, creating a mapping isn't
> > > a super hot path, and getting a writable mapping for reading is weird and
> > > confusing.
> > >
> > > Tested-by: Alex Bennée <alex.bennee@linaro.org>
> > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > ---
> > > virt/kvm/kvm_main.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > index 080740f65061..b845e9252633 100644
> > > --- a/virt/kvm/kvm_main.c
> > > +++ b/virt/kvm/kvm_main.c
> > > @@ -3122,7 +3122,7 @@ int __kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map,
> > > struct kvm_follow_pfn kfp = {
> > > .slot = gfn_to_memslot(vcpu->kvm, gfn),
> > > .gfn = gfn,
> > > - .flags = FOLL_WRITE,
> > > + .flags = writable ? FOLL_WRITE : 0,
> > > .refcounted_page = &map->pinned_page,
> > > .pin = true,
> > > };
> > When writable is false, could we set ".pin = false," ?
>
> Hmm, maybe? I can't imagine anything would actually break, but unless FOLL_PIN
> implies writing, my preference would still be to pin the page so that KVM always
> pins when accessing the actual data of a page.
Ok. So setting .pin = true here is because of KVM direct access, which does not
check mmu notifier's invalidation callback.
^ permalink raw reply [flat|nested] 99+ messages in thread
* [PATCH v13 35/85] KVM: Disallow direct access (w/o mmu_notifier) to unpinned pfn by default
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (33 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 34/85] KVM: Get writable mapping for __kvm_vcpu_map() only when necessary Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 36/85] KVM: x86: Don't fault-in APIC access page during initial allocation Sean Christopherson
` (51 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Add an off-by-default module param to control whether or not KVM is allowed
to map memory that isn't pinned, i.e. that KVM can't guarantee won't be
freed while it is mapped into KVM and/or the guest. Don't remove the
functionality entirely, as there are use cases where mapping unpinned
memory is safe (as defined by the platform owner), e.g. when memory is
hidden from the kernel and managed by userspace, in which case userspace
is already fully trusted to not muck with guest memory mappings.
But for more typical setups, mapping unpinned memory is wildly unsafe, and
unnecessary. The APIs are used exclusively by x86's nested virtualization
support, and there is no known (or sane) use case for mapping PFN-mapped
memory a KVM guest _and_ letting the guest use it for virtualization
structures.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/kvm_main.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b845e9252633..6dcb4f0eed3e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -94,6 +94,13 @@ unsigned int halt_poll_ns_shrink = 2;
module_param(halt_poll_ns_shrink, uint, 0644);
EXPORT_SYMBOL_GPL(halt_poll_ns_shrink);
+/*
+ * Allow direct access (from KVM or the CPU) without MMU notifier protection
+ * to unpinned pages.
+ */
+static bool allow_unsafe_mappings;
+module_param(allow_unsafe_mappings, bool, 0444);
+
/*
* Ordering of locks:
*
@@ -2811,6 +2818,9 @@ static kvm_pfn_t kvm_resolve_pfn(struct kvm_follow_pfn *kfp, struct page *page,
* reference to such pages would cause KVM to prematurely free a page
* it doesn't own (KVM gets and puts the one and only reference).
* Don't allow those pages until the FIXME is resolved.
+ *
+ * Don't grab a reference for pins, callers that pin pages are required
+ * to check refcounted_page, i.e. must not blindly release the pfn.
*/
if (map) {
pfn = map->pfn;
@@ -2929,6 +2939,14 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
bool write_fault = kfp->flags & FOLL_WRITE;
int r;
+ /*
+ * Remapped memory cannot be pinned in any meaningful sense. Bail if
+ * the caller wants to pin the page, i.e. access the page outside of
+ * MMU notifier protection, and unsafe umappings are disallowed.
+ */
+ if (kfp->pin && !allow_unsafe_mappings)
+ return -EINVAL;
+
r = follow_pfnmap_start(&args);
if (r) {
/*
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 36/85] KVM: x86: Don't fault-in APIC access page during initial allocation
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (34 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 35/85] KVM: Disallow direct access (w/o mmu_notifier) to unpinned pfn by default Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 37/85] KVM: x86/mmu: Add "mmu" prefix fault-in helpers to free up generic names Sean Christopherson
` (50 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Drop the gfn_to_page() lookup when installing KVM's internal memslot for
the APIC access page, as KVM doesn't need to immediately fault-in the page
now that the page isn't pinned. In the extremely unlikely event the
kernel can't allocate a 4KiB page, KVM can just as easily return -EFAULT
on the future page fault.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/lapic.c | 12 ------------
1 file changed, 12 deletions(-)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 20526e4d6c62..65412640cfc7 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2647,7 +2647,6 @@ void kvm_apic_update_apicv(struct kvm_vcpu *vcpu)
int kvm_alloc_apic_access_page(struct kvm *kvm)
{
- struct page *page;
void __user *hva;
int ret = 0;
@@ -2663,17 +2662,6 @@ int kvm_alloc_apic_access_page(struct kvm *kvm)
goto out;
}
- page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
- if (!page) {
- ret = -EFAULT;
- goto out;
- }
-
- /*
- * Do not pin the page in memory, so that memory hot-unplug
- * is able to migrate it.
- */
- put_page(page);
kvm->arch.apic_access_memslot_enabled = true;
out:
mutex_unlock(&kvm->slots_lock);
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 37/85] KVM: x86/mmu: Add "mmu" prefix fault-in helpers to free up generic names
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (35 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 36/85] KVM: x86: Don't fault-in APIC access page during initial allocation Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 38/85] KVM: x86/mmu: Put direct prefetched pages via kvm_release_page_clean() Sean Christopherson
` (49 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Prefix x86's faultin_pfn helpers with "mmu" so that the mmu-less names can
be used by common KVM for similar APIs.
No functional change intended.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/mmu/mmu.c | 19 ++++++++++---------
arch/x86/kvm/mmu/mmu_internal.h | 2 +-
arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
3 files changed, 12 insertions(+), 11 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 28f2b842d6ca..e451e1b9a55a 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4347,8 +4347,8 @@ static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
return max_level;
}
-static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
- struct kvm_page_fault *fault)
+static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
+ struct kvm_page_fault *fault)
{
int max_order, r;
@@ -4371,10 +4371,11 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
return RET_PF_CONTINUE;
}
-static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
+static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
+ struct kvm_page_fault *fault)
{
if (fault->is_private)
- return kvm_faultin_pfn_private(vcpu, fault);
+ return kvm_mmu_faultin_pfn_private(vcpu, fault);
fault->pfn = __gfn_to_pfn_memslot(fault->slot, fault->gfn, false, true,
fault->write, &fault->map_writable);
@@ -4409,8 +4410,8 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
return RET_PF_CONTINUE;
}
-static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
- unsigned int access)
+static int kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
+ struct kvm_page_fault *fault, unsigned int access)
{
struct kvm_memory_slot *slot = fault->slot;
int ret;
@@ -4493,7 +4494,7 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
if (mmu_invalidate_retry_gfn_unsafe(vcpu->kvm, fault->mmu_seq, fault->gfn))
return RET_PF_RETRY;
- ret = __kvm_faultin_pfn(vcpu, fault);
+ ret = __kvm_mmu_faultin_pfn(vcpu, fault);
if (ret != RET_PF_CONTINUE)
return ret;
@@ -4570,7 +4571,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
if (r)
return r;
- r = kvm_faultin_pfn(vcpu, fault, ACC_ALL);
+ r = kvm_mmu_faultin_pfn(vcpu, fault, ACC_ALL);
if (r != RET_PF_CONTINUE)
return r;
@@ -4661,7 +4662,7 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
if (r)
return r;
- r = kvm_faultin_pfn(vcpu, fault, ACC_ALL);
+ r = kvm_mmu_faultin_pfn(vcpu, fault, ACC_ALL);
if (r != RET_PF_CONTINUE)
return r;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 633aedec3c2e..59e600f6ff9d 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -235,7 +235,7 @@ struct kvm_page_fault {
/* The memslot containing gfn. May be NULL. */
struct kvm_memory_slot *slot;
- /* Outputs of kvm_faultin_pfn. */
+ /* Outputs of kvm_mmu_faultin_pfn(). */
unsigned long mmu_seq;
kvm_pfn_t pfn;
bool map_writable;
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 143b7e9f26dc..9bd3d6f5db91 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -812,7 +812,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
if (r)
return r;
- r = kvm_faultin_pfn(vcpu, fault, walker.pte_access);
+ r = kvm_mmu_faultin_pfn(vcpu, fault, walker.pte_access);
if (r != RET_PF_CONTINUE)
return r;
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 38/85] KVM: x86/mmu: Put direct prefetched pages via kvm_release_page_clean()
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (36 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 37/85] KVM: x86/mmu: Add "mmu" prefix fault-in helpers to free up generic names Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 39/85] KVM: x86/mmu: Add common helper to handle prefetching SPTEs Sean Christopherson
` (48 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Use kvm_release_page_clean() to put prefeteched pages instead of calling
put_page() directly. This will allow de-duplicating the prefetch code
between indirect and direct MMUs.
Note, there's a small functional change as kvm_release_page_clean() marks
the page/folio as accessed. While it's not strictly guaranteed that the
guest will access the page, KVM won't intercept guest accesses, i.e. won't
mark the page accessed if it _is_ accessed by the guest (unless A/D bits
are disabled, but running without A/D bits is effectively limited to
pre-HSW Intel CPUs).
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/mmu/mmu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e451e1b9a55a..62924f95a398 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2965,7 +2965,7 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
for (i = 0; i < ret; i++, gfn++, start++) {
mmu_set_spte(vcpu, slot, start, access, gfn,
page_to_pfn(pages[i]), NULL);
- put_page(pages[i]);
+ kvm_release_page_clean(pages[i]);
}
return 0;
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 39/85] KVM: x86/mmu: Add common helper to handle prefetching SPTEs
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (37 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 38/85] KVM: x86/mmu: Put direct prefetched pages via kvm_release_page_clean() Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 40/85] KVM: x86/mmu: Add helper to "finish" handling a guest page fault Sean Christopherson
` (47 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Deduplicate the prefetching code for indirect and direct MMUs. The core
logic is the same, the only difference is that indirect MMUs need to
prefetch SPTEs one-at-a-time, as contiguous guest virtual addresses aren't
guaranteed to yield contiguous guest physical addresses.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/mmu/mmu.c | 40 +++++++++++++++++++++-------------
arch/x86/kvm/mmu/paging_tmpl.h | 13 +----------
2 files changed, 26 insertions(+), 27 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 62924f95a398..65d3a602eb2c 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2943,32 +2943,41 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot,
return ret;
}
-static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
- struct kvm_mmu_page *sp,
- u64 *start, u64 *end)
+static bool kvm_mmu_prefetch_sptes(struct kvm_vcpu *vcpu, gfn_t gfn, u64 *sptep,
+ int nr_pages, unsigned int access)
{
struct page *pages[PTE_PREFETCH_NUM];
struct kvm_memory_slot *slot;
- unsigned int access = sp->role.access;
- int i, ret;
- gfn_t gfn;
+ int i;
+
+ if (WARN_ON_ONCE(nr_pages > PTE_PREFETCH_NUM))
+ return false;
- gfn = kvm_mmu_page_get_gfn(sp, spte_index(start));
slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, access & ACC_WRITE_MASK);
if (!slot)
- return -1;
+ return false;
- ret = kvm_prefetch_pages(slot, gfn, pages, end - start);
- if (ret <= 0)
- return -1;
+ nr_pages = kvm_prefetch_pages(slot, gfn, pages, nr_pages);
+ if (nr_pages <= 0)
+ return false;
- for (i = 0; i < ret; i++, gfn++, start++) {
- mmu_set_spte(vcpu, slot, start, access, gfn,
+ for (i = 0; i < nr_pages; i++, gfn++, sptep++) {
+ mmu_set_spte(vcpu, slot, sptep, access, gfn,
page_to_pfn(pages[i]), NULL);
kvm_release_page_clean(pages[i]);
}
- return 0;
+ return true;
+}
+
+static bool direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
+ struct kvm_mmu_page *sp,
+ u64 *start, u64 *end)
+{
+ gfn_t gfn = kvm_mmu_page_get_gfn(sp, spte_index(start));
+ unsigned int access = sp->role.access;
+
+ return kvm_mmu_prefetch_sptes(vcpu, gfn, start, end - start, access);
}
static void __direct_pte_prefetch(struct kvm_vcpu *vcpu,
@@ -2986,8 +2995,9 @@ static void __direct_pte_prefetch(struct kvm_vcpu *vcpu,
if (is_shadow_present_pte(*spte) || spte == sptep) {
if (!start)
continue;
- if (direct_pte_prefetch_many(vcpu, sp, start, spte) < 0)
+ if (!direct_pte_prefetch_many(vcpu, sp, start, spte))
return;
+
start = NULL;
} else if (!start)
start = spte;
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 9bd3d6f5db91..a476a5428017 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -533,9 +533,7 @@ static bool
FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
u64 *spte, pt_element_t gpte)
{
- struct kvm_memory_slot *slot;
unsigned pte_access;
- struct page *page;
gfn_t gfn;
if (FNAME(prefetch_invalid_gpte)(vcpu, sp, spte, gpte))
@@ -545,16 +543,7 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
pte_access = sp->role.access & FNAME(gpte_access)(gpte);
FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte);
- slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, pte_access & ACC_WRITE_MASK);
- if (!slot)
- return false;
-
- if (kvm_prefetch_pages(slot, gfn, &page, 1) != 1)
- return false;
-
- mmu_set_spte(vcpu, slot, spte, pte_access, gfn, page_to_pfn(page), NULL);
- kvm_release_page_clean(page);
- return true;
+ return kvm_mmu_prefetch_sptes(vcpu, gfn, spte, 1, pte_access);
}
static bool FNAME(gpte_changed)(struct kvm_vcpu *vcpu,
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 40/85] KVM: x86/mmu: Add helper to "finish" handling a guest page fault
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (38 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 39/85] KVM: x86/mmu: Add common helper to handle prefetching SPTEs Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 41/85] KVM: x86/mmu: Mark pages/folios dirty at the origin of make_spte() Sean Christopherson
` (46 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Add a helper to finish/complete the handling of a guest page, e.g. to
mark the pages accessed and put any held references. In the near
future, this will allow improving the logic without having to copy+paste
changes into all page fault paths. And in the less near future, will
allow sharing the "finish" API across all architectures.
No functional change intended.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/mmu/mmu.c | 12 +++++++++---
arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
2 files changed, 10 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 65d3a602eb2c..31a6ae41a6f4 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4357,6 +4357,12 @@ static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
return max_level;
}
+static void kvm_mmu_finish_page_fault(struct kvm_vcpu *vcpu,
+ struct kvm_page_fault *fault, int r)
+{
+ kvm_release_pfn_clean(fault->pfn);
+}
+
static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
struct kvm_page_fault *fault)
{
@@ -4522,7 +4528,7 @@ static int kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
* mmu_lock is acquired.
*/
if (mmu_invalidate_retry_gfn_unsafe(vcpu->kvm, fault->mmu_seq, fault->gfn)) {
- kvm_release_pfn_clean(fault->pfn);
+ kvm_mmu_finish_page_fault(vcpu, fault, RET_PF_RETRY);
return RET_PF_RETRY;
}
@@ -4598,8 +4604,8 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
r = direct_map(vcpu, fault);
out_unlock:
+ kvm_mmu_finish_page_fault(vcpu, fault, r);
write_unlock(&vcpu->kvm->mmu_lock);
- kvm_release_pfn_clean(fault->pfn);
return r;
}
@@ -4685,8 +4691,8 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
r = kvm_tdp_mmu_map(vcpu, fault);
out_unlock:
+ kvm_mmu_finish_page_fault(vcpu, fault, r);
read_unlock(&vcpu->kvm->mmu_lock);
- kvm_release_pfn_clean(fault->pfn);
return r;
}
#endif
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index a476a5428017..35d0c3f1a789 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -836,8 +836,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
r = FNAME(fetch)(vcpu, fault, &walker);
out_unlock:
+ kvm_mmu_finish_page_fault(vcpu, fault, r);
write_unlock(&vcpu->kvm->mmu_lock);
- kvm_release_pfn_clean(fault->pfn);
return r;
}
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 41/85] KVM: x86/mmu: Mark pages/folios dirty at the origin of make_spte()
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (39 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 40/85] KVM: x86/mmu: Add helper to "finish" handling a guest page fault Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 42/85] KVM: Move declarations of memslot accessors up in kvm_host.h Sean Christopherson
` (45 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Move the marking of folios dirty from make_spte() out to its callers,
which have access to the _struct page_, not just the underlying pfn.
Once all architectures follow suit, this will allow removing KVM's ugly
hack where KVM elevates the refcount of VM_MIXEDMAP pfns that happen to
be struct page memory.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/mmu/mmu.c | 30 ++++++++++++++++++++++++++++--
arch/x86/kvm/mmu/paging_tmpl.h | 5 +++++
arch/x86/kvm/mmu/spte.c | 11 -----------
3 files changed, 33 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 31a6ae41a6f4..f730870887dd 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2964,7 +2964,17 @@ static bool kvm_mmu_prefetch_sptes(struct kvm_vcpu *vcpu, gfn_t gfn, u64 *sptep,
for (i = 0; i < nr_pages; i++, gfn++, sptep++) {
mmu_set_spte(vcpu, slot, sptep, access, gfn,
page_to_pfn(pages[i]), NULL);
- kvm_release_page_clean(pages[i]);
+
+ /*
+ * KVM always prefetches writable pages from the primary MMU,
+ * and KVM can make its SPTE writable in the fast page handler,
+ * without notifying the primary MMU. Mark pages/folios dirty
+ * now to ensure file data is written back if it ends up being
+ * written by the guest. Because KVM's prefetching GUPs
+ * writable PTEs, the probability of unnecessary writeback is
+ * extremely low.
+ */
+ kvm_release_page_dirty(pages[i]);
}
return true;
@@ -4360,7 +4370,23 @@ static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
static void kvm_mmu_finish_page_fault(struct kvm_vcpu *vcpu,
struct kvm_page_fault *fault, int r)
{
- kvm_release_pfn_clean(fault->pfn);
+ lockdep_assert_once(lockdep_is_held(&vcpu->kvm->mmu_lock) ||
+ r == RET_PF_RETRY);
+
+ /*
+ * If the page that KVM got from the *primary MMU* is writable, and KVM
+ * installed or reused a SPTE, mark the page/folio dirty. Note, this
+ * may mark a folio dirty even if KVM created a read-only SPTE, e.g. if
+ * the GFN is write-protected. Folios can't be safely marked dirty
+ * outside of mmu_lock as doing so could race with writeback on the
+ * folio. As a result, KVM can't mark folios dirty in the fast page
+ * fault handler, and so KVM must (somewhat) speculatively mark the
+ * folio dirty if KVM could locklessly make the SPTE writable.
+ */
+ if (!fault->map_writable || r == RET_PF_RETRY)
+ kvm_release_pfn_clean(fault->pfn);
+ else
+ kvm_release_pfn_dirty(fault->pfn);
}
static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 35d0c3f1a789..f4711674c47b 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -954,6 +954,11 @@ static int FNAME(sync_spte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, int
spte_to_pfn(spte), spte, true, true,
host_writable, &spte);
+ /*
+ * There is no need to mark the pfn dirty, as the new protections must
+ * be a subset of the old protections, i.e. synchronizing a SPTE cannot
+ * change the SPTE from read-only to writable.
+ */
return mmu_spte_update(sptep, spte);
}
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 8e8d6ee79c8b..f1a50a78badb 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -277,17 +277,6 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
mark_page_dirty_in_slot(vcpu->kvm, slot, gfn);
}
- /*
- * If the page that KVM got from the primary MMU is writable, i.e. if
- * it's host-writable, mark the page/folio dirty. As alluded to above,
- * folios can't be safely marked dirty in the fast page fault handler,
- * and so KVM must (somewhat) speculatively mark the folio dirty even
- * though it isn't guaranteed to be written as KVM won't mark the folio
- * dirty if/when the SPTE is made writable.
- */
- if (host_writable)
- kvm_set_pfn_dirty(pfn);
-
*new_spte = spte;
return wrprot;
}
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 42/85] KVM: Move declarations of memslot accessors up in kvm_host.h
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (40 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 41/85] KVM: x86/mmu: Mark pages/folios dirty at the origin of make_spte() Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 43/85] KVM: Add kvm_faultin_pfn() to specifically service guest page faults Sean Christopherson
` (44 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Move the memslot lookup helpers further up in kvm_host.h so that they can
be used by inlined "to pfn" wrappers.
No functional change intended.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
include/linux/kvm_host.h | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9263375d0362..346bfef14e5a 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1168,6 +1168,10 @@ static inline bool kvm_memslot_iter_is_valid(struct kvm_memslot_iter *iter, gfn_
kvm_memslot_iter_is_valid(iter, end); \
kvm_memslot_iter_next(iter))
+struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn);
+struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu);
+struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn);
+
/*
* KVM_SET_USER_MEMORY_REGION ioctl allows the following operations:
* - create a new memory slot
@@ -1303,15 +1307,13 @@ int kvm_gfn_to_hva_cache_init(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
})
int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len);
-struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn);
bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn);
bool kvm_vcpu_is_visible_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn);
void mark_page_dirty_in_slot(struct kvm *kvm, const struct kvm_memory_slot *memslot, gfn_t gfn);
void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
-struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu);
-struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn);
+
kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn);
int __kvm_vcpu_map(struct kvm_vcpu *vcpu, gpa_t gpa, struct kvm_host_map *map,
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 43/85] KVM: Add kvm_faultin_pfn() to specifically service guest page faults
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (41 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 42/85] KVM: Move declarations of memslot accessors up in kvm_host.h Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 44/85] KVM: x86/mmu: Convert page fault paths to kvm_faultin_pfn() Sean Christopherson
` (43 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Add a new dedicated API, kvm_faultin_pfn(), for servicing guest page
faults, i.e. for getting pages/pfns that will be mapped into the guest via
an mmu_notifier-protected KVM MMU. Keep struct kvm_follow_pfn buried in
internal code, as having __kvm_faultin_pfn() take "out" params is actually
cleaner for several architectures, e.g. it allows the caller to have its
own "page fault" structure without having to marshal data to/from
kvm_follow_pfn.
Long term, common KVM would ideally provide a kvm_page_fault structure, a
la x86's struct of the same name. But all architectures need to be
converted to a common API before that can happen.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
include/linux/kvm_host.h | 12 ++++++++++++
virt/kvm/kvm_main.c | 22 ++++++++++++++++++++++
2 files changed, 34 insertions(+)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 346bfef14e5a..3b9afb40e935 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1231,6 +1231,18 @@ static inline void kvm_release_page_unused(struct page *page)
void kvm_release_page_clean(struct page *page);
void kvm_release_page_dirty(struct page *page);
+kvm_pfn_t __kvm_faultin_pfn(const struct kvm_memory_slot *slot, gfn_t gfn,
+ unsigned int foll, bool *writable,
+ struct page **refcounted_page);
+
+static inline kvm_pfn_t kvm_faultin_pfn(struct kvm_vcpu *vcpu, gfn_t gfn,
+ bool write, bool *writable,
+ struct page **refcounted_page)
+{
+ return __kvm_faultin_pfn(kvm_vcpu_gfn_to_memslot(vcpu, gfn), gfn,
+ write ? FOLL_WRITE : 0, writable, refcounted_page);
+}
+
kvm_pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn);
kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
bool *writable);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6dcb4f0eed3e..696d5e429b3e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3098,6 +3098,28 @@ kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn)
}
EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_pfn);
+kvm_pfn_t __kvm_faultin_pfn(const struct kvm_memory_slot *slot, gfn_t gfn,
+ unsigned int foll, bool *writable,
+ struct page **refcounted_page)
+{
+ struct kvm_follow_pfn kfp = {
+ .slot = slot,
+ .gfn = gfn,
+ .flags = foll,
+ .map_writable = writable,
+ .refcounted_page = refcounted_page,
+ };
+
+ if (WARN_ON_ONCE(!writable || !refcounted_page))
+ return KVM_PFN_ERR_FAULT;
+
+ *writable = false;
+ *refcounted_page = NULL;
+
+ return kvm_follow_pfn(&kfp);
+}
+EXPORT_SYMBOL_GPL(__kvm_faultin_pfn);
+
int kvm_prefetch_pages(struct kvm_memory_slot *slot, gfn_t gfn,
struct page **pages, int nr_pages)
{
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 44/85] KVM: x86/mmu: Convert page fault paths to kvm_faultin_pfn()
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (42 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 43/85] KVM: Add kvm_faultin_pfn() to specifically service guest page faults Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 45/85] KVM: guest_memfd: Pass index, not gfn, to __kvm_gmem_get_pfn() Sean Christopherson
` (42 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Convert KVM x86 to use the recently introduced __kvm_faultin_pfn().
Opportunstically capture the refcounted_page grabbed by KVM for use in
future changes.
No functional change intended.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/mmu/mmu.c | 14 ++++++++++----
arch/x86/kvm/mmu/mmu_internal.h | 1 +
2 files changed, 11 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f730870887dd..2e2076287aaf 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4416,11 +4416,14 @@ static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
struct kvm_page_fault *fault)
{
+ unsigned int foll = fault->write ? FOLL_WRITE : 0;
+
if (fault->is_private)
return kvm_mmu_faultin_pfn_private(vcpu, fault);
- fault->pfn = __gfn_to_pfn_memslot(fault->slot, fault->gfn, false, true,
- fault->write, &fault->map_writable);
+ foll |= FOLL_NOWAIT;
+ fault->pfn = __kvm_faultin_pfn(fault->slot, fault->gfn, foll,
+ &fault->map_writable, &fault->refcounted_page);
/*
* If resolving the page failed because I/O is needed to fault-in the
@@ -4447,8 +4450,11 @@ static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
* to wait for IO. Note, gup always bails if it is unable to quickly
* get a page and a fatal signal, i.e. SIGKILL, is pending.
*/
- fault->pfn = __gfn_to_pfn_memslot(fault->slot, fault->gfn, true, true,
- fault->write, &fault->map_writable);
+ foll |= FOLL_INTERRUPTIBLE;
+ foll &= ~FOLL_NOWAIT;
+ fault->pfn = __kvm_faultin_pfn(fault->slot, fault->gfn, foll,
+ &fault->map_writable, &fault->refcounted_page);
+
return RET_PF_CONTINUE;
}
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 59e600f6ff9d..fabbea504a69 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -238,6 +238,7 @@ struct kvm_page_fault {
/* Outputs of kvm_mmu_faultin_pfn(). */
unsigned long mmu_seq;
kvm_pfn_t pfn;
+ struct page *refcounted_page;
bool map_writable;
/*
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 45/85] KVM: guest_memfd: Pass index, not gfn, to __kvm_gmem_get_pfn()
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (43 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 44/85] KVM: x86/mmu: Convert page fault paths to kvm_faultin_pfn() Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 46/85] KVM: guest_memfd: Provide "struct page" as output from kvm_gmem_get_pfn() Sean Christopherson
` (41 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Refactor guest_memfd usage of __kvm_gmem_get_pfn() to pass the index into
the guest_memfd file instead of the gfn, i.e. resolve the index based on
the slot+gfn in the caller instead of in __kvm_gmem_get_pfn(). This will
allow kvm_gmem_get_pfn() to retrieve and return the specific "struct page",
which requires the index into the folio, without a redoing the index
calculation multiple times (which isn't costly, just hard to follow).
Opportunistically add a kvm_gmem_get_index() helper to make the copy+pasted
code easier to understand.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/guest_memfd.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 8f079a61a56d..8a878e57c5d4 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -302,6 +302,11 @@ static inline struct file *kvm_gmem_get_file(struct kvm_memory_slot *slot)
return get_file_active(&slot->gmem.file);
}
+static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn)
+{
+ return gfn - slot->base_gfn + slot->gmem.pgoff;
+}
+
static struct file_operations kvm_gmem_fops = {
.open = generic_file_open,
.release = kvm_gmem_release,
@@ -551,12 +556,11 @@ void kvm_gmem_unbind(struct kvm_memory_slot *slot)
}
/* Returns a locked folio on success. */
-static struct folio *
-__kvm_gmem_get_pfn(struct file *file, struct kvm_memory_slot *slot,
- gfn_t gfn, kvm_pfn_t *pfn, bool *is_prepared,
- int *max_order)
+static struct folio *__kvm_gmem_get_pfn(struct file *file,
+ struct kvm_memory_slot *slot,
+ pgoff_t index, kvm_pfn_t *pfn,
+ bool *is_prepared, int *max_order)
{
- pgoff_t index = gfn - slot->base_gfn + slot->gmem.pgoff;
struct kvm_gmem *gmem = file->private_data;
struct folio *folio;
@@ -592,6 +596,7 @@ __kvm_gmem_get_pfn(struct file *file, struct kvm_memory_slot *slot,
int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
gfn_t gfn, kvm_pfn_t *pfn, int *max_order)
{
+ pgoff_t index = kvm_gmem_get_index(slot, gfn);
struct file *file = kvm_gmem_get_file(slot);
struct folio *folio;
bool is_prepared = false;
@@ -600,7 +605,7 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
if (!file)
return -EFAULT;
- folio = __kvm_gmem_get_pfn(file, slot, gfn, pfn, &is_prepared, max_order);
+ folio = __kvm_gmem_get_pfn(file, slot, index, pfn, &is_prepared, max_order);
if (IS_ERR(folio)) {
r = PTR_ERR(folio);
goto out;
@@ -648,6 +653,7 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long
for (i = 0; i < npages; i += (1 << max_order)) {
struct folio *folio;
gfn_t gfn = start_gfn + i;
+ pgoff_t index = kvm_gmem_get_index(slot, gfn);
bool is_prepared = false;
kvm_pfn_t pfn;
@@ -656,7 +662,7 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long
break;
}
- folio = __kvm_gmem_get_pfn(file, slot, gfn, &pfn, &is_prepared, &max_order);
+ folio = __kvm_gmem_get_pfn(file, slot, index, &pfn, &is_prepared, &max_order);
if (IS_ERR(folio)) {
ret = PTR_ERR(folio);
break;
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 46/85] KVM: guest_memfd: Provide "struct page" as output from kvm_gmem_get_pfn()
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (44 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 45/85] KVM: guest_memfd: Pass index, not gfn, to __kvm_gmem_get_pfn() Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 47/85] KVM: x86/mmu: Put refcounted pages instead of blindly releasing pfns Sean Christopherson
` (40 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Provide the "struct page" associated with a guest_memfd pfn as an output
from __kvm_gmem_get_pfn() so that KVM guest page fault handlers can
directly put the page instead of having to rely on
kvm_pfn_to_refcounted_page().
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/mmu/mmu.c | 2 +-
arch/x86/kvm/svm/sev.c | 10 ++++++----
include/linux/kvm_host.h | 6 ++++--
virt/kvm/guest_memfd.c | 8 ++++++--
4 files changed, 17 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 2e2076287aaf..a038cde74f0d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4400,7 +4400,7 @@ static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
}
r = kvm_gmem_get_pfn(vcpu->kvm, fault->slot, fault->gfn, &fault->pfn,
- &max_order);
+ &fault->refcounted_page, &max_order);
if (r) {
kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
return r;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 4557ff3804ae..c6c852485900 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3849,6 +3849,7 @@ static int __sev_snp_update_protected_guest_state(struct kvm_vcpu *vcpu)
if (VALID_PAGE(svm->sev_es.snp_vmsa_gpa)) {
gfn_t gfn = gpa_to_gfn(svm->sev_es.snp_vmsa_gpa);
struct kvm_memory_slot *slot;
+ struct page *page;
kvm_pfn_t pfn;
slot = gfn_to_memslot(vcpu->kvm, gfn);
@@ -3859,7 +3860,7 @@ static int __sev_snp_update_protected_guest_state(struct kvm_vcpu *vcpu)
* The new VMSA will be private memory guest memory, so
* retrieve the PFN from the gmem backend.
*/
- if (kvm_gmem_get_pfn(vcpu->kvm, slot, gfn, &pfn, NULL))
+ if (kvm_gmem_get_pfn(vcpu->kvm, slot, gfn, &pfn, &page, NULL))
return -EINVAL;
/*
@@ -3888,7 +3889,7 @@ static int __sev_snp_update_protected_guest_state(struct kvm_vcpu *vcpu)
* changes then care should be taken to ensure
* svm->sev_es.vmsa is pinned through some other means.
*/
- kvm_release_pfn_clean(pfn);
+ kvm_release_page_clean(page);
}
/*
@@ -4688,6 +4689,7 @@ void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
struct kvm_memory_slot *slot;
struct kvm *kvm = vcpu->kvm;
int order, rmp_level, ret;
+ struct page *page;
bool assigned;
kvm_pfn_t pfn;
gfn_t gfn;
@@ -4714,7 +4716,7 @@ void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
return;
}
- ret = kvm_gmem_get_pfn(kvm, slot, gfn, &pfn, &order);
+ ret = kvm_gmem_get_pfn(kvm, slot, gfn, &pfn, &page, &order);
if (ret) {
pr_warn_ratelimited("SEV: Unexpected RMP fault, no backing page for private GPA 0x%llx\n",
gpa);
@@ -4772,7 +4774,7 @@ void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
out:
trace_kvm_rmp_fault(vcpu, gpa, pfn, error_code, rmp_level, ret);
out_no_trace:
- put_page(pfn_to_page(pfn));
+ kvm_release_page_unused(page);
}
static bool is_pfn_range_shared(kvm_pfn_t start, kvm_pfn_t end)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 3b9afb40e935..504483d35197 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2490,11 +2490,13 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
#ifdef CONFIG_KVM_PRIVATE_MEM
int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
- gfn_t gfn, kvm_pfn_t *pfn, int *max_order);
+ gfn_t gfn, kvm_pfn_t *pfn, struct page **page,
+ int *max_order);
#else
static inline int kvm_gmem_get_pfn(struct kvm *kvm,
struct kvm_memory_slot *slot, gfn_t gfn,
- kvm_pfn_t *pfn, int *max_order)
+ kvm_pfn_t *pfn, struct page **page,
+ int *max_order)
{
KVM_BUG_ON(1, kvm);
return -EIO;
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 8a878e57c5d4..47a9f68f7b24 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -594,7 +594,8 @@ static struct folio *__kvm_gmem_get_pfn(struct file *file,
}
int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
- gfn_t gfn, kvm_pfn_t *pfn, int *max_order)
+ gfn_t gfn, kvm_pfn_t *pfn, struct page **page,
+ int *max_order)
{
pgoff_t index = kvm_gmem_get_index(slot, gfn);
struct file *file = kvm_gmem_get_file(slot);
@@ -615,7 +616,10 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
r = kvm_gmem_prepare_folio(kvm, slot, gfn, folio);
folio_unlock(folio);
- if (r < 0)
+
+ if (!r)
+ *page = folio_file_page(folio, index);
+ else
folio_put(folio);
out:
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 47/85] KVM: x86/mmu: Put refcounted pages instead of blindly releasing pfns
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (45 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 46/85] KVM: guest_memfd: Provide "struct page" as output from kvm_gmem_get_pfn() Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 48/85] KVM: x86/mmu: Don't mark unused faultin pages as accessed Sean Christopherson
` (39 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Now that all x86 page fault paths precisely track refcounted pages, use
Use kvm_page_fault.refcounted_page to put references to struct page memory
when finishing page faults. This is a baby step towards eliminating
kvm_pfn_to_refcounted_page().
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/mmu/mmu.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a038cde74f0d..f9b7e3a7370f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4373,6 +4373,9 @@ static void kvm_mmu_finish_page_fault(struct kvm_vcpu *vcpu,
lockdep_assert_once(lockdep_is_held(&vcpu->kvm->mmu_lock) ||
r == RET_PF_RETRY);
+ if (!fault->refcounted_page)
+ return;
+
/*
* If the page that KVM got from the *primary MMU* is writable, and KVM
* installed or reused a SPTE, mark the page/folio dirty. Note, this
@@ -4384,9 +4387,9 @@ static void kvm_mmu_finish_page_fault(struct kvm_vcpu *vcpu,
* folio dirty if KVM could locklessly make the SPTE writable.
*/
if (!fault->map_writable || r == RET_PF_RETRY)
- kvm_release_pfn_clean(fault->pfn);
+ kvm_release_page_clean(fault->refcounted_page);
else
- kvm_release_pfn_dirty(fault->pfn);
+ kvm_release_page_dirty(fault->refcounted_page);
}
static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 48/85] KVM: x86/mmu: Don't mark unused faultin pages as accessed
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (46 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 47/85] KVM: x86/mmu: Put refcounted pages instead of blindly releasing pfns Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 49/85] KVM: Move x86's API to release a faultin page to common KVM Sean Christopherson
` (38 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
When finishing guest page faults, don't mark pages as accessed if KVM
is resuming the guest _without_ installing a mapping, i.e. if the page
isn't being used. While it's possible that marking the page accessed
could avoid minor thrashing due to reclaiming a page that the guest is
about to access, it's far more likely that the gfn=>pfn mapping was
was invalidated, e.g. due a memslot change, or because the corresponding
VMA is being modified.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/mmu/mmu.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f9b7e3a7370f..e14b84d2f55b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4386,7 +4386,9 @@ static void kvm_mmu_finish_page_fault(struct kvm_vcpu *vcpu,
* fault handler, and so KVM must (somewhat) speculatively mark the
* folio dirty if KVM could locklessly make the SPTE writable.
*/
- if (!fault->map_writable || r == RET_PF_RETRY)
+ if (r == RET_PF_RETRY)
+ kvm_release_page_unused(fault->refcounted_page);
+ else if (!fault->map_writable)
kvm_release_page_clean(fault->refcounted_page);
else
kvm_release_page_dirty(fault->refcounted_page);
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 49/85] KVM: Move x86's API to release a faultin page to common KVM
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (47 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 48/85] KVM: x86/mmu: Don't mark unused faultin pages as accessed Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 50/85] KVM: VMX: Hold mmu_lock until page is released when updating APIC access page Sean Christopherson
` (37 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Move KVM x86's helper that "finishes" the faultin process to common KVM
so that the logic can be shared across all architectures. Note, not all
architectures implement a fast page fault path, but the gist of the
comment applies to all architectures.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/mmu/mmu.c | 24 ++----------------------
include/linux/kvm_host.h | 26 ++++++++++++++++++++++++++
2 files changed, 28 insertions(+), 22 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e14b84d2f55b..5acdaf3b1007 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4370,28 +4370,8 @@ static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
static void kvm_mmu_finish_page_fault(struct kvm_vcpu *vcpu,
struct kvm_page_fault *fault, int r)
{
- lockdep_assert_once(lockdep_is_held(&vcpu->kvm->mmu_lock) ||
- r == RET_PF_RETRY);
-
- if (!fault->refcounted_page)
- return;
-
- /*
- * If the page that KVM got from the *primary MMU* is writable, and KVM
- * installed or reused a SPTE, mark the page/folio dirty. Note, this
- * may mark a folio dirty even if KVM created a read-only SPTE, e.g. if
- * the GFN is write-protected. Folios can't be safely marked dirty
- * outside of mmu_lock as doing so could race with writeback on the
- * folio. As a result, KVM can't mark folios dirty in the fast page
- * fault handler, and so KVM must (somewhat) speculatively mark the
- * folio dirty if KVM could locklessly make the SPTE writable.
- */
- if (r == RET_PF_RETRY)
- kvm_release_page_unused(fault->refcounted_page);
- else if (!fault->map_writable)
- kvm_release_page_clean(fault->refcounted_page);
- else
- kvm_release_page_dirty(fault->refcounted_page);
+ kvm_release_faultin_page(vcpu->kvm, fault->refcounted_page,
+ r == RET_PF_RETRY, fault->map_writable);
}
static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 504483d35197..9f7682ece4a1 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1231,6 +1231,32 @@ static inline void kvm_release_page_unused(struct page *page)
void kvm_release_page_clean(struct page *page);
void kvm_release_page_dirty(struct page *page);
+static inline void kvm_release_faultin_page(struct kvm *kvm, struct page *page,
+ bool unused, bool dirty)
+{
+ lockdep_assert_once(lockdep_is_held(&kvm->mmu_lock) || unused);
+
+ if (!page)
+ return;
+
+ /*
+ * If the page that KVM got from the *primary MMU* is writable, and KVM
+ * installed or reused a SPTE, mark the page/folio dirty. Note, this
+ * may mark a folio dirty even if KVM created a read-only SPTE, e.g. if
+ * the GFN is write-protected. Folios can't be safely marked dirty
+ * outside of mmu_lock as doing so could race with writeback on the
+ * folio. As a result, KVM can't mark folios dirty in the fast page
+ * fault handler, and so KVM must (somewhat) speculatively mark the
+ * folio dirty if KVM could locklessly make the SPTE writable.
+ */
+ if (unused)
+ kvm_release_page_unused(page);
+ else if (dirty)
+ kvm_release_page_dirty(page);
+ else
+ kvm_release_page_clean(page);
+}
+
kvm_pfn_t __kvm_faultin_pfn(const struct kvm_memory_slot *slot, gfn_t gfn,
unsigned int foll, bool *writable,
struct page **refcounted_page);
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 50/85] KVM: VMX: Hold mmu_lock until page is released when updating APIC access page
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (48 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 49/85] KVM: Move x86's API to release a faultin page to common KVM Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 51/85] KVM: VMX: Use __kvm_faultin_page() to get APIC access page/pfn Sean Christopherson
` (36 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Hold mmu_lock across kvm_release_pfn_clean() when refreshing the APIC
access page address to ensure that KVM doesn't mark a page/folio as
accessed after it has been unmapped. Practically speaking marking a folio
accesses is benign in this scenario, as KVM does hold a reference (it's
really just marking folios dirty that is problematic), but there's no
reason not to be paranoid (moving the APIC access page isn't a hot path),
and no reason to be different from other mmu_notifier-protected flows in
KVM.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/vmx/vmx.c | 21 +++++++++------------
1 file changed, 9 insertions(+), 12 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1a4438358c5e..851be0820e04 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6832,25 +6832,22 @@ void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
return;
read_lock(&vcpu->kvm->mmu_lock);
- if (mmu_invalidate_retry_gfn(kvm, mmu_seq, gfn)) {
+ if (mmu_invalidate_retry_gfn(kvm, mmu_seq, gfn))
kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu);
- read_unlock(&vcpu->kvm->mmu_lock);
- goto out;
- }
+ else
+ vmcs_write64(APIC_ACCESS_ADDR, pfn_to_hpa(pfn));
- vmcs_write64(APIC_ACCESS_ADDR, pfn_to_hpa(pfn));
- read_unlock(&vcpu->kvm->mmu_lock);
-
- /*
- * No need for a manual TLB flush at this point, KVM has already done a
- * flush if there were SPTEs pointing at the previous page.
- */
-out:
/*
* Do not pin apic access page in memory, the MMU notifier
* will call us again if it is migrated or swapped out.
*/
kvm_release_pfn_clean(pfn);
+
+ /*
+ * No need for a manual TLB flush at this point, KVM has already done a
+ * flush if there were SPTEs pointing at the previous page.
+ */
+ read_unlock(&vcpu->kvm->mmu_lock);
}
void vmx_hwapic_isr_update(int max_isr)
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 51/85] KVM: VMX: Use __kvm_faultin_page() to get APIC access page/pfn
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (49 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 50/85] KVM: VMX: Hold mmu_lock until page is released when updating APIC access page Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-21 10:22 ` Yan Zhao
2024-10-10 18:23 ` [PATCH v13 52/85] KVM: PPC: e500: Mark "struct page" dirty in kvmppc_e500_shadow_map() Sean Christopherson
` (35 subsequent siblings)
86 siblings, 1 reply; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Use __kvm_faultin_page() get the APIC access page so that KVM can
precisely release the refcounted page, i.e. to remove yet another user
of kvm_pfn_to_refcounted_page(). While the path isn't handling a guest
page fault, the semantics are effectively the same; KVM just happens to
be mapping the pfn into a VMCS field instead of a secondary MMU.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/vmx/vmx.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 851be0820e04..44cc25dfebba 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6790,8 +6790,10 @@ void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
struct kvm *kvm = vcpu->kvm;
struct kvm_memslots *slots = kvm_memslots(kvm);
struct kvm_memory_slot *slot;
+ struct page *refcounted_page;
unsigned long mmu_seq;
kvm_pfn_t pfn;
+ bool writable;
/* Defer reload until vmcs01 is the current VMCS. */
if (is_guest_mode(vcpu)) {
@@ -6827,7 +6829,7 @@ void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
* controls the APIC-access page memslot, and only deletes the memslot
* if APICv is permanently inhibited, i.e. the memslot won't reappear.
*/
- pfn = gfn_to_pfn_memslot(slot, gfn);
+ pfn = __kvm_faultin_pfn(slot, gfn, FOLL_WRITE, &writable, &refcounted_page);
if (is_error_noslot_pfn(pfn))
return;
@@ -6838,10 +6840,13 @@ void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
vmcs_write64(APIC_ACCESS_ADDR, pfn_to_hpa(pfn));
/*
- * Do not pin apic access page in memory, the MMU notifier
- * will call us again if it is migrated or swapped out.
+ * Do not pin the APIC access page in memory so that it can be freely
+ * migrated, the MMU notifier will call us again if it is migrated or
+ * swapped out. KVM backs the memslot with anonymous memory, the pfn
+ * should always point at a refcounted page (if the pfn is valid).
*/
- kvm_release_pfn_clean(pfn);
+ if (!WARN_ON_ONCE(!refcounted_page))
+ kvm_release_page_clean(refcounted_page);
/*
* No need for a manual TLB flush at this point, KVM has already done a
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* Re: [PATCH v13 51/85] KVM: VMX: Use __kvm_faultin_page() to get APIC access page/pfn
2024-10-10 18:23 ` [PATCH v13 51/85] KVM: VMX: Use __kvm_faultin_page() to get APIC access page/pfn Sean Christopherson
@ 2024-10-21 10:22 ` Yan Zhao
2024-10-21 18:57 ` Sean Christopherson
0 siblings, 1 reply; 99+ messages in thread
From: Yan Zhao @ 2024-10-21 10:22 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, David Matlack, David Stevens, Andrew Jones
On Thu, Oct 10, 2024 at 11:23:53AM -0700, Sean Christopherson wrote:
> Use __kvm_faultin_page() get the APIC access page so that KVM can
> precisely release the refcounted page, i.e. to remove yet another user
> of kvm_pfn_to_refcounted_page(). While the path isn't handling a guest
> page fault, the semantics are effectively the same; KVM just happens to
> be mapping the pfn into a VMCS field instead of a secondary MMU.
>
> Tested-by: Alex Bennée <alex.bennee@linaro.org>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/kvm/vmx/vmx.c | 13 +++++++++----
> 1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 851be0820e04..44cc25dfebba 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -6790,8 +6790,10 @@ void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
> struct kvm *kvm = vcpu->kvm;
> struct kvm_memslots *slots = kvm_memslots(kvm);
> struct kvm_memory_slot *slot;
> + struct page *refcounted_page;
> unsigned long mmu_seq;
> kvm_pfn_t pfn;
> + bool writable;
>
> /* Defer reload until vmcs01 is the current VMCS. */
> if (is_guest_mode(vcpu)) {
> @@ -6827,7 +6829,7 @@ void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
> * controls the APIC-access page memslot, and only deletes the memslot
> * if APICv is permanently inhibited, i.e. the memslot won't reappear.
> */
> - pfn = gfn_to_pfn_memslot(slot, gfn);
> + pfn = __kvm_faultin_pfn(slot, gfn, FOLL_WRITE, &writable, &refcounted_page);
> if (is_error_noslot_pfn(pfn))
> return;
>
> @@ -6838,10 +6840,13 @@ void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
> vmcs_write64(APIC_ACCESS_ADDR, pfn_to_hpa(pfn));
>
> /*
> - * Do not pin apic access page in memory, the MMU notifier
> - * will call us again if it is migrated or swapped out.
> + * Do not pin the APIC access page in memory so that it can be freely
> + * migrated, the MMU notifier will call us again if it is migrated or
> + * swapped out. KVM backs the memslot with anonymous memory, the pfn
> + * should always point at a refcounted page (if the pfn is valid).
> */
> - kvm_release_pfn_clean(pfn);
> + if (!WARN_ON_ONCE(!refcounted_page))
> + kvm_release_page_clean(refcounted_page);
Why it's not
if (!WARN_ON_ONCE(!refcounted_page)) {
if (writable)
kvm_release_page_dirty(refcounted_page)
else
kvm_release_page_clean(refcounted_page)
}
or simply not pass "writable" to __kvm_faultin_pfn() as we know the slot is
not read-only and then set dirty ?
if (!WARN_ON_ONCE(!refcounted_page))
kvm_release_page_dirty(refcounted_page)
>
> /*
> * No need for a manual TLB flush at this point, KVM has already done a
> --
> 2.47.0.rc1.288.g06298d1525-goog
>
^ permalink raw reply [flat|nested] 99+ messages in thread
* Re: [PATCH v13 51/85] KVM: VMX: Use __kvm_faultin_page() to get APIC access page/pfn
2024-10-21 10:22 ` Yan Zhao
@ 2024-10-21 18:57 ` Sean Christopherson
2024-10-22 2:15 ` Yan Zhao
0 siblings, 1 reply; 99+ messages in thread
From: Sean Christopherson @ 2024-10-21 18:57 UTC (permalink / raw)
To: Yan Zhao
Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, David Matlack, David Stevens, Andrew Jones
On Mon, Oct 21, 2024, Yan Zhao wrote:
> On Thu, Oct 10, 2024 at 11:23:53AM -0700, Sean Christopherson wrote:
> > Use __kvm_faultin_page() get the APIC access page so that KVM can
> > precisely release the refcounted page, i.e. to remove yet another user
> > of kvm_pfn_to_refcounted_page(). While the path isn't handling a guest
> > page fault, the semantics are effectively the same; KVM just happens to
> > be mapping the pfn into a VMCS field instead of a secondary MMU.
> >
> > Tested-by: Alex Bennée <alex.bennee@linaro.org>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
...
> > @@ -6838,10 +6840,13 @@ void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
> > vmcs_write64(APIC_ACCESS_ADDR, pfn_to_hpa(pfn));
> >
> > /*
> > - * Do not pin apic access page in memory, the MMU notifier
> > - * will call us again if it is migrated or swapped out.
> > + * Do not pin the APIC access page in memory so that it can be freely
> > + * migrated, the MMU notifier will call us again if it is migrated or
> > + * swapped out. KVM backs the memslot with anonymous memory, the pfn
> > + * should always point at a refcounted page (if the pfn is valid).
> > */
> > - kvm_release_pfn_clean(pfn);
> > + if (!WARN_ON_ONCE(!refcounted_page))
> > + kvm_release_page_clean(refcounted_page);
> Why it's not
> if (!WARN_ON_ONCE(!refcounted_page)) {
> if (writable)
> kvm_release_page_dirty(refcounted_page)
> else
> kvm_release_page_clean(refcounted_page)
> }
>
> or simply not pass "writable" to __kvm_faultin_pfn() as we know the slot is
> not read-only and then set dirty ?
__kvm_faultin_pfn() requires a non-NULL @writable. The intent is to help ensure
the caller is actually checking whether a readable vs. writable mapping was
acquired. For cases that explicitly pass FOLL_WRITE, it's awkward, but those
should be few and far between.
> if (!WARN_ON_ONCE(!refcounted_page))
> kvm_release_page_dirty(refcounted_page)
Ya, this is probably more correct? Though I would strongly prefer to make any
change in behavior on top of this series. The use of kvm_release_page_clean()
was added by commit 878940b33d76 ("KVM: VMX: Retry APIC-access page reload if
invalidation is in-progress"), and I suspect the only reason it added the
kvm_set_page_accessed() call is because there was no "unused" variant. I.e. there
was no concious decision to set Accessed but not Dirty.
^ permalink raw reply [flat|nested] 99+ messages in thread
* Re: [PATCH v13 51/85] KVM: VMX: Use __kvm_faultin_page() to get APIC access page/pfn
2024-10-21 18:57 ` Sean Christopherson
@ 2024-10-22 2:15 ` Yan Zhao
0 siblings, 0 replies; 99+ messages in thread
From: Yan Zhao @ 2024-10-22 2:15 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, David Matlack, David Stevens, Andrew Jones
On Mon, Oct 21, 2024 at 11:57:42AM -0700, Sean Christopherson wrote:
> On Mon, Oct 21, 2024, Yan Zhao wrote:
> > On Thu, Oct 10, 2024 at 11:23:53AM -0700, Sean Christopherson wrote:
> > > Use __kvm_faultin_page() get the APIC access page so that KVM can
> > > precisely release the refcounted page, i.e. to remove yet another user
> > > of kvm_pfn_to_refcounted_page(). While the path isn't handling a guest
> > > page fault, the semantics are effectively the same; KVM just happens to
> > > be mapping the pfn into a VMCS field instead of a secondary MMU.
> > >
> > > Tested-by: Alex Bennée <alex.bennee@linaro.org>
> > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > ---
>
> ...
>
> > > @@ -6838,10 +6840,13 @@ void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
> > > vmcs_write64(APIC_ACCESS_ADDR, pfn_to_hpa(pfn));
> > >
> > > /*
> > > - * Do not pin apic access page in memory, the MMU notifier
> > > - * will call us again if it is migrated or swapped out.
> > > + * Do not pin the APIC access page in memory so that it can be freely
> > > + * migrated, the MMU notifier will call us again if it is migrated or
> > > + * swapped out. KVM backs the memslot with anonymous memory, the pfn
> > > + * should always point at a refcounted page (if the pfn is valid).
> > > */
> > > - kvm_release_pfn_clean(pfn);
> > > + if (!WARN_ON_ONCE(!refcounted_page))
> > > + kvm_release_page_clean(refcounted_page);
> > Why it's not
> > if (!WARN_ON_ONCE(!refcounted_page)) {
> > if (writable)
> > kvm_release_page_dirty(refcounted_page)
> > else
> > kvm_release_page_clean(refcounted_page)
> > }
> >
> > or simply not pass "writable" to __kvm_faultin_pfn() as we know the slot is
> > not read-only and then set dirty ?
>
> __kvm_faultin_pfn() requires a non-NULL @writable. The intent is to help ensure
Ah, right.
> the caller is actually checking whether a readable vs. writable mapping was
> acquired. For cases that explicitly pass FOLL_WRITE, it's awkward, but those
> should be few and far between.
Yes, a little weird though nothing wrong in this case by passing "writable"
without checking its value back :)
>
> > if (!WARN_ON_ONCE(!refcounted_page))
> > kvm_release_page_dirty(refcounted_page)
>
> Ya, this is probably more correct? Though I would strongly prefer to make any
> change in behavior on top of this series. The use of kvm_release_page_clean()
> was added by commit 878940b33d76 ("KVM: VMX: Retry APIC-access page reload if
> invalidation is in-progress"), and I suspect the only reason it added the
> kvm_set_page_accessed() call is because there was no "unused" variant. I.e. there
> was no concious decision to set Accessed but not Dirty.
Thanks for the explanation.
I asked that because I noticed that the policies for setting Accessed or Dirty
for the APIC access page are not aligned between L1 (only Accessed) and L2 (Dirty).
^ permalink raw reply [flat|nested] 99+ messages in thread
* [PATCH v13 52/85] KVM: PPC: e500: Mark "struct page" dirty in kvmppc_e500_shadow_map()
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (50 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 51/85] KVM: VMX: Use __kvm_faultin_page() to get APIC access page/pfn Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 53/85] KVM: PPC: e500: Mark "struct page" pfn accessed before dropping mmu_lock Sean Christopherson
` (34 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Mark the underlying page as dirty in kvmppc_e500_ref_setup()'s sole
caller, kvmppc_e500_shadow_map(), which will allow converting e500 to
__kvm_faultin_pfn() + kvm_release_faultin_page() without having to do
a weird dance between ref_setup() and shadow_map().
Opportunistically drop the redundant kvm_set_pfn_accessed(), as
shadow_map() puts the page via kvm_release_pfn_clean().
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/powerpc/kvm/e500_mmu_host.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index c664fdec75b1..5c2adfd19e12 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -242,7 +242,7 @@ static inline int tlbe_is_writable(struct kvm_book3e_206_tlb_entry *tlbe)
return tlbe->mas7_3 & (MAS3_SW|MAS3_UW);
}
-static inline void kvmppc_e500_ref_setup(struct tlbe_ref *ref,
+static inline bool kvmppc_e500_ref_setup(struct tlbe_ref *ref,
struct kvm_book3e_206_tlb_entry *gtlbe,
kvm_pfn_t pfn, unsigned int wimg)
{
@@ -252,11 +252,7 @@ static inline void kvmppc_e500_ref_setup(struct tlbe_ref *ref,
/* Use guest supplied MAS2_G and MAS2_E */
ref->flags |= (gtlbe->mas2 & MAS2_ATTRIB_MASK) | wimg;
- /* Mark the page accessed */
- kvm_set_pfn_accessed(pfn);
-
- if (tlbe_is_writable(gtlbe))
- kvm_set_pfn_dirty(pfn);
+ return tlbe_is_writable(gtlbe);
}
static inline void kvmppc_e500_ref_release(struct tlbe_ref *ref)
@@ -337,6 +333,7 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
unsigned int wimg = 0;
pgd_t *pgdir;
unsigned long flags;
+ bool writable = false;
/* used to check for invalidations in progress */
mmu_seq = kvm->mmu_invalidate_seq;
@@ -490,7 +487,9 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
goto out;
}
}
- kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg);
+ writable = kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg);
+ if (writable)
+ kvm_set_pfn_dirty(pfn);
kvmppc_e500_setup_stlbe(&vcpu_e500->vcpu, gtlbe, tsize,
ref, gvaddr, stlbe);
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 53/85] KVM: PPC: e500: Mark "struct page" pfn accessed before dropping mmu_lock
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (51 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 52/85] KVM: PPC: e500: Mark "struct page" dirty in kvmppc_e500_shadow_map() Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 54/85] KVM: PPC: e500: Use __kvm_faultin_pfn() to handle page faults Sean Christopherson
` (33 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Mark pages accessed before dropping mmu_lock when faulting in guest memory
so that shadow_map() can convert to kvm_release_faultin_page() without
tripping its lockdep assertion on mmu_lock being held. Marking pages
accessed outside of mmu_lock is ok (not great, but safe), but marking
pages _dirty_ outside of mmu_lock can make filesystems unhappy.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/powerpc/kvm/e500_mmu_host.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index 5c2adfd19e12..334dd96f8081 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -498,11 +498,9 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
kvmppc_mmu_flush_icache(pfn);
out:
- spin_unlock(&kvm->mmu_lock);
-
/* Drop refcount on page, so that mmu notifiers can clear it */
kvm_release_pfn_clean(pfn);
-
+ spin_unlock(&kvm->mmu_lock);
return ret;
}
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 54/85] KVM: PPC: e500: Use __kvm_faultin_pfn() to handle page faults
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (52 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 53/85] KVM: PPC: e500: Mark "struct page" pfn accessed before dropping mmu_lock Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 55/85] KVM: arm64: Mark "struct page" pfns accessed/dirty before dropping mmu_lock Sean Christopherson
` (32 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Convert PPC e500 to use __kvm_faultin_pfn()+kvm_release_faultin_page(),
and continue the inexorable march towards the demise of
kvm_pfn_to_refcounted_page().
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/powerpc/kvm/e500_mmu_host.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index 334dd96f8081..e5a145b578a4 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -322,6 +322,7 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
{
struct kvm_memory_slot *slot;
unsigned long pfn = 0; /* silence GCC warning */
+ struct page *page = NULL;
unsigned long hva;
int pfnmap = 0;
int tsize = BOOK3E_PAGESZ_4K;
@@ -443,7 +444,7 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
if (likely(!pfnmap)) {
tsize_pages = 1UL << (tsize + 10 - PAGE_SHIFT);
- pfn = gfn_to_pfn_memslot(slot, gfn);
+ pfn = __kvm_faultin_pfn(slot, gfn, FOLL_WRITE, NULL, &page);
if (is_error_noslot_pfn(pfn)) {
if (printk_ratelimit())
pr_err("%s: real page not found for gfn %lx\n",
@@ -488,8 +489,6 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
}
}
writable = kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg);
- if (writable)
- kvm_set_pfn_dirty(pfn);
kvmppc_e500_setup_stlbe(&vcpu_e500->vcpu, gtlbe, tsize,
ref, gvaddr, stlbe);
@@ -498,8 +497,7 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
kvmppc_mmu_flush_icache(pfn);
out:
- /* Drop refcount on page, so that mmu notifiers can clear it */
- kvm_release_pfn_clean(pfn);
+ kvm_release_faultin_page(kvm, page, !!ret, writable);
spin_unlock(&kvm->mmu_lock);
return ret;
}
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 55/85] KVM: arm64: Mark "struct page" pfns accessed/dirty before dropping mmu_lock
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (53 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 54/85] KVM: PPC: e500: Use __kvm_faultin_pfn() to handle page faults Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 56/85] KVM: arm64: Use __kvm_faultin_pfn() to handle memory aborts Sean Christopherson
` (31 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Mark pages/folios accessed+dirty prior to dropping mmu_lock, as marking a
page/folio dirty after it has been written back can make some filesystems
unhappy (backing KVM guests will such filesystem files is uncommon, and
the race is minuscule, hence the lack of complaints).
While scary sounding, practically speaking the worst case scenario is that
KVM would trigger this WARN in filemap_unaccount_folio():
/*
* At this point folio must be either written or cleaned by
* truncate. Dirty folio here signals a bug and loss of
* unwritten data - on ordinary filesystems.
*
* But it's harmless on in-memory filesystems like tmpfs; and can
* occur when a driver which did get_user_pages() sets page dirty
* before putting it, while the inode is being finally evicted.
*
* Below fixes dirty accounting after removing the folio entirely
* but leaves the dirty flag set: it has no effect for truncated
* folio and anyway will be cleared before returning folio to
* buddy allocator.
*/
if (WARN_ON_ONCE(folio_test_dirty(folio) &&
mapping_can_writeback(mapping)))
folio_account_cleaned(folio, inode_to_wb(mapping->host));
KVM won't actually write memory because the stage-2 mappings are protected
by the mmu_notifier, i.e. there is no risk of loss of data, even if the
VM were backed by memory that needs writeback.
See the link below for additional details.
This will also allow converting arm64 to kvm_release_faultin_page(), which
requires that mmu_lock be held (for the aforementioned reason).
Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/arm64/kvm/mmu.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index dd221587fcca..ecc6c2b56c43 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1692,15 +1692,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
}
out_unlock:
+ if (writable && !ret)
+ kvm_release_pfn_dirty(pfn);
+ else
+ kvm_release_pfn_clean(pfn);
+
read_unlock(&kvm->mmu_lock);
/* Mark the page dirty only if the fault is handled successfully */
- if (writable && !ret) {
- kvm_set_pfn_dirty(pfn);
+ if (writable && !ret)
mark_page_dirty_in_slot(kvm, memslot, gfn);
- }
- kvm_release_pfn_clean(pfn);
return ret != -EAGAIN ? ret : 0;
}
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 56/85] KVM: arm64: Use __kvm_faultin_pfn() to handle memory aborts
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (54 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 55/85] KVM: arm64: Mark "struct page" pfns accessed/dirty before dropping mmu_lock Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:23 ` [PATCH v13 57/85] KVM: RISC-V: Mark "struct page" pfns dirty iff a stage-2 PTE is installed Sean Christopherson
` (30 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Convert arm64 to use __kvm_faultin_pfn()+kvm_release_faultin_page().
Three down, six to go.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/arm64/kvm/mmu.c | 15 ++++++---------
1 file changed, 6 insertions(+), 9 deletions(-)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index ecc6c2b56c43..4054356c9712 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1439,6 +1439,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
long vma_pagesize, fault_granule;
enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
struct kvm_pgtable *pgt;
+ struct page *page;
if (fault_is_perm)
fault_granule = kvm_vcpu_trap_get_perm_fault_granule(vcpu);
@@ -1560,7 +1561,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
/*
* Read mmu_invalidate_seq so that KVM can detect if the results of
- * vma_lookup() or __gfn_to_pfn_memslot() become stale prior to
+ * vma_lookup() or __kvm_faultin_pfn() become stale prior to
* acquiring kvm->mmu_lock.
*
* Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs
@@ -1569,8 +1570,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
mmu_seq = vcpu->kvm->mmu_invalidate_seq;
mmap_read_unlock(current->mm);
- pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
- write_fault, &writable);
+ pfn = __kvm_faultin_pfn(memslot, gfn, write_fault ? FOLL_WRITE : 0,
+ &writable, &page);
if (pfn == KVM_PFN_ERR_HWPOISON) {
kvm_send_hwpoison_signal(hva, vma_shift);
return 0;
@@ -1583,7 +1584,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
* If the page was identified as device early by looking at
* the VMA flags, vma_pagesize is already representing the
* largest quantity we can map. If instead it was mapped
- * via gfn_to_pfn_prot(), vma_pagesize is set to PAGE_SIZE
+ * via __kvm_faultin_pfn(), vma_pagesize is set to PAGE_SIZE
* and must not be upgraded.
*
* In both cases, we don't let transparent_hugepage_adjust()
@@ -1692,11 +1693,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
}
out_unlock:
- if (writable && !ret)
- kvm_release_pfn_dirty(pfn);
- else
- kvm_release_pfn_clean(pfn);
-
+ kvm_release_faultin_page(kvm, page, !!ret, writable);
read_unlock(&kvm->mmu_lock);
/* Mark the page dirty only if the fault is handled successfully */
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 57/85] KVM: RISC-V: Mark "struct page" pfns dirty iff a stage-2 PTE is installed
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (55 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 56/85] KVM: arm64: Use __kvm_faultin_pfn() to handle memory aborts Sean Christopherson
@ 2024-10-10 18:23 ` Sean Christopherson
2024-10-10 18:24 ` [PATCH v13 58/85] KVM: RISC-V: Mark "struct page" pfns accessed before dropping mmu_lock Sean Christopherson
` (29 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:23 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Don't mark pages dirty if KVM bails from the page fault handler without
installing a stage-2 mapping, i.e. if the page is guaranteed to not be
written by the guest.
In addition to being a (very) minor fix, this paves the way for converting
RISC-V to use kvm_release_faultin_page().
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Acked-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/riscv/kvm/mmu.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index b63650f9b966..06aa5a0d056d 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -669,7 +669,6 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
goto out_unlock;
if (writable) {
- kvm_set_pfn_dirty(hfn);
mark_page_dirty(kvm, gfn);
ret = gstage_map_page(kvm, pcache, gpa, hfn << PAGE_SHIFT,
vma_pagesize, false, true);
@@ -682,6 +681,9 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
kvm_err("Failed to map in G-stage\n");
out_unlock:
+ if ((!ret || ret == -EEXIST) && writable)
+ kvm_set_pfn_dirty(hfn);
+
spin_unlock(&kvm->mmu_lock);
kvm_set_pfn_accessed(hfn);
kvm_release_pfn_clean(hfn);
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 58/85] KVM: RISC-V: Mark "struct page" pfns accessed before dropping mmu_lock
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (56 preceding siblings ...)
2024-10-10 18:23 ` [PATCH v13 57/85] KVM: RISC-V: Mark "struct page" pfns dirty iff a stage-2 PTE is installed Sean Christopherson
@ 2024-10-10 18:24 ` Sean Christopherson
2024-10-10 18:24 ` [PATCH v13 59/85] KVM: RISC-V: Use kvm_faultin_pfn() when mapping pfns into the guest Sean Christopherson
` (28 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:24 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Mark pages accessed before dropping mmu_lock when faulting in guest memory
so that RISC-V can convert to kvm_release_faultin_page() without tripping
its lockdep assertion on mmu_lock being held. Marking pages accessed
outside of mmu_lock is ok (not great, but safe), but marking pages _dirty_
outside of mmu_lock can make filesystems unhappy (see the link below).
Do both under mmu_lock to minimize the chances of doing the wrong thing in
the future.
Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Acked-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/riscv/kvm/mmu.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 06aa5a0d056d..2e9aee518142 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -682,11 +682,11 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
out_unlock:
if ((!ret || ret == -EEXIST) && writable)
- kvm_set_pfn_dirty(hfn);
+ kvm_release_pfn_dirty(hfn);
+ else
+ kvm_release_pfn_clean(hfn);
spin_unlock(&kvm->mmu_lock);
- kvm_set_pfn_accessed(hfn);
- kvm_release_pfn_clean(hfn);
return ret;
}
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 59/85] KVM: RISC-V: Use kvm_faultin_pfn() when mapping pfns into the guest
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (57 preceding siblings ...)
2024-10-10 18:24 ` [PATCH v13 58/85] KVM: RISC-V: Mark "struct page" pfns accessed before dropping mmu_lock Sean Christopherson
@ 2024-10-10 18:24 ` Sean Christopherson
2024-10-10 18:24 ` [PATCH v13 60/85] KVM: PPC: Use __kvm_faultin_pfn() to handle page faults on Book3s HV Sean Christopherson
` (27 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:24 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Convert RISC-V to __kvm_faultin_pfn()+kvm_release_faultin_page(), which
are new APIs to consolidate arch code and provide consistent behavior
across all KVM architectures.
Opportunisticaly fix a s/priort/prior typo in the related comment.
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Acked-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/riscv/kvm/mmu.c | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 2e9aee518142..e11ad1b616f3 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -601,6 +601,7 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
bool logging = (memslot->dirty_bitmap &&
!(memslot->flags & KVM_MEM_READONLY)) ? true : false;
unsigned long vma_pagesize, mmu_seq;
+ struct page *page;
/* We need minimum second+third level pages */
ret = kvm_mmu_topup_memory_cache(pcache, gstage_pgd_levels);
@@ -631,7 +632,7 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
/*
* Read mmu_invalidate_seq so that KVM can detect if the results of
- * vma_lookup() or gfn_to_pfn_prot() become stale priort to acquiring
+ * vma_lookup() or __kvm_faultin_pfn() become stale prior to acquiring
* kvm->mmu_lock.
*
* Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs
@@ -647,7 +648,7 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
return -EFAULT;
}
- hfn = gfn_to_pfn_prot(kvm, gfn, is_write, &writable);
+ hfn = kvm_faultin_pfn(vcpu, gfn, is_write, &writable, &page);
if (hfn == KVM_PFN_ERR_HWPOISON) {
send_sig_mceerr(BUS_MCEERR_AR, (void __user *)hva,
vma_pageshift, current);
@@ -681,11 +682,7 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
kvm_err("Failed to map in G-stage\n");
out_unlock:
- if ((!ret || ret == -EEXIST) && writable)
- kvm_release_pfn_dirty(hfn);
- else
- kvm_release_pfn_clean(hfn);
-
+ kvm_release_faultin_page(kvm, page, ret && ret != -EEXIST, writable);
spin_unlock(&kvm->mmu_lock);
return ret;
}
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 60/85] KVM: PPC: Use __kvm_faultin_pfn() to handle page faults on Book3s HV
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (58 preceding siblings ...)
2024-10-10 18:24 ` [PATCH v13 59/85] KVM: RISC-V: Use kvm_faultin_pfn() when mapping pfns into the guest Sean Christopherson
@ 2024-10-10 18:24 ` Sean Christopherson
2024-10-10 18:24 ` [PATCH v13 61/85] KVM: PPC: Use __kvm_faultin_pfn() to handle page faults on Book3s Radix Sean Christopherson
` (26 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:24 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Replace Book3s HV's homebrewed fault-in logic with __kvm_faultin_pfn(),
which functionally does pretty much the exact same thing.
Note, when the code was written, KVM indeed didn't do fast GUP without
"!atomic && !async", but that has long since changed (KVM tries fast GUP
for all writable mappings).
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/powerpc/kvm/book3s_64_mmu_hv.c | 25 ++++---------------------
1 file changed, 4 insertions(+), 21 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 2f1d58984b41..f305395cf26e 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -603,27 +603,10 @@ int kvmppc_book3s_hv_page_fault(struct kvm_vcpu *vcpu,
write_ok = writing;
hva = gfn_to_hva_memslot(memslot, gfn);
- /*
- * Do a fast check first, since __gfn_to_pfn_memslot doesn't
- * do it with !atomic && !async, which is how we call it.
- * We always ask for write permission since the common case
- * is that the page is writable.
- */
- if (get_user_page_fast_only(hva, FOLL_WRITE, &page)) {
- write_ok = true;
- } else {
- /* Call KVM generic code to do the slow-path check */
- pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
- writing, &write_ok);
- if (is_error_noslot_pfn(pfn))
- return -EFAULT;
- page = NULL;
- if (pfn_valid(pfn)) {
- page = pfn_to_page(pfn);
- if (PageReserved(page))
- page = NULL;
- }
- }
+ pfn = __kvm_faultin_pfn(memslot, gfn, writing ? FOLL_WRITE : 0,
+ &write_ok, &page);
+ if (is_error_noslot_pfn(pfn))
+ return -EFAULT;
/*
* Read the PTE from the process' radix tree and use that
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 61/85] KVM: PPC: Use __kvm_faultin_pfn() to handle page faults on Book3s Radix
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (59 preceding siblings ...)
2024-10-10 18:24 ` [PATCH v13 60/85] KVM: PPC: Use __kvm_faultin_pfn() to handle page faults on Book3s HV Sean Christopherson
@ 2024-10-10 18:24 ` Sean Christopherson
2024-10-10 18:24 ` [PATCH v13 62/85] KVM: PPC: Drop unused @kvm_ro param from kvmppc_book3s_instantiate_page() Sean Christopherson
` (25 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:24 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Replace Book3s Radix's homebrewed (read: copy+pasted) fault-in logic with
__kvm_faultin_pfn(), which functionally does pretty much the exact same
thing.
Note, when the code was written, KVM indeed didn't do fast GUP without
"!atomic && !async", but that has long since changed (KVM tries fast GUP
for all writable mappings).
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/powerpc/kvm/book3s_64_mmu_radix.c | 29 +++++---------------------
1 file changed, 5 insertions(+), 24 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 8304b6f8fe45..14891d0a3b73 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -829,40 +829,21 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu,
unsigned long mmu_seq;
unsigned long hva, gfn = gpa >> PAGE_SHIFT;
bool upgrade_write = false;
- bool *upgrade_p = &upgrade_write;
pte_t pte, *ptep;
unsigned int shift, level;
int ret;
bool large_enable;
+ kvm_pfn_t pfn;
/* used to check for invalidations in progress */
mmu_seq = kvm->mmu_invalidate_seq;
smp_rmb();
- /*
- * Do a fast check first, since __gfn_to_pfn_memslot doesn't
- * do it with !atomic && !async, which is how we call it.
- * We always ask for write permission since the common case
- * is that the page is writable.
- */
hva = gfn_to_hva_memslot(memslot, gfn);
- if (!kvm_ro && get_user_page_fast_only(hva, FOLL_WRITE, &page)) {
- upgrade_write = true;
- } else {
- unsigned long pfn;
-
- /* Call KVM generic code to do the slow-path check */
- pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
- writing, upgrade_p);
- if (is_error_noslot_pfn(pfn))
- return -EFAULT;
- page = NULL;
- if (pfn_valid(pfn)) {
- page = pfn_to_page(pfn);
- if (PageReserved(page))
- page = NULL;
- }
- }
+ pfn = __kvm_faultin_pfn(memslot, gfn, writing ? FOLL_WRITE : 0,
+ &upgrade_write, &page);
+ if (is_error_noslot_pfn(pfn))
+ return -EFAULT;
/*
* Read the PTE from the process' radix tree and use that
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 62/85] KVM: PPC: Drop unused @kvm_ro param from kvmppc_book3s_instantiate_page()
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (60 preceding siblings ...)
2024-10-10 18:24 ` [PATCH v13 61/85] KVM: PPC: Use __kvm_faultin_pfn() to handle page faults on Book3s Radix Sean Christopherson
@ 2024-10-10 18:24 ` Sean Christopherson
2024-10-10 18:24 ` [PATCH v13 63/85] KVM: PPC: Book3S: Mark "struct page" pfns dirty/accessed after installing PTE Sean Christopherson
` (24 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:24 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Drop @kvm_ro from kvmppc_book3s_instantiate_page() as it is now only
written, and never read.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/powerpc/include/asm/kvm_book3s.h | 2 +-
arch/powerpc/kvm/book3s_64_mmu_radix.c | 6 ++----
arch/powerpc/kvm/book3s_hv_nested.c | 4 +---
3 files changed, 4 insertions(+), 8 deletions(-)
diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index 10618622d7ef..3d289dbe3982 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -203,7 +203,7 @@ extern bool kvmppc_hv_handle_set_rc(struct kvm *kvm, bool nested,
extern int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu,
unsigned long gpa,
struct kvm_memory_slot *memslot,
- bool writing, bool kvm_ro,
+ bool writing,
pte_t *inserted_pte, unsigned int *levelp);
extern int kvmppc_init_vm_radix(struct kvm *kvm);
extern void kvmppc_free_radix(struct kvm *kvm);
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 14891d0a3b73..b3e6e73d6a08 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -821,7 +821,7 @@ bool kvmppc_hv_handle_set_rc(struct kvm *kvm, bool nested, bool writing,
int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu,
unsigned long gpa,
struct kvm_memory_slot *memslot,
- bool writing, bool kvm_ro,
+ bool writing,
pte_t *inserted_pte, unsigned int *levelp)
{
struct kvm *kvm = vcpu->kvm;
@@ -931,7 +931,6 @@ int kvmppc_book3s_radix_page_fault(struct kvm_vcpu *vcpu,
struct kvm_memory_slot *memslot;
long ret;
bool writing = !!(dsisr & DSISR_ISSTORE);
- bool kvm_ro = false;
/* Check for unusual errors */
if (dsisr & DSISR_UNSUPP_MMU) {
@@ -984,7 +983,6 @@ int kvmppc_book3s_radix_page_fault(struct kvm_vcpu *vcpu,
ea, DSISR_ISSTORE | DSISR_PROTFAULT);
return RESUME_GUEST;
}
- kvm_ro = true;
}
/* Failed to set the reference/change bits */
@@ -1002,7 +1000,7 @@ int kvmppc_book3s_radix_page_fault(struct kvm_vcpu *vcpu,
/* Try to insert a pte */
ret = kvmppc_book3s_instantiate_page(vcpu, gpa, memslot, writing,
- kvm_ro, NULL, NULL);
+ NULL, NULL);
if (ret == 0 || ret == -EAGAIN)
ret = RESUME_GUEST;
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index 05f5220960c6..771173509617 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -1527,7 +1527,6 @@ static long int __kvmhv_nested_page_fault(struct kvm_vcpu *vcpu,
unsigned long n_gpa, gpa, gfn, perm = 0UL;
unsigned int shift, l1_shift, level;
bool writing = !!(dsisr & DSISR_ISSTORE);
- bool kvm_ro = false;
long int ret;
if (!gp->l1_gr_to_hr) {
@@ -1607,7 +1606,6 @@ static long int __kvmhv_nested_page_fault(struct kvm_vcpu *vcpu,
ea, DSISR_ISSTORE | DSISR_PROTFAULT);
return RESUME_GUEST;
}
- kvm_ro = true;
}
/* 2. Find the host pte for this L1 guest real address */
@@ -1629,7 +1627,7 @@ static long int __kvmhv_nested_page_fault(struct kvm_vcpu *vcpu,
if (!pte_present(pte) || (writing && !(pte_val(pte) & _PAGE_WRITE))) {
/* No suitable pte found -> try to insert a mapping */
ret = kvmppc_book3s_instantiate_page(vcpu, gpa, memslot,
- writing, kvm_ro, &pte, &level);
+ writing, &pte, &level);
if (ret == -EAGAIN)
return RESUME_GUEST;
else if (ret)
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 63/85] KVM: PPC: Book3S: Mark "struct page" pfns dirty/accessed after installing PTE
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (61 preceding siblings ...)
2024-10-10 18:24 ` [PATCH v13 62/85] KVM: PPC: Drop unused @kvm_ro param from kvmppc_book3s_instantiate_page() Sean Christopherson
@ 2024-10-10 18:24 ` Sean Christopherson
2024-10-10 18:24 ` [PATCH v13 64/85] KVM: PPC: Use kvm_faultin_pfn() to handle page faults on Book3s PR Sean Christopherson
` (23 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:24 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Mark pages/folios dirty/accessed after installing a PTE, and more
specifically after acquiring mmu_lock and checking for an mmu_notifier
invalidation. Marking a page/folio dirty after it has been written back
can make some filesystems unhappy (backing KVM guests will such filesystem
files is uncommon, and the race is minuscule, hence the lack of complaints).
See the link below for details.
This will also allow converting Book3S to kvm_release_faultin_page(),
which requires that mmu_lock be held (for the aforementioned reason).
Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/powerpc/kvm/book3s_64_mmu_host.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c b/arch/powerpc/kvm/book3s_64_mmu_host.c
index bc6a381b5346..d0e4f7bbdc3d 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -121,13 +121,10 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte,
vpn = hpt_vpn(orig_pte->eaddr, map->host_vsid, MMU_SEGSIZE_256M);
- kvm_set_pfn_accessed(pfn);
if (!orig_pte->may_write || !writable)
rflags |= PP_RXRX;
- else {
+ else
mark_page_dirty(vcpu->kvm, gfn);
- kvm_set_pfn_dirty(pfn);
- }
if (!orig_pte->may_execute)
rflags |= HPTE_R_N;
@@ -202,8 +199,11 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte,
}
out_unlock:
+ if (!orig_pte->may_write || !writable)
+ kvm_release_pfn_clean(pfn);
+ else
+ kvm_release_pfn_dirty(pfn);
spin_unlock(&kvm->mmu_lock);
- kvm_release_pfn_clean(pfn);
if (cpte)
kvmppc_mmu_hpte_cache_free(cpte);
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 64/85] KVM: PPC: Use kvm_faultin_pfn() to handle page faults on Book3s PR
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (62 preceding siblings ...)
2024-10-10 18:24 ` [PATCH v13 63/85] KVM: PPC: Book3S: Mark "struct page" pfns dirty/accessed after installing PTE Sean Christopherson
@ 2024-10-10 18:24 ` Sean Christopherson
2024-10-10 18:24 ` [PATCH v13 65/85] KVM: LoongArch: Mark "struct page" pfns dirty only in "slow" page fault path Sean Christopherson
` (22 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:24 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Convert Book3S PR to __kvm_faultin_pfn()+kvm_release_faultin_page(), which
are new APIs to consolidate arch code and provide consistent behavior
across all KVM architectures.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/powerpc/include/asm/kvm_book3s.h | 2 +-
arch/powerpc/kvm/book3s.c | 7 ++++---
arch/powerpc/kvm/book3s_32_mmu_host.c | 7 ++++---
arch/powerpc/kvm/book3s_64_mmu_host.c | 10 +++++-----
4 files changed, 14 insertions(+), 12 deletions(-)
diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index 3d289dbe3982..e1ff291ba891 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -235,7 +235,7 @@ extern void kvmppc_set_bat(struct kvm_vcpu *vcpu, struct kvmppc_bat *bat,
extern void kvmppc_giveup_ext(struct kvm_vcpu *vcpu, ulong msr);
extern int kvmppc_emulate_paired_single(struct kvm_vcpu *vcpu);
extern kvm_pfn_t kvmppc_gpa_to_pfn(struct kvm_vcpu *vcpu, gpa_t gpa,
- bool writing, bool *writable);
+ bool writing, bool *writable, struct page **page);
extern void kvmppc_add_revmap_chain(struct kvm *kvm, struct revmap_entry *rev,
unsigned long *rmap, long pte_index, int realmode);
extern void kvmppc_update_dirty_map(const struct kvm_memory_slot *memslot,
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index ff6c38373957..d79c5d1098c0 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -422,7 +422,7 @@ int kvmppc_core_prepare_to_enter(struct kvm_vcpu *vcpu)
EXPORT_SYMBOL_GPL(kvmppc_core_prepare_to_enter);
kvm_pfn_t kvmppc_gpa_to_pfn(struct kvm_vcpu *vcpu, gpa_t gpa, bool writing,
- bool *writable)
+ bool *writable, struct page **page)
{
ulong mp_pa = vcpu->arch.magic_page_pa & KVM_PAM;
gfn_t gfn = gpa >> PAGE_SHIFT;
@@ -437,13 +437,14 @@ kvm_pfn_t kvmppc_gpa_to_pfn(struct kvm_vcpu *vcpu, gpa_t gpa, bool writing,
kvm_pfn_t pfn;
pfn = (kvm_pfn_t)virt_to_phys((void*)shared_page) >> PAGE_SHIFT;
- get_page(pfn_to_page(pfn));
+ *page = pfn_to_page(pfn);
+ get_page(*page);
if (writable)
*writable = true;
return pfn;
}
- return gfn_to_pfn_prot(vcpu->kvm, gfn, writing, writable);
+ return kvm_faultin_pfn(vcpu, gfn, writing, writable, page);
}
EXPORT_SYMBOL_GPL(kvmppc_gpa_to_pfn);
diff --git a/arch/powerpc/kvm/book3s_32_mmu_host.c b/arch/powerpc/kvm/book3s_32_mmu_host.c
index 4b3a8d80cfa3..5b7212edbb13 100644
--- a/arch/powerpc/kvm/book3s_32_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_32_mmu_host.c
@@ -130,6 +130,7 @@ extern char etext[];
int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte,
bool iswrite)
{
+ struct page *page;
kvm_pfn_t hpaddr;
u64 vpn;
u64 vsid;
@@ -145,7 +146,7 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte,
bool writable;
/* Get host physical address for gpa */
- hpaddr = kvmppc_gpa_to_pfn(vcpu, orig_pte->raddr, iswrite, &writable);
+ hpaddr = kvmppc_gpa_to_pfn(vcpu, orig_pte->raddr, iswrite, &writable, &page);
if (is_error_noslot_pfn(hpaddr)) {
printk(KERN_INFO "Couldn't get guest page for gpa %lx!\n",
orig_pte->raddr);
@@ -232,7 +233,7 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte,
pte = kvmppc_mmu_hpte_cache_next(vcpu);
if (!pte) {
- kvm_release_pfn_clean(hpaddr >> PAGE_SHIFT);
+ kvm_release_page_unused(page);
r = -EAGAIN;
goto out;
}
@@ -250,7 +251,7 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte,
kvmppc_mmu_hpte_cache_map(vcpu, pte);
- kvm_release_pfn_clean(hpaddr >> PAGE_SHIFT);
+ kvm_release_page_clean(page);
out:
return r;
}
diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c b/arch/powerpc/kvm/book3s_64_mmu_host.c
index d0e4f7bbdc3d..be20aee6fd7d 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -88,13 +88,14 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte,
struct hpte_cache *cpte;
unsigned long gfn = orig_pte->raddr >> PAGE_SHIFT;
unsigned long pfn;
+ struct page *page;
/* used to check for invalidations in progress */
mmu_seq = kvm->mmu_invalidate_seq;
smp_rmb();
/* Get host physical address for gpa */
- pfn = kvmppc_gpa_to_pfn(vcpu, orig_pte->raddr, iswrite, &writable);
+ pfn = kvmppc_gpa_to_pfn(vcpu, orig_pte->raddr, iswrite, &writable, &page);
if (is_error_noslot_pfn(pfn)) {
printk(KERN_INFO "Couldn't get guest page for gpa %lx!\n",
orig_pte->raddr);
@@ -199,10 +200,9 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte,
}
out_unlock:
- if (!orig_pte->may_write || !writable)
- kvm_release_pfn_clean(pfn);
- else
- kvm_release_pfn_dirty(pfn);
+ /* FIXME: Don't unconditionally pass unused=false. */
+ kvm_release_faultin_page(kvm, page, false,
+ orig_pte->may_write && writable);
spin_unlock(&kvm->mmu_lock);
if (cpte)
kvmppc_mmu_hpte_cache_free(cpte);
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 65/85] KVM: LoongArch: Mark "struct page" pfns dirty only in "slow" page fault path
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (63 preceding siblings ...)
2024-10-10 18:24 ` [PATCH v13 64/85] KVM: PPC: Use kvm_faultin_pfn() to handle page faults on Book3s PR Sean Christopherson
@ 2024-10-10 18:24 ` Sean Christopherson
2024-10-10 18:24 ` [PATCH v13 66/85] KVM: LoongArch: Mark "struct page" pfns accessed " Sean Christopherson
` (21 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:24 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Mark pages/folios dirty only the slow page fault path, i.e. only when
mmu_lock is held and the operation is mmu_notifier-protected, as marking a
page/folio dirty after it has been written back can make some filesystems
unhappy (backing KVM guests will such filesystem files is uncommon, and
the race is minuscule, hence the lack of complaints).
See the link below for details.
Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/loongarch/kvm/mmu.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c
index 28681dfb4b85..cc2a5f289b14 100644
--- a/arch/loongarch/kvm/mmu.c
+++ b/arch/loongarch/kvm/mmu.c
@@ -608,13 +608,13 @@ static int kvm_map_page_fast(struct kvm_vcpu *vcpu, unsigned long gpa, bool writ
if (kvm_pte_young(changed))
kvm_set_pfn_accessed(pfn);
- if (kvm_pte_dirty(changed)) {
- mark_page_dirty(kvm, gfn);
- kvm_set_pfn_dirty(pfn);
- }
if (page)
put_page(page);
}
+
+ if (kvm_pte_dirty(changed))
+ mark_page_dirty(kvm, gfn);
+
return ret;
out:
spin_unlock(&kvm->mmu_lock);
@@ -915,12 +915,14 @@ static int kvm_map_page(struct kvm_vcpu *vcpu, unsigned long gpa, bool write)
else
++kvm->stat.pages;
kvm_set_pte(ptep, new_pte);
- spin_unlock(&kvm->mmu_lock);
- if (prot_bits & _PAGE_DIRTY) {
- mark_page_dirty_in_slot(kvm, memslot, gfn);
+ if (writeable)
kvm_set_pfn_dirty(pfn);
- }
+
+ spin_unlock(&kvm->mmu_lock);
+
+ if (prot_bits & _PAGE_DIRTY)
+ mark_page_dirty_in_slot(kvm, memslot, gfn);
kvm_release_pfn_clean(pfn);
out:
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 66/85] KVM: LoongArch: Mark "struct page" pfns accessed only in "slow" page fault path
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (64 preceding siblings ...)
2024-10-10 18:24 ` [PATCH v13 65/85] KVM: LoongArch: Mark "struct page" pfns dirty only in "slow" page fault path Sean Christopherson
@ 2024-10-10 18:24 ` Sean Christopherson
2024-10-10 18:24 ` [PATCH v13 67/85] KVM: LoongArch: Mark "struct page" pfn accessed before dropping mmu_lock Sean Christopherson
` (20 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:24 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Mark pages accessed only in the slow path, before dropping mmu_lock when
faulting in guest memory so that LoongArch can convert to
kvm_release_faultin_page() without tripping its lockdep assertion on
mmu_lock being held.
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/loongarch/kvm/mmu.c | 20 ++------------------
1 file changed, 2 insertions(+), 18 deletions(-)
diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c
index cc2a5f289b14..ed43504c5c7e 100644
--- a/arch/loongarch/kvm/mmu.c
+++ b/arch/loongarch/kvm/mmu.c
@@ -552,12 +552,10 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
static int kvm_map_page_fast(struct kvm_vcpu *vcpu, unsigned long gpa, bool write)
{
int ret = 0;
- kvm_pfn_t pfn = 0;
kvm_pte_t *ptep, changed, new;
gfn_t gfn = gpa >> PAGE_SHIFT;
struct kvm *kvm = vcpu->kvm;
struct kvm_memory_slot *slot;
- struct page *page;
spin_lock(&kvm->mmu_lock);
@@ -570,8 +568,6 @@ static int kvm_map_page_fast(struct kvm_vcpu *vcpu, unsigned long gpa, bool writ
/* Track access to pages marked old */
new = kvm_pte_mkyoung(*ptep);
- /* call kvm_set_pfn_accessed() after unlock */
-
if (write && !kvm_pte_dirty(new)) {
if (!kvm_pte_write(new)) {
ret = -EFAULT;
@@ -595,23 +591,11 @@ static int kvm_map_page_fast(struct kvm_vcpu *vcpu, unsigned long gpa, bool writ
}
changed = new ^ (*ptep);
- if (changed) {
+ if (changed)
kvm_set_pte(ptep, new);
- pfn = kvm_pte_pfn(new);
- page = kvm_pfn_to_refcounted_page(pfn);
- if (page)
- get_page(page);
- }
+
spin_unlock(&kvm->mmu_lock);
- if (changed) {
- if (kvm_pte_young(changed))
- kvm_set_pfn_accessed(pfn);
-
- if (page)
- put_page(page);
- }
-
if (kvm_pte_dirty(changed))
mark_page_dirty(kvm, gfn);
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 67/85] KVM: LoongArch: Mark "struct page" pfn accessed before dropping mmu_lock
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (65 preceding siblings ...)
2024-10-10 18:24 ` [PATCH v13 66/85] KVM: LoongArch: Mark "struct page" pfns accessed " Sean Christopherson
@ 2024-10-10 18:24 ` Sean Christopherson
2024-10-10 18:24 ` [PATCH v13 68/85] KVM: LoongArch: Use kvm_faultin_pfn() to map pfns into the guest Sean Christopherson
` (19 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:24 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Mark pages accessed before dropping mmu_lock when faulting in guest memory
so that LoongArch can convert to kvm_release_faultin_page() without
tripping its lockdep assertion on mmu_lock being held.
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/loongarch/kvm/mmu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c
index ed43504c5c7e..7066cafcce64 100644
--- a/arch/loongarch/kvm/mmu.c
+++ b/arch/loongarch/kvm/mmu.c
@@ -902,13 +902,13 @@ static int kvm_map_page(struct kvm_vcpu *vcpu, unsigned long gpa, bool write)
if (writeable)
kvm_set_pfn_dirty(pfn);
+ kvm_release_pfn_clean(pfn);
spin_unlock(&kvm->mmu_lock);
if (prot_bits & _PAGE_DIRTY)
mark_page_dirty_in_slot(kvm, memslot, gfn);
- kvm_release_pfn_clean(pfn);
out:
srcu_read_unlock(&kvm->srcu, srcu_idx);
return err;
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 68/85] KVM: LoongArch: Use kvm_faultin_pfn() to map pfns into the guest
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (66 preceding siblings ...)
2024-10-10 18:24 ` [PATCH v13 67/85] KVM: LoongArch: Mark "struct page" pfn accessed before dropping mmu_lock Sean Christopherson
@ 2024-10-10 18:24 ` Sean Christopherson
2024-10-10 18:24 ` [PATCH v13 69/85] KVM: MIPS: Mark "struct page" pfns dirty only in "slow" page fault path Sean Christopherson
` (18 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:24 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Convert LoongArch to kvm_faultin_pfn()+kvm_release_faultin_page(), which
are new APIs to consolidate arch code and provide consistent behavior
across all KVM architectures.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/loongarch/kvm/mmu.c | 14 ++++++--------
1 file changed, 6 insertions(+), 8 deletions(-)
diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c
index 7066cafcce64..4d203294767c 100644
--- a/arch/loongarch/kvm/mmu.c
+++ b/arch/loongarch/kvm/mmu.c
@@ -780,6 +780,7 @@ static int kvm_map_page(struct kvm_vcpu *vcpu, unsigned long gpa, bool write)
struct kvm *kvm = vcpu->kvm;
struct kvm_memory_slot *memslot;
struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
+ struct page *page;
/* Try the fast path to handle old / clean pages */
srcu_idx = srcu_read_lock(&kvm->srcu);
@@ -807,7 +808,7 @@ static int kvm_map_page(struct kvm_vcpu *vcpu, unsigned long gpa, bool write)
mmu_seq = kvm->mmu_invalidate_seq;
/*
* Ensure the read of mmu_invalidate_seq isn't reordered with PTE reads in
- * gfn_to_pfn_prot() (which calls get_user_pages()), so that we don't
+ * kvm_faultin_pfn() (which calls get_user_pages()), so that we don't
* risk the page we get a reference to getting unmapped before we have a
* chance to grab the mmu_lock without mmu_invalidate_retry() noticing.
*
@@ -819,7 +820,7 @@ static int kvm_map_page(struct kvm_vcpu *vcpu, unsigned long gpa, bool write)
smp_rmb();
/* Slow path - ask KVM core whether we can access this GPA */
- pfn = gfn_to_pfn_prot(kvm, gfn, write, &writeable);
+ pfn = kvm_faultin_pfn(vcpu, gfn, write, &writeable, &page);
if (is_error_noslot_pfn(pfn)) {
err = -EFAULT;
goto out;
@@ -831,10 +832,10 @@ static int kvm_map_page(struct kvm_vcpu *vcpu, unsigned long gpa, bool write)
/*
* This can happen when mappings are changed asynchronously, but
* also synchronously if a COW is triggered by
- * gfn_to_pfn_prot().
+ * kvm_faultin_pfn().
*/
spin_unlock(&kvm->mmu_lock);
- kvm_release_pfn_clean(pfn);
+ kvm_release_page_unused(page);
if (retry_no > 100) {
retry_no = 0;
schedule();
@@ -900,10 +901,7 @@ static int kvm_map_page(struct kvm_vcpu *vcpu, unsigned long gpa, bool write)
++kvm->stat.pages;
kvm_set_pte(ptep, new_pte);
- if (writeable)
- kvm_set_pfn_dirty(pfn);
- kvm_release_pfn_clean(pfn);
-
+ kvm_release_faultin_page(kvm, page, false, writeable);
spin_unlock(&kvm->mmu_lock);
if (prot_bits & _PAGE_DIRTY)
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 69/85] KVM: MIPS: Mark "struct page" pfns dirty only in "slow" page fault path
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (67 preceding siblings ...)
2024-10-10 18:24 ` [PATCH v13 68/85] KVM: LoongArch: Use kvm_faultin_pfn() to map pfns into the guest Sean Christopherson
@ 2024-10-10 18:24 ` Sean Christopherson
2024-10-10 18:24 ` [PATCH v13 70/85] KVM: MIPS: Mark "struct page" pfns accessed " Sean Christopherson
` (17 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:24 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Mark pages/folios dirty only the slow page fault path, i.e. only when
mmu_lock is held and the operation is mmu_notifier-protected, as marking a
page/folio dirty after it has been written back can make some filesystems
unhappy (backing KVM guests will such filesystem files is uncommon, and
the race is minuscule, hence the lack of complaints).
See the link below for details.
Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/mips/kvm/mmu.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
index c17157e700c0..4da9ce4eb54d 100644
--- a/arch/mips/kvm/mmu.c
+++ b/arch/mips/kvm/mmu.c
@@ -514,7 +514,6 @@ static int _kvm_mips_map_page_fast(struct kvm_vcpu *vcpu, unsigned long gpa,
set_pte(ptep, pte_mkdirty(*ptep));
pfn = pte_pfn(*ptep);
mark_page_dirty(kvm, gfn);
- kvm_set_pfn_dirty(pfn);
}
if (out_entry)
@@ -628,7 +627,6 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, unsigned long gpa,
if (write_fault) {
prot_bits |= __WRITEABLE;
mark_page_dirty(kvm, gfn);
- kvm_set_pfn_dirty(pfn);
}
}
entry = pfn_pte(pfn, __pgprot(prot_bits));
@@ -642,6 +640,9 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, unsigned long gpa,
if (out_buddy)
*out_buddy = *ptep_buddy(ptep);
+ if (writeable)
+ kvm_set_pfn_dirty(pfn);
+
spin_unlock(&kvm->mmu_lock);
kvm_release_pfn_clean(pfn);
kvm_set_pfn_accessed(pfn);
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 70/85] KVM: MIPS: Mark "struct page" pfns accessed only in "slow" page fault path
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (68 preceding siblings ...)
2024-10-10 18:24 ` [PATCH v13 69/85] KVM: MIPS: Mark "struct page" pfns dirty only in "slow" page fault path Sean Christopherson
@ 2024-10-10 18:24 ` Sean Christopherson
2024-10-10 18:24 ` [PATCH v13 71/85] KVM: MIPS: Mark "struct page" pfns accessed prior to dropping mmu_lock Sean Christopherson
` (16 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:24 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Mark pages accessed only in the slow page fault path in order to remove
an unnecessary user of kvm_pfn_to_refcounted_page(). Marking pages
accessed in the primary MMU during KVM page fault handling isn't harmful,
but it's largely pointless and likely a waste of a cycles since the
primary MMU will call into KVM via mmu_notifiers when aging pages. I.e.
KVM participates in a "pull" model, so there's no need to also "push"
updates.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/mips/kvm/mmu.c | 12 ++----------
1 file changed, 2 insertions(+), 10 deletions(-)
diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
index 4da9ce4eb54d..f1e4b618ec6d 100644
--- a/arch/mips/kvm/mmu.c
+++ b/arch/mips/kvm/mmu.c
@@ -484,8 +484,6 @@ static int _kvm_mips_map_page_fast(struct kvm_vcpu *vcpu, unsigned long gpa,
struct kvm *kvm = vcpu->kvm;
gfn_t gfn = gpa >> PAGE_SHIFT;
pte_t *ptep;
- kvm_pfn_t pfn = 0; /* silence bogus GCC warning */
- bool pfn_valid = false;
int ret = 0;
spin_lock(&kvm->mmu_lock);
@@ -498,12 +496,9 @@ static int _kvm_mips_map_page_fast(struct kvm_vcpu *vcpu, unsigned long gpa,
}
/* Track access to pages marked old */
- if (!pte_young(*ptep)) {
+ if (!pte_young(*ptep))
set_pte(ptep, pte_mkyoung(*ptep));
- pfn = pte_pfn(*ptep);
- pfn_valid = true;
- /* call kvm_set_pfn_accessed() after unlock */
- }
+
if (write_fault && !pte_dirty(*ptep)) {
if (!pte_write(*ptep)) {
ret = -EFAULT;
@@ -512,7 +507,6 @@ static int _kvm_mips_map_page_fast(struct kvm_vcpu *vcpu, unsigned long gpa,
/* Track dirtying of writeable pages */
set_pte(ptep, pte_mkdirty(*ptep));
- pfn = pte_pfn(*ptep);
mark_page_dirty(kvm, gfn);
}
@@ -523,8 +517,6 @@ static int _kvm_mips_map_page_fast(struct kvm_vcpu *vcpu, unsigned long gpa,
out:
spin_unlock(&kvm->mmu_lock);
- if (pfn_valid)
- kvm_set_pfn_accessed(pfn);
return ret;
}
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 71/85] KVM: MIPS: Mark "struct page" pfns accessed prior to dropping mmu_lock
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (69 preceding siblings ...)
2024-10-10 18:24 ` [PATCH v13 70/85] KVM: MIPS: Mark "struct page" pfns accessed " Sean Christopherson
@ 2024-10-10 18:24 ` Sean Christopherson
2024-10-10 18:24 ` [PATCH v13 72/85] KVM: MIPS: Use kvm_faultin_pfn() to map pfns into the guest Sean Christopherson
` (15 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:24 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Mark pages accessed before dropping mmu_lock when faulting in guest memory
so that MIPS can convert to kvm_release_faultin_page() without tripping
its lockdep assertion on mmu_lock being held.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/mips/kvm/mmu.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
index f1e4b618ec6d..69463ab24d97 100644
--- a/arch/mips/kvm/mmu.c
+++ b/arch/mips/kvm/mmu.c
@@ -634,10 +634,9 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, unsigned long gpa,
if (writeable)
kvm_set_pfn_dirty(pfn);
-
- spin_unlock(&kvm->mmu_lock);
kvm_release_pfn_clean(pfn);
- kvm_set_pfn_accessed(pfn);
+
+ spin_unlock(&kvm->mmu_lock);
out:
srcu_read_unlock(&kvm->srcu, srcu_idx);
return err;
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 72/85] KVM: MIPS: Use kvm_faultin_pfn() to map pfns into the guest
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (70 preceding siblings ...)
2024-10-10 18:24 ` [PATCH v13 71/85] KVM: MIPS: Mark "struct page" pfns accessed prior to dropping mmu_lock Sean Christopherson
@ 2024-10-10 18:24 ` Sean Christopherson
2024-10-10 18:24 ` [PATCH v13 73/85] KVM: PPC: Remove extra get_page() to fix page refcount leak Sean Christopherson
` (14 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:24 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Convert MIPS to kvm_faultin_pfn()+kvm_release_faultin_page(), which
are new APIs to consolidate arch code and provide consistent behavior
across all KVM architectures.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/mips/kvm/mmu.c | 14 ++++++--------
1 file changed, 6 insertions(+), 8 deletions(-)
diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
index 69463ab24d97..d2c3b6b41f18 100644
--- a/arch/mips/kvm/mmu.c
+++ b/arch/mips/kvm/mmu.c
@@ -557,6 +557,7 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, unsigned long gpa,
bool writeable;
unsigned long prot_bits;
unsigned long mmu_seq;
+ struct page *page;
/* Try the fast path to handle old / clean pages */
srcu_idx = srcu_read_lock(&kvm->srcu);
@@ -578,7 +579,7 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, unsigned long gpa,
mmu_seq = kvm->mmu_invalidate_seq;
/*
* Ensure the read of mmu_invalidate_seq isn't reordered with PTE reads
- * in gfn_to_pfn_prot() (which calls get_user_pages()), so that we don't
+ * in kvm_faultin_pfn() (which calls get_user_pages()), so that we don't
* risk the page we get a reference to getting unmapped before we have a
* chance to grab the mmu_lock without mmu_invalidate_retry() noticing.
*
@@ -590,7 +591,7 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, unsigned long gpa,
smp_rmb();
/* Slow path - ask KVM core whether we can access this GPA */
- pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, &writeable);
+ pfn = kvm_faultin_pfn(vcpu, gfn, write_fault, &writeable, &page);
if (is_error_noslot_pfn(pfn)) {
err = -EFAULT;
goto out;
@@ -602,10 +603,10 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, unsigned long gpa,
/*
* This can happen when mappings are changed asynchronously, but
* also synchronously if a COW is triggered by
- * gfn_to_pfn_prot().
+ * kvm_faultin_pfn().
*/
spin_unlock(&kvm->mmu_lock);
- kvm_release_pfn_clean(pfn);
+ kvm_release_page_unused(page);
goto retry;
}
@@ -632,10 +633,7 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, unsigned long gpa,
if (out_buddy)
*out_buddy = *ptep_buddy(ptep);
- if (writeable)
- kvm_set_pfn_dirty(pfn);
- kvm_release_pfn_clean(pfn);
-
+ kvm_release_faultin_page(kvm, page, false, writeable);
spin_unlock(&kvm->mmu_lock);
out:
srcu_read_unlock(&kvm->srcu, srcu_idx);
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 73/85] KVM: PPC: Remove extra get_page() to fix page refcount leak
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (71 preceding siblings ...)
2024-10-10 18:24 ` [PATCH v13 72/85] KVM: MIPS: Use kvm_faultin_pfn() to map pfns into the guest Sean Christopherson
@ 2024-10-10 18:24 ` Sean Christopherson
2024-10-10 18:24 ` [PATCH v13 74/85] KVM: PPC: Use kvm_vcpu_map() to map guest memory to patch dcbz instructions Sean Christopherson
` (13 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:24 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Don't manually do get_page() when patching dcbz, as gfn_to_page() gifts
the caller a reference. I.e. doing get_page() will leak the page due to
not putting all references.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/powerpc/kvm/book3s_pr.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index d7721297b9b6..cd7ab6d85090 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -652,7 +652,6 @@ static void kvmppc_patch_dcbz(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte)
hpage_offset &= ~0xFFFULL;
hpage_offset /= 4;
- get_page(hpage);
page = kmap_atomic(hpage);
/* patch dcbz into reserved instruction, so we trap */
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 74/85] KVM: PPC: Use kvm_vcpu_map() to map guest memory to patch dcbz instructions
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (72 preceding siblings ...)
2024-10-10 18:24 ` [PATCH v13 73/85] KVM: PPC: Remove extra get_page() to fix page refcount leak Sean Christopherson
@ 2024-10-10 18:24 ` Sean Christopherson
2024-10-10 18:24 ` [PATCH v13 75/85] KVM: Convert gfn_to_page() to use kvm_follow_pfn() Sean Christopherson
` (12 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:24 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Use kvm_vcpu_map() when patching dcbz in guest memory, as a regular GUP
isn't technically sufficient when writing to data in the target pages.
As per Documentation/core-api/pin_user_pages.rst:
Correct (uses FOLL_PIN calls):
pin_user_pages()
write to the data within the pages
unpin_user_pages()
INCORRECT (uses FOLL_GET calls):
get_user_pages()
write to the data within the pages
put_page()
As a happy bonus, using kvm_vcpu_{,un}map() takes care of creating a
mapping and marking the page dirty.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/powerpc/kvm/book3s_pr.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index cd7ab6d85090..83bcdc80ce51 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -639,28 +639,27 @@ static void kvmppc_set_pvr_pr(struct kvm_vcpu *vcpu, u32 pvr)
*/
static void kvmppc_patch_dcbz(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte)
{
- struct page *hpage;
+ struct kvm_host_map map;
u64 hpage_offset;
u32 *page;
- int i;
+ int i, r;
- hpage = gfn_to_page(vcpu->kvm, pte->raddr >> PAGE_SHIFT);
- if (!hpage)
+ r = kvm_vcpu_map(vcpu, pte->raddr >> PAGE_SHIFT, &map);
+ if (r)
return;
hpage_offset = pte->raddr & ~PAGE_MASK;
hpage_offset &= ~0xFFFULL;
hpage_offset /= 4;
- page = kmap_atomic(hpage);
+ page = map.hva;
/* patch dcbz into reserved instruction, so we trap */
for (i=hpage_offset; i < hpage_offset + (HW_PAGE_SIZE / 4); i++)
if ((be32_to_cpu(page[i]) & 0xff0007ff) == INS_DCBZ)
page[i] &= cpu_to_be32(0xfffffff7);
- kunmap_atomic(page);
- put_page(hpage);
+ kvm_vcpu_unmap(vcpu, &map);
}
static bool kvmppc_visible_gpa(struct kvm_vcpu *vcpu, gpa_t gpa)
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 75/85] KVM: Convert gfn_to_page() to use kvm_follow_pfn()
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (73 preceding siblings ...)
2024-10-10 18:24 ` [PATCH v13 74/85] KVM: PPC: Use kvm_vcpu_map() to map guest memory to patch dcbz instructions Sean Christopherson
@ 2024-10-10 18:24 ` Sean Christopherson
2024-10-10 18:24 ` [PATCH v13 76/85] KVM: Add support for read-only usage of gfn_to_page() Sean Christopherson
` (11 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:24 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Convert gfn_to_page() to the new kvm_follow_pfn() internal API, which will
eventually allow removing gfn_to_pfn() and kvm_pfn_to_refcounted_page().
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/kvm_main.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 696d5e429b3e..1782242a4800 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3145,14 +3145,16 @@ EXPORT_SYMBOL_GPL(kvm_prefetch_pages);
*/
struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
{
- kvm_pfn_t pfn;
+ struct page *refcounted_page = NULL;
+ struct kvm_follow_pfn kfp = {
+ .slot = gfn_to_memslot(kvm, gfn),
+ .gfn = gfn,
+ .flags = FOLL_WRITE,
+ .refcounted_page = &refcounted_page,
+ };
- pfn = gfn_to_pfn(kvm, gfn);
-
- if (is_error_noslot_pfn(pfn))
- return NULL;
-
- return kvm_pfn_to_refcounted_page(pfn);
+ (void)kvm_follow_pfn(&kfp);
+ return refcounted_page;
}
EXPORT_SYMBOL_GPL(gfn_to_page);
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 76/85] KVM: Add support for read-only usage of gfn_to_page()
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (74 preceding siblings ...)
2024-10-10 18:24 ` [PATCH v13 75/85] KVM: Convert gfn_to_page() to use kvm_follow_pfn() Sean Christopherson
@ 2024-10-10 18:24 ` Sean Christopherson
2024-10-10 18:24 ` [PATCH v13 77/85] KVM: arm64: Use __gfn_to_page() when copying MTE tags to/from userspace Sean Christopherson
` (10 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:24 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Rework gfn_to_page() to support read-only accesses so that it can be used
by arm64 to get MTE tags out of guest memory.
Opportunistically rewrite the comment to be even more stern about using
gfn_to_page(), as there are very few scenarios where requiring a struct
page is actually the right thing to do (though there are such scenarios).
Add a FIXME to call out that KVM probably should be pinning pages, not
just getting pages.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
include/linux/kvm_host.h | 7 ++++++-
virt/kvm/kvm_main.c | 15 ++++++++-------
2 files changed, 14 insertions(+), 8 deletions(-)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9f7682ece4a1..af928b59b2ab 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1213,7 +1213,12 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
int kvm_prefetch_pages(struct kvm_memory_slot *slot, gfn_t gfn,
struct page **pages, int nr_pages);
-struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn);
+struct page *__gfn_to_page(struct kvm *kvm, gfn_t gfn, bool write);
+static inline struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
+{
+ return __gfn_to_page(kvm, gfn, true);
+}
+
unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn);
unsigned long gfn_to_hva_prot(struct kvm *kvm, gfn_t gfn, bool *writable);
unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot, gfn_t gfn);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1782242a4800..8f8b2cd01189 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3138,25 +3138,26 @@ int kvm_prefetch_pages(struct kvm_memory_slot *slot, gfn_t gfn,
EXPORT_SYMBOL_GPL(kvm_prefetch_pages);
/*
- * Do not use this helper unless you are absolutely certain the gfn _must_ be
- * backed by 'struct page'. A valid example is if the backing memslot is
- * controlled by KVM. Note, if the returned page is valid, it's refcount has
- * been elevated by gfn_to_pfn().
+ * Don't use this API unless you are absolutely, positively certain that KVM
+ * needs to get a struct page, e.g. to pin the page for firmware DMA.
+ *
+ * FIXME: Users of this API likely need to FOLL_PIN the page, not just elevate
+ * its refcount.
*/
-struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
+struct page *__gfn_to_page(struct kvm *kvm, gfn_t gfn, bool write)
{
struct page *refcounted_page = NULL;
struct kvm_follow_pfn kfp = {
.slot = gfn_to_memslot(kvm, gfn),
.gfn = gfn,
- .flags = FOLL_WRITE,
+ .flags = write ? FOLL_WRITE : 0,
.refcounted_page = &refcounted_page,
};
(void)kvm_follow_pfn(&kfp);
return refcounted_page;
}
-EXPORT_SYMBOL_GPL(gfn_to_page);
+EXPORT_SYMBOL_GPL(__gfn_to_page);
int __kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map,
bool writable)
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 77/85] KVM: arm64: Use __gfn_to_page() when copying MTE tags to/from userspace
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (75 preceding siblings ...)
2024-10-10 18:24 ` [PATCH v13 76/85] KVM: Add support for read-only usage of gfn_to_page() Sean Christopherson
@ 2024-10-10 18:24 ` Sean Christopherson
2024-10-10 18:24 ` [PATCH v13 78/85] KVM: PPC: Explicitly require struct page memory for Ultravisor sharing Sean Christopherson
` (9 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:24 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Use __gfn_to_page() instead when copying MTE tags between guest and
userspace. This will eventually allow removing gfn_to_pfn_prot(),
gfn_to_pfn(), kvm_pfn_to_refcounted_page(), and related APIs.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/arm64/kvm/guest.c | 21 +++++++++------------
1 file changed, 9 insertions(+), 12 deletions(-)
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 962f985977c2..4cd7ffa76794 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -1051,20 +1051,18 @@ int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
}
while (length > 0) {
- kvm_pfn_t pfn = gfn_to_pfn_prot(kvm, gfn, write, NULL);
+ struct page *page = __gfn_to_page(kvm, gfn, write);
void *maddr;
unsigned long num_tags;
- struct page *page;
- if (is_error_noslot_pfn(pfn)) {
- ret = -EFAULT;
- goto out;
- }
-
- page = pfn_to_online_page(pfn);
if (!page) {
+ ret = -EFAULT;
+ goto out;
+ }
+
+ if (!pfn_to_online_page(page_to_pfn(page))) {
/* Reject ZONE_DEVICE memory */
- kvm_release_pfn_clean(pfn);
+ kvm_release_page_unused(page);
ret = -EFAULT;
goto out;
}
@@ -1078,7 +1076,7 @@ int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
/* No tags in memory, so write zeros */
num_tags = MTE_GRANULES_PER_PAGE -
clear_user(tags, MTE_GRANULES_PER_PAGE);
- kvm_release_pfn_clean(pfn);
+ kvm_release_page_clean(page);
} else {
/*
* Only locking to serialise with a concurrent
@@ -1093,8 +1091,7 @@ int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
if (num_tags != MTE_GRANULES_PER_PAGE)
mte_clear_page_tags(maddr);
set_page_mte_tagged(page);
-
- kvm_release_pfn_dirty(pfn);
+ kvm_release_page_dirty(page);
}
if (num_tags != MTE_GRANULES_PER_PAGE) {
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 78/85] KVM: PPC: Explicitly require struct page memory for Ultravisor sharing
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (76 preceding siblings ...)
2024-10-10 18:24 ` [PATCH v13 77/85] KVM: arm64: Use __gfn_to_page() when copying MTE tags to/from userspace Sean Christopherson
@ 2024-10-10 18:24 ` Sean Christopherson
2024-10-10 18:24 ` [PATCH v13 79/85] KVM: Drop gfn_to_pfn() APIs now that all users are gone Sean Christopherson
` (8 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:24 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Explicitly require "struct page" memory when sharing memory between
guest and host via an Ultravisor. Given the number of pfn_to_page()
calls in the code, it's safe to assume that KVM already requires that the
pfn returned by gfn_to_pfn() is backed by struct page, i.e. this is
likely a bug fix, not a reduction in KVM capabilities.
Switching to gfn_to_page() will eventually allow removing gfn_to_pfn()
and kvm_pfn_to_refcounted_page().
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/powerpc/kvm/book3s_hv_uvmem.c | 25 ++++++++++++-------------
1 file changed, 12 insertions(+), 13 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 92f33115144b..3a6592a31a10 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -879,9 +879,8 @@ static unsigned long kvmppc_share_page(struct kvm *kvm, unsigned long gpa,
{
int ret = H_PARAMETER;
- struct page *uvmem_page;
+ struct page *page, *uvmem_page;
struct kvmppc_uvmem_page_pvt *pvt;
- unsigned long pfn;
unsigned long gfn = gpa >> page_shift;
int srcu_idx;
unsigned long uvmem_pfn;
@@ -901,8 +900,8 @@ static unsigned long kvmppc_share_page(struct kvm *kvm, unsigned long gpa,
retry:
mutex_unlock(&kvm->arch.uvmem_lock);
- pfn = gfn_to_pfn(kvm, gfn);
- if (is_error_noslot_pfn(pfn))
+ page = gfn_to_page(kvm, gfn);
+ if (!page)
goto out;
mutex_lock(&kvm->arch.uvmem_lock);
@@ -911,16 +910,16 @@ static unsigned long kvmppc_share_page(struct kvm *kvm, unsigned long gpa,
pvt = uvmem_page->zone_device_data;
pvt->skip_page_out = true;
pvt->remove_gfn = false; /* it continues to be a valid GFN */
- kvm_release_pfn_clean(pfn);
+ kvm_release_page_unused(page);
goto retry;
}
- if (!uv_page_in(kvm->arch.lpid, pfn << page_shift, gpa, 0,
+ if (!uv_page_in(kvm->arch.lpid, page_to_pfn(page) << page_shift, gpa, 0,
page_shift)) {
kvmppc_gfn_shared(gfn, kvm);
ret = H_SUCCESS;
}
- kvm_release_pfn_clean(pfn);
+ kvm_release_page_clean(page);
mutex_unlock(&kvm->arch.uvmem_lock);
out:
srcu_read_unlock(&kvm->srcu, srcu_idx);
@@ -1083,21 +1082,21 @@ kvmppc_h_svm_page_out(struct kvm *kvm, unsigned long gpa,
int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn)
{
- unsigned long pfn;
+ struct page *page;
int ret = U_SUCCESS;
- pfn = gfn_to_pfn(kvm, gfn);
- if (is_error_noslot_pfn(pfn))
+ page = gfn_to_page(kvm, gfn);
+ if (!page)
return -EFAULT;
mutex_lock(&kvm->arch.uvmem_lock);
if (kvmppc_gfn_is_uvmem_pfn(gfn, kvm, NULL))
goto out;
- ret = uv_page_in(kvm->arch.lpid, pfn << PAGE_SHIFT, gfn << PAGE_SHIFT,
- 0, PAGE_SHIFT);
+ ret = uv_page_in(kvm->arch.lpid, page_to_pfn(page) << PAGE_SHIFT,
+ gfn << PAGE_SHIFT, 0, PAGE_SHIFT);
out:
- kvm_release_pfn_clean(pfn);
+ kvm_release_page_clean(page);
mutex_unlock(&kvm->arch.uvmem_lock);
return (ret == U_SUCCESS) ? RESUME_GUEST : -EFAULT;
}
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 79/85] KVM: Drop gfn_to_pfn() APIs now that all users are gone
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (77 preceding siblings ...)
2024-10-10 18:24 ` [PATCH v13 78/85] KVM: PPC: Explicitly require struct page memory for Ultravisor sharing Sean Christopherson
@ 2024-10-10 18:24 ` Sean Christopherson
2024-10-10 18:24 ` [PATCH v13 80/85] KVM: s390: Use kvm_release_page_dirty() to unpin "struct page" memory Sean Christopherson
` (7 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:24 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Drop gfn_to_pfn() and all its variants now that all users are gone.
No functional change intended.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
include/linux/kvm_host.h | 11 --------
virt/kvm/kvm_main.c | 59 ----------------------------------------
2 files changed, 70 deletions(-)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index af928b59b2ab..4a1eaa40a215 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1274,14 +1274,6 @@ static inline kvm_pfn_t kvm_faultin_pfn(struct kvm_vcpu *vcpu, gfn_t gfn,
write ? FOLL_WRITE : 0, writable, refcounted_page);
}
-kvm_pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn);
-kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
- bool *writable);
-kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn);
-kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
- bool interruptible, bool no_wait,
- bool write_fault, bool *writable);
-
void kvm_release_pfn_clean(kvm_pfn_t pfn);
void kvm_release_pfn_dirty(kvm_pfn_t pfn);
void kvm_set_pfn_dirty(kvm_pfn_t pfn);
@@ -1356,9 +1348,6 @@ unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn);
void mark_page_dirty_in_slot(struct kvm *kvm, const struct kvm_memory_slot *memslot, gfn_t gfn);
void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
-
-kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn);
-
int __kvm_vcpu_map(struct kvm_vcpu *vcpu, gpa_t gpa, struct kvm_host_map *map,
bool writable);
void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8f8b2cd01189..b2c8d429442d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3039,65 +3039,6 @@ static kvm_pfn_t kvm_follow_pfn(struct kvm_follow_pfn *kfp)
return hva_to_pfn(kfp);
}
-kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
- bool interruptible, bool no_wait,
- bool write_fault, bool *writable)
-{
- struct kvm_follow_pfn kfp = {
- .slot = slot,
- .gfn = gfn,
- .map_writable = writable,
- };
-
- if (write_fault)
- kfp.flags |= FOLL_WRITE;
- if (no_wait)
- kfp.flags |= FOLL_NOWAIT;
- if (interruptible)
- kfp.flags |= FOLL_INTERRUPTIBLE;
-
- return kvm_follow_pfn(&kfp);
-}
-EXPORT_SYMBOL_GPL(__gfn_to_pfn_memslot);
-
-kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
- bool *writable)
-{
- struct kvm_follow_pfn kfp = {
- .slot = gfn_to_memslot(kvm, gfn),
- .gfn = gfn,
- .flags = write_fault ? FOLL_WRITE : 0,
- .map_writable = writable,
- };
-
- return kvm_follow_pfn(&kfp);
-}
-EXPORT_SYMBOL_GPL(gfn_to_pfn_prot);
-
-kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn)
-{
- struct kvm_follow_pfn kfp = {
- .slot = slot,
- .gfn = gfn,
- .flags = FOLL_WRITE,
- };
-
- return kvm_follow_pfn(&kfp);
-}
-EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot);
-
-kvm_pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
-{
- return gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn);
-}
-EXPORT_SYMBOL_GPL(gfn_to_pfn);
-
-kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn)
-{
- return gfn_to_pfn_memslot(kvm_vcpu_gfn_to_memslot(vcpu, gfn), gfn);
-}
-EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_pfn);
-
kvm_pfn_t __kvm_faultin_pfn(const struct kvm_memory_slot *slot, gfn_t gfn,
unsigned int foll, bool *writable,
struct page **refcounted_page)
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 80/85] KVM: s390: Use kvm_release_page_dirty() to unpin "struct page" memory
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (78 preceding siblings ...)
2024-10-10 18:24 ` [PATCH v13 79/85] KVM: Drop gfn_to_pfn() APIs now that all users are gone Sean Christopherson
@ 2024-10-10 18:24 ` Sean Christopherson
2024-10-10 18:24 ` [PATCH v13 81/85] KVM: Make kvm_follow_pfn.refcounted_page a required field Sean Christopherson
` (6 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:24 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Use kvm_release_page_dirty() when unpinning guest pages, as the pfn was
retrieved via pin_guest_page(), i.e. is guaranteed to be backed by struct
page memory. This will allow dropping kvm_release_pfn_dirty() and
friends.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/s390/kvm/vsie.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
index 763a070f5955..e1fdf83879cf 100644
--- a/arch/s390/kvm/vsie.c
+++ b/arch/s390/kvm/vsie.c
@@ -670,7 +670,7 @@ static int pin_guest_page(struct kvm *kvm, gpa_t gpa, hpa_t *hpa)
/* Unpins a page previously pinned via pin_guest_page, marking it as dirty. */
static void unpin_guest_page(struct kvm *kvm, gpa_t gpa, hpa_t hpa)
{
- kvm_release_pfn_dirty(hpa >> PAGE_SHIFT);
+ kvm_release_page_dirty(pfn_to_page(hpa >> PAGE_SHIFT));
/* mark the page always as dirty for migration */
mark_page_dirty(kvm, gpa_to_gfn(gpa));
}
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 81/85] KVM: Make kvm_follow_pfn.refcounted_page a required field
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (79 preceding siblings ...)
2024-10-10 18:24 ` [PATCH v13 80/85] KVM: s390: Use kvm_release_page_dirty() to unpin "struct page" memory Sean Christopherson
@ 2024-10-10 18:24 ` Sean Christopherson
2024-10-10 18:24 ` [PATCH v13 82/85] KVM: x86/mmu: Don't mark "struct page" accessed when zapping SPTEs Sean Christopherson
` (5 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:24 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Now that the legacy gfn_to_pfn() APIs are gone, and all callers of
hva_to_pfn() pass in a refcounted_page pointer, make it a required field
to ensure all future usage in KVM plays nice.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/kvm_main.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b2c8d429442d..a483da96f4be 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2834,8 +2834,7 @@ static kvm_pfn_t kvm_resolve_pfn(struct kvm_follow_pfn *kfp, struct page *page,
pfn = page_to_pfn(page);
}
- if (kfp->refcounted_page)
- *kfp->refcounted_page = page;
+ *kfp->refcounted_page = page;
return pfn;
}
@@ -2986,6 +2985,9 @@ kvm_pfn_t hva_to_pfn(struct kvm_follow_pfn *kfp)
might_sleep();
+ if (WARN_ON_ONCE(!kfp->refcounted_page))
+ return KVM_PFN_ERR_FAULT;
+
if (hva_to_pfn_fast(kfp, &pfn))
return pfn;
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 82/85] KVM: x86/mmu: Don't mark "struct page" accessed when zapping SPTEs
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (80 preceding siblings ...)
2024-10-10 18:24 ` [PATCH v13 81/85] KVM: Make kvm_follow_pfn.refcounted_page a required field Sean Christopherson
@ 2024-10-10 18:24 ` Sean Christopherson
2024-10-10 18:24 ` [PATCH v13 83/85] KVM: arm64: Don't mark "struct page" accessed when making SPTE young Sean Christopherson
` (4 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:24 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Don't mark pages/folios as accessed in the primary MMU when zapping SPTEs,
as doing so relies on kvm_pfn_to_refcounted_page(), and generally speaking
is unnecessary and wasteful. KVM participates in page aging via
mmu_notifiers, so there's no need to push "accessed" updates to the
primary MMU.
And if KVM zaps a SPTe in response to an mmu_notifier, marking it accessed
_after_ the primary MMU has decided to zap the page is likely to go
unnoticed, i.e. odds are good that, if the page is being zapped for
reclaim, the page will be swapped out regardless of whether or not KVM
marks the page accessed.
Dropping x86's use of kvm_set_pfn_accessed() also paves the way for
removing kvm_pfn_to_refcounted_page() and all its users.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/mmu/mmu.c | 17 -----------------
arch/x86/kvm/mmu/tdp_mmu.c | 3 ---
2 files changed, 20 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 5acdaf3b1007..55eeca931e23 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -559,10 +559,8 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte)
*/
static u64 mmu_spte_clear_track_bits(struct kvm *kvm, u64 *sptep)
{
- kvm_pfn_t pfn;
u64 old_spte = *sptep;
int level = sptep_to_sp(sptep)->role.level;
- struct page *page;
if (!is_shadow_present_pte(old_spte) ||
!spte_has_volatile_bits(old_spte))
@@ -574,21 +572,6 @@ static u64 mmu_spte_clear_track_bits(struct kvm *kvm, u64 *sptep)
return old_spte;
kvm_update_page_stats(kvm, level, -1);
-
- pfn = spte_to_pfn(old_spte);
-
- /*
- * KVM doesn't hold a reference to any pages mapped into the guest, and
- * instead uses the mmu_notifier to ensure that KVM unmaps any pages
- * before they are reclaimed. Sanity check that, if the pfn is backed
- * by a refcounted page, the refcount is elevated.
- */
- page = kvm_pfn_to_refcounted_page(pfn);
- WARN_ON_ONCE(page && !page_count(page));
-
- if (is_accessed_spte(old_spte))
- kvm_set_pfn_accessed(pfn);
-
return old_spte;
}
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 8aa0d7a7602b..91caa73a905b 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -861,9 +861,6 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root,
tdp_mmu_iter_set_spte(kvm, &iter, SHADOW_NONPRESENT_VALUE);
- if (is_accessed_spte(iter.old_spte))
- kvm_set_pfn_accessed(spte_to_pfn(iter.old_spte));
-
/*
* Zappings SPTEs in invalid roots doesn't require a TLB flush,
* see kvm_tdp_mmu_zap_invalidated_roots() for details.
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 83/85] KVM: arm64: Don't mark "struct page" accessed when making SPTE young
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (81 preceding siblings ...)
2024-10-10 18:24 ` [PATCH v13 82/85] KVM: x86/mmu: Don't mark "struct page" accessed when zapping SPTEs Sean Christopherson
@ 2024-10-10 18:24 ` Sean Christopherson
2024-10-10 18:24 ` [PATCH v13 84/85] KVM: Drop APIs that manipulate "struct page" via pfns Sean Christopherson
` (3 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:24 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Don't mark pages/folios as accessed in the primary MMU when making a SPTE
young in KVM's secondary MMU, as doing so relies on
kvm_pfn_to_refcounted_page(), and generally speaking is unnecessary and
wasteful. KVM participates in page aging via mmu_notifiers, so there's no
need to push "accessed" updates to the primary MMU.
Dropping use of kvm_set_pfn_accessed() also paves the way for removing
kvm_pfn_to_refcounted_page() and all its users.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/arm64/include/asm/kvm_pgtable.h | 4 +---
arch/arm64/kvm/hyp/pgtable.c | 7 ++-----
arch/arm64/kvm/mmu.c | 6 +-----
3 files changed, 4 insertions(+), 13 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 03f4c3d7839c..aab04097b505 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -674,10 +674,8 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size);
*
* If there is a valid, leaf page-table entry used to translate @addr, then
* set the access flag in that entry.
- *
- * Return: The old page-table entry prior to setting the flag, 0 on failure.
*/
-kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr);
+void kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr);
/**
* kvm_pgtable_stage2_test_clear_young() - Test and optionally clear the access
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index b11bcebac908..40bd55966540 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -1245,19 +1245,16 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
NULL, NULL, 0);
}
-kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr)
+void kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr)
{
- kvm_pte_t pte = 0;
int ret;
ret = stage2_update_leaf_attrs(pgt, addr, 1, KVM_PTE_LEAF_ATTR_LO_S2_AF, 0,
- &pte, NULL,
+ NULL, NULL,
KVM_PGTABLE_WALK_HANDLE_FAULT |
KVM_PGTABLE_WALK_SHARED);
if (!ret)
dsb(ishst);
-
- return pte;
}
struct stage2_age_data {
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 4054356c9712..e2ae9005e333 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1706,18 +1706,14 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
/* Resolve the access fault by making the page young again. */
static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
{
- kvm_pte_t pte;
struct kvm_s2_mmu *mmu;
trace_kvm_access_fault(fault_ipa);
read_lock(&vcpu->kvm->mmu_lock);
mmu = vcpu->arch.hw_mmu;
- pte = kvm_pgtable_stage2_mkyoung(mmu->pgt, fault_ipa);
+ kvm_pgtable_stage2_mkyoung(mmu->pgt, fault_ipa);
read_unlock(&vcpu->kvm->mmu_lock);
-
- if (kvm_pte_valid(pte))
- kvm_set_pfn_accessed(kvm_pte_to_pfn(pte));
}
/**
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 84/85] KVM: Drop APIs that manipulate "struct page" via pfns
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (82 preceding siblings ...)
2024-10-10 18:24 ` [PATCH v13 83/85] KVM: arm64: Don't mark "struct page" accessed when making SPTE young Sean Christopherson
@ 2024-10-10 18:24 ` Sean Christopherson
2024-10-10 18:24 ` [PATCH v13 85/85] KVM: Don't grab reference on VM_MIXEDMAP pfns that have a "struct page" Sean Christopherson
` (2 subsequent siblings)
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:24 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Remove all kvm_{release,set}_pfn_*() APIs now that all users are gone.
No functional change intended.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
include/linux/kvm_host.h | 5 ----
virt/kvm/kvm_main.c | 55 ----------------------------------------
2 files changed, 60 deletions(-)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 4a1eaa40a215..d045f8310a48 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1274,11 +1274,6 @@ static inline kvm_pfn_t kvm_faultin_pfn(struct kvm_vcpu *vcpu, gfn_t gfn,
write ? FOLL_WRITE : 0, writable, refcounted_page);
}
-void kvm_release_pfn_clean(kvm_pfn_t pfn);
-void kvm_release_pfn_dirty(kvm_pfn_t pfn);
-void kvm_set_pfn_dirty(kvm_pfn_t pfn);
-void kvm_set_pfn_accessed(kvm_pfn_t pfn);
-
int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset,
int len);
int kvm_read_guest(struct kvm *kvm, gpa_t gpa, void *data, unsigned long len);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a483da96f4be..396ca14f18f3 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3164,61 +3164,6 @@ void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map)
}
EXPORT_SYMBOL_GPL(kvm_vcpu_unmap);
-void kvm_release_pfn_clean(kvm_pfn_t pfn)
-{
- struct page *page;
-
- if (is_error_noslot_pfn(pfn))
- return;
-
- page = kvm_pfn_to_refcounted_page(pfn);
- if (!page)
- return;
-
- kvm_release_page_clean(page);
-}
-EXPORT_SYMBOL_GPL(kvm_release_pfn_clean);
-
-void kvm_release_pfn_dirty(kvm_pfn_t pfn)
-{
- struct page *page;
-
- if (is_error_noslot_pfn(pfn))
- return;
-
- page = kvm_pfn_to_refcounted_page(pfn);
- if (!page)
- return;
-
- kvm_release_page_dirty(page);
-}
-EXPORT_SYMBOL_GPL(kvm_release_pfn_dirty);
-
-/*
- * Note, checking for an error/noslot pfn is the caller's responsibility when
- * directly marking a page dirty/accessed. Unlike the "release" helpers, the
- * "set" helpers are not to be used when the pfn might point at garbage.
- */
-void kvm_set_pfn_dirty(kvm_pfn_t pfn)
-{
- if (WARN_ON(is_error_noslot_pfn(pfn)))
- return;
-
- if (pfn_valid(pfn))
- kvm_set_page_dirty(pfn_to_page(pfn));
-}
-EXPORT_SYMBOL_GPL(kvm_set_pfn_dirty);
-
-void kvm_set_pfn_accessed(kvm_pfn_t pfn)
-{
- if (WARN_ON(is_error_noslot_pfn(pfn)))
- return;
-
- if (pfn_valid(pfn))
- kvm_set_page_accessed(pfn_to_page(pfn));
-}
-EXPORT_SYMBOL_GPL(kvm_set_pfn_accessed);
-
static int next_segment(unsigned long len, int offset)
{
if (len > PAGE_SIZE - offset)
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH v13 85/85] KVM: Don't grab reference on VM_MIXEDMAP pfns that have a "struct page"
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (83 preceding siblings ...)
2024-10-10 18:24 ` [PATCH v13 84/85] KVM: Drop APIs that manipulate "struct page" via pfns Sean Christopherson
@ 2024-10-10 18:24 ` Sean Christopherson
2024-10-17 17:40 ` [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Paolo Bonzini
2024-10-24 3:37 ` Dmitry Osipenko
86 siblings, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2024-10-10 18:24 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda, Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
Now that KVM no longer relies on an ugly heuristic to find its struct page
references, i.e. now that KVM can't get false positives on VM_MIXEDMAP
pfns, remove KVM's hack to elevate the refcount for pfns that happen to
have a valid struct page. In addition to removing a long-standing wart
in KVM, this allows KVM to map non-refcounted struct page memory into the
guest, e.g. for exposing GPU TTM buffers to KVM guests.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
include/linux/kvm_host.h | 3 --
virt/kvm/kvm_main.c | 75 ++--------------------------------------
2 files changed, 2 insertions(+), 76 deletions(-)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d045f8310a48..02f0206fd2dc 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1730,9 +1730,6 @@ void kvm_arch_sync_events(struct kvm *kvm);
int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu);
-struct page *kvm_pfn_to_refcounted_page(kvm_pfn_t pfn);
-bool kvm_is_zone_device_page(struct page *page);
-
struct kvm_irq_ack_notifier {
struct hlist_node link;
unsigned gsi;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 396ca14f18f3..b1b10dc408a0 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -160,52 +160,6 @@ __weak void kvm_arch_guest_memory_reclaimed(struct kvm *kvm)
{
}
-bool kvm_is_zone_device_page(struct page *page)
-{
- /*
- * The metadata used by is_zone_device_page() to determine whether or
- * not a page is ZONE_DEVICE is guaranteed to be valid if and only if
- * the device has been pinned, e.g. by get_user_pages(). WARN if the
- * page_count() is zero to help detect bad usage of this helper.
- */
- if (WARN_ON_ONCE(!page_count(page)))
- return false;
-
- return is_zone_device_page(page);
-}
-
-/*
- * Returns a 'struct page' if the pfn is "valid" and backed by a refcounted
- * page, NULL otherwise. Note, the list of refcounted PG_reserved page types
- * is likely incomplete, it has been compiled purely through people wanting to
- * back guest with a certain type of memory and encountering issues.
- */
-struct page *kvm_pfn_to_refcounted_page(kvm_pfn_t pfn)
-{
- struct page *page;
-
- if (!pfn_valid(pfn))
- return NULL;
-
- page = pfn_to_page(pfn);
- if (!PageReserved(page))
- return page;
-
- /* The ZERO_PAGE(s) is marked PG_reserved, but is refcounted. */
- if (is_zero_pfn(pfn))
- return page;
-
- /*
- * ZONE_DEVICE pages currently set PG_reserved, but from a refcounting
- * perspective they are "normal" pages, albeit with slightly different
- * usage rules.
- */
- if (kvm_is_zone_device_page(page))
- return page;
-
- return NULL;
-}
-
/*
* Switches to specified vcpu, until a matching vcpu_put()
*/
@@ -2804,35 +2758,10 @@ static kvm_pfn_t kvm_resolve_pfn(struct kvm_follow_pfn *kfp, struct page *page,
if (kfp->map_writable)
*kfp->map_writable = writable;
- /*
- * FIXME: Remove this once KVM no longer blindly calls put_page() on
- * every pfn that points at a struct page.
- *
- * Get a reference for follow_pte() pfns if they happen to point at a
- * struct page, as KVM will ultimately call kvm_release_pfn_clean() on
- * the returned pfn, i.e. KVM expects to have a reference.
- *
- * Certain IO or PFNMAP mappings can be backed with valid struct pages,
- * but be allocated without refcounting, e.g. tail pages of
- * non-compound higher order allocations. Grabbing and putting a
- * reference to such pages would cause KVM to prematurely free a page
- * it doesn't own (KVM gets and puts the one and only reference).
- * Don't allow those pages until the FIXME is resolved.
- *
- * Don't grab a reference for pins, callers that pin pages are required
- * to check refcounted_page, i.e. must not blindly release the pfn.
- */
- if (map) {
+ if (map)
pfn = map->pfn;
-
- if (!kfp->pin) {
- page = kvm_pfn_to_refcounted_page(pfn);
- if (page && !get_page_unless_zero(page))
- return KVM_PFN_ERR_FAULT;
- }
- } else {
+ else
pfn = page_to_pfn(page);
- }
*kfp->refcounted_page = page;
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 99+ messages in thread
* Re: [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (84 preceding siblings ...)
2024-10-10 18:24 ` [PATCH v13 85/85] KVM: Don't grab reference on VM_MIXEDMAP pfns that have a "struct page" Sean Christopherson
@ 2024-10-17 17:40 ` Paolo Bonzini
2024-10-22 0:25 ` Sean Christopherson
2024-10-24 3:37 ` Dmitry Osipenko
86 siblings, 1 reply; 99+ messages in thread
From: Paolo Bonzini @ 2024-10-17 17:40 UTC (permalink / raw)
To: Sean Christopherson
Cc: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
Michael Ellerman, Anup Patel, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
On Thu, Oct 10, 2024 at 8:24 PM Sean Christopherson <seanjc@google.com> wrote:
> v13:
> - Rebased onto v6.12-rc2
> - Collect reviews. [Alex and others]
> - Fix a transient bug in arm64 and RISC-V where KVM would leak a page
> refcount. [Oliver]
> - Fix a dangling comment. [Alex]
> - Drop kvm_lookup_pfn(), as the x86 that "needed" it was stupid and is (was?)
> eliminated in v6.12.
> - Drop check_user_page_hwpoison(). [Paolo]
> - Drop the arm64 MTE fixes that went into 6.12.
> - Slightly redo the guest_memfd interaction to account for 6.12 changes.
Here is my own summary of the changes:
patches removed from v12:
01/02 - already upstream
09 - moved to separate A/D series [1]
34 - not needed due to new patch 36
35 - gone after 620525739521376a65a690df899e1596d56791f8
patches added or substantially changed in v13:
05/06/07 - new, suggested by Yan Zhao
08 - code was folded from mmu_spte_age into kvm_rmap_age_gfn_range
14 - new, suggested by me in reply to 84/84 (yuck)
15 - new, suggested by me in reply to 84/84
19 - somewhat rewritten for new follow_pfnmap API
27 - smaller changes due to new follow_pfnmap API
36 - rewritten, suggested by me
45 - new, cleanup
46 - much simplified due to new patch 45
Looks good to me, thanks and congratulations!! Should we merge it in
kvm/next asap?
Paolo
[1] https://patchew.org/linux/20241011021051.1557902-1-seanjc@google.com/20241011021051.1557902-5-seanjc@google.com/
^ permalink raw reply [flat|nested] 99+ messages in thread
* Re: [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages
2024-10-17 17:40 ` [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Paolo Bonzini
@ 2024-10-22 0:25 ` Sean Christopherson
2024-10-25 17:41 ` Paolo Bonzini
0 siblings, 1 reply; 99+ messages in thread
From: Sean Christopherson @ 2024-10-22 0:25 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
Michael Ellerman, Anup Patel, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
On Thu, Oct 17, 2024, Paolo Bonzini wrote:
> On Thu, Oct 10, 2024 at 8:24 PM Sean Christopherson <seanjc@google.com> wrote:
> > v13:
> > - Rebased onto v6.12-rc2
> > - Collect reviews. [Alex and others]
> > - Fix a transient bug in arm64 and RISC-V where KVM would leak a page
> > refcount. [Oliver]
> > - Fix a dangling comment. [Alex]
> > - Drop kvm_lookup_pfn(), as the x86 that "needed" it was stupid and is (was?)
> > eliminated in v6.12.
> > - Drop check_user_page_hwpoison(). [Paolo]
> > - Drop the arm64 MTE fixes that went into 6.12.
> > - Slightly redo the guest_memfd interaction to account for 6.12 changes.
>
> Here is my own summary of the changes:
Yep, looks right to me.
> patches removed from v12:
> 01/02 - already upstream
> 09 - moved to separate A/D series [1]
> 34 - not needed due to new patch 36
> 35 - gone after 620525739521376a65a690df899e1596d56791f8
>
> patches added or substantially changed in v13:
> 05/06/07 - new, suggested by Yan Zhao
> 08 - code was folded from mmu_spte_age into kvm_rmap_age_gfn_range
> 14 - new, suggested by me in reply to 84/84 (yuck)
> 15 - new, suggested by me in reply to 84/84
> 19 - somewhat rewritten for new follow_pfnmap API
> 27 - smaller changes due to new follow_pfnmap API
> 36 - rewritten, suggested by me
> 45 - new, cleanup
> 46 - much simplified due to new patch 45
>
> Looks good to me, thanks and congratulations!! Should we merge it in
> kvm/next asap?
That has my vote, though I'm obvious extremely biased :-)
^ permalink raw reply [flat|nested] 99+ messages in thread
* Re: [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages
2024-10-22 0:25 ` Sean Christopherson
@ 2024-10-25 17:41 ` Paolo Bonzini
0 siblings, 0 replies; 99+ messages in thread
From: Paolo Bonzini @ 2024-10-25 17:41 UTC (permalink / raw)
To: Sean Christopherson
Cc: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
Michael Ellerman, Anup Patel, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
On Tue, Oct 22, 2024 at 2:25 AM Sean Christopherson <seanjc@google.com> wrote:
> > Looks good to me, thanks and congratulations!! Should we merge it in
> > kvm/next asap?
>
> That has my vote, though I'm obvious extremely biased :-)
Your wish is my command... Merged.
Paolo
^ permalink raw reply [flat|nested] 99+ messages in thread
* Re: [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages
2024-10-10 18:23 [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
` (85 preceding siblings ...)
2024-10-17 17:40 ` [PATCH v13 00/85] KVM: Stop grabbing references to PFNMAP'd pages Paolo Bonzini
@ 2024-10-24 3:37 ` Dmitry Osipenko
86 siblings, 0 replies; 99+ messages in thread
From: Dmitry Osipenko @ 2024-10-24 3:37 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Marc Zyngier, Oliver Upton,
Tianrui Zhao, Bibo Mao, Huacai Chen, Michael Ellerman, Anup Patel,
Paul Walmsley, Palmer Dabbelt, Albert Ou, Christian Borntraeger,
Janosch Frank, Claudio Imbrenda
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
Alex Bennée, Yan Zhao, David Matlack, David Stevens,
Andrew Jones
On 10/10/24 21:23, Sean Christopherson wrote:
> TL;DR: Eliminate KVM's long-standing (and heinous) behavior of essentially
> guessing which pfns are refcounted pages (see kvm_pfn_to_refcounted_page()).
>
> Getting there requires "fixing" arch code that isn't obviously broken.
> Specifically, to get rid of kvm_pfn_to_refcounted_page(), KVM needs to
> stop marking pages/folios dirty/accessed based solely on the pfn that's
> stored in KVM's stage-2 page tables.
>
> Instead of tracking which SPTEs correspond to refcounted pages, simply
> remove all of the code that operates on "struct page" based ona the pfn
> in stage-2 PTEs. This is the back ~40-50% of the series.
>
> For x86 in particular, which sets accessed/dirty status when that info
> would be "lost", e.g. when SPTEs are zapped or KVM clears the dirty flag
> in a SPTE, foregoing the updates provides very measurable performance
> improvements for related operations. E.g. when clearing dirty bits as
> part of dirty logging, and zapping SPTEs to reconstitue huge pages when
> disabling dirty logging.
>
> The front ~40% of the series is cleanups and prep work, and most of it is
> x86 focused (purely because x86 added the most special cases, *sigh*).
> E.g. several of the inputs to hva_to_pfn() (and it's myriad wrappers),
> can be removed by cleaning up and deduplicating x86 code.
>
> v13:
> - Rebased onto v6.12-rc2
> - Collect reviews. [Alex and others]
> - Fix a transient bug in arm64 and RISC-V where KVM would leak a page
> refcount. [Oliver]
> - Fix a dangling comment. [Alex]
> - Drop kvm_lookup_pfn(), as the x86 that "needed" it was stupid and is (was?)
> eliminated in v6.12.
> - Drop check_user_page_hwpoison(). [Paolo]
> - Drop the arm64 MTE fixes that went into 6.12.
> - Slightly redo the guest_memfd interaction to account for 6.12 changes.
Thanks a lot for working on this patchset! I tested it with native
amdgpu/intel contexts and venus/virgl with dGPU and iGPU, no problems
spotted. Please merge sooner, this will unblock lots of new virtio-gpu
features.
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
--
Best regards,
Dmitry
^ permalink raw reply [flat|nested] 99+ messages in thread