[PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages

linux-mips.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages
@ 2024-07-26 23:51 Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 01/84] KVM: arm64: Release pfn, i.e. put page, if copying MTE tags hits ZONE_DEVICE Sean Christopherson
                   ` (85 more replies)
  0 siblings, 86 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

arm64 folks, the first two patches are bug fixes, but I have very low
confidence that they are correct and/or desirable.  If they are more or
less correct, I can post them separately if that'd make life easier.  I
included them here to avoid conflicts, and because I'm pretty sure how
KVM deals with MTE tags vs. dirty logging will impact what APIs KVM needs
to provide to arch code.

On to the series...  The TL;DR is that I would like to get input on two
things:

 1. Marking folios dirty/accessed only on the intial stage-2 page fault
 2. The new APIs for faulting, prefetching, and doing "lookups" on pfns

This is (spiritually) v12 of David Steven's series to play nice with pfns
that are "valid", i.e. have a struct page, but are not refcounted.  Whereas
David's series only (mostly) fixed things for x86, this series goes for
broke and completely eliminates KVM's long-standing (and heinous) behavior
of essentially guessing which pfns are refcounted pages (see
kvm_pfn_to_refcounted_page()).

Getting there requires "fixing" arch code that isn't obviously broken.
Specifically, to get rid of kvm_pfn_to_refcounted_page(), KVM needs to
stop marking pages/folios dirty/accessed based solely on the pfn that's
stored in KVM's stage-2 page tables.  In v11, KVM x86 did this by tagging
SPTEs with a flag (using a software-available bit).

But that isn't a viable option for some flavors of x86 (we're out of
software-available bits), and more importantly I've convinced myself[*]
that marking folios _dirty_ after SPTEs have been installed is completely
unnecessary, and that marking folios accessed is likewise unnecessary for
real world workloads (and if this is a sticking point, I have ideas on
how to handle it more gracefully; more at the bottom).

So, instead of tracking which SPTEs correspond to refcounted pages, v12
simply removes all of the code that operates on "struct page" based on
the pfn in stage-2 PTEs.  This is the back ~40-50% of the series.  Most
of the patches are relevatively uninteresting from a code perspective,
it's the concept itself (of not marking folios dirty/accessed from SPTEs)
that needs discussion.

For x86 in particular, which sets accessed/dirty status when that info
would be "lost", e.g. when SPTEs are zapped or KVM clears the dirty flag
in a SPTE, foregoing the updates provides very measurable performance
improvements for related operations.  E.g. when clearing dirty bits as
part of dirty logging, and zapping SPTEs to reconstitue huge pages when
disabling dirty logging.

The other big change from v11 is that I opted to go with dedicated,
specific, and hopefully descriptive APIs to wrap kvm_follow_pfn() instead
of expose the "inner" helper to arch code.  In part because I still don't
love kvm_follow_pfn(), and fewer callers means its easier to change if/when
someone comes up with a better name.  But also because I think/hope that
having dedicated APIs will make it easier for arch developers to understand
what is the right/preferred way to do certain operations.  E.g. so that all
architectures use the same core flow for handling stage-2 page faults.
Long term, I would love to standardize that code even more, but this series
is already waaaay too big.

Along the way, I also discovered that several of the inputs to hva_to_pfn()
(and it's myriad wrappers) could be removed.  E.g. the rather weirdly named
@atomic flag can be removed by deduplicating x86's prefetching code.

As for capturing accessed information on zapped SPTEs, e.g. to prevent
losing accessed information because NUMA balancing mucks with things, my
thought is that arch code can preserve the accessed information in SPTEs
that are unmapped/zapped because protections were modified, e.g. so that
LRU-initiated aging can still collect information.  I'm not at all
convinced that this is necessary outside of tests that care about exact
counts, e.g. KVM selftests, but I'll post an RFC KVM x86 series to get
the conversation started.

Note, I'm purposefully not capturing the delta from v11=>v12, because
there is zero chance I will get everything, and while this is a spiritual
successor to David's v11, in practice it's like 98% new code.

[*] https://lore.kernel.org/all/20240320005024.3216282-1-seanjc@google.com

David Stevens (3):
  KVM: Replace "async" pointer in gfn=>pfn with "no_wait" and error code
  KVM: Introduce kvm_follow_pfn() to eventually replace "gfn_to_pfn"
    APIs
  KVM: Migrate kvm_vcpu_map() to kvm_follow_pfn()

Sean Christopherson (81):
  KVM: arm64: Release pfn, i.e. put page, if copying MTE tags hits
    ZONE_DEVICE
  KVM: arm64: Disallow copying MTE to guest memory while KVM is dirty
    logging
  KVM: Drop KVM_ERR_PTR_BAD_PAGE and instead return NULL to indicate an
    error
  KVM: Allow calling kvm_release_page_{clean,dirty}() on a NULL page
    pointer
  KVM: Add kvm_release_page_unused() API to put pages that KVM never
    consumes
  KVM: x86/mmu: Skip the "try unsync" path iff the old SPTE was a leaf
    SPTE
  KVM: x86/mmu: Mark folio dirty when creating SPTE, not when
    zapping/modifying
  KVM: x86/mmu: Mark page/folio accessed only when zapping leaf SPTEs
  KVM: x86/mmu: Don't force flush if SPTE update clears Accessed bit
  KVM: x86/mmu: Use gfn_to_page_many_atomic() when prefetching indirect
    PTEs
  KVM: Rename gfn_to_page_many_atomic() to kvm_prefetch_pages()
  KVM: Drop @atomic param from gfn=>pfn and hva=>pfn APIs
  KVM: Annotate that all paths in hva_to_pfn() might sleep
  KVM: x86/mmu: Drop kvm_page_fault.hva, i.e. don't track intermediate
    hva
  KVM: Drop unused "hva" pointer from __gfn_to_pfn_memslot()
  KVM: Remove pointless sanity check on @map param to kvm_vcpu_(un)map()
  KVM: Explicitly initialize all fields at the start of kvm_vcpu_map()
  KVM: Use NULL for struct page pointer to indicate mremapped memory
  KVM: nVMX: Rely on kvm_vcpu_unmap() to track validity of eVMCS mapping
  KVM: nVMX: Drop pointless msr_bitmap_map field from struct nested_vmx
  KVM: nVMX: Add helper to put (unmap) vmcs12 pages
  KVM: Use plain "struct page" pointer instead of single-entry array
  KVM: Provide refcounted page as output field in struct kvm_follow_pfn
  KVM: Move kvm_{set,release}_page_{clean,dirty}() helpers up in
    kvm_main.c
  KVM: pfncache: Precisely track refcounted pages
  KVM: Pin (as in FOLL_PIN) pages during kvm_vcpu_map()
  KVM: nVMX: Mark vmcs12's APIC access page dirty when unmapping
  KVM: Pass in write/dirty to kvm_vcpu_map(), not kvm_vcpu_unmap()
  KVM: Get writable mapping for __kvm_vcpu_map() only when necessary
  KVM: Disallow direct access (w/o mmu_notifier) to unpinned pfn by
    default
  KVM: Add a helper to lookup a pfn without grabbing a reference
  KVM: x86: Use kvm_lookup_pfn() to check if retrying #PF is useful
  KVM: x86: Use kvm_lookup_pfn() to check if APIC access page was
    installed
  KVM: x86/mmu: Add "mmu" prefix fault-in helpers to free up generic
    names
  KVM: x86/mmu: Put direct prefetched pages via kvm_release_page_clean()
  KVM: x86/mmu: Add common helper to handle prefetching SPTEs
  KVM: x86/mmu: Add helper to "finish" handling a guest page fault
  KVM: x86/mmu: Mark pages/folios dirty at the origin of make_spte()
  KVM: Move declarations of memslot accessors up in kvm_host.h
  KVM: Add kvm_faultin_pfn() to specifically service guest page faults
  KVM: x86/mmu: Convert page fault paths to kvm_faultin_pfn()
  KVM: guest_memfd: Provide "struct page" as output from
    kvm_gmem_get_pfn()
  KVM: x86/mmu: Put refcounted pages instead of blindly releasing pfns
  KVM: x86/mmu: Don't mark unused faultin pages as accessed
  KVM: Move x86's API to release a faultin page to common KVM
  KVM: VMX: Hold mmu_lock until page is released when updating APIC
    access page
  KVM: VMX: Use __kvm_faultin_page() to get APIC access page/pfn
  KVM: PPC: e500: Mark "struct page" dirty in kvmppc_e500_shadow_map()
  KVM: PPC: e500: Mark "struct page" pfn accessed before dropping
    mmu_lock
  KVM: PPC: e500: Use __kvm_faultin_pfn() to handle page faults
  KVM: arm64: Mark "struct page" pfns accessed/dirty before dropping
    mmu_lock
  KVM: arm64: Use __kvm_faultin_pfn() to handle memory aborts
  KVM: RISC-V: Mark "struct page" pfns dirty iff a stage-2 PTE is
    installed
  KVM: RISC-V: Mark "struct page" pfns accessed before dropping mmu_lock
  KVM: RISC-V: Use kvm_faultin_pfn() when mapping pfns into the guest
  KVM: PPC: Use __kvm_faultin_pfn() to handle page faults on Book3s HV
  KVM: PPC: Use __kvm_faultin_pfn() to handle page faults on Book3s
    Radix
  KVM: PPC: Drop unused @kvm_ro param from
    kvmppc_book3s_instantiate_page()
  KVM: PPC: Book3S: Mark "struct page" pfns dirty/accessed after
    installing PTE
  KVM: PPC: Use kvm_faultin_pfn() to handle page faults on Book3s PR
  KVM: LoongArch: Mark "struct page" pfns dirty only in "slow" page
    fault path
  KVM: LoongArch: Mark "struct page" pfns accessed only in "slow" page
    fault path
  KVM: LoongArch: Mark "struct page" pfn accessed before dropping
    mmu_lock
  KVM: LoongArch: Use kvm_faultin_pfn() to map pfns into the guest
  KVM: MIPS: Mark "struct page" pfns dirty only in "slow" page fault
    path
  KVM: MIPS: Mark "struct page" pfns accessed only in "slow" page fault
    path
  KVM: MIPS: Mark "struct page" pfns accessed prior to dropping mmu_lock
  KVM: MIPS: Use kvm_faultin_pfn() to map pfns into the guest
  KVM: PPC: Remove extra get_page() to fix page refcount leak
  KVM: PPC: Use kvm_vcpu_map() to map guest memory to patch dcbz
    instructions
  KVM: Convert gfn_to_page() to use kvm_follow_pfn()
  KVM: Add support for read-only usage of gfn_to_page()
  KVM: arm64: Use __gfn_to_page() when copying MTE tags to/from
    userspace
  KVM: PPC: Explicitly require struct page memory for Ultravisor sharing
  KVM: Drop gfn_to_pfn() APIs now that all users are gone
  KVM: s390: Use kvm_release_page_dirty() to unpin "struct page" memory
  KVM: Make kvm_follow_pfn.refcounted_page a required field
  KVM: x86/mmu: Don't mark "struct page" accessed when zapping SPTEs
  KVM: arm64: Don't mark "struct page" accessed when making SPTE young
  KVM: Drop APIs that manipulate "struct page" via pfns
  KVM: Don't grab reference on VM_MIXEDMAP pfns that have a "struct
    page"

 Documentation/virt/kvm/locking.rst     |  80 ++--
 arch/arm64/include/asm/kvm_pgtable.h   |   4 +-
 arch/arm64/kvm/guest.c                 |  25 +-
 arch/arm64/kvm/hyp/pgtable.c           |   7 +-
 arch/arm64/kvm/mmu.c                   |  21 +-
 arch/loongarch/kvm/mmu.c               |  40 +-
 arch/mips/kvm/mmu.c                    |  26 +-
 arch/powerpc/include/asm/kvm_book3s.h  |   4 +-
 arch/powerpc/kvm/book3s.c              |   7 +-
 arch/powerpc/kvm/book3s_32_mmu_host.c  |   7 +-
 arch/powerpc/kvm/book3s_64_mmu_host.c  |  12 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c    |  25 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c |  35 +-
 arch/powerpc/kvm/book3s_hv_nested.c    |   4 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c     |  25 +-
 arch/powerpc/kvm/book3s_pr.c           |  14 +-
 arch/powerpc/kvm/book3s_xive_native.c  |   2 +-
 arch/powerpc/kvm/e500_mmu_host.c       |  19 +-
 arch/riscv/kvm/mmu.c                   |   9 +-
 arch/s390/kvm/vsie.c                   |   4 +-
 arch/x86/kvm/lapic.c                   |  15 +-
 arch/x86/kvm/mmu/mmu.c                 | 191 ++++----
 arch/x86/kvm/mmu/mmu_internal.h        |   5 +-
 arch/x86/kvm/mmu/paging_tmpl.h         |  29 +-
 arch/x86/kvm/mmu/spte.c                |  23 +-
 arch/x86/kvm/mmu/tdp_mmu.c             |  16 -
 arch/x86/kvm/svm/nested.c              |   4 +-
 arch/x86/kvm/svm/sev.c                 |  12 +-
 arch/x86/kvm/svm/svm.c                 |   8 +-
 arch/x86/kvm/vmx/nested.c              |  42 +-
 arch/x86/kvm/vmx/vmx.c                 |  28 +-
 arch/x86/kvm/vmx/vmx.h                 |   2 -
 arch/x86/kvm/x86.c                     |  16 +-
 include/linux/kvm_host.h               | 124 +++--
 virt/kvm/guest_memfd.c                 |  19 +-
 virt/kvm/kvm_main.c                    | 603 ++++++++++---------------
 virt/kvm/kvm_mm.h                      |  36 +-
 virt/kvm/pfncache.c                    |  20 +-
 38 files changed, 698 insertions(+), 865 deletions(-)

base-commit: 332d2c1d713e232e163386c35a3ba0c1b90df83f
-- 
2.46.0.rc1.232.g9752f9e123-goog

^ permalink raw reply	[flat|nested] 150+ messages in thread

* [PATCH v12 01/84] KVM: arm64: Release pfn, i.e. put page, if copying MTE tags hits ZONE_DEVICE
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-31 16:23   ` Alex Bennée
                     ` (3 more replies)
  2024-07-26 23:51 ` [PATCH v12 02/84] KVM: arm64: Disallow copying MTE to guest memory while KVM is dirty logging Sean Christopherson
                   ` (84 subsequent siblings)
  85 siblings, 4 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Put the page reference acquired by gfn_to_pfn_prot() if
kvm_vm_ioctl_mte_copy_tags() runs into ZONE_DEVICE memory.  KVM's less-
than-stellar heuristics for dealing with pfn-mapped memory means that KVM
can get a page reference to ZONE_DEVICE memory.

Fixes: f0376edb1ddc ("KVM: arm64: Add ioctl to fetch/store tags in a guest")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm64/kvm/guest.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 11098eb7eb44..e1f0ff08836a 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -1059,6 +1059,7 @@ int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
 		page = pfn_to_online_page(pfn);
 		if (!page) {
 			/* Reject ZONE_DEVICE memory */
+			kvm_release_pfn_clean(pfn);
 			ret = -EFAULT;
 			goto out;
 		}
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 02/84] KVM: arm64: Disallow copying MTE to guest memory while KVM is dirty logging
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 01/84] KVM: arm64: Release pfn, i.e. put page, if copying MTE tags hits ZONE_DEVICE Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-08-01  7:34   ` Aneesh Kumar K.V
                     ` (2 more replies)
  2024-07-26 23:51 ` [PATCH v12 03/84] KVM: Drop KVM_ERR_PTR_BAD_PAGE and instead return NULL to indicate an error Sean Christopherson
                   ` (83 subsequent siblings)
  85 siblings, 3 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Disallow copying MTE tags to guest memory while KVM is dirty logging, as
writing guest memory without marking the gfn as dirty in the memslot could
result in userspace failing to migrate the updated page.  Ideally (maybe?),
KVM would simply mark the gfn as dirty, but there is no vCPU to work with,
and presumably the only use case for copy MTE tags _to_ the guest is when
restoring state on the target.

Fixes: f0376edb1ddc ("KVM: arm64: Add ioctl to fetch/store tags in a guest")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm64/kvm/guest.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index e1f0ff08836a..962f985977c2 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -1045,6 +1045,11 @@ int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
 
 	mutex_lock(&kvm->slots_lock);
 
+	if (write && atomic_read(&kvm->nr_memslots_dirty_logging)) {
+		ret = -EBUSY;
+		goto out;
+	}
+
 	while (length > 0) {
 		kvm_pfn_t pfn = gfn_to_pfn_prot(kvm, gfn, write, NULL);
 		void *maddr;
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 03/84] KVM: Drop KVM_ERR_PTR_BAD_PAGE and instead return NULL to indicate an error
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 01/84] KVM: arm64: Release pfn, i.e. put page, if copying MTE tags hits ZONE_DEVICE Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 02/84] KVM: arm64: Disallow copying MTE to guest memory while KVM is dirty logging Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-08-01  8:57   ` Alex Bennée
  2024-07-26 23:51 ` [PATCH v12 04/84] KVM: Allow calling kvm_release_page_{clean,dirty}() on a NULL page pointer Sean Christopherson
                   ` (82 subsequent siblings)
  85 siblings, 1 reply; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Remove KVM_ERR_PTR_BAD_PAGE and instead return NULL, as "bad page" is just
a leftover bit of weirdness from days of old when KVM stuffed a "bad" page
into the guest instead of actually handling missing pages.  See commit
cea7bb21280e ("KVM: MMU: Make gfn_to_page() always safe").

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/powerpc/kvm/book3s_pr.c          |  2 +-
 arch/powerpc/kvm/book3s_xive_native.c |  2 +-
 arch/s390/kvm/vsie.c                  |  2 +-
 arch/x86/kvm/lapic.c                  |  2 +-
 include/linux/kvm_host.h              |  7 -------
 virt/kvm/kvm_main.c                   | 15 ++++++---------
 6 files changed, 10 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index a7d7137ea0c8..1bdcd4ee4813 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -645,7 +645,7 @@ static void kvmppc_patch_dcbz(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte)
 	int i;
 
 	hpage = gfn_to_page(vcpu->kvm, pte->raddr >> PAGE_SHIFT);
-	if (is_error_page(hpage))
+	if (!hpage)
 		return;
 
 	hpage_offset = pte->raddr & ~PAGE_MASK;
diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
index 6e2ebbd8aaac..d9bf1bc3ff61 100644
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -654,7 +654,7 @@ static int kvmppc_xive_native_set_queue_config(struct kvmppc_xive *xive,
 	}
 
 	page = gfn_to_page(kvm, gfn);
-	if (is_error_page(page)) {
+	if (!page) {
 		srcu_read_unlock(&kvm->srcu, srcu_idx);
 		pr_err("Couldn't get queue page %llx!\n", kvm_eq.qaddr);
 		return -EINVAL;
diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
index 54deafd0d698..566697ee37eb 100644
--- a/arch/s390/kvm/vsie.c
+++ b/arch/s390/kvm/vsie.c
@@ -661,7 +661,7 @@ static int pin_guest_page(struct kvm *kvm, gpa_t gpa, hpa_t *hpa)
 	struct page *page;
 
 	page = gfn_to_page(kvm, gpa_to_gfn(gpa));
-	if (is_error_page(page))
+	if (!page)
 		return -EINVAL;
 	*hpa = (hpa_t)page_to_phys(page) + (gpa & ~PAGE_MASK);
 	return 0;
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index a7172ba59ad2..6d65b36fac29 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2629,7 +2629,7 @@ int kvm_alloc_apic_access_page(struct kvm *kvm)
 	}
 
 	page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
-	if (is_error_page(page)) {
+	if (!page) {
 		ret = -EFAULT;
 		goto out;
 	}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 689e8be873a7..3d9617d1de41 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -153,13 +153,6 @@ static inline bool kvm_is_error_gpa(gpa_t gpa)
 	return gpa == INVALID_GPA;
 }
 
-#define KVM_ERR_PTR_BAD_PAGE	(ERR_PTR(-ENOENT))
-
-static inline bool is_error_page(struct page *page)
-{
-	return IS_ERR(page);
-}
-
 #define KVM_REQUEST_MASK           GENMASK(7,0)
 #define KVM_REQUEST_NO_WAKEUP      BIT(8)
 #define KVM_REQUEST_WAIT           BIT(9)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d0788d0a72cc..fd8c212b8de7 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3085,19 +3085,14 @@ EXPORT_SYMBOL_GPL(gfn_to_page_many_atomic);
  */
 struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
 {
-	struct page *page;
 	kvm_pfn_t pfn;
 
 	pfn = gfn_to_pfn(kvm, gfn);
 
 	if (is_error_noslot_pfn(pfn))
-		return KVM_ERR_PTR_BAD_PAGE;
+		return NULL;
 
-	page = kvm_pfn_to_refcounted_page(pfn);
-	if (!page)
-		return KVM_ERR_PTR_BAD_PAGE;
-
-	return page;
+	return kvm_pfn_to_refcounted_page(pfn);
 }
 EXPORT_SYMBOL_GPL(gfn_to_page);
 
@@ -3191,7 +3186,8 @@ static void kvm_set_page_accessed(struct page *page)
 
 void kvm_release_page_clean(struct page *page)
 {
-	WARN_ON(is_error_page(page));
+	if (WARN_ON(!page))
+		return;
 
 	kvm_set_page_accessed(page);
 	put_page(page);
@@ -3215,7 +3211,8 @@ EXPORT_SYMBOL_GPL(kvm_release_pfn_clean);
 
 void kvm_release_page_dirty(struct page *page)
 {
-	WARN_ON(is_error_page(page));
+	if (WARN_ON(!page))
+		return;
 
 	kvm_set_page_dirty(page);
 	kvm_release_page_clean(page);
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 04/84] KVM: Allow calling kvm_release_page_{clean,dirty}() on a NULL page pointer
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (2 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 03/84] KVM: Drop KVM_ERR_PTR_BAD_PAGE and instead return NULL to indicate an error Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-08-01  9:03   ` Alex Bennée
  2024-07-26 23:51 ` [PATCH v12 05/84] KVM: Add kvm_release_page_unused() API to put pages that KVM never consumes Sean Christopherson
                   ` (81 subsequent siblings)
  85 siblings, 1 reply; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Allow passing a NULL @page to kvm_release_page_{clean,dirty}(), there's no
tangible benefit to forcing the callers to pre-check @page, and it ends up
generating a lot of duplicate boilerplate code.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 virt/kvm/kvm_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index fd8c212b8de7..656e931ac39e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3186,7 +3186,7 @@ static void kvm_set_page_accessed(struct page *page)
 
 void kvm_release_page_clean(struct page *page)
 {
-	if (WARN_ON(!page))
+	if (!page)
 		return;
 
 	kvm_set_page_accessed(page);
@@ -3211,7 +3211,7 @@ EXPORT_SYMBOL_GPL(kvm_release_pfn_clean);
 
 void kvm_release_page_dirty(struct page *page)
 {
-	if (WARN_ON(!page))
+	if (!page)
 		return;
 
 	kvm_set_page_dirty(page);
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 05/84] KVM: Add kvm_release_page_unused() API to put pages that KVM never consumes
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (3 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 04/84] KVM: Allow calling kvm_release_page_{clean,dirty}() on a NULL page pointer Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-08-01  9:20   ` Alex Bennée
  2024-07-26 23:51 ` [PATCH v12 06/84] KVM: x86/mmu: Skip the "try unsync" path iff the old SPTE was a leaf SPTE Sean Christopherson
                   ` (80 subsequent siblings)
  85 siblings, 1 reply; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Add an API to release an unused page, i.e. to put a page without marking
it accessed or dirty.  The API will be used when KVM faults-in a page but
bails before installing the guest mapping (and other similar flows).

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 include/linux/kvm_host.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 3d9617d1de41..c5d39a337aa3 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1201,6 +1201,15 @@ unsigned long gfn_to_hva_prot(struct kvm *kvm, gfn_t gfn, bool *writable);
 unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot, gfn_t gfn);
 unsigned long gfn_to_hva_memslot_prot(struct kvm_memory_slot *slot, gfn_t gfn,
 				      bool *writable);
+
+static inline void kvm_release_page_unused(struct page *page)
+{
+	if (!page)
+		return;
+
+	put_page(page);
+}
+
 void kvm_release_page_clean(struct page *page);
 void kvm_release_page_dirty(struct page *page);
 
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 06/84] KVM: x86/mmu: Skip the "try unsync" path iff the old SPTE was a leaf SPTE
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (4 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 05/84] KVM: Add kvm_release_page_unused() API to put pages that KVM never consumes Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 07/84] KVM: x86/mmu: Mark folio dirty when creating SPTE, not when zapping/modifying Sean Christopherson
                   ` (79 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Apply make_spte()'s optimization to skip trying to unsync shadow pages if
and only if the old SPTE was a leaf SPTE, as non-leaf SPTEs in direct MMUs
are always writable, i.e. could trigger a false positive and incorrectly
lead to KVM creating a SPTE without write-protecting or marking shadow
pages unsync.

This bug only affects the TDP MMU, as the shadow MMU only overwrites a
shadow-present SPTE when synchronizing SPTEs (and only 4KiB SPTEs can be
unsync).  Specifically, mmu_set_spte() drops any non-leaf SPTEs *before*
calling make_spte(), whereas the TDP MMU can do a direct replacement of a
page table with the leaf SPTE.

Opportunistically update the comment to explain why skipping the unsync
stuff is safe, as opposed to simply saying "it's someone else's problem".

Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/spte.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index d4527965e48c..a3baf0cadbee 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -226,12 +226,20 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 		spte |= PT_WRITABLE_MASK | shadow_mmu_writable_mask;
 
 		/*
-		 * Optimization: for pte sync, if spte was writable the hash
-		 * lookup is unnecessary (and expensive). Write protection
-		 * is responsibility of kvm_mmu_get_page / kvm_mmu_sync_roots.
-		 * Same reasoning can be applied to dirty page accounting.
+		 * When overwriting an existing leaf SPTE, and the old SPTE was
+		 * writable, skip trying to unsync shadow pages as any relevant
+		 * shadow pages must already be unsync, i.e. the hash lookup is
+		 * unnecessary (and expensive).
+		 *
+		 * The same reasoning applies to dirty page/folio accounting;
+		 * KVM will mark the folio dirty using the old SPTE, thus
+		 * there's no need to immediately mark the new SPTE as dirty.
+		 *
+		 * Note, both cases rely on KVM not changing PFNs without first
+		 * zapping the old SPTE, which is guaranteed by both the shadow
+		 * MMU and the TDP MMU.
 		 */
-		if (is_writable_pte(old_spte))
+		if (is_last_spte(old_spte, level) && is_writable_pte(old_spte))
 			goto out;
 
 		/*
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 07/84] KVM: x86/mmu: Mark folio dirty when creating SPTE, not when zapping/modifying
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (5 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 06/84] KVM: x86/mmu: Skip the "try unsync" path iff the old SPTE was a leaf SPTE Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 08/84] KVM: x86/mmu: Mark page/folio accessed only when zapping leaf SPTEs Sean Christopherson
                   ` (78 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Mark pages/folios dirty when creating SPTEs to map PFNs into the guest,
not when zapping or modifying SPTEs, as marking folios dirty when zapping
or modifying SPTEs can be extremely inefficient.  E.g. when KVM is zapping
collapsible SPTEs to reconstitute a hugepage after disbling dirty logging,
KVM will mark every 4KiB pfn as dirty, even though _at least_ 512 pfns are
guaranteed to be in a single folio (the SPTE couldn't potentially be huge
if that weren't the case).  The problem only becomes worse for 1GiB
HugeTLB pages, as KVM can mark a single folio dirty 512*512 times.

Marking a folio dirty when mapping is functionally safe as KVM drops all
relevant SPTEs in response to an mmu_notifier invalidation, i.e. ensures
that the guest can't dirty a folio after access has been removed.

And because KVM already marks folios dirty when zapping/modifying SPTEs
for KVM reasons, i.e. not in response to an mmu_notifier invalidation,
there is no danger of "prematurely" marking a folio dirty.  E.g. if a
filesystems cleans a folio without first removing write access, then there
already exists races where KVM could mark a folio dirty before remote TLBs
are flushed, i.e. before guest writes are guaranteed to stop.  Furthermore,
x86 is literally the only architecture that marks folios dirty on the
backend; every other KVM architecture marks folios dirty at map time.

x86's unique behavior likely stems from the fact that x86's MMU predates
mmu_notifiers.  Long, long ago, before mmu_notifiers were added, marking
pages dirty when zapping SPTEs was logical, and perhaps even necessary, as
KVM held references to pages, i.e. kept a page's refcount elevated while
the page was mapped into the guest.  At the time, KVM's rmap_remove()
simply did:

        if (is_writeble_pte(*spte))
                kvm_release_pfn_dirty(pfn);
        else
                kvm_release_pfn_clean(pfn);

i.e. dropped the refcount and marked the page dirty at the same time.
After mmu_notifiers were introduced, commit acb66dd051d0 ("KVM: MMU:
don't hold pagecount reference for mapped sptes pages") removed the
refcount logic, but kept the dirty logic, i.e. converted the above to:

	if (is_writeble_pte(*spte))
		kvm_release_pfn_dirty(pfn);

And for KVM x86, that's essentially how things have stayed over the last
~15 years, without anyone revisiting *why* KVM marks pages/folios dirty at
zap/modification time, e.g. the behavior was blindly carried forward to
the TDP MMU.

Practically speaking, the only downside to marking a folio dirty during
mapping is that KVM could trigger writeback of memory that was never
actually written.  Except that can't actually happen if KVM marks folios
dirty if and only if a writable SPTE is created (as done here), because
KVM always marks writable SPTEs as dirty during make_spte().  See commit
9b51a63024bd ("KVM: MMU: Explicitly set D-bit for writable spte."), circa
2015.

Note, KVM's access tracking logic for prefetched SPTEs is a bit odd.  If a
guest PTE is dirty and writable, KVM will create a writable SPTE, but then
mark the SPTE for access tracking.  Which isn't wrong, just a bit odd, as
it results in _more_ precise dirty tracking for MMUs _without_ A/D bits.

To keep things simple, mark the folio dirty before access tracking comes
into play, as an access-tracked SPTE can be restored in the fast page
fault path, i.e. without holding mmu_lock.  While writing SPTEs and
accessing memslots outside of mmu_lock is safe, marking a folio dirty is
not.  E.g. if the fast path gets interrupted _just_ after setting a SPTE,
the primary MMU could theoretically invalidate and free a folio before KVM
marks it dirty.  Unlike the shadow MMU, which waits for CPUs to respond to
an IPI, the TDP MMU only guarantees the page tables themselves won't be
freed (via RCU).

Opportunistically update a few stale comments.

Cc: David Matlack <dmatlack@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c         | 29 ++++-------------------------
 arch/x86/kvm/mmu/paging_tmpl.h |  6 +++---
 arch/x86/kvm/mmu/spte.c        | 20 ++++++++++++++++++--
 arch/x86/kvm/mmu/tdp_mmu.c     | 12 ------------
 4 files changed, 25 insertions(+), 42 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 901be9e420a4..2e6daa6d1cc0 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -547,10 +547,8 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte)
 		kvm_set_pfn_accessed(spte_to_pfn(old_spte));
 	}

-	if (is_dirty_spte(old_spte) && !is_dirty_spte(new_spte)) {
+	if (is_dirty_spte(old_spte) && !is_dirty_spte(new_spte))
 		flush = true;
-		kvm_set_pfn_dirty(spte_to_pfn(old_spte));
-	}

 	return flush;
 }
@@ -593,9 +591,6 @@ static u64 mmu_spte_clear_track_bits(struct kvm *kvm, u64 *sptep)
 	if (is_accessed_spte(old_spte))
 		kvm_set_pfn_accessed(pfn);

-	if (is_dirty_spte(old_spte))
-		kvm_set_pfn_dirty(pfn);
-
 	return old_spte;
 }

@@ -626,13 +621,6 @@ static bool mmu_spte_age(u64 *sptep)
 		clear_bit((ffs(shadow_accessed_mask) - 1),
 			  (unsigned long *)sptep);
 	} else {
-		/*
-		 * Capture the dirty status of the page, so that it doesn't get
-		 * lost when the SPTE is marked for access tracking.
-		 */
-		if (is_writable_pte(spte))
-			kvm_set_pfn_dirty(spte_to_pfn(spte));
-
 		spte = mark_spte_for_access_track(spte);
 		mmu_spte_update_no_track(sptep, spte);
 	}
@@ -1275,16 +1263,6 @@ static bool spte_clear_dirty(u64 *sptep)
 	return mmu_spte_update(sptep, spte);
 }

-static bool spte_wrprot_for_clear_dirty(u64 *sptep)
-{
-	bool was_writable = test_and_clear_bit(PT_WRITABLE_SHIFT,
-					       (unsigned long *)sptep);
-	if (was_writable && !spte_ad_enabled(*sptep))
-		kvm_set_pfn_dirty(spte_to_pfn(*sptep));
-
-	return was_writable;
-}
-
 /*
  * Gets the GFN ready for another round of dirty logging by clearing the
  *	- D bit on ad-enabled SPTEs, and
@@ -1300,7 +1278,8 @@ static bool __rmap_clear_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head,

 	for_each_rmap_spte(rmap_head, &iter, sptep)
 		if (spte_ad_need_write_protect(*sptep))
-			flush |= spte_wrprot_for_clear_dirty(sptep);
+			flush |= test_and_clear_bit(PT_WRITABLE_SHIFT,
+						    (unsigned long *)sptep);
 		else
 			flush |= spte_clear_dirty(sptep);

@@ -3381,7 +3360,7 @@ static bool fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu,
 	 * harm. This also avoids the TLB flush needed after setting dirty bit
 	 * so non-PML cases won't be impacted.
 	 *
-	 * Compare with set_spte where instead shadow_dirty_mask is set.
+	 * Compare with make_spte() where instead shadow_dirty_mask is set.
 	 */
 	if (!try_cmpxchg64(sptep, &old_spte, new_spte))
 		return false;
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 69941cebb3a8..ef0b3b213e5b 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -891,9 +891,9 @@ static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,

 /*
  * Using the information in sp->shadowed_translation (kvm_mmu_page_get_gfn()) is
- * safe because:
- * - The spte has a reference to the struct page, so the pfn for a given gfn
- *   can't change unless all sptes pointing to it are nuked first.
+ * safe because SPTEs are protected by mmu_notifiers and memslot generations, so
+ * the pfn for a given gfn can't change unless all SPTEs pointing to the gfn are
+ * nuked first.
  *
  * Returns
  * < 0: failed to sync spte
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index a3baf0cadbee..9b8795bd2f04 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -232,8 +232,8 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 		 * unnecessary (and expensive).
 		 *
 		 * The same reasoning applies to dirty page/folio accounting;
-		 * KVM will mark the folio dirty using the old SPTE, thus
-		 * there's no need to immediately mark the new SPTE as dirty.
+		 * KVM marked the folio dirty when the old SPTE was created,
+		 * thus there's no need to mark the folio dirty again.
 		 *
 		 * Note, both cases rely on KVM not changing PFNs without first
 		 * zapping the old SPTE, which is guaranteed by both the shadow
@@ -266,12 +266,28 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 		  "spte = 0x%llx, level = %d, rsvd bits = 0x%llx", spte, level,
 		  get_rsvd_bits(&vcpu->arch.mmu->shadow_zero_check, spte, level));

+	/*
+	 * Mark the memslot dirty *after* modifying it for access tracking.
+	 * Unlike folios, memslots can be safely marked dirty out of mmu_lock,
+	 * i.e. in the fast page fault handler.
+	 */
 	if ((spte & PT_WRITABLE_MASK) && kvm_slot_dirty_track_enabled(slot)) {
 		/* Enforced by kvm_mmu_hugepage_adjust. */
 		WARN_ON_ONCE(level > PG_LEVEL_4K);
 		mark_page_dirty_in_slot(vcpu->kvm, slot, gfn);
 	}

+	/*
+	 * If the page that KVM got from the primary MMU is writable, i.e. if
+	 * it's host-writable, mark the page/folio dirty.  As alluded to above,
+	 * folios can't be safely marked dirty in the fast page fault handler,
+	 * and so KVM must (somewhat) speculatively mark the folio dirty even
+	 * though it isn't guaranteed to be written as KVM won't mark the folio
+	 * dirty if/when the SPTE is made writable.
+	 */
+	if (host_writable)
+		kvm_set_pfn_dirty(pfn);
+
 	*new_spte = spte;
 	return wrprot;
 }
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index c7dc49ee7388..7ac43d1ce918 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -511,10 +511,6 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 	if (is_leaf != was_leaf)
 		kvm_update_page_stats(kvm, level, is_leaf ? 1 : -1);

-	if (was_leaf && is_dirty_spte(old_spte) &&
-	    (!is_present || !is_dirty_spte(new_spte) || pfn_changed))
-		kvm_set_pfn_dirty(spte_to_pfn(old_spte));
-
 	/*
 	 * Recursively handle child PTs if the change removed a subtree from
 	 * the paging structure.  Note the WARN on the PFN changing without the
@@ -1248,13 +1244,6 @@ static bool age_gfn_range(struct kvm *kvm, struct tdp_iter *iter,
 							 iter->level);
 		new_spte = iter->old_spte & ~shadow_accessed_mask;
 	} else {
-		/*
-		 * Capture the dirty status of the page, so that it doesn't get
-		 * lost when the SPTE is marked for access tracking.
-		 */
-		if (is_writable_pte(iter->old_spte))
-			kvm_set_pfn_dirty(spte_to_pfn(iter->old_spte));
-
 		new_spte = mark_spte_for_access_track(iter->old_spte);
 		iter->old_spte = kvm_tdp_mmu_write_spte(iter->sptep,
 							iter->old_spte, new_spte,
@@ -1595,7 +1584,6 @@ static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
 		trace_kvm_tdp_mmu_spte_changed(iter.as_id, iter.gfn, iter.level,
 					       iter.old_spte,
 					       iter.old_spte & ~dbit);
-		kvm_set_pfn_dirty(spte_to_pfn(iter.old_spte));
 	}

 	rcu_read_unlock();
-- 
2.46.0.rc1.232.g9752f9e123-goog

^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 08/84] KVM: x86/mmu: Mark page/folio accessed only when zapping leaf SPTEs
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (6 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 07/84] KVM: x86/mmu: Mark folio dirty when creating SPTE, not when zapping/modifying Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 09/84] KVM: x86/mmu: Don't force flush if SPTE update clears Accessed bit Sean Christopherson
                   ` (77 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Mark folios as accessed only when zapping leaf SPTEs, which is a rough
heuristic for "only in response to an mmu_notifier invalidation".  Page
aging and LRUs are tolerant of false negatives, i.e. KVM doesn't need to
be precise for correctness, and re-marking folios as accessed when zapping
entire roots or when zapping collapsible SPTEs is expensive and adds very
little value.

E.g. when a VM is dying, all of its memory is being freed; marking folios
accessed at that time provides no known value.  Similarly, because KVM
marks folios as accessed when creating SPTEs, marking all folios as
accessed when userspace happens to delete a memslot doesn't add value.
The folio was marked access when the old SPTE was created, and will be
marked accessed yet again if a vCPU accesses the pfn again after reloading
a new root.  Zapping collapsible SPTEs is a similar story; marking folios
accessed just because userspace disable dirty logging is a side effect of
KVM behavior, not a deliberate goal.

As an intermediate step, a.k.a. bisection point, towards *never* marking
folios accessed when dropping SPTEs, mark folios accessed when the primary
MMU might be invalidating mappings, as such zappings are not KVM initiated,
i.e. might actually be related to page aging and LRU activity.

Note, x86 is the only KVM architecture that "double dips"; every other
arch marks pfns as accessed only when mapping into the guest, not when
mapping into the guest _and_ when removing from the guest.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 Documentation/virt/kvm/locking.rst | 76 +++++++++++++++---------------
 arch/x86/kvm/mmu/mmu.c             |  4 +-
 arch/x86/kvm/mmu/tdp_mmu.c         |  7 ++-
 3 files changed, 43 insertions(+), 44 deletions(-)

diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst
index 02880d5552d5..8b3bb9fe60bf 100644
--- a/Documentation/virt/kvm/locking.rst
+++ b/Documentation/virt/kvm/locking.rst
@@ -138,49 +138,51 @@ Then, we can ensure the dirty bitmaps is correctly set for a gfn.
 
 2) Dirty bit tracking
 
-In the origin code, the spte can be fast updated (non-atomically) if the
+In the original code, the spte can be fast updated (non-atomically) if the
 spte is read-only and the Accessed bit has already been set since the
 Accessed bit and Dirty bit can not be lost.
 
 But it is not true after fast page fault since the spte can be marked
 writable between reading spte and updating spte. Like below case:
 
-+------------------------------------------------------------------------+
-| At the beginning::                                                     |
-|                                                                        |
-|	spte.W = 0                                                       |
-|	spte.Accessed = 1                                                |
-+------------------------------------+-----------------------------------+
-| CPU 0:                             | CPU 1:                            |
-+------------------------------------+-----------------------------------+
-| In mmu_spte_clear_track_bits()::   |                                   |
-|                                    |                                   |
-|  old_spte = *spte;                 |                                   |
-|                                    |                                   |
-|                                    |                                   |
-|  /* 'if' condition is satisfied. */|                                   |
-|  if (old_spte.Accessed == 1 &&     |                                   |
-|       old_spte.W == 0)             |                                   |
-|     spte = 0ull;                   |                                   |
-+------------------------------------+-----------------------------------+
-|                                    | on fast page fault path::         |
-|                                    |                                   |
-|                                    |    spte.W = 1                     |
-|                                    |                                   |
-|                                    | memory write on the spte::        |
-|                                    |                                   |
-|                                    |    spte.Dirty = 1                 |
-+------------------------------------+-----------------------------------+
-|  ::                                |                                   |
-|                                    |                                   |
-|   else                             |                                   |
-|     old_spte = xchg(spte, 0ull)    |                                   |
-|   if (old_spte.Accessed == 1)      |                                   |
-|     kvm_set_pfn_accessed(spte.pfn);|                                   |
-|   if (old_spte.Dirty == 1)         |                                   |
-|     kvm_set_pfn_dirty(spte.pfn);   |                                   |
-|     OOPS!!!                        |                                   |
-+------------------------------------+-----------------------------------+
++-------------------------------------------------------------------------+
+| At the beginning::                                                      |
+|                                                                         |
+|	spte.W = 0                                                              |
+|	spte.Accessed = 1                                                       |
++-------------------------------------+-----------------------------------+
+| CPU 0:                              | CPU 1:                            |
++-------------------------------------+-----------------------------------+
+| In mmu_spte_update()::              |                                   |
+|                                     |                                   |
+|  old_spte = *spte;                  |                                   |
+|                                     |                                   |
+|                                     |                                   |
+|  /* 'if' condition is satisfied. */ |                                   |
+|  if (old_spte.Accessed == 1 &&      |                                   |
+|       old_spte.W == 0)              |                                   |
+|     spte = new_spte;                |                                   |
++-------------------------------------+-----------------------------------+
+|                                     | on fast page fault path::         |
+|                                     |                                   |
+|                                     |    spte.W = 1                     |
+|                                     |                                   |
+|                                     | memory write on the spte::        |
+|                                     |                                   |
+|                                     |    spte.Dirty = 1                 |
++-------------------------------------+-----------------------------------+
+|  ::                                 |                                   |
+|                                     |                                   |
+|   else                              |                                   |
+|     old_spte = xchg(spte, new_spte);|                                   |
+|   if (old_spte.Accessed &&          |                                   |
+|       !new_spte.Accessed)           |                                   |
+|     flush = true;                   |                                   |
+|   if (old_spte.Dirty &&             |                                   |
+|       !new_spte.Dirty)              |                                   |
+|     flush = true;                   |                                   |
+|     OOPS!!!                         |                                   |
++-------------------------------------+-----------------------------------+
 
 The Dirty bit is lost in this case.
 
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 2e6daa6d1cc0..58b70328b20c 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -542,10 +542,8 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte)
 	 * to guarantee consistency between TLB and page tables.
 	 */
 
-	if (is_accessed_spte(old_spte) && !is_accessed_spte(new_spte)) {
+	if (is_accessed_spte(old_spte) && !is_accessed_spte(new_spte))
 		flush = true;
-		kvm_set_pfn_accessed(spte_to_pfn(old_spte));
-	}
 
 	if (is_dirty_spte(old_spte) && !is_dirty_spte(new_spte))
 		flush = true;
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 7ac43d1ce918..d1de5f28c445 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -520,10 +520,6 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 	if (was_present && !was_leaf &&
 	    (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed)))
 		handle_removed_pt(kvm, spte_to_child_pt(old_spte, level), shared);
-
-	if (was_leaf && is_accessed_spte(old_spte) &&
-	    (!is_present || !is_accessed_spte(new_spte) || pfn_changed))
-		kvm_set_pfn_accessed(spte_to_pfn(old_spte));
 }
 
 static inline int __must_check __tdp_mmu_set_spte_atomic(struct tdp_iter *iter,
@@ -865,6 +861,9 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root,
 
 		tdp_mmu_iter_set_spte(kvm, &iter, SHADOW_NONPRESENT_VALUE);
 
+		if (is_accessed_spte(iter.old_spte))
+			kvm_set_pfn_accessed(spte_to_pfn(iter.old_spte));
+
 		/*
 		 * Zappings SPTEs in invalid roots doesn't require a TLB flush,
 		 * see kvm_tdp_mmu_zap_invalidated_roots() for details.
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 09/84] KVM: x86/mmu: Don't force flush if SPTE update clears Accessed bit
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (7 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 08/84] KVM: x86/mmu: Mark page/folio accessed only when zapping leaf SPTEs Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 10/84] KVM: x86/mmu: Use gfn_to_page_many_atomic() when prefetching indirect PTEs Sean Christopherson
                   ` (76 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Don't force a TLB flush if mmu_spte_update() clears Accessed bit, as
access tracking tolerates false negatives, as evidenced by the
mmu_notifier hooks that explicit test and age SPTEs without doing a TLB
flush.

In practice, this is very nearly a nop.  spte_write_protect() and
spte_clear_dirty() never clear the Accessed bit.  make_spte() always
sets the Accessed bit for !prefetch scenarios.  FNAME(sync_spte) only sets
SPTE if the protection bits are changing, i.e. if a flush will be needed
regardless of the Accessed bits.  And FNAME(pte_prefetch) sets SPTE if and
only if the old SPTE is !PRESENT.

That leaves kvm_arch_async_page_ready() as the one path that will generate
a !ACCESSED SPTE *and* overwrite a PRESENT SPTE.  And that's very arguably
a bug, as clobbering a valid SPTE in that case is nonsensical.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 31 +++++++++----------------------
 1 file changed, 9 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 58b70328b20c..b7642f1f993f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -518,37 +518,24 @@ static u64 mmu_spte_update_no_track(u64 *sptep, u64 new_spte)
  * TLBs must be flushed. Otherwise rmap_write_protect will find a read-only
  * spte, even though the writable spte might be cached on a CPU's TLB.
  *
+ * Remote TLBs also need to be flushed if the Dirty bit is cleared, as false
+ * negatives are not acceptable, e.g. if KVM is using D-bit based PML on VMX.
+ *
+ * Don't flush if the Accessed bit is cleared, as access tracking tolerates
+ * false negatives, and the one path that does care about TLB flushes,
+ * kvm_mmu_notifier_clear_flush_young(), uses mmu_spte_update_no_track().
+ *
  * Returns true if the TLB needs to be flushed
  */
 static bool mmu_spte_update(u64 *sptep, u64 new_spte)
 {
-	bool flush = false;
 	u64 old_spte = mmu_spte_update_no_track(sptep, new_spte);
 
 	if (!is_shadow_present_pte(old_spte))
 		return false;
 
-	/*
-	 * For the spte updated out of mmu-lock is safe, since
-	 * we always atomically update it, see the comments in
-	 * spte_has_volatile_bits().
-	 */
-	if (is_mmu_writable_spte(old_spte) &&
-	      !is_writable_pte(new_spte))
-		flush = true;
-
-	/*
-	 * Flush TLB when accessed/dirty states are changed in the page tables,
-	 * to guarantee consistency between TLB and page tables.
-	 */
-
-	if (is_accessed_spte(old_spte) && !is_accessed_spte(new_spte))
-		flush = true;
-
-	if (is_dirty_spte(old_spte) && !is_dirty_spte(new_spte))
-		flush = true;
-
-	return flush;
+	return (is_mmu_writable_spte(old_spte) && !is_writable_pte(new_spte)) ||
+	       (is_dirty_spte(old_spte) && !is_dirty_spte(new_spte));
 }
 
 /*
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 10/84] KVM: x86/mmu: Use gfn_to_page_many_atomic() when prefetching indirect PTEs
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (8 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 09/84] KVM: x86/mmu: Don't force flush if SPTE update clears Accessed bit Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 11/84] KVM: Rename gfn_to_page_many_atomic() to kvm_prefetch_pages() Sean Christopherson
                   ` (75 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Use gfn_to_page_many_atomic() instead of gfn_to_pfn_memslot_atomic() when
prefetching indirect PTEs (direct_pte_prefetch_many() already uses the
"to page" APIS).  Functionally, the two are subtly equivalent, as the "to
pfn" API short-circuits hva_to_pfn() if hva_to_pfn_fast() fails, i.e. is
just a wrapper for get_user_page_fast_only()/get_user_pages_fast_only().

Switching to the "to page" API will allow dropping the @atomic parameter
from the entire hva_to_pfn() callchain.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/paging_tmpl.h | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index ef0b3b213e5b..6b215a932158 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -535,8 +535,8 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 {
 	struct kvm_memory_slot *slot;
 	unsigned pte_access;
+	struct page *page;
 	gfn_t gfn;
-	kvm_pfn_t pfn;
 
 	if (FNAME(prefetch_invalid_gpte)(vcpu, sp, spte, gpte))
 		return false;
@@ -549,12 +549,11 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	if (!slot)
 		return false;
 
-	pfn = gfn_to_pfn_memslot_atomic(slot, gfn);
-	if (is_error_pfn(pfn))
+	if (gfn_to_page_many_atomic(slot, gfn, &page, 1) != 1)
 		return false;
 
-	mmu_set_spte(vcpu, slot, spte, pte_access, gfn, pfn, NULL);
-	kvm_release_pfn_clean(pfn);
+	mmu_set_spte(vcpu, slot, spte, pte_access, gfn, page_to_pfn(page), NULL);
+	kvm_release_page_clean(page);
 	return true;
 }
 
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 11/84] KVM: Rename gfn_to_page_many_atomic() to kvm_prefetch_pages()
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (9 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 10/84] KVM: x86/mmu: Use gfn_to_page_many_atomic() when prefetching indirect PTEs Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-08-02 11:16   ` Alex Bennée
  2024-07-26 23:51 ` [PATCH v12 12/84] KVM: Drop @atomic param from gfn=>pfn and hva=>pfn APIs Sean Christopherson
                   ` (74 subsequent siblings)
  85 siblings, 1 reply; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Rename gfn_to_page_many_atomic() to kvm_prefetch_pages() to try and
communicate its true purpose, as the "atomic" aspect is essentially a
side effect of the fact that x86 uses the API while holding mmu_lock.
E.g. even if mmu_lock weren't held, KVM wouldn't want to fault-in pages,
as the goal is to opportunistically grab surrounding pages that have
already been accessed and/or dirtied by the host, and to do so quickly.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c         | 2 +-
 arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
 include/linux/kvm_host.h       | 4 ++--
 virt/kvm/kvm_main.c            | 6 +++---
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b7642f1f993f..c1914f02c5e1 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2912,7 +2912,7 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
 	if (!slot)
 		return -1;
 
-	ret = gfn_to_page_many_atomic(slot, gfn, pages, end - start);
+	ret = kvm_prefetch_pages(slot, gfn, pages, end - start);
 	if (ret <= 0)
 		return -1;
 
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 6b215a932158..bc801d454f41 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -549,7 +549,7 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	if (!slot)
 		return false;
 
-	if (gfn_to_page_many_atomic(slot, gfn, &page, 1) != 1)
+	if (kvm_prefetch_pages(slot, gfn, &page, 1) != 1)
 		return false;
 
 	mmu_set_spte(vcpu, slot, spte, pte_access, gfn, page_to_pfn(page), NULL);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c5d39a337aa3..79fed9fea638 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1192,8 +1192,8 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm);
 void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 				   struct kvm_memory_slot *slot);
 
-int gfn_to_page_many_atomic(struct kvm_memory_slot *slot, gfn_t gfn,
-			    struct page **pages, int nr_pages);
+int kvm_prefetch_pages(struct kvm_memory_slot *slot, gfn_t gfn,
+		       struct page **pages, int nr_pages);
 
 struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn);
 unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 656e931ac39e..803299778cf8 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3060,8 +3060,8 @@ kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn)
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_pfn);
 
-int gfn_to_page_many_atomic(struct kvm_memory_slot *slot, gfn_t gfn,
-			    struct page **pages, int nr_pages)
+int kvm_prefetch_pages(struct kvm_memory_slot *slot, gfn_t gfn,
+		       struct page **pages, int nr_pages)
 {
 	unsigned long addr;
 	gfn_t entry = 0;
@@ -3075,7 +3075,7 @@ int gfn_to_page_many_atomic(struct kvm_memory_slot *slot, gfn_t gfn,
 
 	return get_user_pages_fast_only(addr, nr_pages, FOLL_WRITE, pages);
 }
-EXPORT_SYMBOL_GPL(gfn_to_page_many_atomic);
+EXPORT_SYMBOL_GPL(kvm_prefetch_pages);
 
 /*
  * Do not use this helper unless you are absolutely certain the gfn _must_ be
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 12/84] KVM: Drop @atomic param from gfn=>pfn and hva=>pfn APIs
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (10 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 11/84] KVM: Rename gfn_to_page_many_atomic() to kvm_prefetch_pages() Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-08-01  9:31   ` Alex Bennée
  2024-07-26 23:51 ` [PATCH v12 13/84] KVM: Annotate that all paths in hva_to_pfn() might sleep Sean Christopherson
                   ` (73 subsequent siblings)
  85 siblings, 1 reply; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Drop @atomic from the myriad "to_pfn" APIs now that all callers pass
"false".

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 Documentation/virt/kvm/locking.rst     |  4 +--
 arch/arm64/kvm/mmu.c                   |  2 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c    |  2 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c |  2 +-
 arch/x86/kvm/mmu/mmu.c                 | 12 ++++-----
 include/linux/kvm_host.h               |  4 +--
 virt/kvm/kvm_main.c                    | 36 +++++---------------------
 virt/kvm/kvm_mm.h                      |  4 +--
 virt/kvm/pfncache.c                    |  2 +-
 9 files changed, 22 insertions(+), 46 deletions(-)

diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst
index 8b3bb9fe60bf..9af511e7aa53 100644
--- a/Documentation/virt/kvm/locking.rst
+++ b/Documentation/virt/kvm/locking.rst
@@ -126,8 +126,8 @@ We dirty-log for gfn1, that means gfn2 is lost in dirty-bitmap.
 For direct sp, we can easily avoid it since the spte of direct sp is fixed
 to gfn.  For indirect sp, we disabled fast page fault for simplicity.
 
-A solution for indirect sp could be to pin the gfn, for example via
-kvm_vcpu_gfn_to_pfn_atomic, before the cmpxchg.  After the pinning:
+A solution for indirect sp could be to pin the gfn before the cmpxchg.  After
+the pinning:
 
 - We have held the refcount of pfn; that means the pfn can not be freed and
   be reused for another gfn.
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 6981b1bc0946..30dd62f56a11 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1562,7 +1562,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	mmu_seq = vcpu->kvm->mmu_invalidate_seq;
 	mmap_read_unlock(current->mm);
 
-	pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false, NULL,
+	pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
 				   write_fault, &writable, NULL);
 	if (pfn == KVM_PFN_ERR_HWPOISON) {
 		kvm_send_hwpoison_signal(hva, vma_shift);
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 1b51b1c4713b..8cd02ca4b1b8 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -613,7 +613,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_vcpu *vcpu,
 		write_ok = true;
 	} else {
 		/* Call KVM generic code to do the slow-path check */
-		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false, NULL,
+		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
 					   writing, &write_ok, NULL);
 		if (is_error_noslot_pfn(pfn))
 			return -EFAULT;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 408d98f8a514..26a969e935e3 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -852,7 +852,7 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu,
 		unsigned long pfn;
 
 		/* Call KVM generic code to do the slow-path check */
-		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false, NULL,
+		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
 					   writing, upgrade_p, NULL);
 		if (is_error_noslot_pfn(pfn))
 			return -EFAULT;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c1914f02c5e1..d76390ef49b2 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4334,9 +4334,9 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 		return kvm_faultin_pfn_private(vcpu, fault);
 
 	async = false;
-	fault->pfn = __gfn_to_pfn_memslot(fault->slot, fault->gfn, false, false,
-					  &async, fault->write,
-					  &fault->map_writable, &fault->hva);
+	fault->pfn = __gfn_to_pfn_memslot(fault->slot, fault->gfn, false, &async,
+					  fault->write, &fault->map_writable,
+					  &fault->hva);
 	if (!async)
 		return RET_PF_CONTINUE; /* *pfn has correct page already */
 
@@ -4356,9 +4356,9 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	 * to wait for IO.  Note, gup always bails if it is unable to quickly
 	 * get a page and a fatal signal, i.e. SIGKILL, is pending.
 	 */
-	fault->pfn = __gfn_to_pfn_memslot(fault->slot, fault->gfn, false, true,
-					  NULL, fault->write,
-					  &fault->map_writable, &fault->hva);
+	fault->pfn = __gfn_to_pfn_memslot(fault->slot, fault->gfn, true, NULL,
+					  fault->write, &fault->map_writable,
+					  &fault->hva);
 	return RET_PF_CONTINUE;
 }
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 79fed9fea638..6d4503e8eabe 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1217,9 +1217,8 @@ kvm_pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn);
 kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
 		      bool *writable);
 kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn);
-kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn);
 kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
-			       bool atomic, bool interruptible, bool *async,
+			       bool interruptible, bool *async,
 			       bool write_fault, bool *writable, hva_t *hva);
 
 void kvm_release_pfn_clean(kvm_pfn_t pfn);
@@ -1300,7 +1299,6 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
 
 struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu);
 struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn);
-kvm_pfn_t kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn);
 kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn);
 int kvm_vcpu_map(struct kvm_vcpu *vcpu, gpa_t gpa, struct kvm_host_map *map);
 void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map, bool dirty);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 803299778cf8..84c73b4fc804 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2929,7 +2929,6 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 /*
  * Pin guest page in memory and return its pfn.
  * @addr: host virtual address which maps memory to the guest
- * @atomic: whether this function is forbidden from sleeping
  * @interruptible: whether the process can be interrupted by non-fatal signals
  * @async: whether this function need to wait IO complete if the
  *         host page is not in the memory
@@ -2941,22 +2940,16 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
  * 2): @write_fault = false && @writable, @writable will tell the caller
  *     whether the mapping is writable.
  */
-kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool interruptible,
-		     bool *async, bool write_fault, bool *writable)
+kvm_pfn_t hva_to_pfn(unsigned long addr, bool interruptible, bool *async,
+		     bool write_fault, bool *writable)
 {
 	struct vm_area_struct *vma;
 	kvm_pfn_t pfn;
 	int npages, r;
 
-	/* we can do it either atomically or asynchronously, not both */
-	BUG_ON(atomic && async);
-
 	if (hva_to_pfn_fast(addr, write_fault, writable, &pfn))
 		return pfn;
 
-	if (atomic)
-		return KVM_PFN_ERR_FAULT;
-
 	npages = hva_to_pfn_slow(addr, async, write_fault, interruptible,
 				 writable, &pfn);
 	if (npages == 1)
@@ -2993,7 +2986,7 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool interruptible,
 }
 
 kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
-			       bool atomic, bool interruptible, bool *async,
+			       bool interruptible, bool *async,
 			       bool write_fault, bool *writable, hva_t *hva)
 {
 	unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, write_fault);
@@ -3015,39 +3008,24 @@ kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
 		writable = NULL;
 	}
 
-	return hva_to_pfn(addr, atomic, interruptible, async, write_fault,
-			  writable);
+	return hva_to_pfn(addr, interruptible, async, write_fault, writable);
 }
 EXPORT_SYMBOL_GPL(__gfn_to_pfn_memslot);
 
 kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
 		      bool *writable)
 {
-	return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, false,
-				    NULL, write_fault, writable, NULL);
+	return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, NULL,
+				    write_fault, writable, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_prot);
 
 kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn)
 {
-	return __gfn_to_pfn_memslot(slot, gfn, false, false, NULL, true,
-				    NULL, NULL);
+	return __gfn_to_pfn_memslot(slot, gfn, false, NULL, true, NULL, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot);
 
-kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn)
-{
-	return __gfn_to_pfn_memslot(slot, gfn, true, false, NULL, true,
-				    NULL, NULL);
-}
-EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot_atomic);
-
-kvm_pfn_t kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn)
-{
-	return gfn_to_pfn_memslot_atomic(kvm_vcpu_gfn_to_memslot(vcpu, gfn), gfn);
-}
-EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_pfn_atomic);
-
 kvm_pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
 {
 	return gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn);
diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h
index 715f19669d01..a3fa86f60d6c 100644
--- a/virt/kvm/kvm_mm.h
+++ b/virt/kvm/kvm_mm.h
@@ -20,8 +20,8 @@
 #define KVM_MMU_UNLOCK(kvm)		spin_unlock(&(kvm)->mmu_lock)
 #endif /* KVM_HAVE_MMU_RWLOCK */
 
-kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool interruptible,
-		     bool *async, bool write_fault, bool *writable);
+kvm_pfn_t hva_to_pfn(unsigned long addr, bool interruptible, bool *async,
+		     bool write_fault, bool *writable);
 
 #ifdef CONFIG_HAVE_KVM_PFNCACHE
 void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm,
diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index f0039efb9e1e..58c706a610e5 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -198,7 +198,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
 		}
 
 		/* We always request a writeable mapping */
-		new_pfn = hva_to_pfn(gpc->uhva, false, false, NULL, true, NULL);
+		new_pfn = hva_to_pfn(gpc->uhva, false, NULL, true, NULL);
 		if (is_error_noslot_pfn(new_pfn))
 			goto out_error;
 
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 13/84] KVM: Annotate that all paths in hva_to_pfn() might sleep
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (11 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 12/84] KVM: Drop @atomic param from gfn=>pfn and hva=>pfn APIs Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-08-08 12:00   ` Alex Bennée
  2024-07-26 23:51 ` [PATCH v12 14/84] KVM: Replace "async" pointer in gfn=>pfn with "no_wait" and error code Sean Christopherson
                   ` (72 subsequent siblings)
  85 siblings, 1 reply; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Now that hva_to_pfn() no longer supports being called in atomic context,
move the might_sleep() annotation from hva_to_pfn_slow() to hva_to_pfn().

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 virt/kvm/kvm_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 84c73b4fc804..03af1a0090b1 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2807,8 +2807,6 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
 	struct page *page;
 	int npages;
 
-	might_sleep();
-
 	if (writable)
 		*writable = write_fault;
 
@@ -2947,6 +2945,8 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool interruptible, bool *async,
 	kvm_pfn_t pfn;
 	int npages, r;
 
+	might_sleep();
+
 	if (hva_to_pfn_fast(addr, write_fault, writable, &pfn))
 		return pfn;
 
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 14/84] KVM: Replace "async" pointer in gfn=>pfn with "no_wait" and error code
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (12 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 13/84] KVM: Annotate that all paths in hva_to_pfn() might sleep Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 15/84] KVM: x86/mmu: Drop kvm_page_fault.hva, i.e. don't track intermediate hva Sean Christopherson
                   ` (71 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

From: David Stevens <stevensd@chromium.org>

Add a pfn error code to communicate that hva_to_pfn() failed because I/O
was needed and disallowed, and convert @async to a constant @no_wait
boolean.  This will allow eliminating the @no_wait param by having callers
pass in FOLL_NOWAIT along with other FOLL_* flags.

Signed-off-by: David Stevens <stevensd@chromium.org>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c   | 18 +++++++++++-------
 include/linux/kvm_host.h |  3 ++-
 virt/kvm/kvm_main.c      | 29 +++++++++++++++--------------
 virt/kvm/kvm_mm.h        |  2 +-
 virt/kvm/pfncache.c      |  4 ++--
 5 files changed, 31 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d76390ef49b2..eb9ad0283fd5 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4328,17 +4328,21 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
 
 static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
-	bool async;
-
 	if (fault->is_private)
 		return kvm_faultin_pfn_private(vcpu, fault);
 
-	async = false;
-	fault->pfn = __gfn_to_pfn_memslot(fault->slot, fault->gfn, false, &async,
+	fault->pfn = __gfn_to_pfn_memslot(fault->slot, fault->gfn, false, true,
 					  fault->write, &fault->map_writable,
 					  &fault->hva);
-	if (!async)
-		return RET_PF_CONTINUE; /* *pfn has correct page already */
+
+	/*
+	 * If resolving the page failed because I/O is needed to fault-in the
+	 * page, then either set up an asynchronous #PF to do the I/O, or if
+	 * doing an async #PF isn't possible, retry with I/O allowed.  All
+	 * other failures are terminal, i.e. retrying won't help.
+	 */
+	if (fault->pfn != KVM_PFN_ERR_NEEDS_IO)
+		return RET_PF_CONTINUE;
 
 	if (!fault->prefetch && kvm_can_do_async_pf(vcpu)) {
 		trace_kvm_try_async_get_page(fault->addr, fault->gfn);
@@ -4356,7 +4360,7 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	 * to wait for IO.  Note, gup always bails if it is unable to quickly
 	 * get a page and a fatal signal, i.e. SIGKILL, is pending.
 	 */
-	fault->pfn = __gfn_to_pfn_memslot(fault->slot, fault->gfn, true, NULL,
+	fault->pfn = __gfn_to_pfn_memslot(fault->slot, fault->gfn, true, true,
 					  fault->write, &fault->map_writable,
 					  &fault->hva);
 	return RET_PF_CONTINUE;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6d4503e8eabe..92b2922e2216 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -97,6 +97,7 @@
 #define KVM_PFN_ERR_HWPOISON	(KVM_PFN_ERR_MASK + 1)
 #define KVM_PFN_ERR_RO_FAULT	(KVM_PFN_ERR_MASK + 2)
 #define KVM_PFN_ERR_SIGPENDING	(KVM_PFN_ERR_MASK + 3)
+#define KVM_PFN_ERR_NEEDS_IO	(KVM_PFN_ERR_MASK + 4)
 
 /*
  * error pfns indicate that the gfn is in slot but faild to
@@ -1218,7 +1219,7 @@ kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
 		      bool *writable);
 kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn);
 kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
-			       bool interruptible, bool *async,
+			       bool interruptible, bool no_wait,
 			       bool write_fault, bool *writable, hva_t *hva);
 
 void kvm_release_pfn_clean(kvm_pfn_t pfn);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 03af1a0090b1..c2efdfe26d5b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2789,7 +2789,7 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
  * The slow path to get the pfn of the specified host virtual address,
  * 1 indicates success, -errno is returned if error is detected.
  */
-static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
+static int hva_to_pfn_slow(unsigned long addr, bool no_wait, bool write_fault,
 			   bool interruptible, bool *writable, kvm_pfn_t *pfn)
 {
 	/*
@@ -2812,7 +2812,7 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
 
 	if (write_fault)
 		flags |= FOLL_WRITE;
-	if (async)
+	if (no_wait)
 		flags |= FOLL_NOWAIT;
 	if (interruptible)
 		flags |= FOLL_INTERRUPTIBLE;
@@ -2928,8 +2928,8 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
  * Pin guest page in memory and return its pfn.
  * @addr: host virtual address which maps memory to the guest
  * @interruptible: whether the process can be interrupted by non-fatal signals
- * @async: whether this function need to wait IO complete if the
- *         host page is not in the memory
+ * @no_wait: whether or not this function need to wait IO complete if the
+ *	     host page is not in the memory
  * @write_fault: whether we should get a writable host page
  * @writable: whether it allows to map a writable host page for !@write_fault
  *
@@ -2938,7 +2938,7 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
  * 2): @write_fault = false && @writable, @writable will tell the caller
  *     whether the mapping is writable.
  */
-kvm_pfn_t hva_to_pfn(unsigned long addr, bool interruptible, bool *async,
+kvm_pfn_t hva_to_pfn(unsigned long addr, bool interruptible, bool no_wait,
 		     bool write_fault, bool *writable)
 {
 	struct vm_area_struct *vma;
@@ -2950,7 +2950,7 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool interruptible, bool *async,
 	if (hva_to_pfn_fast(addr, write_fault, writable, &pfn))
 		return pfn;
 
-	npages = hva_to_pfn_slow(addr, async, write_fault, interruptible,
+	npages = hva_to_pfn_slow(addr, no_wait, write_fault, interruptible,
 				 writable, &pfn);
 	if (npages == 1)
 		return pfn;
@@ -2959,7 +2959,7 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool interruptible, bool *async,
 
 	mmap_read_lock(current->mm);
 	if (npages == -EHWPOISON ||
-	      (!async && check_user_page_hwpoison(addr))) {
+	    (!no_wait && check_user_page_hwpoison(addr))) {
 		pfn = KVM_PFN_ERR_HWPOISON;
 		goto exit;
 	}
@@ -2976,9 +2976,10 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool interruptible, bool *async,
 		if (r < 0)
 			pfn = KVM_PFN_ERR_FAULT;
 	} else {
-		if (async && vma_is_valid(vma, write_fault))
-			*async = true;
-		pfn = KVM_PFN_ERR_FAULT;
+		if (no_wait && vma_is_valid(vma, write_fault))
+			pfn = KVM_PFN_ERR_NEEDS_IO;
+		else
+			pfn = KVM_PFN_ERR_FAULT;
 	}
 exit:
 	mmap_read_unlock(current->mm);
@@ -2986,7 +2987,7 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool interruptible, bool *async,
 }
 
 kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
-			       bool interruptible, bool *async,
+			       bool interruptible, bool no_wait,
 			       bool write_fault, bool *writable, hva_t *hva)
 {
 	unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, write_fault);
@@ -3008,21 +3009,21 @@ kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
 		writable = NULL;
 	}
 
-	return hva_to_pfn(addr, interruptible, async, write_fault, writable);
+	return hva_to_pfn(addr, interruptible, no_wait, write_fault, writable);
 }
 EXPORT_SYMBOL_GPL(__gfn_to_pfn_memslot);
 
 kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
 		      bool *writable)
 {
-	return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, NULL,
+	return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, false,
 				    write_fault, writable, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_prot);
 
 kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn)
 {
-	return __gfn_to_pfn_memslot(slot, gfn, false, NULL, true, NULL, NULL);
+	return __gfn_to_pfn_memslot(slot, gfn, false, false, true, NULL, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot);
 
diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h
index a3fa86f60d6c..51f3fee4ca3f 100644
--- a/virt/kvm/kvm_mm.h
+++ b/virt/kvm/kvm_mm.h
@@ -20,7 +20,7 @@
 #define KVM_MMU_UNLOCK(kvm)		spin_unlock(&(kvm)->mmu_lock)
 #endif /* KVM_HAVE_MMU_RWLOCK */
 
-kvm_pfn_t hva_to_pfn(unsigned long addr, bool interruptible, bool *async,
+kvm_pfn_t hva_to_pfn(unsigned long addr, bool interruptible, bool no_wait,
 		     bool write_fault, bool *writable);
 
 #ifdef CONFIG_HAVE_KVM_PFNCACHE
diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 58c706a610e5..32dc61f48c81 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -197,8 +197,8 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
 			cond_resched();
 		}
 
-		/* We always request a writeable mapping */
-		new_pfn = hva_to_pfn(gpc->uhva, false, NULL, true, NULL);
+		/* We always request a writable mapping */
+		new_pfn = hva_to_pfn(gpc->uhva, false, false, true, NULL);
 		if (is_error_noslot_pfn(new_pfn))
 			goto out_error;
 
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 15/84] KVM: x86/mmu: Drop kvm_page_fault.hva, i.e. don't track intermediate hva
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (13 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 14/84] KVM: Replace "async" pointer in gfn=>pfn with "no_wait" and error code Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 16/84] KVM: Drop unused "hva" pointer from __gfn_to_pfn_memslot() Sean Christopherson
                   ` (70 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Remove kvm_page_fault.hva as it is never read, only written.  This will
allow removing the @hva param from __gfn_to_pfn_memslot().

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 5 ++---
 arch/x86/kvm/mmu/mmu_internal.h | 2 --
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index eb9ad0283fd5..e0bfbf95646c 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3248,7 +3248,6 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu,
 	fault->slot = NULL;
 	fault->pfn = KVM_PFN_NOSLOT;
 	fault->map_writable = false;
-	fault->hva = KVM_HVA_ERR_BAD;
 
 	/*
 	 * If MMIO caching is disabled, emulate immediately without
@@ -4333,7 +4332,7 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 
 	fault->pfn = __gfn_to_pfn_memslot(fault->slot, fault->gfn, false, true,
 					  fault->write, &fault->map_writable,
-					  &fault->hva);
+					  NULL);
 
 	/*
 	 * If resolving the page failed because I/O is needed to fault-in the
@@ -4362,7 +4361,7 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	 */
 	fault->pfn = __gfn_to_pfn_memslot(fault->slot, fault->gfn, true, true,
 					  fault->write, &fault->map_writable,
-					  &fault->hva);
+					  NULL);
 	return RET_PF_CONTINUE;
 }
 
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 1721d97743e9..f67396c435df 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -238,7 +238,6 @@ struct kvm_page_fault {
 	/* Outputs of kvm_faultin_pfn.  */
 	unsigned long mmu_seq;
 	kvm_pfn_t pfn;
-	hva_t hva;
 	bool map_writable;
 
 	/*
@@ -310,7 +309,6 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 		.is_private = err & PFERR_PRIVATE_ACCESS,
 
 		.pfn = KVM_PFN_ERR_FAULT,
-		.hva = KVM_HVA_ERR_BAD,
 	};
 	int r;
 
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 16/84] KVM: Drop unused "hva" pointer from __gfn_to_pfn_memslot()
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (14 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 15/84] KVM: x86/mmu: Drop kvm_page_fault.hva, i.e. don't track intermediate hva Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 17/84] KVM: Introduce kvm_follow_pfn() to eventually replace "gfn_to_pfn" APIs Sean Christopherson
                   ` (69 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Drop @hva from __gfn_to_pfn_memslot() now that all callers pass NULL.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm64/kvm/mmu.c                   | 2 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c    | 2 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 2 +-
 arch/x86/kvm/mmu/mmu.c                 | 6 ++----
 include/linux/kvm_host.h               | 2 +-
 virt/kvm/kvm_main.c                    | 9 +++------
 6 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 30dd62f56a11..22ee37360c4e 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1563,7 +1563,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	mmap_read_unlock(current->mm);
 
 	pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
-				   write_fault, &writable, NULL);
+				   write_fault, &writable);
 	if (pfn == KVM_PFN_ERR_HWPOISON) {
 		kvm_send_hwpoison_signal(hva, vma_shift);
 		return 0;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 8cd02ca4b1b8..2f1d58984b41 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -614,7 +614,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_vcpu *vcpu,
 	} else {
 		/* Call KVM generic code to do the slow-path check */
 		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
-					   writing, &write_ok, NULL);
+					   writing, &write_ok);
 		if (is_error_noslot_pfn(pfn))
 			return -EFAULT;
 		page = NULL;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 26a969e935e3..8304b6f8fe45 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -853,7 +853,7 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu,
 
 		/* Call KVM generic code to do the slow-path check */
 		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
-					   writing, upgrade_p, NULL);
+					   writing, upgrade_p);
 		if (is_error_noslot_pfn(pfn))
 			return -EFAULT;
 		page = NULL;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e0bfbf95646c..a201b56728ae 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4331,8 +4331,7 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 		return kvm_faultin_pfn_private(vcpu, fault);
 
 	fault->pfn = __gfn_to_pfn_memslot(fault->slot, fault->gfn, false, true,
-					  fault->write, &fault->map_writable,
-					  NULL);
+					  fault->write, &fault->map_writable);
 
 	/*
 	 * If resolving the page failed because I/O is needed to fault-in the
@@ -4360,8 +4359,7 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	 * get a page and a fatal signal, i.e. SIGKILL, is pending.
 	 */
 	fault->pfn = __gfn_to_pfn_memslot(fault->slot, fault->gfn, true, true,
-					  fault->write, &fault->map_writable,
-					  NULL);
+					  fault->write, &fault->map_writable);
 	return RET_PF_CONTINUE;
 }
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 92b2922e2216..f42e030f69a4 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1220,7 +1220,7 @@ kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
 kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn);
 kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
 			       bool interruptible, bool no_wait,
-			       bool write_fault, bool *writable, hva_t *hva);
+			       bool write_fault, bool *writable);
 
 void kvm_release_pfn_clean(kvm_pfn_t pfn);
 void kvm_release_pfn_dirty(kvm_pfn_t pfn);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c2efdfe26d5b..6e3bb202c1b3 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2988,13 +2988,10 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool interruptible, bool no_wait,
 
 kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
 			       bool interruptible, bool no_wait,
-			       bool write_fault, bool *writable, hva_t *hva)
+			       bool write_fault, bool *writable)
 {
 	unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, write_fault);
 
-	if (hva)
-		*hva = addr;
-
 	if (kvm_is_error_hva(addr)) {
 		if (writable)
 			*writable = false;
@@ -3017,13 +3014,13 @@ kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
 		      bool *writable)
 {
 	return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, false,
-				    write_fault, writable, NULL);
+				    write_fault, writable);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_prot);
 
 kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn)
 {
-	return __gfn_to_pfn_memslot(slot, gfn, false, false, true, NULL, NULL);
+	return __gfn_to_pfn_memslot(slot, gfn, false, false, true, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot);
 
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 17/84] KVM: Introduce kvm_follow_pfn() to eventually replace "gfn_to_pfn" APIs
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (15 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 16/84] KVM: Drop unused "hva" pointer from __gfn_to_pfn_memslot() Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 18/84] KVM: Remove pointless sanity check on @map param to kvm_vcpu_(un)map() Sean Christopherson
                   ` (68 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

From: David Stevens <stevensd@chromium.org>

Introduce kvm_follow_pfn() to eventually supplant the various "gfn_to_pfn"
APIs, albeit by adding more wrappers.  The primary motivation of the new
helper is to pass a structure instead of an ever changing set of parameters,
e.g. so that tweaking the behavior, inputs, and/or outputs of the "to pfn"
helpers doesn't require churning half of KVM.

In the more distant future, the APIs exposed to arch code could also
follow suit, e.g. by adding something akin to x86's "struct kvm_page_fault"
when faulting in guest memory.  But for now, the goal is purely to clean
up KVM's "internal" MMU code.

As part of the conversion, replace the write_fault, interruptible, and
no-wait boolean flags with FOLL_WRITE, FOLL_INTERRUPTIBLE, and FOLL_NOWAIT
respectively.  Collecting the various FOLL_* flags into a single field
will again ease the pain of passing new flags.

Signed-off-by: David Stevens <stevensd@chromium.org>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 virt/kvm/kvm_main.c | 166 +++++++++++++++++++++++---------------------
 virt/kvm/kvm_mm.h   |  20 +++++-
 virt/kvm/pfncache.c |   9 ++-
 3 files changed, 111 insertions(+), 84 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6e3bb202c1b3..56c2d11761e0 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2761,8 +2761,7 @@ static inline int check_user_page_hwpoison(unsigned long addr)
  * true indicates success, otherwise false is returned.  It's also the
  * only part that runs if we can in atomic context.
  */
-static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
-			    bool *writable, kvm_pfn_t *pfn)
+static bool hva_to_pfn_fast(struct kvm_follow_pfn *kfp, kvm_pfn_t *pfn)
 {
 	struct page *page[1];
 
@@ -2771,14 +2770,13 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
 	 * or the caller allows to map a writable pfn for a read fault
 	 * request.
 	 */
-	if (!(write_fault || writable))
+	if (!((kfp->flags & FOLL_WRITE) || kfp->map_writable))
 		return false;
 
-	if (get_user_page_fast_only(addr, FOLL_WRITE, page)) {
+	if (get_user_page_fast_only(kfp->hva, FOLL_WRITE, page)) {
 		*pfn = page_to_pfn(page[0]);
-
-		if (writable)
-			*writable = true;
+		if (kfp->map_writable)
+			*kfp->map_writable = true;
 		return true;
 	}
 
@@ -2789,8 +2787,7 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
  * The slow path to get the pfn of the specified host virtual address,
  * 1 indicates success, -errno is returned if error is detected.
  */
-static int hva_to_pfn_slow(unsigned long addr, bool no_wait, bool write_fault,
-			   bool interruptible, bool *writable, kvm_pfn_t *pfn)
+static int hva_to_pfn_slow(struct kvm_follow_pfn *kfp, kvm_pfn_t *pfn)
 {
 	/*
 	 * When a VCPU accesses a page that is not mapped into the secondary
@@ -2803,34 +2800,30 @@ static int hva_to_pfn_slow(unsigned long addr, bool no_wait, bool write_fault,
 	 * Note that get_user_page_fast_only() and FOLL_WRITE for now
 	 * implicitly honor NUMA hinting faults and don't need this flag.
 	 */
-	unsigned int flags = FOLL_HWPOISON | FOLL_HONOR_NUMA_FAULT;
-	struct page *page;
+	unsigned int flags = FOLL_HWPOISON | FOLL_HONOR_NUMA_FAULT | kfp->flags;
+	struct page *page, *wpage;
 	int npages;
 
-	if (writable)
-		*writable = write_fault;
-
-	if (write_fault)
-		flags |= FOLL_WRITE;
-	if (no_wait)
-		flags |= FOLL_NOWAIT;
-	if (interruptible)
-		flags |= FOLL_INTERRUPTIBLE;
-
-	npages = get_user_pages_unlocked(addr, 1, &page, flags);
+	npages = get_user_pages_unlocked(kfp->hva, 1, &page, flags);
 	if (npages != 1)
 		return npages;
 
+	if (!kfp->map_writable)
+		goto out;
+
+	if (kfp->flags & FOLL_WRITE) {
+		*kfp->map_writable = true;
+		goto out;
+	}
+
 	/* map read fault as writable if possible */
-	if (unlikely(!write_fault) && writable) {
-		struct page *wpage;
-
-		if (get_user_page_fast_only(addr, FOLL_WRITE, &wpage)) {
-			*writable = true;
-			put_page(page);
-			page = wpage;
-		}
+	if (get_user_page_fast_only(kfp->hva, FOLL_WRITE, &wpage)) {
+		*kfp->map_writable = true;
+		put_page(page);
+		page = wpage;
 	}
+
+out:
 	*pfn = page_to_pfn(page);
 	return npages;
 }
@@ -2857,23 +2850,23 @@ static int kvm_try_get_pfn(kvm_pfn_t pfn)
 }
 
 static int hva_to_pfn_remapped(struct vm_area_struct *vma,
-			       unsigned long addr, bool write_fault,
-			       bool *writable, kvm_pfn_t *p_pfn)
+			       struct kvm_follow_pfn *kfp, kvm_pfn_t *p_pfn)
 {
 	kvm_pfn_t pfn;
 	pte_t *ptep;
 	pte_t pte;
 	spinlock_t *ptl;
+	bool write_fault = kfp->flags & FOLL_WRITE;
 	int r;
 
-	r = follow_pte(vma, addr, &ptep, &ptl);
+	r = follow_pte(vma, kfp->hva, &ptep, &ptl);
 	if (r) {
 		/*
 		 * get_user_pages fails for VM_IO and VM_PFNMAP vmas and does
 		 * not call the fault handler, so do it here.
 		 */
 		bool unlocked = false;
-		r = fixup_user_fault(current->mm, addr,
+		r = fixup_user_fault(current->mm, kfp->hva,
 				     (write_fault ? FAULT_FLAG_WRITE : 0),
 				     &unlocked);
 		if (unlocked)
@@ -2881,7 +2874,7 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 		if (r)
 			return r;
 
-		r = follow_pte(vma, addr, &ptep, &ptl);
+		r = follow_pte(vma, kfp->hva, &ptep, &ptl);
 		if (r)
 			return r;
 	}
@@ -2893,8 +2886,8 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 		goto out;
 	}
 
-	if (writable)
-		*writable = pte_write(pte);
+	if (kfp->map_writable)
+		*kfp->map_writable = pte_write(pte);
 	pfn = pte_pfn(pte);
 
 	/*
@@ -2924,22 +2917,7 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 	return r;
 }
 
-/*
- * Pin guest page in memory and return its pfn.
- * @addr: host virtual address which maps memory to the guest
- * @interruptible: whether the process can be interrupted by non-fatal signals
- * @no_wait: whether or not this function need to wait IO complete if the
- *	     host page is not in the memory
- * @write_fault: whether we should get a writable host page
- * @writable: whether it allows to map a writable host page for !@write_fault
- *
- * The function will map a writable host page for these two cases:
- * 1): @write_fault = true
- * 2): @write_fault = false && @writable, @writable will tell the caller
- *     whether the mapping is writable.
- */
-kvm_pfn_t hva_to_pfn(unsigned long addr, bool interruptible, bool no_wait,
-		     bool write_fault, bool *writable)
+kvm_pfn_t hva_to_pfn(struct kvm_follow_pfn *kfp)
 {
 	struct vm_area_struct *vma;
 	kvm_pfn_t pfn;
@@ -2947,11 +2925,10 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool interruptible, bool no_wait,
 
 	might_sleep();
 
-	if (hva_to_pfn_fast(addr, write_fault, writable, &pfn))
+	if (hva_to_pfn_fast(kfp, &pfn))
 		return pfn;
 
-	npages = hva_to_pfn_slow(addr, no_wait, write_fault, interruptible,
-				 writable, &pfn);
+	npages = hva_to_pfn_slow(kfp, &pfn);
 	if (npages == 1)
 		return pfn;
 	if (npages == -EINTR)
@@ -2959,24 +2936,25 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool interruptible, bool no_wait,
 
 	mmap_read_lock(current->mm);
 	if (npages == -EHWPOISON ||
-	    (!no_wait && check_user_page_hwpoison(addr))) {
+	    (!(kfp->flags & FOLL_NOWAIT) && check_user_page_hwpoison(kfp->hva))) {
 		pfn = KVM_PFN_ERR_HWPOISON;
 		goto exit;
 	}
 
 retry:
-	vma = vma_lookup(current->mm, addr);
+	vma = vma_lookup(current->mm, kfp->hva);
 
 	if (vma == NULL)
 		pfn = KVM_PFN_ERR_FAULT;
 	else if (vma->vm_flags & (VM_IO | VM_PFNMAP)) {
-		r = hva_to_pfn_remapped(vma, addr, write_fault, writable, &pfn);
+		r = hva_to_pfn_remapped(vma, kfp, &pfn);
 		if (r == -EAGAIN)
 			goto retry;
 		if (r < 0)
 			pfn = KVM_PFN_ERR_FAULT;
 	} else {
-		if (no_wait && vma_is_valid(vma, write_fault))
+		if ((kfp->flags & FOLL_NOWAIT) &&
+		    vma_is_valid(vma, kfp->flags & FOLL_WRITE))
 			pfn = KVM_PFN_ERR_NEEDS_IO;
 		else
 			pfn = KVM_PFN_ERR_FAULT;
@@ -2986,41 +2964,69 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool interruptible, bool no_wait,
 	return pfn;
 }
 
+static kvm_pfn_t kvm_follow_pfn(struct kvm_follow_pfn *kfp)
+{
+	kfp->hva = __gfn_to_hva_many(kfp->slot, kfp->gfn, NULL,
+				     kfp->flags & FOLL_WRITE);
+
+	if (kfp->hva == KVM_HVA_ERR_RO_BAD)
+		return KVM_PFN_ERR_RO_FAULT;
+
+	if (kvm_is_error_hva(kfp->hva))
+		return KVM_PFN_NOSLOT;
+
+	if (memslot_is_readonly(kfp->slot) && kfp->map_writable) {
+		*kfp->map_writable = false;
+		kfp->map_writable = NULL;
+	}
+
+	return hva_to_pfn(kfp);
+}
+
 kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
 			       bool interruptible, bool no_wait,
 			       bool write_fault, bool *writable)
 {
-	unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, write_fault);
-
-	if (kvm_is_error_hva(addr)) {
-		if (writable)
-			*writable = false;
-
-		return addr == KVM_HVA_ERR_RO_BAD ? KVM_PFN_ERR_RO_FAULT :
-						    KVM_PFN_NOSLOT;
-	}
-
-	/* Do not map writable pfn in the readonly memslot. */
-	if (writable && memslot_is_readonly(slot)) {
-		*writable = false;
-		writable = NULL;
-	}
-
-	return hva_to_pfn(addr, interruptible, no_wait, write_fault, writable);
+	struct kvm_follow_pfn kfp = {
+		.slot = slot,
+		.gfn = gfn,
+		.map_writable = writable,
+	};
+
+	if (write_fault)
+		kfp.flags |= FOLL_WRITE;
+	if (no_wait)
+		kfp.flags |= FOLL_NOWAIT;
+	if (interruptible)
+		kfp.flags |= FOLL_INTERRUPTIBLE;
+
+	return kvm_follow_pfn(&kfp);
 }
 EXPORT_SYMBOL_GPL(__gfn_to_pfn_memslot);
 
 kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
 		      bool *writable)
 {
-	return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, false,
-				    write_fault, writable);
+	struct kvm_follow_pfn kfp = {
+		.slot = gfn_to_memslot(kvm, gfn),
+		.gfn = gfn,
+		.flags = write_fault ? FOLL_WRITE : 0,
+		.map_writable = writable,
+	};
+
+	return kvm_follow_pfn(&kfp);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_prot);
 
 kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn)
 {
-	return __gfn_to_pfn_memslot(slot, gfn, false, false, true, NULL);
+	struct kvm_follow_pfn kfp = {
+		.slot = slot,
+		.gfn = gfn,
+		.flags = FOLL_WRITE,
+	};
+
+	return kvm_follow_pfn(&kfp);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot);
 
diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h
index 51f3fee4ca3f..d5a215958f06 100644
--- a/virt/kvm/kvm_mm.h
+++ b/virt/kvm/kvm_mm.h
@@ -20,8 +20,24 @@
 #define KVM_MMU_UNLOCK(kvm)		spin_unlock(&(kvm)->mmu_lock)
 #endif /* KVM_HAVE_MMU_RWLOCK */
 
-kvm_pfn_t hva_to_pfn(unsigned long addr, bool interruptible, bool no_wait,
-		     bool write_fault, bool *writable);
+
+struct kvm_follow_pfn {
+	const struct kvm_memory_slot *slot;
+	const gfn_t gfn;
+
+	unsigned long hva;
+
+	/* FOLL_* flags modifying lookup behavior, e.g. FOLL_WRITE. */
+	unsigned int flags;
+
+	/*
+	 * If non-NULL, try to get a writable mapping even for a read fault.
+	 * Set to true if a writable mapping was obtained.
+	 */
+	bool *map_writable;
+};
+
+kvm_pfn_t hva_to_pfn(struct kvm_follow_pfn *kfp);
 
 #ifdef CONFIG_HAVE_KVM_PFNCACHE
 void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm,
diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 32dc61f48c81..067daf9ad6ef 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -159,6 +159,12 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
 	kvm_pfn_t new_pfn = KVM_PFN_ERR_FAULT;
 	void *new_khva = NULL;
 	unsigned long mmu_seq;
+	struct kvm_follow_pfn kfp = {
+		.slot = gpc->memslot,
+		.gfn = gpa_to_gfn(gpc->gpa),
+		.flags = FOLL_WRITE,
+		.hva = gpc->uhva,
+	};
 
 	lockdep_assert_held(&gpc->refresh_lock);
 
@@ -197,8 +203,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
 			cond_resched();
 		}
 
-		/* We always request a writable mapping */
-		new_pfn = hva_to_pfn(gpc->uhva, false, false, true, NULL);
+		new_pfn = hva_to_pfn(&kfp);
 		if (is_error_noslot_pfn(new_pfn))
 			goto out_error;
 
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 18/84] KVM: Remove pointless sanity check on @map param to kvm_vcpu_(un)map()
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (16 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 17/84] KVM: Introduce kvm_follow_pfn() to eventually replace "gfn_to_pfn" APIs Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 19/84] KVM: Explicitly initialize all fields at the start of kvm_vcpu_map() Sean Christopherson
                   ` (67 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Drop kvm_vcpu_{,un}map()'s useless checks on @map being non-NULL.  The map
is 100% kernel controlled, any caller that passes a NULL pointer is broken
and needs to be fixed, i.e. a crash due to a NULL pointer dereference is
desirable (though obviously not as desirable as not having a bug in the
first place).

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 virt/kvm/kvm_main.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 56c2d11761e0..21ff0f4fa02c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3092,9 +3092,6 @@ int kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map)
 	void *hva = NULL;
 	struct page *page = KVM_UNMAPPED_PAGE;
 
-	if (!map)
-		return -EINVAL;
-
 	pfn = gfn_to_pfn(vcpu->kvm, gfn);
 	if (is_error_noslot_pfn(pfn))
 		return -EINVAL;
@@ -3122,9 +3119,6 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_map);
 
 void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map, bool dirty)
 {
-	if (!map)
-		return;
-
 	if (!map->hva)
 		return;
 
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 19/84] KVM: Explicitly initialize all fields at the start of kvm_vcpu_map()
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (17 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 18/84] KVM: Remove pointless sanity check on @map param to kvm_vcpu_(un)map() Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 20/84] KVM: Use NULL for struct page pointer to indicate mremapped memory Sean Christopherson
                   ` (66 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Explicitly initialize the entire kvm_host_map structure when mapping a
pfn, as some callers declare their struct on the stack, i.e. don't
zero-initialize the struct, which makes the map->hva in kvm_vcpu_unmap()
*very* suspect.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 virt/kvm/kvm_main.c | 40 ++++++++++++++++------------------------
 1 file changed, 16 insertions(+), 24 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 21ff0f4fa02c..67a50b87bb87 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3088,32 +3088,24 @@ void kvm_release_pfn(kvm_pfn_t pfn, bool dirty)
 
 int kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map)
 {
-	kvm_pfn_t pfn;
-	void *hva = NULL;
-	struct page *page = KVM_UNMAPPED_PAGE;
-
-	pfn = gfn_to_pfn(vcpu->kvm, gfn);
-	if (is_error_noslot_pfn(pfn))
-		return -EINVAL;
-
-	if (pfn_valid(pfn)) {
-		page = pfn_to_page(pfn);
-		hva = kmap(page);
-#ifdef CONFIG_HAS_IOMEM
-	} else {
-		hva = memremap(pfn_to_hpa(pfn), PAGE_SIZE, MEMREMAP_WB);
-#endif
-	}
-
-	if (!hva)
-		return -EFAULT;
-
-	map->page = page;
-	map->hva = hva;
-	map->pfn = pfn;
+	map->page = KVM_UNMAPPED_PAGE;
+	map->hva = NULL;
 	map->gfn = gfn;
 
-	return 0;
+	map->pfn = gfn_to_pfn(vcpu->kvm, gfn);
+	if (is_error_noslot_pfn(map->pfn))
+		return -EINVAL;
+
+	if (pfn_valid(map->pfn)) {
+		map->page = pfn_to_page(map->pfn);
+		map->hva = kmap(map->page);
+#ifdef CONFIG_HAS_IOMEM
+	} else {
+		map->hva = memremap(pfn_to_hpa(map->pfn), PAGE_SIZE, MEMREMAP_WB);
+#endif
+	}
+
+	return map->hva ? 0 : -EFAULT;
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_map);
 
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 20/84] KVM: Use NULL for struct page pointer to indicate mremapped memory
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (18 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 19/84] KVM: Explicitly initialize all fields at the start of kvm_vcpu_map() Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 21/84] KVM: nVMX: Rely on kvm_vcpu_unmap() to track validity of eVMCS mapping Sean Christopherson
                   ` (65 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Drop yet another unnecessary magic page value from KVM, as there's zero
reason to use a poisoned pointer to indicate "no page".  If KVM uses a
NULL page pointer, the kernel will explode just as quickly as if KVM uses
a poisoned pointer.  Never mind the fact that such usage would be a
blatant and egregious KVM bug.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 include/linux/kvm_host.h | 4 ----
 virt/kvm/kvm_main.c      | 4 ++--
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f42e030f69a4..a5dcb72bab00 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -273,16 +273,12 @@ enum {
 	READING_SHADOW_PAGE_TABLES,
 };
 
-#define KVM_UNMAPPED_PAGE	((void *) 0x500 + POISON_POINTER_DELTA)
-
 struct kvm_host_map {
 	/*
 	 * Only valid if the 'pfn' is managed by the host kernel (i.e. There is
 	 * a 'struct page' for it. When using mem= kernel parameter some memory
 	 * can be used as guest memory but they are not managed by host
 	 * kernel).
-	 * If 'pfn' is not managed by the host kernel, this field is
-	 * initialized to KVM_UNMAPPED_PAGE.
 	 */
 	struct page *page;
 	void *hva;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 67a50b87bb87..3d717a131906 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3088,7 +3088,7 @@ void kvm_release_pfn(kvm_pfn_t pfn, bool dirty)
 
 int kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map)
 {
-	map->page = KVM_UNMAPPED_PAGE;
+	map->page = NULL;
 	map->hva = NULL;
 	map->gfn = gfn;
 
@@ -3114,7 +3114,7 @@ void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map, bool dirty)
 	if (!map->hva)
 		return;
 
-	if (map->page != KVM_UNMAPPED_PAGE)
+	if (map->page)
 		kunmap(map->page);
 #ifdef CONFIG_HAS_IOMEM
 	else
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 21/84] KVM: nVMX: Rely on kvm_vcpu_unmap() to track validity of eVMCS mapping
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (19 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 20/84] KVM: Use NULL for struct page pointer to indicate mremapped memory Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 22/84] KVM: nVMX: Drop pointless msr_bitmap_map field from struct nested_vmx Sean Christopherson
                   ` (64 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Remove the explicit evmptr12 validity check when deciding whether or not
to unmap the eVMCS pointer, and instead rely on kvm_vcpu_unmap() to play
nice with a NULL map->hva, i.e. to do nothing if the map is invalid.

Note, vmx->nested.hv_evmcs_map is zero-allocated along with the rest of
vcpu_vmx, i.e. the map starts out invalid/NULL.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 2392a7ef254d..a34b49ea64b5 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -231,11 +231,8 @@ static inline void nested_release_evmcs(struct kvm_vcpu *vcpu)
 	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
-	if (nested_vmx_is_evmptr12_valid(vmx)) {
-		kvm_vcpu_unmap(vcpu, &vmx->nested.hv_evmcs_map, true);
-		vmx->nested.hv_evmcs = NULL;
-	}
-
+	kvm_vcpu_unmap(vcpu, &vmx->nested.hv_evmcs_map, true);
+	vmx->nested.hv_evmcs = NULL;
 	vmx->nested.hv_evmcs_vmptr = EVMPTR_INVALID;
 
 	if (hv_vcpu) {
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 22/84] KVM: nVMX: Drop pointless msr_bitmap_map field from struct nested_vmx
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (20 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 21/84] KVM: nVMX: Rely on kvm_vcpu_unmap() to track validity of eVMCS mapping Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 23/84] KVM: nVMX: Add helper to put (unmap) vmcs12 pages Sean Christopherson
                   ` (63 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Remove vcpu_vmx.msr_bitmap_map and instead use an on-stack structure in
the one function that uses the map, nested_vmx_prepare_msr_bitmap().

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 8 ++++----
 arch/x86/kvm/vmx/vmx.h    | 2 --
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index a34b49ea64b5..372d005e09e7 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -621,7 +621,7 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
 	int msr;
 	unsigned long *msr_bitmap_l1;
 	unsigned long *msr_bitmap_l0 = vmx->nested.vmcs02.msr_bitmap;
-	struct kvm_host_map *map = &vmx->nested.msr_bitmap_map;
+	struct kvm_host_map msr_bitmap_map;
 
 	/* Nothing to do if the MSR bitmap is not in use.  */
 	if (!cpu_has_vmx_msr_bitmap() ||
@@ -644,10 +644,10 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
 			return true;
 	}
 
-	if (kvm_vcpu_map(vcpu, gpa_to_gfn(vmcs12->msr_bitmap), map))
+	if (kvm_vcpu_map(vcpu, gpa_to_gfn(vmcs12->msr_bitmap), &msr_bitmap_map))
 		return false;
 
-	msr_bitmap_l1 = (unsigned long *)map->hva;
+	msr_bitmap_l1 = (unsigned long *)msr_bitmap_map.hva;
 
 	/*
 	 * To keep the control flow simple, pay eight 8-byte writes (sixteen
@@ -711,7 +711,7 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
 	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
 					 MSR_IA32_FLUSH_CMD, MSR_TYPE_W);
 
-	kvm_vcpu_unmap(vcpu, &vmx->nested.msr_bitmap_map, false);
+	kvm_vcpu_unmap(vcpu, &msr_bitmap_map, false);
 
 	vmx->nested.force_msr_bitmap_recalc = false;
 
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 42498fa63abb..889c6c42ee27 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -204,8 +204,6 @@ struct nested_vmx {
 	struct kvm_host_map virtual_apic_map;
 	struct kvm_host_map pi_desc_map;
 
-	struct kvm_host_map msr_bitmap_map;
-
 	struct pi_desc *pi_desc;
 	bool pi_pending;
 	u16 posted_intr_nv;
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 23/84] KVM: nVMX: Add helper to put (unmap) vmcs12 pages
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (21 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 22/84] KVM: nVMX: Drop pointless msr_bitmap_map field from struct nested_vmx Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 24/84] KVM: Use plain "struct page" pointer instead of single-entry array Sean Christopherson
                   ` (62 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Add a helper to dedup unmapping the vmcs12 pages.  This will reduce the
amount of churn when a future patch refactors the kvm_vcpu_unmap() API.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 32 ++++++++++++++++++--------------
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 372d005e09e7..8d05d1d9f544 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -314,6 +314,21 @@ static void vmx_switch_vmcs(struct kvm_vcpu *vcpu, struct loaded_vmcs *vmcs)
 	vcpu->arch.regs_dirty = 0;
 }
 
+static void nested_put_vmcs12_pages(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+	/*
+	 * Unpin physical memory we referred to in the vmcs02.  The APIC access
+	 * page's backing page (yeah, confusing) shouldn't actually be accessed,
+	 * and if it is written, the contents are irrelevant.
+	 */
+	kvm_vcpu_unmap(vcpu, &vmx->nested.apic_access_page_map, false);
+	kvm_vcpu_unmap(vcpu, &vmx->nested.virtual_apic_map, true);
+	kvm_vcpu_unmap(vcpu, &vmx->nested.pi_desc_map, true);
+	vmx->nested.pi_desc = NULL;
+}
+
 /*
  * Free whatever needs to be freed from vmx->nested when L1 goes down, or
  * just stops using VMX.
@@ -346,15 +361,8 @@ static void free_nested(struct kvm_vcpu *vcpu)
 	vmx->nested.cached_vmcs12 = NULL;
 	kfree(vmx->nested.cached_shadow_vmcs12);
 	vmx->nested.cached_shadow_vmcs12 = NULL;
-	/*
-	 * Unpin physical memory we referred to in the vmcs02.  The APIC access
-	 * page's backing page (yeah, confusing) shouldn't actually be accessed,
-	 * and if it is written, the contents are irrelevant.
-	 */
-	kvm_vcpu_unmap(vcpu, &vmx->nested.apic_access_page_map, false);
-	kvm_vcpu_unmap(vcpu, &vmx->nested.virtual_apic_map, true);
-	kvm_vcpu_unmap(vcpu, &vmx->nested.pi_desc_map, true);
-	vmx->nested.pi_desc = NULL;
+
+	nested_put_vmcs12_pages(vcpu);
 
 	kvm_mmu_free_roots(vcpu->kvm, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL);
 
@@ -4942,11 +4950,7 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
 		vmx_update_cpu_dirty_logging(vcpu);
 	}
 
-	/* Unpin physical memory we referred to in vmcs02 */
-	kvm_vcpu_unmap(vcpu, &vmx->nested.apic_access_page_map, false);
-	kvm_vcpu_unmap(vcpu, &vmx->nested.virtual_apic_map, true);
-	kvm_vcpu_unmap(vcpu, &vmx->nested.pi_desc_map, true);
-	vmx->nested.pi_desc = NULL;
+	nested_put_vmcs12_pages(vcpu);
 
 	if (vmx->nested.reload_vmcs01_apic_access_page) {
 		vmx->nested.reload_vmcs01_apic_access_page = false;
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 24/84] KVM: Use plain "struct page" pointer instead of single-entry array
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (22 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 23/84] KVM: nVMX: Add helper to put (unmap) vmcs12 pages Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-08-01  9:53   ` Alex Bennée
  2024-07-26 23:51 ` [PATCH v12 25/84] KVM: Provide refcounted page as output field in struct kvm_follow_pfn Sean Christopherson
                   ` (61 subsequent siblings)
  85 siblings, 1 reply; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Use a single pointer instead of a single-entry array for the struct page
pointer in hva_to_pfn_fast().  Using an array makes the code unnecessarily
annoying to read and update.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 virt/kvm/kvm_main.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 3d717a131906..8e83d3f043f1 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2763,7 +2763,7 @@ static inline int check_user_page_hwpoison(unsigned long addr)
  */
 static bool hva_to_pfn_fast(struct kvm_follow_pfn *kfp, kvm_pfn_t *pfn)
 {
-	struct page *page[1];
+	struct page *page;
 
 	/*
 	 * Fast pin a writable pfn only if it is a write fault request
@@ -2773,8 +2773,8 @@ static bool hva_to_pfn_fast(struct kvm_follow_pfn *kfp, kvm_pfn_t *pfn)
 	if (!((kfp->flags & FOLL_WRITE) || kfp->map_writable))
 		return false;
 
-	if (get_user_page_fast_only(kfp->hva, FOLL_WRITE, page)) {
-		*pfn = page_to_pfn(page[0]);
+	if (get_user_page_fast_only(kfp->hva, FOLL_WRITE, &page)) {
+		*pfn = page_to_pfn(page);
 		if (kfp->map_writable)
 			*kfp->map_writable = true;
 		return true;
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 25/84] KVM: Provide refcounted page as output field in struct kvm_follow_pfn
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (23 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 24/84] KVM: Use plain "struct page" pointer instead of single-entry array Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 26/84] KVM: Move kvm_{set,release}_page_{clean,dirty}() helpers up in kvm_main.c Sean Christopherson
                   ` (60 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Add kvm_follow_pfn.refcounted_page as an output for the "to pfn" APIs to
"return" the struct page that is associated with the returned pfn (if KVM
acquired a reference to the page).  This will eventually allow removing
KVM's hacky kvm_pfn_to_refcounted_page() code, which is error prone and
can't detect pfns that are valid, but aren't (currently) refcounted.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 virt/kvm/kvm_main.c | 100 +++++++++++++++++++++-----------------------
 virt/kvm/kvm_mm.h   |   9 ++++
 2 files changed, 56 insertions(+), 53 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8e83d3f043f1..31570c5627e3 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2756,6 +2756,46 @@ static inline int check_user_page_hwpoison(unsigned long addr)
 	return rc == -EHWPOISON;
 }
 
+static kvm_pfn_t kvm_resolve_pfn(struct kvm_follow_pfn *kfp, struct page *page,
+				 pte_t *pte, bool writable)
+{
+	kvm_pfn_t pfn;
+
+	WARN_ON_ONCE(!!page == !!pte);
+
+	if (kfp->map_writable)
+		*kfp->map_writable = writable;
+
+	/*
+	 * FIXME: Remove this once KVM no longer blindly calls put_page() on
+	 *	  every pfn that points at a struct page.
+	 *
+	 * Get a reference for follow_pte() pfns if they happen to point at a
+	 * struct page, as KVM will ultimately call kvm_release_pfn_clean() on
+	 * the returned pfn, i.e. KVM expects to have a reference.
+	 *
+	 * Certain IO or PFNMAP mappings can be backed with valid struct pages,
+	 * but be allocated without refcounting, e.g. tail pages of
+	 * non-compound higher order allocations.  Grabbing and putting a
+	 * reference to such pages would cause KVM to prematurely free a page
+	 * it doesn't own (KVM gets and puts the one and only reference).
+	 * Don't allow those pages until the FIXME is resolved.
+	 */
+	if (pte) {
+		pfn = pte_pfn(*pte);
+		page = kvm_pfn_to_refcounted_page(pfn);
+		if (page && !get_page_unless_zero(page))
+			return KVM_PFN_ERR_FAULT;
+	} else {
+		pfn = page_to_pfn(page);
+	}
+
+	if (kfp->refcounted_page)
+		*kfp->refcounted_page = page;
+
+	return pfn;
+}
+
 /*
  * The fast path to get the writable pfn which will be stored in @pfn,
  * true indicates success, otherwise false is returned.  It's also the
@@ -2774,9 +2814,7 @@ static bool hva_to_pfn_fast(struct kvm_follow_pfn *kfp, kvm_pfn_t *pfn)
 		return false;
 
 	if (get_user_page_fast_only(kfp->hva, FOLL_WRITE, &page)) {
-		*pfn = page_to_pfn(page);
-		if (kfp->map_writable)
-			*kfp->map_writable = true;
+		*pfn = kvm_resolve_pfn(kfp, page, NULL, true);
 		return true;
 	}
 
@@ -2808,23 +2846,15 @@ static int hva_to_pfn_slow(struct kvm_follow_pfn *kfp, kvm_pfn_t *pfn)
 	if (npages != 1)
 		return npages;
 
-	if (!kfp->map_writable)
-		goto out;
-
-	if (kfp->flags & FOLL_WRITE) {
-		*kfp->map_writable = true;
-		goto out;
-	}
-
 	/* map read fault as writable if possible */
-	if (get_user_page_fast_only(kfp->hva, FOLL_WRITE, &wpage)) {
-		*kfp->map_writable = true;
+	if (!(flags & FOLL_WRITE) && kfp->map_writable &&
+	    get_user_page_fast_only(kfp->hva, FOLL_WRITE, &wpage)) {
 		put_page(page);
 		page = wpage;
+		flags |= FOLL_WRITE;
 	}
 
-out:
-	*pfn = page_to_pfn(page);
+	*pfn = kvm_resolve_pfn(kfp, page, NULL, flags & FOLL_WRITE);
 	return npages;
 }
 
@@ -2839,20 +2869,9 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
 	return true;
 }
 
-static int kvm_try_get_pfn(kvm_pfn_t pfn)
-{
-	struct page *page = kvm_pfn_to_refcounted_page(pfn);
-
-	if (!page)
-		return 1;
-
-	return get_page_unless_zero(page);
-}
-
 static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 			       struct kvm_follow_pfn *kfp, kvm_pfn_t *p_pfn)
 {
-	kvm_pfn_t pfn;
 	pte_t *ptep;
 	pte_t pte;
 	spinlock_t *ptl;
@@ -2882,38 +2901,13 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 	pte = ptep_get(ptep);
 
 	if (write_fault && !pte_write(pte)) {
-		pfn = KVM_PFN_ERR_RO_FAULT;
+		*p_pfn = KVM_PFN_ERR_RO_FAULT;
 		goto out;
 	}
 
-	if (kfp->map_writable)
-		*kfp->map_writable = pte_write(pte);
-	pfn = pte_pfn(pte);
-
-	/*
-	 * Get a reference here because callers of *hva_to_pfn* and
-	 * *gfn_to_pfn* ultimately call kvm_release_pfn_clean on the
-	 * returned pfn.  This is only needed if the VMA has VM_MIXEDMAP
-	 * set, but the kvm_try_get_pfn/kvm_release_pfn_clean pair will
-	 * simply do nothing for reserved pfns.
-	 *
-	 * Whoever called remap_pfn_range is also going to call e.g.
-	 * unmap_mapping_range before the underlying pages are freed,
-	 * causing a call to our MMU notifier.
-	 *
-	 * Certain IO or PFNMAP mappings can be backed with valid
-	 * struct pages, but be allocated without refcounting e.g.,
-	 * tail pages of non-compound higher order allocations, which
-	 * would then underflow the refcount when the caller does the
-	 * required put_page. Don't allow those pages here.
-	 */
-	if (!kvm_try_get_pfn(pfn))
-		r = -EFAULT;
-
+	*p_pfn = kvm_resolve_pfn(kfp, NULL, &pte, pte_write(pte));
 out:
 	pte_unmap_unlock(ptep, ptl);
-	*p_pfn = pfn;
-
 	return r;
 }
 
diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h
index d5a215958f06..d3ac1ba8ba66 100644
--- a/virt/kvm/kvm_mm.h
+++ b/virt/kvm/kvm_mm.h
@@ -35,6 +35,15 @@ struct kvm_follow_pfn {
 	 * Set to true if a writable mapping was obtained.
 	 */
 	bool *map_writable;
+
+	/*
+	 * Optional output.  Set to a valid "struct page" if the returned pfn
+	 * is for a refcounted or pinned struct page, NULL if the returned pfn
+	 * has no struct page or if the struct page is not being refcounted
+	 * (e.g. tail pages of non-compound higher order allocations from
+	 * IO/PFNMAP mappings).
+	 */
+	struct page **refcounted_page;
 };
 
 kvm_pfn_t hva_to_pfn(struct kvm_follow_pfn *kfp);
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 26/84] KVM: Move kvm_{set,release}_page_{clean,dirty}() helpers up in kvm_main.c
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (24 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 25/84] KVM: Provide refcounted page as output field in struct kvm_follow_pfn Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-08-01  9:55   ` Alex Bennée
  2024-07-26 23:51 ` [PATCH v12 27/84] KVM: pfncache: Precisely track refcounted pages Sean Christopherson
                   ` (59 subsequent siblings)
  85 siblings, 1 reply; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Hoist the kvm_{set,release}_page_{clean,dirty}() APIs further up in
kvm_main.c so that they can be used by the kvm_follow_pfn family of APIs.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 virt/kvm/kvm_main.c | 82 ++++++++++++++++++++++-----------------------
 1 file changed, 41 insertions(+), 41 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 31570c5627e3..48b626f1b5f3 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2748,6 +2748,47 @@ unsigned long kvm_vcpu_gfn_to_hva_prot(struct kvm_vcpu *vcpu, gfn_t gfn, bool *w
 	return gfn_to_hva_memslot_prot(slot, gfn, writable);
 }
 
+static bool kvm_is_ad_tracked_page(struct page *page)
+{
+	/*
+	 * Per page-flags.h, pages tagged PG_reserved "should in general not be
+	 * touched (e.g. set dirty) except by its owner".
+	 */
+	return !PageReserved(page);
+}
+
+static void kvm_set_page_dirty(struct page *page)
+{
+	if (kvm_is_ad_tracked_page(page))
+		SetPageDirty(page);
+}
+
+static void kvm_set_page_accessed(struct page *page)
+{
+	if (kvm_is_ad_tracked_page(page))
+		mark_page_accessed(page);
+}
+
+void kvm_release_page_clean(struct page *page)
+{
+	if (!page)
+		return;
+
+	kvm_set_page_accessed(page);
+	put_page(page);
+}
+EXPORT_SYMBOL_GPL(kvm_release_page_clean);
+
+void kvm_release_page_dirty(struct page *page)
+{
+	if (!page)
+		return;
+
+	kvm_set_page_dirty(page);
+	kvm_release_page_clean(page);
+}
+EXPORT_SYMBOL_GPL(kvm_release_page_dirty);
+
 static inline int check_user_page_hwpoison(unsigned long addr)
 {
 	int rc, flags = FOLL_HWPOISON | FOLL_WRITE;
@@ -3125,37 +3166,6 @@ void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map, bool dirty)
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_unmap);
 
-static bool kvm_is_ad_tracked_page(struct page *page)
-{
-	/*
-	 * Per page-flags.h, pages tagged PG_reserved "should in general not be
-	 * touched (e.g. set dirty) except by its owner".
-	 */
-	return !PageReserved(page);
-}
-
-static void kvm_set_page_dirty(struct page *page)
-{
-	if (kvm_is_ad_tracked_page(page))
-		SetPageDirty(page);
-}
-
-static void kvm_set_page_accessed(struct page *page)
-{
-	if (kvm_is_ad_tracked_page(page))
-		mark_page_accessed(page);
-}
-
-void kvm_release_page_clean(struct page *page)
-{
-	if (!page)
-		return;
-
-	kvm_set_page_accessed(page);
-	put_page(page);
-}
-EXPORT_SYMBOL_GPL(kvm_release_page_clean);
-
 void kvm_release_pfn_clean(kvm_pfn_t pfn)
 {
 	struct page *page;
@@ -3171,16 +3181,6 @@ void kvm_release_pfn_clean(kvm_pfn_t pfn)
 }
 EXPORT_SYMBOL_GPL(kvm_release_pfn_clean);
 
-void kvm_release_page_dirty(struct page *page)
-{
-	if (!page)
-		return;
-
-	kvm_set_page_dirty(page);
-	kvm_release_page_clean(page);
-}
-EXPORT_SYMBOL_GPL(kvm_release_page_dirty);
-
 void kvm_release_pfn_dirty(kvm_pfn_t pfn)
 {
 	struct page *page;
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 27/84] KVM: pfncache: Precisely track refcounted pages
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (25 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 26/84] KVM: Move kvm_{set,release}_page_{clean,dirty}() helpers up in kvm_main.c Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 28/84] KVM: Migrate kvm_vcpu_map() to kvm_follow_pfn() Sean Christopherson
                   ` (58 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Track refcounted struct page memory using kvm_follow_pfn.refcounted_page
instead of relying on kvm_release_pfn_clean() to correctly detect that the
pfn is associated with a struct page.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 virt/kvm/pfncache.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 067daf9ad6ef..728d2c1b488a 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -159,11 +159,14 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
 	kvm_pfn_t new_pfn = KVM_PFN_ERR_FAULT;
 	void *new_khva = NULL;
 	unsigned long mmu_seq;
+	struct page *page;
+
 	struct kvm_follow_pfn kfp = {
 		.slot = gpc->memslot,
 		.gfn = gpa_to_gfn(gpc->gpa),
 		.flags = FOLL_WRITE,
 		.hva = gpc->uhva,
+		.refcounted_page = &page,
 	};
 
 	lockdep_assert_held(&gpc->refresh_lock);
@@ -198,7 +201,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
 			if (new_khva != old_khva)
 				gpc_unmap(new_pfn, new_khva);
 
-			kvm_release_pfn_clean(new_pfn);
+			kvm_release_page_unused(page);
 
 			cond_resched();
 		}
@@ -218,7 +221,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
 			new_khva = gpc_map(new_pfn);
 
 		if (!new_khva) {
-			kvm_release_pfn_clean(new_pfn);
+			kvm_release_page_unused(page);
 			goto out_error;
 		}
 
@@ -236,11 +239,11 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
 	gpc->khva = new_khva + offset_in_page(gpc->uhva);
 
 	/*
-	 * Put the reference to the _new_ pfn.  The pfn is now tracked by the
+	 * Put the reference to the _new_ page.  The page is now tracked by the
 	 * cache and can be safely migrated, swapped, etc... as the cache will
 	 * invalidate any mappings in response to relevant mmu_notifier events.
 	 */
-	kvm_release_pfn_clean(new_pfn);
+	kvm_release_page_clean(page);
 
 	return 0;
 
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 28/84] KVM: Migrate kvm_vcpu_map() to kvm_follow_pfn()
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (26 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 27/84] KVM: pfncache: Precisely track refcounted pages Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 29/84] KVM: Pin (as in FOLL_PIN) pages during kvm_vcpu_map() Sean Christopherson
                   ` (57 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

From: David Stevens <stevensd@chromium.org>

Migrate kvm_vcpu_map() to kvm_follow_pfn(), and have it track whether or
not the map holds a refcounted struct page.  Precisely tracking struct
page references will eventually allow removing kvm_pfn_to_refcounted_page()
and its various wrappers.

Signed-off-by: David Stevens <stevensd@chromium.org>
[sean: use a pointer instead of a boolean]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 include/linux/kvm_host.h |  2 +-
 virt/kvm/kvm_main.c      | 26 ++++++++++++++++----------
 2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a5dcb72bab00..8b5ac3305b05 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -280,6 +280,7 @@ struct kvm_host_map {
 	 * can be used as guest memory but they are not managed by host
 	 * kernel).
 	 */
+	struct page *refcounted_page;
 	struct page *page;
 	void *hva;
 	kvm_pfn_t pfn;
@@ -1223,7 +1224,6 @@ void kvm_release_pfn_dirty(kvm_pfn_t pfn);
 void kvm_set_pfn_dirty(kvm_pfn_t pfn);
 void kvm_set_pfn_accessed(kvm_pfn_t pfn);
 
-void kvm_release_pfn(kvm_pfn_t pfn, bool dirty);
 int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset,
 			int len);
 int kvm_read_guest(struct kvm *kvm, gpa_t gpa, void *data, unsigned long len);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 48b626f1b5f3..255cbed83b40 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3113,21 +3113,21 @@ struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
 }
 EXPORT_SYMBOL_GPL(gfn_to_page);
 
-void kvm_release_pfn(kvm_pfn_t pfn, bool dirty)
-{
-	if (dirty)
-		kvm_release_pfn_dirty(pfn);
-	else
-		kvm_release_pfn_clean(pfn);
-}
-
 int kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map)
 {
+	struct kvm_follow_pfn kfp = {
+		.slot = gfn_to_memslot(vcpu->kvm, gfn),
+		.gfn = gfn,
+		.flags = FOLL_WRITE,
+		.refcounted_page = &map->refcounted_page,
+	};
+
+	map->refcounted_page = NULL;
 	map->page = NULL;
 	map->hva = NULL;
 	map->gfn = gfn;
 
-	map->pfn = gfn_to_pfn(vcpu->kvm, gfn);
+	map->pfn = kvm_follow_pfn(&kfp);
 	if (is_error_noslot_pfn(map->pfn))
 		return -EINVAL;
 
@@ -3159,10 +3159,16 @@ void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map, bool dirty)
 	if (dirty)
 		kvm_vcpu_mark_page_dirty(vcpu, map->gfn);
 
-	kvm_release_pfn(map->pfn, dirty);
+	if (map->refcounted_page) {
+		if (dirty)
+			kvm_release_page_dirty(map->refcounted_page);
+		else
+			kvm_release_page_clean(map->refcounted_page);
+	}
 
 	map->hva = NULL;
 	map->page = NULL;
+	map->refcounted_page = NULL;
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_unmap);
 
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 29/84] KVM: Pin (as in FOLL_PIN) pages during kvm_vcpu_map()
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (27 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 28/84] KVM: Migrate kvm_vcpu_map() to kvm_follow_pfn() Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 30/84] KVM: nVMX: Mark vmcs12's APIC access page dirty when unmapping Sean Christopherson
                   ` (56 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Pin, as in FOLL_PIN, pages when mapping them for direct access by KVM.
As per Documentation/core-api/pin_user_pages.rst, writing to a page that
was gotten via FOLL_GET is explicitly disallowed.

  Correct (uses FOLL_PIN calls):
      pin_user_pages()
      write to the data within the pages
      unpin_user_pages()

  INCORRECT (uses FOLL_GET calls):
      get_user_pages()
      write to the data within the pages
      put_page()

Unfortunately, FOLL_PIN is a "private" flag, and so kvm_follow_pfn must
use a one-off bool instead of being able to piggyback the "flags" field.

Link: https://lwn.net/Articles/930667
Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 include/linux/kvm_host.h |  2 +-
 virt/kvm/kvm_main.c      | 54 +++++++++++++++++++++++++++++-----------
 virt/kvm/kvm_mm.h        |  7 ++++++
 3 files changed, 47 insertions(+), 16 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 8b5ac3305b05..3d4094ece479 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -280,7 +280,7 @@ struct kvm_host_map {
 	 * can be used as guest memory but they are not managed by host
 	 * kernel).
 	 */
-	struct page *refcounted_page;
+	struct page *pinned_page;
 	struct page *page;
 	void *hva;
 	kvm_pfn_t pfn;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 255cbed83b40..4a9b99c11355 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2824,9 +2824,12 @@ static kvm_pfn_t kvm_resolve_pfn(struct kvm_follow_pfn *kfp, struct page *page,
 	 */
 	if (pte) {
 		pfn = pte_pfn(*pte);
-		page = kvm_pfn_to_refcounted_page(pfn);
-		if (page && !get_page_unless_zero(page))
-			return KVM_PFN_ERR_FAULT;
+
+		if (!kfp->pin) {
+			page = kvm_pfn_to_refcounted_page(pfn);
+			if (page && !get_page_unless_zero(page))
+				return KVM_PFN_ERR_FAULT;
+		}
 	} else {
 		pfn = page_to_pfn(page);
 	}
@@ -2845,16 +2848,24 @@ static kvm_pfn_t kvm_resolve_pfn(struct kvm_follow_pfn *kfp, struct page *page,
 static bool hva_to_pfn_fast(struct kvm_follow_pfn *kfp, kvm_pfn_t *pfn)
 {
 	struct page *page;
+	bool r;
 
 	/*
-	 * Fast pin a writable pfn only if it is a write fault request
-	 * or the caller allows to map a writable pfn for a read fault
-	 * request.
+	 * Try the fast-only path when the caller wants to pin/get the page for
+	 * writing.  If the caller only wants to read the page, KVM must go
+	 * down the full, slow path in order to avoid racing an operation that
+	 * breaks Copy-on-Write (CoW), e.g. so that KVM doesn't end up pointing
+	 * at the old, read-only page while mm/ points at a new, writable page.
 	 */
 	if (!((kfp->flags & FOLL_WRITE) || kfp->map_writable))
 		return false;
 
-	if (get_user_page_fast_only(kfp->hva, FOLL_WRITE, &page)) {
+	if (kfp->pin)
+		r = pin_user_pages_fast(kfp->hva, 1, FOLL_WRITE, &page) == 1;
+	else
+		r = get_user_page_fast_only(kfp->hva, FOLL_WRITE, &page);
+
+	if (r) {
 		*pfn = kvm_resolve_pfn(kfp, page, NULL, true);
 		return true;
 	}
@@ -2883,10 +2894,21 @@ static int hva_to_pfn_slow(struct kvm_follow_pfn *kfp, kvm_pfn_t *pfn)
 	struct page *page, *wpage;
 	int npages;
 
-	npages = get_user_pages_unlocked(kfp->hva, 1, &page, flags);
+	if (kfp->pin)
+		npages = pin_user_pages_unlocked(kfp->hva, 1, &page, flags);
+	else
+		npages = get_user_pages_unlocked(kfp->hva, 1, &page, flags);
 	if (npages != 1)
 		return npages;
 
+	/*
+	 * Pinning is mutually exclusive with opportunistically mapping a read
+	 * fault as writable, as KVM should never pin pages when mapping memory
+	 * into the guest (pinning is only for direct accesses from KVM).
+	 */
+	if (WARN_ON_ONCE(kfp->map_writable && kfp->pin))
+		goto out;
+
 	/* map read fault as writable if possible */
 	if (!(flags & FOLL_WRITE) && kfp->map_writable &&
 	    get_user_page_fast_only(kfp->hva, FOLL_WRITE, &wpage)) {
@@ -2895,6 +2917,7 @@ static int hva_to_pfn_slow(struct kvm_follow_pfn *kfp, kvm_pfn_t *pfn)
 		flags |= FOLL_WRITE;
 	}
 
+out:
 	*pfn = kvm_resolve_pfn(kfp, page, NULL, flags & FOLL_WRITE);
 	return npages;
 }
@@ -3119,10 +3142,11 @@ int kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map)
 		.slot = gfn_to_memslot(vcpu->kvm, gfn),
 		.gfn = gfn,
 		.flags = FOLL_WRITE,
-		.refcounted_page = &map->refcounted_page,
+		.refcounted_page = &map->pinned_page,
+		.pin = true,
 	};
 
-	map->refcounted_page = NULL;
+	map->pinned_page = NULL;
 	map->page = NULL;
 	map->hva = NULL;
 	map->gfn = gfn;
@@ -3159,16 +3183,16 @@ void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map, bool dirty)
 	if (dirty)
 		kvm_vcpu_mark_page_dirty(vcpu, map->gfn);
 
-	if (map->refcounted_page) {
+	if (map->pinned_page) {
 		if (dirty)
-			kvm_release_page_dirty(map->refcounted_page);
-		else
-			kvm_release_page_clean(map->refcounted_page);
+			kvm_set_page_dirty(map->pinned_page);
+		kvm_set_page_accessed(map->pinned_page);
+		unpin_user_page(map->pinned_page);
 	}
 
 	map->hva = NULL;
 	map->page = NULL;
-	map->refcounted_page = NULL;
+	map->pinned_page = NULL;
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_unmap);
 
diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h
index d3ac1ba8ba66..acef3f5c582a 100644
--- a/virt/kvm/kvm_mm.h
+++ b/virt/kvm/kvm_mm.h
@@ -30,6 +30,13 @@ struct kvm_follow_pfn {
 	/* FOLL_* flags modifying lookup behavior, e.g. FOLL_WRITE. */
 	unsigned int flags;
 
+	/*
+	 * Pin the page (effectively FOLL_PIN, which is an mm/ internal flag).
+	 * The page *must* be pinned if KVM will write to the page via a kernel
+	 * mapping, e.g. via kmap(), mremap(), etc.
+	 */
+	bool pin;
+
 	/*
 	 * If non-NULL, try to get a writable mapping even for a read fault.
 	 * Set to true if a writable mapping was obtained.
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 30/84] KVM: nVMX: Mark vmcs12's APIC access page dirty when unmapping
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (28 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 29/84] KVM: Pin (as in FOLL_PIN) pages during kvm_vcpu_map() Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 31/84] KVM: Pass in write/dirty to kvm_vcpu_map(), not kvm_vcpu_unmap() Sean Christopherson
                   ` (55 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Mark the APIC access page as dirty when unmapping it from KVM.  The fact
that the page _shouldn't_ be written doesn't guarantee the page _won't_ be
written.  And while the contents are likely irrelevant, the values _are_
visible to the guest, i.e. dropping writes would be visible to the guest
(though obviously highly unlikely to be problematic in practice).

Marking the map dirty will allow specifying the write vs. read-only when
*mapping* the memory, which in turn will allow creating read-only maps.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 8d05d1d9f544..3096f6f5ecdb 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -318,12 +318,7 @@ static void nested_put_vmcs12_pages(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
-	/*
-	 * Unpin physical memory we referred to in the vmcs02.  The APIC access
-	 * page's backing page (yeah, confusing) shouldn't actually be accessed,
-	 * and if it is written, the contents are irrelevant.
-	 */
-	kvm_vcpu_unmap(vcpu, &vmx->nested.apic_access_page_map, false);
+	kvm_vcpu_unmap(vcpu, &vmx->nested.apic_access_page_map, true);
 	kvm_vcpu_unmap(vcpu, &vmx->nested.virtual_apic_map, true);
 	kvm_vcpu_unmap(vcpu, &vmx->nested.pi_desc_map, true);
 	vmx->nested.pi_desc = NULL;
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 31/84] KVM: Pass in write/dirty to kvm_vcpu_map(), not kvm_vcpu_unmap()
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (29 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 30/84] KVM: nVMX: Mark vmcs12's APIC access page dirty when unmapping Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 32/84] KVM: Get writable mapping for __kvm_vcpu_map() only when necessary Sean Christopherson
                   ` (54 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Now that all kvm_vcpu_{,un}map() users pass "true" for @dirty, have them
pass "true" as a @writable param to kvm_vcpu_map(), and thus create a
read-only mapping when possible.

Note, creating read-only mappings can be theoretically slower, as they
don't play nice with fast GUP due to the need to break CoW before mapping
the underlying PFN.  But practically speaking, creating a mapping isn't
a super hot path, and getting a writable mapping for reading is weird and
confusing.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/nested.c |  4 ++--
 arch/x86/kvm/svm/sev.c    |  2 +-
 arch/x86/kvm/svm/svm.c    |  8 ++++----
 arch/x86/kvm/vmx/nested.c | 16 ++++++++--------
 include/linux/kvm_host.h  | 20 ++++++++++++++++++--
 virt/kvm/kvm_main.c       | 12 +++++++-----
 6 files changed, 40 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 6f704c1037e5..23b3a228cd0a 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -922,7 +922,7 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
 	nested_svm_vmexit(svm);
 
 out:
-	kvm_vcpu_unmap(vcpu, &map, true);
+	kvm_vcpu_unmap(vcpu, &map);
 
 	return ret;
 }
@@ -1126,7 +1126,7 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 				       vmcb12->control.exit_int_info_err,
 				       KVM_ISA_SVM);
 
-	kvm_vcpu_unmap(vcpu, &map, true);
+	kvm_vcpu_unmap(vcpu, &map);
 
 	nested_svm_transition_tlb_flush(vcpu);
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index a16c873b3232..62f63fd714df 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3466,7 +3466,7 @@ void sev_es_unmap_ghcb(struct vcpu_svm *svm)
 
 	sev_es_sync_to_ghcb(svm);
 
-	kvm_vcpu_unmap(&svm->vcpu, &svm->sev_es.ghcb_map, true);
+	kvm_vcpu_unmap(&svm->vcpu, &svm->sev_es.ghcb_map);
 	svm->sev_es.ghcb = NULL;
 }
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index c115d26844f7..742a2cec04ce 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2299,7 +2299,7 @@ static int vmload_vmsave_interception(struct kvm_vcpu *vcpu, bool vmload)
 		svm_copy_vmloadsave_state(vmcb12, svm->vmcb);
 	}
 
-	kvm_vcpu_unmap(vcpu, &map, true);
+	kvm_vcpu_unmap(vcpu, &map);
 
 	return ret;
 }
@@ -4690,7 +4690,7 @@ static int svm_enter_smm(struct kvm_vcpu *vcpu, union kvm_smram *smram)
 	svm_copy_vmrun_state(map_save.hva + 0x400,
 			     &svm->vmcb01.ptr->save);
 
-	kvm_vcpu_unmap(vcpu, &map_save, true);
+	kvm_vcpu_unmap(vcpu, &map_save);
 	return 0;
 }
 
@@ -4750,9 +4750,9 @@ static int svm_leave_smm(struct kvm_vcpu *vcpu, const union kvm_smram *smram)
 	svm->nested.nested_run_pending = 1;
 
 unmap_save:
-	kvm_vcpu_unmap(vcpu, &map_save, true);
+	kvm_vcpu_unmap(vcpu, &map_save);
 unmap_map:
-	kvm_vcpu_unmap(vcpu, &map, true);
+	kvm_vcpu_unmap(vcpu, &map);
 	return ret;
 }
 
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 3096f6f5ecdb..f7dde74ff565 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -231,7 +231,7 @@ static inline void nested_release_evmcs(struct kvm_vcpu *vcpu)
 	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
-	kvm_vcpu_unmap(vcpu, &vmx->nested.hv_evmcs_map, true);
+	kvm_vcpu_unmap(vcpu, &vmx->nested.hv_evmcs_map);
 	vmx->nested.hv_evmcs = NULL;
 	vmx->nested.hv_evmcs_vmptr = EVMPTR_INVALID;
 
@@ -318,9 +318,9 @@ static void nested_put_vmcs12_pages(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
-	kvm_vcpu_unmap(vcpu, &vmx->nested.apic_access_page_map, true);
-	kvm_vcpu_unmap(vcpu, &vmx->nested.virtual_apic_map, true);
-	kvm_vcpu_unmap(vcpu, &vmx->nested.pi_desc_map, true);
+	kvm_vcpu_unmap(vcpu, &vmx->nested.apic_access_page_map);
+	kvm_vcpu_unmap(vcpu, &vmx->nested.virtual_apic_map);
+	kvm_vcpu_unmap(vcpu, &vmx->nested.pi_desc_map);
 	vmx->nested.pi_desc = NULL;
 }
 
@@ -624,7 +624,7 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
 	int msr;
 	unsigned long *msr_bitmap_l1;
 	unsigned long *msr_bitmap_l0 = vmx->nested.vmcs02.msr_bitmap;
-	struct kvm_host_map msr_bitmap_map;
+	struct kvm_host_map map;
 
 	/* Nothing to do if the MSR bitmap is not in use.  */
 	if (!cpu_has_vmx_msr_bitmap() ||
@@ -647,10 +647,10 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
 			return true;
 	}
 
-	if (kvm_vcpu_map(vcpu, gpa_to_gfn(vmcs12->msr_bitmap), &msr_bitmap_map))
+	if (kvm_vcpu_map_readonly(vcpu, gpa_to_gfn(vmcs12->msr_bitmap), &map))
 		return false;
 
-	msr_bitmap_l1 = (unsigned long *)msr_bitmap_map.hva;
+	msr_bitmap_l1 = (unsigned long *)map.hva;
 
 	/*
 	 * To keep the control flow simple, pay eight 8-byte writes (sixteen
@@ -714,7 +714,7 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
 	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
 					 MSR_IA32_FLUSH_CMD, MSR_TYPE_W);
 
-	kvm_vcpu_unmap(vcpu, &msr_bitmap_map, false);
+	kvm_vcpu_unmap(vcpu, &map);
 
 	vmx->nested.force_msr_bitmap_recalc = false;
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 3d4094ece479..82ca0971c156 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -285,6 +285,7 @@ struct kvm_host_map {
 	void *hva;
 	kvm_pfn_t pfn;
 	kvm_pfn_t gfn;
+	bool writable;
 };
 
 /*
@@ -1297,8 +1298,23 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
 struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu);
 struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn);
 kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn);
-int kvm_vcpu_map(struct kvm_vcpu *vcpu, gpa_t gpa, struct kvm_host_map *map);
-void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map, bool dirty);
+
+int __kvm_vcpu_map(struct kvm_vcpu *vcpu, gpa_t gpa, struct kvm_host_map *map,
+		   bool writable);
+void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map);
+
+static inline int kvm_vcpu_map(struct kvm_vcpu *vcpu, gpa_t gpa,
+			       struct kvm_host_map *map)
+{
+	return __kvm_vcpu_map(vcpu, gpa, map, true);
+}
+
+static inline int kvm_vcpu_map_readonly(struct kvm_vcpu *vcpu, gpa_t gpa,
+					struct kvm_host_map *map)
+{
+	return __kvm_vcpu_map(vcpu, gpa, map, false);
+}
+
 unsigned long kvm_vcpu_gfn_to_hva(struct kvm_vcpu *vcpu, gfn_t gfn);
 unsigned long kvm_vcpu_gfn_to_hva_prot(struct kvm_vcpu *vcpu, gfn_t gfn, bool *writable);
 int kvm_vcpu_read_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn, void *data, int offset,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4a9b99c11355..a46c7bf1f902 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3136,7 +3136,8 @@ struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
 }
 EXPORT_SYMBOL_GPL(gfn_to_page);
 
-int kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map)
+int __kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map,
+		   bool writable)
 {
 	struct kvm_follow_pfn kfp = {
 		.slot = gfn_to_memslot(vcpu->kvm, gfn),
@@ -3150,6 +3151,7 @@ int kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map)
 	map->page = NULL;
 	map->hva = NULL;
 	map->gfn = gfn;
+	map->writable = writable;
 
 	map->pfn = kvm_follow_pfn(&kfp);
 	if (is_error_noslot_pfn(map->pfn))
@@ -3166,9 +3168,9 @@ int kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map)
 
 	return map->hva ? 0 : -EFAULT;
 }
-EXPORT_SYMBOL_GPL(kvm_vcpu_map);
+EXPORT_SYMBOL_GPL(__kvm_vcpu_map);
 
-void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map, bool dirty)
+void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map)
 {
 	if (!map->hva)
 		return;
@@ -3180,11 +3182,11 @@ void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map, bool dirty)
 		memunmap(map->hva);
 #endif
 
-	if (dirty)
+	if (map->writable)
 		kvm_vcpu_mark_page_dirty(vcpu, map->gfn);
 
 	if (map->pinned_page) {
-		if (dirty)
+		if (map->writable)
 			kvm_set_page_dirty(map->pinned_page);
 		kvm_set_page_accessed(map->pinned_page);
 		unpin_user_page(map->pinned_page);
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 32/84] KVM: Get writable mapping for __kvm_vcpu_map() only when necessary
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (30 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 31/84] KVM: Pass in write/dirty to kvm_vcpu_map(), not kvm_vcpu_unmap() Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 33/84] KVM: Disallow direct access (w/o mmu_notifier) to unpinned pfn by default Sean Christopherson
                   ` (53 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

When creating a memory map for read, don't request a writable pfn from the
primary MMU.  While creating read-only mappings can be theoretically slower,
as they don't play nice with fast GUP due to the need to break CoW before
mapping the underlying PFN, practically speaking, creating a mapping isn't
a super hot path, and getting a writable mapping for reading is weird and
confusing.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 virt/kvm/kvm_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a46c7bf1f902..a28479629488 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3142,7 +3142,7 @@ int __kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map,
 	struct kvm_follow_pfn kfp = {
 		.slot = gfn_to_memslot(vcpu->kvm, gfn),
 		.gfn = gfn,
-		.flags = FOLL_WRITE,
+		.flags = writable ? FOLL_WRITE : 0,
 		.refcounted_page = &map->pinned_page,
 		.pin = true,
 	};
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 33/84] KVM: Disallow direct access (w/o mmu_notifier) to unpinned pfn by default
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (31 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 32/84] KVM: Get writable mapping for __kvm_vcpu_map() only when necessary Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 34/84] KVM: Add a helper to lookup a pfn without grabbing a reference Sean Christopherson
                   ` (52 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Add an off-by-default module param to control whether or not KVM is allowed
to map memory that isn't pinned, i.e. that KVM can't guarantee won't be
freed while it is mapped into KVM and/or the guest.  Don't remove the
functionality entirely, as there are use cases where mapping unpinned
memory is safe (as defined by the platform owner), e.g. when memory is
hidden from the kernel and managed by userspace, in which case userspace
is already fully trusted to not muck with guest memory mappings.

But for more typical setups, mapping unpinned memory is wildly unsafe, and
unnecessary.  The APIs are used exclusively by x86's nested virtualization
support, and there is no known (or sane) use case for mapping PFN-mapped
memory a KVM guest _and_ letting the guest use it for virtualization
structures.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 virt/kvm/kvm_main.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a28479629488..0b3c0bddaa07 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -94,6 +94,13 @@ unsigned int halt_poll_ns_shrink = 2;
 module_param(halt_poll_ns_shrink, uint, 0644);
 EXPORT_SYMBOL_GPL(halt_poll_ns_shrink);
 
+/*
+ * Allow direct access (from KVM or the CPU) without MMU notifier protection
+ * to unpinned pages.
+ */
+static bool allow_unsafe_mappings;
+module_param(allow_unsafe_mappings, bool, 0444);
+
 /*
  * Ordering of locks:
  *
@@ -2821,6 +2828,9 @@ static kvm_pfn_t kvm_resolve_pfn(struct kvm_follow_pfn *kfp, struct page *page,
 	 * reference to such pages would cause KVM to prematurely free a page
 	 * it doesn't own (KVM gets and puts the one and only reference).
 	 * Don't allow those pages until the FIXME is resolved.
+	 *
+	 * Don't grab a reference for pins, callers that pin pages are required
+	 * to check refcounted_page, i.e. must not blindly release the pfn.
 	 */
 	if (pte) {
 		pfn = pte_pfn(*pte);
@@ -2942,6 +2952,14 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 	bool write_fault = kfp->flags & FOLL_WRITE;
 	int r;
 
+	/*
+	 * Remapped memory cannot be pinned in any meaningful sense.  Bail if
+	 * the caller wants to pin the page, i.e. access the page outside of
+	 * MMU notifier protection, and unsafe umappings are disallowed.
+	 */
+	if (kfp->pin && !allow_unsafe_mappings)
+		return -EINVAL;
+
 	r = follow_pte(vma, kfp->hva, &ptep, &ptl);
 	if (r) {
 		/*
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 34/84] KVM: Add a helper to lookup a pfn without grabbing a reference
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (32 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 33/84] KVM: Disallow direct access (w/o mmu_notifier) to unpinned pfn by default Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-30 10:41   ` Paolo Bonzini
  2024-07-26 23:51 ` [PATCH v12 35/84] KVM: x86: Use kvm_lookup_pfn() to check if retrying #PF is useful Sean Christopherson
                   ` (51 subsequent siblings)
  85 siblings, 1 reply; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Add a kvm_follow_pfn() wrapper, kvm_lookup_pfn(), to allow looking up a
gfn=>pfn mapping without the caller getting a reference to any underlying
page.  The API will be used in flows that want to know if a gfn points at
a valid pfn, but don't actually need to do anything with the pfn.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 include/linux/kvm_host.h |  2 ++
 virt/kvm/kvm_main.c      | 16 ++++++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 82ca0971c156..5a572cef4adc 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1212,6 +1212,8 @@ static inline void kvm_release_page_unused(struct page *page)
 void kvm_release_page_clean(struct page *page);
 void kvm_release_page_dirty(struct page *page);
 
+kvm_pfn_t kvm_lookup_pfn(struct kvm *kvm, gfn_t gfn);
+
 kvm_pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn);
 kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
 		      bool *writable);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 0b3c0bddaa07..ad84dab8c5dc 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3118,6 +3118,22 @@ kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn)
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_pfn);
 
+kvm_pfn_t kvm_lookup_pfn(struct kvm *kvm, gfn_t gfn)
+{
+	struct page *refcounted_page = NULL;
+	struct kvm_follow_pfn kfp = {
+		.slot = gfn_to_memslot(kvm, gfn),
+		.gfn = gfn,
+		.flags = FOLL_WRITE,
+		.refcounted_page = &refcounted_page,
+	};
+	kvm_pfn_t pfn;
+
+	pfn = kvm_follow_pfn(&kfp);
+	kvm_release_page_unused(refcounted_page);
+	return pfn;
+}
+
 int kvm_prefetch_pages(struct kvm_memory_slot *slot, gfn_t gfn,
 		       struct page **pages, int nr_pages)
 {
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 35/84] KVM: x86: Use kvm_lookup_pfn() to check if retrying #PF is useful
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (33 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 34/84] KVM: Add a helper to lookup a pfn without grabbing a reference Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 36/84] KVM: x86: Use kvm_lookup_pfn() to check if APIC access page was installed Sean Christopherson
                   ` (50 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Use kvm_lookup_pfn() instead of an open coded equivalent when checking to
see if KVM should exit to userspace or re-enter the guest after failed
instruction emulation triggered by a guest page fault.

Note, there is a small functional change as kvm_lookup_pfn() doesn't mark
the page as accessed, whereas kvm_release_pfn_clean() does mark the page
accessed (if the pfn is backed by a refcounted struct page).  Neither
behavior is wrong per se, e.g. querying the gfn=>pfn mapping doesn't
actually access the page, but the guest _did_ access the gfn, otherwise
the fault wouldn't have occurred.

That said, either KVM will exit to userspace and the guest will likely be
terminated, or KVM will re-enter the guest and, barring weirdness in the
guest, the guest will re-access the gfn, and KVM will fault-in the pfn and
mark it accessed.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index af6c8cf6a37a..59501ad6e7f5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8867,7 +8867,6 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 				  int emulation_type)
 {
 	gpa_t gpa = cr2_or_gpa;
-	kvm_pfn_t pfn;
 
 	if (!(emulation_type & EMULTYPE_ALLOW_RETRY_PF))
 		return false;
@@ -8892,22 +8891,15 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	}
 
 	/*
-	 * Do not retry the unhandleable instruction if it faults on the
-	 * readonly host memory, otherwise it will goto a infinite loop:
+	 * Do not retry the unhandleable instruction if emulation was triggered
+	 * for emulated MMIO, e.g. by a readonly memslot or lack of a memslot,
+	 * otherwise KVM will send the vCPU into an infinite loop:
 	 * retry instruction -> write #PF -> emulation fail -> retry
 	 * instruction -> ...
 	 */
-	pfn = gfn_to_pfn(vcpu->kvm, gpa_to_gfn(gpa));
-
-	/*
-	 * If the instruction failed on the error pfn, it can not be fixed,
-	 * report the error to userspace.
-	 */
-	if (is_error_noslot_pfn(pfn))
+	if (is_error_noslot_pfn(kvm_lookup_pfn(vcpu->kvm, gpa_to_gfn(gpa))))
 		return false;
 
-	kvm_release_pfn_clean(pfn);
-
 	/*
 	 * If emulation may have been triggered by a write to a shadowed page
 	 * table, unprotect the gfn (zap any relevant SPTEs) and re-enter the
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 36/84] KVM: x86: Use kvm_lookup_pfn() to check if APIC access page was installed
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (34 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 35/84] KVM: x86: Use kvm_lookup_pfn() to check if retrying #PF is useful Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 37/84] KVM: x86/mmu: Add "mmu" prefix fault-in helpers to free up generic names Sean Christopherson
                   ` (49 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Use kvm_lookup_pfn() to verify that the APIC access page was allocated and
installed as expected.  The mapping is controlled by KVM, i.e. it's
guaranteed to be backed by struct page, the purpose of the check is purely
to ensure the page is allocated, i.e. that KVM doesn't point the guest at
garbage.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/lapic.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 6d65b36fac29..88dc43660d23 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2612,8 +2612,8 @@ void kvm_apic_update_apicv(struct kvm_vcpu *vcpu)
 
 int kvm_alloc_apic_access_page(struct kvm *kvm)
 {
-	struct page *page;
 	void __user *hva;
+	kvm_pfn_t pfn;
 	int ret = 0;
 
 	mutex_lock(&kvm->slots_lock);
@@ -2628,17 +2628,16 @@ int kvm_alloc_apic_access_page(struct kvm *kvm)
 		goto out;
 	}
 
-	page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
-	if (!page) {
-		ret = -EFAULT;
-		goto out;
-	}
-
 	/*
 	 * Do not pin the page in memory, so that memory hot-unplug
 	 * is able to migrate it.
 	 */
-	put_page(page);
+	pfn = kvm_lookup_pfn(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
+	if (is_error_noslot_pfn(pfn)) {
+		ret = -EFAULT;
+		goto out;
+	}
+
 	kvm->arch.apic_access_memslot_enabled = true;
 out:
 	mutex_unlock(&kvm->slots_lock);
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 37/84] KVM: x86/mmu: Add "mmu" prefix fault-in helpers to free up generic names
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (35 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 36/84] KVM: x86: Use kvm_lookup_pfn() to check if APIC access page was installed Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 38/84] KVM: x86/mmu: Put direct prefetched pages via kvm_release_page_clean() Sean Christopherson
                   ` (48 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Prefix x86's faultin_pfn helpers with "mmu" so that the mmu-less names can
be used by common KVM for similar APIs.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 19 ++++++++++---------
 arch/x86/kvm/mmu/mmu_internal.h |  2 +-
 arch/x86/kvm/mmu/paging_tmpl.h  |  2 +-
 3 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a201b56728ae..4d30920f653d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4301,8 +4301,8 @@ static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
 	return req_max_level;
 }
 
-static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
-				   struct kvm_page_fault *fault)
+static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
+				       struct kvm_page_fault *fault)
 {
 	int max_order, r;
 
@@ -4325,10 +4325,11 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
 	return RET_PF_CONTINUE;
 }
 
-static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
+static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
+				 struct kvm_page_fault *fault)
 {
 	if (fault->is_private)
-		return kvm_faultin_pfn_private(vcpu, fault);
+		return kvm_mmu_faultin_pfn_private(vcpu, fault);
 
 	fault->pfn = __gfn_to_pfn_memslot(fault->slot, fault->gfn, false, true,
 					  fault->write, &fault->map_writable);
@@ -4363,8 +4364,8 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	return RET_PF_CONTINUE;
 }
 
-static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
-			   unsigned int access)
+static int kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
+			       struct kvm_page_fault *fault, unsigned int access)
 {
 	struct kvm_memory_slot *slot = fault->slot;
 	int ret;
@@ -4447,7 +4448,7 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 	if (mmu_invalidate_retry_gfn_unsafe(vcpu->kvm, fault->mmu_seq, fault->gfn))
 		return RET_PF_RETRY;
 
-	ret = __kvm_faultin_pfn(vcpu, fault);
+	ret = __kvm_mmu_faultin_pfn(vcpu, fault);
 	if (ret != RET_PF_CONTINUE)
 		return ret;
 
@@ -4524,7 +4525,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	if (r)
 		return r;
 
-	r = kvm_faultin_pfn(vcpu, fault, ACC_ALL);
+	r = kvm_mmu_faultin_pfn(vcpu, fault, ACC_ALL);
 	if (r != RET_PF_CONTINUE)
 		return r;
 
@@ -4617,7 +4618,7 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
 	if (r)
 		return r;
 
-	r = kvm_faultin_pfn(vcpu, fault, ACC_ALL);
+	r = kvm_mmu_faultin_pfn(vcpu, fault, ACC_ALL);
 	if (r != RET_PF_CONTINUE)
 		return r;
 
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index f67396c435df..a5113347bb12 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -235,7 +235,7 @@ struct kvm_page_fault {
 	/* The memslot containing gfn. May be NULL. */
 	struct kvm_memory_slot *slot;
 
-	/* Outputs of kvm_faultin_pfn.  */
+	/* Outputs of kvm_mmu_faultin_pfn().  */
 	unsigned long mmu_seq;
 	kvm_pfn_t pfn;
 	bool map_writable;
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index bc801d454f41..b02d0abfca68 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -811,7 +811,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	if (r)
 		return r;
 
-	r = kvm_faultin_pfn(vcpu, fault, walker.pte_access);
+	r = kvm_mmu_faultin_pfn(vcpu, fault, walker.pte_access);
 	if (r != RET_PF_CONTINUE)
 		return r;
 
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 38/84] KVM: x86/mmu: Put direct prefetched pages via kvm_release_page_clean()
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (36 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 37/84] KVM: x86/mmu: Add "mmu" prefix fault-in helpers to free up generic names Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 39/84] KVM: x86/mmu: Add common helper to handle prefetching SPTEs Sean Christopherson
                   ` (47 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Use kvm_release_page_clean() to put prefeteched pages instead of calling
put_page() directly.  This will allow de-duplicating the prefetch code
between indirect and direct MMUs.

Note, there's a small functional change as kvm_release_page_clean() marks
the page/folio as accessed.  While it's not strictly guaranteed that the
guest will access the page, KVM won't intercept guest accesses, i.e. won't
mark the page accessed if it _is_ accessed by the guest (unless A/D bits
are disabled, but running without A/D bits is effectively limited to
pre-HSW Intel CPUs).

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4d30920f653d..0def1444c01c 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2919,7 +2919,7 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
 	for (i = 0; i < ret; i++, gfn++, start++) {
 		mmu_set_spte(vcpu, slot, start, access, gfn,
 			     page_to_pfn(pages[i]), NULL);
-		put_page(pages[i]);
+		kvm_release_page_clean(pages[i]);
 	}
 
 	return 0;
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 39/84] KVM: x86/mmu: Add common helper to handle prefetching SPTEs
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (37 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 38/84] KVM: x86/mmu: Put direct prefetched pages via kvm_release_page_clean() Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 40/84] KVM: x86/mmu: Add helper to "finish" handling a guest page fault Sean Christopherson
                   ` (46 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Deduplicate the prefetching code for indirect and direct MMUs.  The core
logic is the same, the only difference is that indirect MMUs need to
prefetch SPTEs one-at-a-time, as contiguous guest virtual addresses aren't
guaranteed to yield contiguous guest physical addresses.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c         | 40 +++++++++++++++++++++-------------
 arch/x86/kvm/mmu/paging_tmpl.h | 13 +----------
 2 files changed, 26 insertions(+), 27 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 0def1444c01c..e76f64f55c4a 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2897,32 +2897,41 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot,
 	return ret;
 }
 
-static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
-				    struct kvm_mmu_page *sp,
-				    u64 *start, u64 *end)
+static bool kvm_mmu_prefetch_sptes(struct kvm_vcpu *vcpu, gfn_t gfn, u64 *sptep,
+				   int nr_pages, unsigned int access)
 {
 	struct page *pages[PTE_PREFETCH_NUM];
 	struct kvm_memory_slot *slot;
-	unsigned int access = sp->role.access;
-	int i, ret;
-	gfn_t gfn;
+	int i;
+
+	if (WARN_ON_ONCE(nr_pages > PTE_PREFETCH_NUM))
+		return false;
 
-	gfn = kvm_mmu_page_get_gfn(sp, spte_index(start));
 	slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, access & ACC_WRITE_MASK);
 	if (!slot)
-		return -1;
+		return false;
 
-	ret = kvm_prefetch_pages(slot, gfn, pages, end - start);
-	if (ret <= 0)
-		return -1;
+	nr_pages = kvm_prefetch_pages(slot, gfn, pages, nr_pages);
+	if (nr_pages <= 0)
+		return false;
 
-	for (i = 0; i < ret; i++, gfn++, start++) {
-		mmu_set_spte(vcpu, slot, start, access, gfn,
+	for (i = 0; i < nr_pages; i++, gfn++, sptep++) {
+		mmu_set_spte(vcpu, slot, sptep, access, gfn,
 			     page_to_pfn(pages[i]), NULL);
 		kvm_release_page_clean(pages[i]);
 	}
 
-	return 0;
+	return true;
+}
+
+static bool direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
+				     struct kvm_mmu_page *sp,
+				     u64 *start, u64 *end)
+{
+	gfn_t gfn = kvm_mmu_page_get_gfn(sp, spte_index(start));
+	unsigned int access = sp->role.access;
+
+	return kvm_mmu_prefetch_sptes(vcpu, gfn, start, end - start, access);
 }
 
 static void __direct_pte_prefetch(struct kvm_vcpu *vcpu,
@@ -2940,8 +2949,9 @@ static void __direct_pte_prefetch(struct kvm_vcpu *vcpu,
 		if (is_shadow_present_pte(*spte) || spte == sptep) {
 			if (!start)
 				continue;
-			if (direct_pte_prefetch_many(vcpu, sp, start, spte) < 0)
+			if (!direct_pte_prefetch_many(vcpu, sp, start, spte))
 				return;
+
 			start = NULL;
 		} else if (!start)
 			start = spte;
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index b02d0abfca68..e1c2f098d9d5 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -533,9 +533,7 @@ static bool
 FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 		     u64 *spte, pt_element_t gpte)
 {
-	struct kvm_memory_slot *slot;
 	unsigned pte_access;
-	struct page *page;
 	gfn_t gfn;
 
 	if (FNAME(prefetch_invalid_gpte)(vcpu, sp, spte, gpte))
@@ -545,16 +543,7 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	pte_access = sp->role.access & FNAME(gpte_access)(gpte);
 	FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte);
 
-	slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, pte_access & ACC_WRITE_MASK);
-	if (!slot)
-		return false;
-
-	if (kvm_prefetch_pages(slot, gfn, &page, 1) != 1)
-		return false;
-
-	mmu_set_spte(vcpu, slot, spte, pte_access, gfn, page_to_pfn(page), NULL);
-	kvm_release_page_clean(page);
-	return true;
+	return kvm_mmu_prefetch_sptes(vcpu, gfn, spte, 1, pte_access);
 }
 
 static bool FNAME(gpte_changed)(struct kvm_vcpu *vcpu,
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 40/84] KVM: x86/mmu: Add helper to "finish" handling a guest page fault
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (38 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 39/84] KVM: x86/mmu: Add common helper to handle prefetching SPTEs Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 41/84] KVM: x86/mmu: Mark pages/folios dirty at the origin of make_spte() Sean Christopherson
                   ` (45 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Add a helper to finish/complete the handling of a guest page, e.g. to
mark the pages accessed and put any held references.  In the near
future, this will allow improving the logic without having to copy+paste
changes into all page fault paths.  And in the less near future, will
allow sharing the "finish" API across all architectures.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c         | 12 +++++++++---
 arch/x86/kvm/mmu/paging_tmpl.h |  2 +-
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e76f64f55c4a..1cdd67707461 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4311,6 +4311,12 @@ static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
 	return req_max_level;
 }
 
+static void kvm_mmu_finish_page_fault(struct kvm_vcpu *vcpu,
+				      struct kvm_page_fault *fault, int r)
+{
+	kvm_release_pfn_clean(fault->pfn);
+}
+
 static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
 				       struct kvm_page_fault *fault)
 {
@@ -4476,7 +4482,7 @@ static int kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
 	 * mmu_lock is acquired.
 	 */
 	if (mmu_invalidate_retry_gfn_unsafe(vcpu->kvm, fault->mmu_seq, fault->gfn)) {
-		kvm_release_pfn_clean(fault->pfn);
+		kvm_mmu_finish_page_fault(vcpu, fault, RET_PF_RETRY);
 		return RET_PF_RETRY;
 	}
 
@@ -4552,8 +4558,8 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	r = direct_map(vcpu, fault);
 
 out_unlock:
+	kvm_mmu_finish_page_fault(vcpu, fault, r);
 	write_unlock(&vcpu->kvm->mmu_lock);
-	kvm_release_pfn_clean(fault->pfn);
 	return r;
 }
 
@@ -4641,8 +4647,8 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
 	r = kvm_tdp_mmu_map(vcpu, fault);
 
 out_unlock:
+	kvm_mmu_finish_page_fault(vcpu, fault, r);
 	read_unlock(&vcpu->kvm->mmu_lock);
-	kvm_release_pfn_clean(fault->pfn);
 	return r;
 }
 #endif
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index e1c2f098d9d5..b6897916c76b 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -835,8 +835,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	r = FNAME(fetch)(vcpu, fault, &walker);
 
 out_unlock:
+	kvm_mmu_finish_page_fault(vcpu, fault, r);
 	write_unlock(&vcpu->kvm->mmu_lock);
-	kvm_release_pfn_clean(fault->pfn);
 	return r;
 }
 
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 41/84] KVM: x86/mmu: Mark pages/folios dirty at the origin of make_spte()
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (39 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 40/84] KVM: x86/mmu: Add helper to "finish" handling a guest page fault Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-30  8:57   ` Paolo Bonzini
  2024-07-26 23:51 ` [PATCH v12 42/84] KVM: Move declarations of memslot accessors up in kvm_host.h Sean Christopherson
                   ` (44 subsequent siblings)
  85 siblings, 1 reply; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Move the marking of folios dirty from make_spte() out to its callers,
which have access to the _struct page_, not just the underlying pfn.
Once all architectures follow suit, this will allow removing KVM's ugly
hack where KVM elevates the refcount of VM_MIXEDMAP pfns that happen to
be struct page memory.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c         | 29 +++++++++++++++++++++++++++--
 arch/x86/kvm/mmu/paging_tmpl.h |  5 +++++
 arch/x86/kvm/mmu/spte.c        | 11 -----------
 3 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 1cdd67707461..7e7b855ce1e1 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2918,7 +2918,16 @@ static bool kvm_mmu_prefetch_sptes(struct kvm_vcpu *vcpu, gfn_t gfn, u64 *sptep,
 	for (i = 0; i < nr_pages; i++, gfn++, sptep++) {
 		mmu_set_spte(vcpu, slot, sptep, access, gfn,
 			     page_to_pfn(pages[i]), NULL);
-		kvm_release_page_clean(pages[i]);
+
+		/*
+		 * KVM always prefetches writable pages from the primary MMU,
+		 * and KVM can make its SPTE writable in the fast page, without
+		 * notifying the primary MMU.  Mark pages/folios dirty now to
+		 * ensure file data is written back if it ends up being written
+		 * by the guest.  Because KVM's prefetching GUPs writable PTEs,
+		 * the probability of unnecessary writeback is extremely low.
+		 */
+		kvm_release_page_dirty(pages[i]);
 	}
 
 	return true;
@@ -4314,7 +4323,23 @@ static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
 static void kvm_mmu_finish_page_fault(struct kvm_vcpu *vcpu,
 				      struct kvm_page_fault *fault, int r)
 {
-	kvm_release_pfn_clean(fault->pfn);
+	lockdep_assert_once(lockdep_is_held(&vcpu->kvm->mmu_lock) ||
+			    r == RET_PF_RETRY);
+
+	/*
+	 * If the page that KVM got from the *primary MMU* is writable, and KVM
+	 * installed or reused a SPTE, mark the page/folio dirty.  Note, this
+	 * may mark a folio dirty even if KVM created a read-only SPTE, e.g. if
+	 * the GFN is write-protected.  Folios can't be safely marked dirty
+	 * outside of mmu_lock as doing so could race with writeback on the
+	 * folio.  As a result, KVM can't mark folios dirty in the fast page
+	 * fault handler, and so KVM must (somewhat) speculatively mark the
+	 * folio dirty if KVM could locklessly make the SPTE writable.
+	 */
+	if (!fault->map_writable || r == RET_PF_RETRY)
+		kvm_release_pfn_clean(fault->pfn);
+	else
+		kvm_release_pfn_dirty(fault->pfn);
 }
 
 static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index b6897916c76b..2e2d87a925ac 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -953,6 +953,11 @@ static int FNAME(sync_spte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, int
 		  spte_to_pfn(spte), spte, true, false,
 		  host_writable, &spte);
 
+	/*
+	 * There is no need to mark the pfn dirty, as the new protections must
+	 * be a subset of the old protections, i.e. synchronizing a SPTE cannot
+	 * change the SPTE from read-only to writable.
+	 */
 	return mmu_spte_update(sptep, spte);
 }
 
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 9b8795bd2f04..2c5650390d3b 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -277,17 +277,6 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 		mark_page_dirty_in_slot(vcpu->kvm, slot, gfn);
 	}
 
-	/*
-	 * If the page that KVM got from the primary MMU is writable, i.e. if
-	 * it's host-writable, mark the page/folio dirty.  As alluded to above,
-	 * folios can't be safely marked dirty in the fast page fault handler,
-	 * and so KVM must (somewhat) speculatively mark the folio dirty even
-	 * though it isn't guaranteed to be written as KVM won't mark the folio
-	 * dirty if/when the SPTE is made writable.
-	 */
-	if (host_writable)
-		kvm_set_pfn_dirty(pfn);
-
 	*new_spte = spte;
 	return wrprot;
 }
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 42/84] KVM: Move declarations of memslot accessors up in kvm_host.h
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (40 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 41/84] KVM: x86/mmu: Mark pages/folios dirty at the origin of make_spte() Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 43/84] KVM: Add kvm_faultin_pfn() to specifically service guest page faults Sean Christopherson
                   ` (43 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Move the memslot lookup helpers further up in kvm_host.h so that they can
be used by inlined "to pfn" wrappers.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 include/linux/kvm_host.h | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 5a572cef4adc..ef0277b77375 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1153,6 +1153,10 @@ static inline bool kvm_memslot_iter_is_valid(struct kvm_memslot_iter *iter, gfn_
 	     kvm_memslot_iter_is_valid(iter, end);			\
 	     kvm_memslot_iter_next(iter))
 
+struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn);
+struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu);
+struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn);
+
 /*
  * KVM_SET_USER_MEMORY_REGION ioctl allows the following operations:
  * - create a new memory slot
@@ -1290,15 +1294,13 @@ int kvm_gfn_to_hva_cache_init(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
 })
 
 int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len);
-struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn);
 bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn);
 bool kvm_vcpu_is_visible_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
 unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn);
 void mark_page_dirty_in_slot(struct kvm *kvm, const struct kvm_memory_slot *memslot, gfn_t gfn);
 void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
 
-struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu);
-struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn);
+
 kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn);
 
 int __kvm_vcpu_map(struct kvm_vcpu *vcpu, gpa_t gpa, struct kvm_host_map *map,
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 43/84] KVM: Add kvm_faultin_pfn() to specifically service guest page faults
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (41 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 42/84] KVM: Move declarations of memslot accessors up in kvm_host.h Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 44/84] KVM: x86/mmu: Convert page fault paths to kvm_faultin_pfn() Sean Christopherson
                   ` (42 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Add a new dedicated API, kvm_faultin_pfn(), for servicing guest page
faults, i.e. for getting pages/pfns that will be mapped into the guest via
an mmu_notifier-protected KVM MMU.  Keep struct kvm_follow_pfn buried in
internal code, as having __kvm_faultin_pfn() take "out" params is actually
cleaner for several architectures, e.g. it allows the caller to have its
own "page fault" structure without having to marshal data to/from
kvm_follow_pfn.

Long term, common KVM would ideally provide a kvm_page_fault structure, a
la x86's struct of the same name.  But all architectures need to be
converted to a common API before that can happen.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 include/linux/kvm_host.h | 11 +++++++++++
 virt/kvm/kvm_main.c      | 22 ++++++++++++++++++++++
 2 files changed, 33 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ef0277b77375..e0548ae92659 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1217,6 +1217,17 @@ void kvm_release_page_clean(struct page *page);
 void kvm_release_page_dirty(struct page *page);
 
 kvm_pfn_t kvm_lookup_pfn(struct kvm *kvm, gfn_t gfn);
+kvm_pfn_t __kvm_faultin_pfn(const struct kvm_memory_slot *slot, gfn_t gfn,
+			    unsigned int foll, bool *writable,
+			    struct page **refcounted_page);
+
+static inline kvm_pfn_t kvm_faultin_pfn(struct kvm_vcpu *vcpu, gfn_t gfn,
+					bool write, bool *writable,
+					struct page **refcounted_page)
+{
+	return __kvm_faultin_pfn(kvm_vcpu_gfn_to_memslot(vcpu, gfn), gfn,
+				 write ? FOLL_WRITE : 0, writable, refcounted_page);
+}
 
 kvm_pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn);
 kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ad84dab8c5dc..6dc448602751 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3134,6 +3134,28 @@ kvm_pfn_t kvm_lookup_pfn(struct kvm *kvm, gfn_t gfn)
 	return pfn;
 }
 
+kvm_pfn_t __kvm_faultin_pfn(const struct kvm_memory_slot *slot, gfn_t gfn,
+			    unsigned int foll, bool *writable,
+			    struct page **refcounted_page)
+{
+	struct kvm_follow_pfn kfp = {
+		.slot = slot,
+		.gfn = gfn,
+		.flags = foll,
+		.map_writable = writable,
+		.refcounted_page = refcounted_page,
+	};
+
+	if (WARN_ON_ONCE(!writable || !refcounted_page))
+		return KVM_PFN_ERR_FAULT;
+
+	*writable = false;
+	*refcounted_page = NULL;
+
+	return kvm_follow_pfn(&kfp);
+}
+EXPORT_SYMBOL_GPL(__kvm_faultin_pfn);
+
 int kvm_prefetch_pages(struct kvm_memory_slot *slot, gfn_t gfn,
 		       struct page **pages, int nr_pages)
 {
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 44/84] KVM: x86/mmu: Convert page fault paths to kvm_faultin_pfn()
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (42 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 43/84] KVM: Add kvm_faultin_pfn() to specifically service guest page faults Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 45/84] KVM: guest_memfd: Provide "struct page" as output from kvm_gmem_get_pfn() Sean Christopherson
                   ` (41 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Convert KVM x86 to use the recently introduced __kvm_faultin_pfn().
Opportunstically capture the refcounted_page grabbed by KVM for use in
future changes.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 14 ++++++++++----
 arch/x86/kvm/mmu/mmu_internal.h |  1 +
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 7e7b855ce1e1..53555ea5e5bb 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4369,11 +4369,14 @@ static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
 static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
 				 struct kvm_page_fault *fault)
 {
+	unsigned int foll = fault->write ? FOLL_WRITE : 0;
+
 	if (fault->is_private)
 		return kvm_mmu_faultin_pfn_private(vcpu, fault);
 
-	fault->pfn = __gfn_to_pfn_memslot(fault->slot, fault->gfn, false, true,
-					  fault->write, &fault->map_writable);
+	foll |= FOLL_NOWAIT;
+	fault->pfn = __kvm_faultin_pfn(fault->slot, fault->gfn, foll,
+				       &fault->map_writable, &fault->refcounted_page);
 
 	/*
 	 * If resolving the page failed because I/O is needed to fault-in the
@@ -4400,8 +4403,11 @@ static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
 	 * to wait for IO.  Note, gup always bails if it is unable to quickly
 	 * get a page and a fatal signal, i.e. SIGKILL, is pending.
 	 */
-	fault->pfn = __gfn_to_pfn_memslot(fault->slot, fault->gfn, true, true,
-					  fault->write, &fault->map_writable);
+	foll |= FOLL_INTERRUPTIBLE;
+	foll &= ~FOLL_NOWAIT;
+	fault->pfn = __kvm_faultin_pfn(fault->slot, fault->gfn, foll,
+				       &fault->map_writable, &fault->refcounted_page);
+
 	return RET_PF_CONTINUE;
 }
 
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index a5113347bb12..e1f8385105a5 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -238,6 +238,7 @@ struct kvm_page_fault {
 	/* Outputs of kvm_mmu_faultin_pfn().  */
 	unsigned long mmu_seq;
 	kvm_pfn_t pfn;
+	struct page *refcounted_page;
 	bool map_writable;
 
 	/*
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 45/84] KVM: guest_memfd: Provide "struct page" as output from kvm_gmem_get_pfn()
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (43 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 44/84] KVM: x86/mmu: Convert page fault paths to kvm_faultin_pfn() Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-30  9:05   ` Paolo Bonzini
  2024-07-26 23:51 ` [PATCH v12 46/84] KVM: x86/mmu: Put refcounted pages instead of blindly releasing pfns Sean Christopherson
                   ` (40 subsequent siblings)
  85 siblings, 1 reply; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Provide the "struct page" associated with a guest_memfd pfn as an output
from __kvm_gmem_get_pfn() so that KVM guest page fault handlers can
directly put the page instead of having to rely on
kvm_pfn_to_refcounted_page().

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c   |  2 +-
 arch/x86/kvm/svm/sev.c   | 10 ++++++----
 include/linux/kvm_host.h |  6 ++++--
 virt/kvm/guest_memfd.c   | 19 +++++++++++--------
 4 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 53555ea5e5bb..146e57c9c86d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4353,7 +4353,7 @@ static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
 	}
 
 	r = kvm_gmem_get_pfn(vcpu->kvm, fault->slot, fault->gfn, &fault->pfn,
-			     &max_order);
+			     &fault->refcounted_page, &max_order);
 	if (r) {
 		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
 		return r;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 62f63fd714df..5c125e4c1096 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3847,6 +3847,7 @@ static int __sev_snp_update_protected_guest_state(struct kvm_vcpu *vcpu)
 	if (VALID_PAGE(svm->sev_es.snp_vmsa_gpa)) {
 		gfn_t gfn = gpa_to_gfn(svm->sev_es.snp_vmsa_gpa);
 		struct kvm_memory_slot *slot;
+		struct page *page;
 		kvm_pfn_t pfn;
 
 		slot = gfn_to_memslot(vcpu->kvm, gfn);
@@ -3857,7 +3858,7 @@ static int __sev_snp_update_protected_guest_state(struct kvm_vcpu *vcpu)
 		 * The new VMSA will be private memory guest memory, so
 		 * retrieve the PFN from the gmem backend.
 		 */
-		if (kvm_gmem_get_pfn(vcpu->kvm, slot, gfn, &pfn, NULL))
+		if (kvm_gmem_get_pfn(vcpu->kvm, slot, gfn, &pfn, &page, NULL))
 			return -EINVAL;
 
 		/*
@@ -3886,7 +3887,7 @@ static int __sev_snp_update_protected_guest_state(struct kvm_vcpu *vcpu)
 		 * changes then care should be taken to ensure
 		 * svm->sev_es.vmsa is pinned through some other means.
 		 */
-		kvm_release_pfn_clean(pfn);
+		kvm_release_page_clean(page);
 	}
 
 	/*
@@ -4686,6 +4687,7 @@ void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
 	struct kvm_memory_slot *slot;
 	struct kvm *kvm = vcpu->kvm;
 	int order, rmp_level, ret;
+	struct page *page;
 	bool assigned;
 	kvm_pfn_t pfn;
 	gfn_t gfn;
@@ -4712,7 +4714,7 @@ void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
 		return;
 	}
 
-	ret = kvm_gmem_get_pfn(kvm, slot, gfn, &pfn, &order);
+	ret = kvm_gmem_get_pfn(kvm, slot, gfn, &pfn, &page, &order);
 	if (ret) {
 		pr_warn_ratelimited("SEV: Unexpected RMP fault, no backing page for private GPA 0x%llx\n",
 				    gpa);
@@ -4770,7 +4772,7 @@ void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
 out:
 	trace_kvm_rmp_fault(vcpu, gpa, pfn, error_code, rmp_level, ret);
 out_no_trace:
-	put_page(pfn_to_page(pfn));
+	kvm_release_page_unused(page);
 }
 
 static bool is_pfn_range_shared(kvm_pfn_t start, kvm_pfn_t end)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index e0548ae92659..9d2a97eb30e4 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2462,11 +2462,13 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
 
 #ifdef CONFIG_KVM_PRIVATE_MEM
 int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
-		     gfn_t gfn, kvm_pfn_t *pfn, int *max_order);
+		     gfn_t gfn, kvm_pfn_t *pfn, struct page **page,
+		     int *max_order);
 #else
 static inline int kvm_gmem_get_pfn(struct kvm *kvm,
 				   struct kvm_memory_slot *slot, gfn_t gfn,
-				   kvm_pfn_t *pfn, int *max_order)
+				   kvm_pfn_t *pfn, struct page **page,
+				   int *max_order)
 {
 	KVM_BUG_ON(1, kvm);
 	return -EIO;
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 1c509c351261..ad1f9e73cd13 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -542,12 +542,12 @@ void kvm_gmem_unbind(struct kvm_memory_slot *slot)
 }
 
 static int __kvm_gmem_get_pfn(struct file *file, struct kvm_memory_slot *slot,
-		       gfn_t gfn, kvm_pfn_t *pfn, int *max_order, bool prepare)
+			      gfn_t gfn, kvm_pfn_t *pfn, struct page **page,
+			      int *max_order, bool prepare)
 {
 	pgoff_t index = gfn - slot->base_gfn + slot->gmem.pgoff;
 	struct kvm_gmem *gmem = file->private_data;
 	struct folio *folio;
-	struct page *page;
 	int r;
 
 	if (file != slot->gmem.file) {
@@ -571,9 +571,9 @@ static int __kvm_gmem_get_pfn(struct file *file, struct kvm_memory_slot *slot,
 		return -EHWPOISON;
 	}
 
-	page = folio_file_page(folio, index);
+	*page = folio_file_page(folio, index);
 
-	*pfn = page_to_pfn(page);
+	*pfn = page_to_pfn(*page);
 	if (max_order)
 		*max_order = 0;
 
@@ -585,7 +585,8 @@ static int __kvm_gmem_get_pfn(struct file *file, struct kvm_memory_slot *slot,
 }
 
 int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
-		     gfn_t gfn, kvm_pfn_t *pfn, int *max_order)
+		     gfn_t gfn, kvm_pfn_t *pfn, struct page **page,
+		     int *max_order)
 {
 	struct file *file = kvm_gmem_get_file(slot);
 	int r;
@@ -593,7 +594,7 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 	if (!file)
 		return -EFAULT;
 
-	r = __kvm_gmem_get_pfn(file, slot, gfn, pfn, max_order, true);
+	r = __kvm_gmem_get_pfn(file, slot, gfn, pfn, page, max_order, true);
 	fput(file);
 	return r;
 }
@@ -604,6 +605,7 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long
 {
 	struct file *file;
 	struct kvm_memory_slot *slot;
+	struct page *page;
 	void __user *p;
 
 	int ret = 0, max_order;
@@ -633,7 +635,8 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long
 			break;
 		}
 
-		ret = __kvm_gmem_get_pfn(file, slot, gfn, &pfn, &max_order, false);
+		ret = __kvm_gmem_get_pfn(file, slot, gfn, &pfn, &page,
+					  &max_order, false);
 		if (ret)
 			break;
 
@@ -644,7 +647,7 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long
 		p = src ? src + i * PAGE_SIZE : NULL;
 		ret = post_populate(kvm, gfn, pfn, p, max_order, opaque);
 
-		put_page(pfn_to_page(pfn));
+		put_page(page);
 		if (ret)
 			break;
 	}
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 46/84] KVM: x86/mmu: Put refcounted pages instead of blindly releasing pfns
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (44 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 45/84] KVM: guest_memfd: Provide "struct page" as output from kvm_gmem_get_pfn() Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 47/84] KVM: x86/mmu: Don't mark unused faultin pages as accessed Sean Christopherson
                   ` (39 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Now that all x86 page fault paths precisely track refcounted pages, use
Use kvm_page_fault.refcounted_page to put references to struct page memory
when finishing page faults.  This is a baby step towards eliminating
kvm_pfn_to_refcounted_page().

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 146e57c9c86d..3cdb1bd80823 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4326,6 +4326,9 @@ static void kvm_mmu_finish_page_fault(struct kvm_vcpu *vcpu,
 	lockdep_assert_once(lockdep_is_held(&vcpu->kvm->mmu_lock) ||
 			    r == RET_PF_RETRY);
 
+	if (!fault->refcounted_page)
+		return;
+
 	/*
 	 * If the page that KVM got from the *primary MMU* is writable, and KVM
 	 * installed or reused a SPTE, mark the page/folio dirty.  Note, this
@@ -4337,9 +4340,9 @@ static void kvm_mmu_finish_page_fault(struct kvm_vcpu *vcpu,
 	 * folio dirty if KVM could locklessly make the SPTE writable.
 	 */
 	if (!fault->map_writable || r == RET_PF_RETRY)
-		kvm_release_pfn_clean(fault->pfn);
+		kvm_release_page_clean(fault->refcounted_page);
 	else
-		kvm_release_pfn_dirty(fault->pfn);
+		kvm_release_page_dirty(fault->refcounted_page);
 }
 
 static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 47/84] KVM: x86/mmu: Don't mark unused faultin pages as accessed
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (45 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 46/84] KVM: x86/mmu: Put refcounted pages instead of blindly releasing pfns Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 48/84] KVM: Move x86's API to release a faultin page to common KVM Sean Christopherson
                   ` (38 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

When finishing guest page faults, don't mark pages as accessed if KVM
is resuming the guest _without_ installing a mapping, i.e. if the page
isn't being used.  While it's possible that marking the page accessed
could avoid minor thrashing due to reclaiming a page that the guest is
about to access, it's far more likely that the gfn=>pfn mapping was
was invalidated, e.g. due a memslot change, or because the corresponding
VMA is being modified.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 3cdb1bd80823..95beb50748fc 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4339,7 +4339,9 @@ static void kvm_mmu_finish_page_fault(struct kvm_vcpu *vcpu,
 	 * fault handler, and so KVM must (somewhat) speculatively mark the
 	 * folio dirty if KVM could locklessly make the SPTE writable.
 	 */
-	if (!fault->map_writable || r == RET_PF_RETRY)
+	if (r == RET_PF_RETRY)
+		kvm_release_page_unused(fault->refcounted_page);
+	else if (!fault->map_writable)
 		kvm_release_page_clean(fault->refcounted_page);
 	else
 		kvm_release_page_dirty(fault->refcounted_page);
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 48/84] KVM: Move x86's API to release a faultin page to common KVM
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (46 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 47/84] KVM: x86/mmu: Don't mark unused faultin pages as accessed Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-30  8:58   ` Paolo Bonzini
  2024-07-26 23:51 ` [PATCH v12 49/84] KVM: VMX: Hold mmu_lock until page is released when updating APIC access page Sean Christopherson
                   ` (37 subsequent siblings)
  85 siblings, 1 reply; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Move KVM x86's helper that "finishes" the faultin process to common KVM
so that the logic can be shared across all architectures.  Note, not all
architectures implement a fast page fault path, but the gist of the
comment applies to all architectures.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c   | 24 ++----------------------
 include/linux/kvm_host.h | 26 ++++++++++++++++++++++++++
 2 files changed, 28 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 95beb50748fc..2a0cfa225c8d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4323,28 +4323,8 @@ static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
 static void kvm_mmu_finish_page_fault(struct kvm_vcpu *vcpu,
 				      struct kvm_page_fault *fault, int r)
 {
-	lockdep_assert_once(lockdep_is_held(&vcpu->kvm->mmu_lock) ||
-			    r == RET_PF_RETRY);
-
-	if (!fault->refcounted_page)
-		return;
-
-	/*
-	 * If the page that KVM got from the *primary MMU* is writable, and KVM
-	 * installed or reused a SPTE, mark the page/folio dirty.  Note, this
-	 * may mark a folio dirty even if KVM created a read-only SPTE, e.g. if
-	 * the GFN is write-protected.  Folios can't be safely marked dirty
-	 * outside of mmu_lock as doing so could race with writeback on the
-	 * folio.  As a result, KVM can't mark folios dirty in the fast page
-	 * fault handler, and so KVM must (somewhat) speculatively mark the
-	 * folio dirty if KVM could locklessly make the SPTE writable.
-	 */
-	if (r == RET_PF_RETRY)
-		kvm_release_page_unused(fault->refcounted_page);
-	else if (!fault->map_writable)
-		kvm_release_page_clean(fault->refcounted_page);
-	else
-		kvm_release_page_dirty(fault->refcounted_page);
+	kvm_release_faultin_page(vcpu->kvm, fault->refcounted_page,
+				 r == RET_PF_RETRY, fault->map_writable);
 }
 
 static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9d2a97eb30e4..91341cdc6562 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1216,6 +1216,32 @@ static inline void kvm_release_page_unused(struct page *page)
 void kvm_release_page_clean(struct page *page);
 void kvm_release_page_dirty(struct page *page);
 
+static inline void kvm_release_faultin_page(struct kvm *kvm, struct page *page,
+					    bool unused, bool dirty)
+{
+	lockdep_assert_once(lockdep_is_held(&kvm->mmu_lock) || unused);
+
+	if (!page)
+		return;
+
+	/*
+	 * If the page that KVM got from the *primary MMU* is writable, and KVM
+	 * installed or reused a SPTE, mark the page/folio dirty.  Note, this
+	 * may mark a folio dirty even if KVM created a read-only SPTE, e.g. if
+	 * the GFN is write-protected.  Folios can't be safely marked dirty
+	 * outside of mmu_lock as doing so could race with writeback on the
+	 * folio.  As a result, KVM can't mark folios dirty in the fast page
+	 * fault handler, and so KVM must (somewhat) speculatively mark the
+	 * folio dirty if KVM could locklessly make the SPTE writable.
+	 */
+	if (unused)
+		kvm_release_page_unused(page);
+	else if (dirty)
+		kvm_release_page_dirty(page);
+	else
+		kvm_release_page_clean(page);
+}
+
 kvm_pfn_t kvm_lookup_pfn(struct kvm *kvm, gfn_t gfn);
 kvm_pfn_t __kvm_faultin_pfn(const struct kvm_memory_slot *slot, gfn_t gfn,
 			    unsigned int foll, bool *writable,
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 49/84] KVM: VMX: Hold mmu_lock until page is released when updating APIC access page
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (47 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 48/84] KVM: Move x86's API to release a faultin page to common KVM Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-26 23:51 ` [PATCH v12 50/84] KVM: VMX: Use __kvm_faultin_page() to get APIC access page/pfn Sean Christopherson
                   ` (36 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Hold mmu_lock across kvm_release_pfn_clean() when refreshing the APIC
access page address to ensure that KVM doesn't mark a page/folio as
accessed after it has been unmapped.  Practically speaking marking a folio
accesses is benign in this scenario, as KVM does hold a reference (it's
really just marking folios dirty that is problematic), but there's no
reason not to be paranoid (moving the APIC access page isn't a hot path),
and no reason to be different from other mmu_notifier-protected flows in
KVM.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/vmx.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index f18c2d8c7476..30032585f7dc 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6828,25 +6828,22 @@ void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
 		return;
 
 	read_lock(&vcpu->kvm->mmu_lock);
-	if (mmu_invalidate_retry_gfn(kvm, mmu_seq, gfn)) {
+	if (mmu_invalidate_retry_gfn(kvm, mmu_seq, gfn))
 		kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu);
-		read_unlock(&vcpu->kvm->mmu_lock);
-		goto out;
-	}
+	else
+		vmcs_write64(APIC_ACCESS_ADDR, pfn_to_hpa(pfn));
 
-	vmcs_write64(APIC_ACCESS_ADDR, pfn_to_hpa(pfn));
-	read_unlock(&vcpu->kvm->mmu_lock);
-
-	/*
-	 * No need for a manual TLB flush at this point, KVM has already done a
-	 * flush if there were SPTEs pointing at the previous page.
-	 */
-out:
 	/*
 	 * Do not pin apic access page in memory, the MMU notifier
 	 * will call us again if it is migrated or swapped out.
 	 */
 	kvm_release_pfn_clean(pfn);
+
+	/*
+	 * No need for a manual TLB flush at this point, KVM has already done a
+	 * flush if there were SPTEs pointing at the previous page.
+	 */
+	read_unlock(&vcpu->kvm->mmu_lock);
 }
 
 void vmx_hwapic_isr_update(int max_isr)
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 50/84] KVM: VMX: Use __kvm_faultin_page() to get APIC access page/pfn
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (48 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 49/84] KVM: VMX: Hold mmu_lock until page is released when updating APIC access page Sean Christopherson
@ 2024-07-26 23:51 ` Sean Christopherson
  2024-07-30  8:59   ` Paolo Bonzini
  2024-07-26 23:52 ` [PATCH v12 51/84] KVM: PPC: e500: Mark "struct page" dirty in kvmppc_e500_shadow_map() Sean Christopherson
                   ` (35 subsequent siblings)
  85 siblings, 1 reply; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Use __kvm_faultin_page() get the APIC access page so that KVM can
precisely release the refcounted page, i.e. to remove yet another user
of kvm_pfn_to_refcounted_page().  While the path isn't handling a guest
page fault, the semantics are effectively the same; KVM just happens to
be mapping the pfn into a VMCS field instead of a secondary MMU.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/vmx.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 30032585f7dc..b109bd282a52 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6786,8 +6786,10 @@ void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
 	struct kvm *kvm = vcpu->kvm;
 	struct kvm_memslots *slots = kvm_memslots(kvm);
 	struct kvm_memory_slot *slot;
+	struct page *refcounted_page;
 	unsigned long mmu_seq;
 	kvm_pfn_t pfn;
+	bool ign;
 
 	/* Defer reload until vmcs01 is the current VMCS. */
 	if (is_guest_mode(vcpu)) {
@@ -6823,7 +6825,7 @@ void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
 	 * controls the APIC-access page memslot, and only deletes the memslot
 	 * if APICv is permanently inhibited, i.e. the memslot won't reappear.
 	 */
-	pfn = gfn_to_pfn_memslot(slot, gfn);
+	pfn = __kvm_faultin_pfn(slot, gfn, FOLL_WRITE, &ign, &refcounted_page);
 	if (is_error_noslot_pfn(pfn))
 		return;
 
@@ -6834,10 +6836,13 @@ void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
 		vmcs_write64(APIC_ACCESS_ADDR, pfn_to_hpa(pfn));
 
 	/*
-	 * Do not pin apic access page in memory, the MMU notifier
-	 * will call us again if it is migrated or swapped out.
+	 * Do not pin the APIC access page in memory so that it can be freely
+	 * migrated, the MMU notifier will call us again if it is migrated or
+	 * swapped out.  KVM backs the memslot with anonymous memory, the pfn
+	 * should always point at a refcounted page (if the pfn is valid).
 	 */
-	kvm_release_pfn_clean(pfn);
+	if (!WARN_ON_ONCE(!refcounted_page))
+		kvm_release_page_clean(refcounted_page);
 
 	/*
 	 * No need for a manual TLB flush at this point, KVM has already done a
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 51/84] KVM: PPC: e500: Mark "struct page" dirty in kvmppc_e500_shadow_map()
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (49 preceding siblings ...)
  2024-07-26 23:51 ` [PATCH v12 50/84] KVM: VMX: Use __kvm_faultin_page() to get APIC access page/pfn Sean Christopherson
@ 2024-07-26 23:52 ` Sean Christopherson
  2024-07-26 23:52 ` [PATCH v12 52/84] KVM: PPC: e500: Mark "struct page" pfn accessed before dropping mmu_lock Sean Christopherson
                   ` (34 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:52 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Mark the underlying page as dirty in kvmppc_e500_ref_setup()'s sole
caller, kvmppc_e500_shadow_map(), which will allow converting e500 to
__kvm_faultin_pfn() + kvm_release_faultin_page() without having to do
a weird dance between ref_setup() and shadow_map().

Opportunistically drop the redundant kvm_set_pfn_accessed(), as
shadow_map() puts the page via kvm_release_pfn_clean().

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/powerpc/kvm/e500_mmu_host.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index c664fdec75b1..5c2adfd19e12 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -242,7 +242,7 @@ static inline int tlbe_is_writable(struct kvm_book3e_206_tlb_entry *tlbe)
 	return tlbe->mas7_3 & (MAS3_SW|MAS3_UW);
 }
 
-static inline void kvmppc_e500_ref_setup(struct tlbe_ref *ref,
+static inline bool kvmppc_e500_ref_setup(struct tlbe_ref *ref,
 					 struct kvm_book3e_206_tlb_entry *gtlbe,
 					 kvm_pfn_t pfn, unsigned int wimg)
 {
@@ -252,11 +252,7 @@ static inline void kvmppc_e500_ref_setup(struct tlbe_ref *ref,
 	/* Use guest supplied MAS2_G and MAS2_E */
 	ref->flags |= (gtlbe->mas2 & MAS2_ATTRIB_MASK) | wimg;
 
-	/* Mark the page accessed */
-	kvm_set_pfn_accessed(pfn);
-
-	if (tlbe_is_writable(gtlbe))
-		kvm_set_pfn_dirty(pfn);
+	return tlbe_is_writable(gtlbe);
 }
 
 static inline void kvmppc_e500_ref_release(struct tlbe_ref *ref)
@@ -337,6 +333,7 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
 	unsigned int wimg = 0;
 	pgd_t *pgdir;
 	unsigned long flags;
+	bool writable = false;
 
 	/* used to check for invalidations in progress */
 	mmu_seq = kvm->mmu_invalidate_seq;
@@ -490,7 +487,9 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
 			goto out;
 		}
 	}
-	kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg);
+	writable = kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg);
+	if (writable)
+		kvm_set_pfn_dirty(pfn);
 
 	kvmppc_e500_setup_stlbe(&vcpu_e500->vcpu, gtlbe, tsize,
 				ref, gvaddr, stlbe);
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 52/84] KVM: PPC: e500: Mark "struct page" pfn accessed before dropping mmu_lock
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (50 preceding siblings ...)
  2024-07-26 23:52 ` [PATCH v12 51/84] KVM: PPC: e500: Mark "struct page" dirty in kvmppc_e500_shadow_map() Sean Christopherson
@ 2024-07-26 23:52 ` Sean Christopherson
  2024-07-26 23:52 ` [PATCH v12 53/84] KVM: PPC: e500: Use __kvm_faultin_pfn() to handle page faults Sean Christopherson
                   ` (33 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:52 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Mark pages accessed before dropping mmu_lock when faulting in guest memory
so that shadow_map() can convert to kvm_release_faultin_page() without
tripping its lockdep assertion on mmu_lock being held.  Marking pages
accessed outside of mmu_lock is ok (not great, but safe), but marking
pages _dirty_ outside of mmu_lock can make filesystems unhappy.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/powerpc/kvm/e500_mmu_host.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index 5c2adfd19e12..334dd96f8081 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -498,11 +498,9 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
 	kvmppc_mmu_flush_icache(pfn);
 
 out:
-	spin_unlock(&kvm->mmu_lock);
-
 	/* Drop refcount on page, so that mmu notifiers can clear it */
 	kvm_release_pfn_clean(pfn);
-
+	spin_unlock(&kvm->mmu_lock);
 	return ret;
 }
 
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 53/84] KVM: PPC: e500: Use __kvm_faultin_pfn() to handle page faults
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (51 preceding siblings ...)
  2024-07-26 23:52 ` [PATCH v12 52/84] KVM: PPC: e500: Mark "struct page" pfn accessed before dropping mmu_lock Sean Christopherson
@ 2024-07-26 23:52 ` Sean Christopherson
  2024-07-26 23:52 ` [PATCH v12 54/84] KVM: arm64: Mark "struct page" pfns accessed/dirty before dropping mmu_lock Sean Christopherson
                   ` (32 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:52 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Convert PPC e500 to use __kvm_faultin_pfn()+kvm_release_faultin_page(),
and continue the inexorable march towards the demise of
kvm_pfn_to_refcounted_page().

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/powerpc/kvm/e500_mmu_host.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index 334dd96f8081..e5a145b578a4 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -322,6 +322,7 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
 {
 	struct kvm_memory_slot *slot;
 	unsigned long pfn = 0; /* silence GCC warning */
+	struct page *page = NULL;
 	unsigned long hva;
 	int pfnmap = 0;
 	int tsize = BOOK3E_PAGESZ_4K;
@@ -443,7 +444,7 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
 
 	if (likely(!pfnmap)) {
 		tsize_pages = 1UL << (tsize + 10 - PAGE_SHIFT);
-		pfn = gfn_to_pfn_memslot(slot, gfn);
+		pfn = __kvm_faultin_pfn(slot, gfn, FOLL_WRITE, NULL, &page);
 		if (is_error_noslot_pfn(pfn)) {
 			if (printk_ratelimit())
 				pr_err("%s: real page not found for gfn %lx\n",
@@ -488,8 +489,6 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
 		}
 	}
 	writable = kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg);
-	if (writable)
-		kvm_set_pfn_dirty(pfn);
 
 	kvmppc_e500_setup_stlbe(&vcpu_e500->vcpu, gtlbe, tsize,
 				ref, gvaddr, stlbe);
@@ -498,8 +497,7 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
 	kvmppc_mmu_flush_icache(pfn);
 
 out:
-	/* Drop refcount on page, so that mmu notifiers can clear it */
-	kvm_release_pfn_clean(pfn);
+	kvm_release_faultin_page(kvm, page, !!ret, writable);
 	spin_unlock(&kvm->mmu_lock);
 	return ret;
 }
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 54/84] KVM: arm64: Mark "struct page" pfns accessed/dirty before dropping mmu_lock
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (52 preceding siblings ...)
  2024-07-26 23:52 ` [PATCH v12 53/84] KVM: PPC: e500: Use __kvm_faultin_pfn() to handle page faults Sean Christopherson
@ 2024-07-26 23:52 ` Sean Christopherson
  2024-08-05 23:25   ` Oliver Upton
  2024-07-26 23:52 ` [PATCH v12 55/84] KVM: arm64: Use __kvm_faultin_pfn() to handle memory aborts Sean Christopherson
                   ` (31 subsequent siblings)
  85 siblings, 1 reply; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:52 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Mark pages/folios accessed+dirty prior to dropping mmu_lock, as marking a
page/folio dirty after it has been written back can make some filesystems
unhappy (backing KVM guests will such filesystem files is uncommon, and
the race is minuscule, hence the lack of complaints).  See the link below
for details.

This will also allow converting arm64 to kvm_release_faultin_page(), which
requires that mmu_lock be held (for the aforementioned reason).

Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm64/kvm/mmu.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 22ee37360c4e..ce13c3d884d5 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1685,15 +1685,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	}
 
 out_unlock:
+	if (writable && !ret)
+		kvm_set_pfn_dirty(pfn);
+	else
+		kvm_release_pfn_clean(pfn);
+
 	read_unlock(&kvm->mmu_lock);
 
 	/* Mark the page dirty only if the fault is handled successfully */
-	if (writable && !ret) {
-		kvm_set_pfn_dirty(pfn);
+	if (writable && !ret)
 		mark_page_dirty_in_slot(kvm, memslot, gfn);
-	}
 
-	kvm_release_pfn_clean(pfn);
 	return ret != -EAGAIN ? ret : 0;
 }
 
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 55/84] KVM: arm64: Use __kvm_faultin_pfn() to handle memory aborts
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (53 preceding siblings ...)
  2024-07-26 23:52 ` [PATCH v12 54/84] KVM: arm64: Mark "struct page" pfns accessed/dirty before dropping mmu_lock Sean Christopherson
@ 2024-07-26 23:52 ` Sean Christopherson
  2024-07-26 23:52 ` [PATCH v12 56/84] KVM: RISC-V: Mark "struct page" pfns dirty iff a stage-2 PTE is installed Sean Christopherson
                   ` (30 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:52 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Convert arm64 to use __kvm_faultin_pfn()+kvm_release_faultin_page().
Three down, six to go.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm64/kvm/mmu.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index ce13c3d884d5..756fc856ab44 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1439,6 +1439,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	long vma_pagesize, fault_granule;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
 	struct kvm_pgtable *pgt;
+	struct page *page;
 
 	if (fault_is_perm)
 		fault_granule = kvm_vcpu_trap_get_perm_fault_granule(vcpu);
@@ -1553,7 +1554,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 
 	/*
 	 * Read mmu_invalidate_seq so that KVM can detect if the results of
-	 * vma_lookup() or __gfn_to_pfn_memslot() become stale prior to
+	 * vma_lookup() or __kvm_faultin_pfn() become stale prior to
 	 * acquiring kvm->mmu_lock.
 	 *
 	 * Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs
@@ -1562,8 +1563,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	mmu_seq = vcpu->kvm->mmu_invalidate_seq;
 	mmap_read_unlock(current->mm);
 
-	pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
-				   write_fault, &writable);
+	pfn = __kvm_faultin_pfn(memslot, gfn, write_fault ? FOLL_WRITE : 0,
+				&writable, &page);
 	if (pfn == KVM_PFN_ERR_HWPOISON) {
 		kvm_send_hwpoison_signal(hva, vma_shift);
 		return 0;
@@ -1576,7 +1577,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		 * If the page was identified as device early by looking at
 		 * the VMA flags, vma_pagesize is already representing the
 		 * largest quantity we can map.  If instead it was mapped
-		 * via gfn_to_pfn_prot(), vma_pagesize is set to PAGE_SIZE
+		 * via __kvm_faultin_pfn(), vma_pagesize is set to PAGE_SIZE
 		 * and must not be upgraded.
 		 *
 		 * In both cases, we don't let transparent_hugepage_adjust()
@@ -1685,11 +1686,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	}
 
 out_unlock:
-	if (writable && !ret)
-		kvm_set_pfn_dirty(pfn);
-	else
-		kvm_release_pfn_clean(pfn);
-
+	kvm_release_faultin_page(kvm, page, !!ret, writable);
 	read_unlock(&kvm->mmu_lock);
 
 	/* Mark the page dirty only if the fault is handled successfully */
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 56/84] KVM: RISC-V: Mark "struct page" pfns dirty iff a stage-2 PTE is installed
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (54 preceding siblings ...)
  2024-07-26 23:52 ` [PATCH v12 55/84] KVM: arm64: Use __kvm_faultin_pfn() to handle memory aborts Sean Christopherson
@ 2024-07-26 23:52 ` Sean Christopherson
  2024-07-31  8:11   ` Andrew Jones
  2024-08-06 15:03   ` Anup Patel
  2024-07-26 23:52 ` [PATCH v12 57/84] KVM: RISC-V: Mark "struct page" pfns accessed before dropping mmu_lock Sean Christopherson
                   ` (29 subsequent siblings)
  85 siblings, 2 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:52 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Don't mark pages dirty if KVM bails from the page fault handler without
installing a stage-2 mapping, i.e. if the page is guaranteed to not be
written by the guest.

In addition to being a (very) minor fix, this paves the way for converting
RISC-V to use kvm_release_faultin_page().

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/riscv/kvm/mmu.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index b63650f9b966..06aa5a0d056d 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -669,7 +669,6 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
 		goto out_unlock;
 
 	if (writable) {
-		kvm_set_pfn_dirty(hfn);
 		mark_page_dirty(kvm, gfn);
 		ret = gstage_map_page(kvm, pcache, gpa, hfn << PAGE_SHIFT,
 				      vma_pagesize, false, true);
@@ -682,6 +681,9 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
 		kvm_err("Failed to map in G-stage\n");
 
 out_unlock:
+	if ((!ret || ret == -EEXIST) && writable)
+		kvm_set_pfn_dirty(hfn);
+
 	spin_unlock(&kvm->mmu_lock);
 	kvm_set_pfn_accessed(hfn);
 	kvm_release_pfn_clean(hfn);
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 57/84] KVM: RISC-V: Mark "struct page" pfns accessed before dropping mmu_lock
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (55 preceding siblings ...)
  2024-07-26 23:52 ` [PATCH v12 56/84] KVM: RISC-V: Mark "struct page" pfns dirty iff a stage-2 PTE is installed Sean Christopherson
@ 2024-07-26 23:52 ` Sean Christopherson
  2024-07-31  8:12   ` Andrew Jones
  2024-08-06 15:04   ` Anup Patel
  2024-07-26 23:52 ` [PATCH v12 58/84] KVM: RISC-V: Use kvm_faultin_pfn() when mapping pfns into the guest Sean Christopherson
                   ` (28 subsequent siblings)
  85 siblings, 2 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:52 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Mark pages accessed before dropping mmu_lock when faulting in guest memory
so that RISC-V can convert to kvm_release_faultin_page() without tripping
its lockdep assertion on mmu_lock being held.  Marking pages accessed
outside of mmu_lock is ok (not great, but safe), but marking pages _dirty_
outside of mmu_lock can make filesystems unhappy.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/riscv/kvm/mmu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 06aa5a0d056d..806f68e70642 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -683,10 +683,10 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
 out_unlock:
 	if ((!ret || ret == -EEXIST) && writable)
 		kvm_set_pfn_dirty(hfn);
+	else
+		kvm_release_pfn_clean(hfn);
 
 	spin_unlock(&kvm->mmu_lock);
-	kvm_set_pfn_accessed(hfn);
-	kvm_release_pfn_clean(hfn);
 	return ret;
 }
 
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 58/84] KVM: RISC-V: Use kvm_faultin_pfn() when mapping pfns into the guest
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (56 preceding siblings ...)
  2024-07-26 23:52 ` [PATCH v12 57/84] KVM: RISC-V: Mark "struct page" pfns accessed before dropping mmu_lock Sean Christopherson
@ 2024-07-26 23:52 ` Sean Christopherson
  2024-07-31  8:11   ` Andrew Jones
  2024-08-06 15:04   ` Anup Patel
  2024-07-26 23:52 ` [PATCH v12 59/84] KVM: PPC: Use __kvm_faultin_pfn() to handle page faults on Book3s HV Sean Christopherson
                   ` (27 subsequent siblings)
  85 siblings, 2 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:52 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Convert RISC-V to __kvm_faultin_pfn()+kvm_release_faultin_page(), which
are new APIs to consolidate arch code and provide consistent behavior
across all KVM architectures.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/riscv/kvm/mmu.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 806f68e70642..f73d6a79a78c 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -601,6 +601,7 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
 	bool logging = (memslot->dirty_bitmap &&
 			!(memslot->flags & KVM_MEM_READONLY)) ? true : false;
 	unsigned long vma_pagesize, mmu_seq;
+	struct page *page;
 
 	/* We need minimum second+third level pages */
 	ret = kvm_mmu_topup_memory_cache(pcache, gstage_pgd_levels);
@@ -631,7 +632,7 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
 
 	/*
 	 * Read mmu_invalidate_seq so that KVM can detect if the results of
-	 * vma_lookup() or gfn_to_pfn_prot() become stale priort to acquiring
+	 * vma_lookup() or __kvm_faultin_pfn() become stale priort to acquiring
 	 * kvm->mmu_lock.
 	 *
 	 * Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs
@@ -647,7 +648,7 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
 		return -EFAULT;
 	}
 
-	hfn = gfn_to_pfn_prot(kvm, gfn, is_write, &writable);
+	hfn = kvm_faultin_pfn(vcpu, gfn, is_write, &writable, &page);
 	if (hfn == KVM_PFN_ERR_HWPOISON) {
 		send_sig_mceerr(BUS_MCEERR_AR, (void __user *)hva,
 				vma_pageshift, current);
@@ -681,11 +682,7 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
 		kvm_err("Failed to map in G-stage\n");
 
 out_unlock:
-	if ((!ret || ret == -EEXIST) && writable)
-		kvm_set_pfn_dirty(hfn);
-	else
-		kvm_release_pfn_clean(hfn);
-
+	kvm_release_faultin_page(kvm, page, ret && ret != -EEXIST, writable);
 	spin_unlock(&kvm->mmu_lock);
 	return ret;
 }
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 59/84] KVM: PPC: Use __kvm_faultin_pfn() to handle page faults on Book3s HV
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (57 preceding siblings ...)
  2024-07-26 23:52 ` [PATCH v12 58/84] KVM: RISC-V: Use kvm_faultin_pfn() when mapping pfns into the guest Sean Christopherson
@ 2024-07-26 23:52 ` Sean Christopherson
  2024-07-26 23:52 ` [PATCH v12 60/84] KVM: PPC: Use __kvm_faultin_pfn() to handle page faults on Book3s Radix Sean Christopherson
                   ` (26 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:52 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Replace Book3s HV's homebrewed fault-in logic with __kvm_faultin_pfn(),
which functionally does pretty much the exact same thing.

Note, when the code was written, KVM indeed didn't do fast GUP without
"!atomic && !async", but that has long since changed (KVM tries fast GUP
for all writable mappings).

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 25 ++++---------------------
 1 file changed, 4 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 2f1d58984b41..f305395cf26e 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -603,27 +603,10 @@ int kvmppc_book3s_hv_page_fault(struct kvm_vcpu *vcpu,
 	write_ok = writing;
 	hva = gfn_to_hva_memslot(memslot, gfn);
 
-	/*
-	 * Do a fast check first, since __gfn_to_pfn_memslot doesn't
-	 * do it with !atomic && !async, which is how we call it.
-	 * We always ask for write permission since the common case
-	 * is that the page is writable.
-	 */
-	if (get_user_page_fast_only(hva, FOLL_WRITE, &page)) {
-		write_ok = true;
-	} else {
-		/* Call KVM generic code to do the slow-path check */
-		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
-					   writing, &write_ok);
-		if (is_error_noslot_pfn(pfn))
-			return -EFAULT;
-		page = NULL;
-		if (pfn_valid(pfn)) {
-			page = pfn_to_page(pfn);
-			if (PageReserved(page))
-				page = NULL;
-		}
-	}
+	pfn = __kvm_faultin_pfn(memslot, gfn, writing ? FOLL_WRITE : 0,
+				&write_ok, &page);
+	if (is_error_noslot_pfn(pfn))
+		return -EFAULT;
 
 	/*
 	 * Read the PTE from the process' radix tree and use that
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 60/84] KVM: PPC: Use __kvm_faultin_pfn() to handle page faults on Book3s Radix
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (58 preceding siblings ...)
  2024-07-26 23:52 ` [PATCH v12 59/84] KVM: PPC: Use __kvm_faultin_pfn() to handle page faults on Book3s HV Sean Christopherson
@ 2024-07-26 23:52 ` Sean Christopherson
  2024-07-26 23:52 ` [PATCH v12 61/84] KVM: PPC: Drop unused @kvm_ro param from kvmppc_book3s_instantiate_page() Sean Christopherson
                   ` (25 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:52 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Replace Book3s Radix's homebrewed (read: copy+pasted) fault-in logic with
__kvm_faultin_pfn(), which functionally does pretty much the exact same
thing.

Note, when the code was written, KVM indeed didn't do fast GUP without
"!atomic && !async", but that has long since changed (KVM tries fast GUP
for all writable mappings).

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 29 +++++---------------------
 1 file changed, 5 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 8304b6f8fe45..14891d0a3b73 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -829,40 +829,21 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu,
 	unsigned long mmu_seq;
 	unsigned long hva, gfn = gpa >> PAGE_SHIFT;
 	bool upgrade_write = false;
-	bool *upgrade_p = &upgrade_write;
 	pte_t pte, *ptep;
 	unsigned int shift, level;
 	int ret;
 	bool large_enable;
+	kvm_pfn_t pfn;
 
 	/* used to check for invalidations in progress */
 	mmu_seq = kvm->mmu_invalidate_seq;
 	smp_rmb();
 
-	/*
-	 * Do a fast check first, since __gfn_to_pfn_memslot doesn't
-	 * do it with !atomic && !async, which is how we call it.
-	 * We always ask for write permission since the common case
-	 * is that the page is writable.
-	 */
 	hva = gfn_to_hva_memslot(memslot, gfn);
-	if (!kvm_ro && get_user_page_fast_only(hva, FOLL_WRITE, &page)) {
-		upgrade_write = true;
-	} else {
-		unsigned long pfn;
-
-		/* Call KVM generic code to do the slow-path check */
-		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
-					   writing, upgrade_p);
-		if (is_error_noslot_pfn(pfn))
-			return -EFAULT;
-		page = NULL;
-		if (pfn_valid(pfn)) {
-			page = pfn_to_page(pfn);
-			if (PageReserved(page))
-				page = NULL;
-		}
-	}
+	pfn = __kvm_faultin_pfn(memslot, gfn, writing ? FOLL_WRITE : 0,
+				&upgrade_write, &page);
+	if (is_error_noslot_pfn(pfn))
+		return -EFAULT;
 
 	/*
 	 * Read the PTE from the process' radix tree and use that
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 61/84] KVM: PPC: Drop unused @kvm_ro param from kvmppc_book3s_instantiate_page()
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (59 preceding siblings ...)
  2024-07-26 23:52 ` [PATCH v12 60/84] KVM: PPC: Use __kvm_faultin_pfn() to handle page faults on Book3s Radix Sean Christopherson
@ 2024-07-26 23:52 ` Sean Christopherson
  2024-07-26 23:52 ` [PATCH v12 62/84] KVM: PPC: Book3S: Mark "struct page" pfns dirty/accessed after installing PTE Sean Christopherson
                   ` (24 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:52 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Drop @kvm_ro from kvmppc_book3s_instantiate_page() as it is now only
written, and never read.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/powerpc/include/asm/kvm_book3s.h  | 2 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 6 ++----
 arch/powerpc/kvm/book3s_hv_nested.c    | 4 +---
 3 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index 3e1e2a698c9e..34e8f0b7b345 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -203,7 +203,7 @@ extern bool kvmppc_hv_handle_set_rc(struct kvm *kvm, bool nested,
 extern int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu,
 				unsigned long gpa,
 				struct kvm_memory_slot *memslot,
-				bool writing, bool kvm_ro,
+				bool writing,
 				pte_t *inserted_pte, unsigned int *levelp);
 extern int kvmppc_init_vm_radix(struct kvm *kvm);
 extern void kvmppc_free_radix(struct kvm *kvm);
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 14891d0a3b73..b3e6e73d6a08 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -821,7 +821,7 @@ bool kvmppc_hv_handle_set_rc(struct kvm *kvm, bool nested, bool writing,
 int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu,
 				   unsigned long gpa,
 				   struct kvm_memory_slot *memslot,
-				   bool writing, bool kvm_ro,
+				   bool writing,
 				   pte_t *inserted_pte, unsigned int *levelp)
 {
 	struct kvm *kvm = vcpu->kvm;
@@ -931,7 +931,6 @@ int kvmppc_book3s_radix_page_fault(struct kvm_vcpu *vcpu,
 	struct kvm_memory_slot *memslot;
 	long ret;
 	bool writing = !!(dsisr & DSISR_ISSTORE);
-	bool kvm_ro = false;
 
 	/* Check for unusual errors */
 	if (dsisr & DSISR_UNSUPP_MMU) {
@@ -984,7 +983,6 @@ int kvmppc_book3s_radix_page_fault(struct kvm_vcpu *vcpu,
 					ea, DSISR_ISSTORE | DSISR_PROTFAULT);
 			return RESUME_GUEST;
 		}
-		kvm_ro = true;
 	}
 
 	/* Failed to set the reference/change bits */
@@ -1002,7 +1000,7 @@ int kvmppc_book3s_radix_page_fault(struct kvm_vcpu *vcpu,
 
 	/* Try to insert a pte */
 	ret = kvmppc_book3s_instantiate_page(vcpu, gpa, memslot, writing,
-					     kvm_ro, NULL, NULL);
+					     NULL, NULL);
 
 	if (ret == 0 || ret == -EAGAIN)
 		ret = RESUME_GUEST;
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index 05f5220960c6..771173509617 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -1527,7 +1527,6 @@ static long int __kvmhv_nested_page_fault(struct kvm_vcpu *vcpu,
 	unsigned long n_gpa, gpa, gfn, perm = 0UL;
 	unsigned int shift, l1_shift, level;
 	bool writing = !!(dsisr & DSISR_ISSTORE);
-	bool kvm_ro = false;
 	long int ret;
 
 	if (!gp->l1_gr_to_hr) {
@@ -1607,7 +1606,6 @@ static long int __kvmhv_nested_page_fault(struct kvm_vcpu *vcpu,
 					ea, DSISR_ISSTORE | DSISR_PROTFAULT);
 			return RESUME_GUEST;
 		}
-		kvm_ro = true;
 	}
 
 	/* 2. Find the host pte for this L1 guest real address */
@@ -1629,7 +1627,7 @@ static long int __kvmhv_nested_page_fault(struct kvm_vcpu *vcpu,
 	if (!pte_present(pte) || (writing && !(pte_val(pte) & _PAGE_WRITE))) {
 		/* No suitable pte found -> try to insert a mapping */
 		ret = kvmppc_book3s_instantiate_page(vcpu, gpa, memslot,
-					writing, kvm_ro, &pte, &level);
+					writing, &pte, &level);
 		if (ret == -EAGAIN)
 			return RESUME_GUEST;
 		else if (ret)
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 62/84] KVM: PPC: Book3S: Mark "struct page" pfns dirty/accessed after installing PTE
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (60 preceding siblings ...)
  2024-07-26 23:52 ` [PATCH v12 61/84] KVM: PPC: Drop unused @kvm_ro param from kvmppc_book3s_instantiate_page() Sean Christopherson
@ 2024-07-26 23:52 ` Sean Christopherson
  2024-07-26 23:52 ` [PATCH v12 63/84] KVM: PPC: Use kvm_faultin_pfn() to handle page faults on Book3s PR Sean Christopherson
                   ` (23 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:52 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Mark pages/folios dirty/accessed after installing a PTE, and more
specifically after acquiring mmu_lock and checking for an mmu_notifier
invalidation.  Marking a page/folio dirty after it has been written back
can make some filesystems unhappy (backing KVM guests will such filesystem
files is uncommon, and the race is minuscule, hence the lack of complaints).
See the link below for details.

This will also allow converting Book3S to kvm_release_faultin_page(),
which requires that mmu_lock be held (for the aforementioned reason).

Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/powerpc/kvm/book3s_64_mmu_host.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c b/arch/powerpc/kvm/book3s_64_mmu_host.c
index bc6a381b5346..d0e4f7bbdc3d 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -121,13 +121,10 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte,
 
 	vpn = hpt_vpn(orig_pte->eaddr, map->host_vsid, MMU_SEGSIZE_256M);
 
-	kvm_set_pfn_accessed(pfn);
 	if (!orig_pte->may_write || !writable)
 		rflags |= PP_RXRX;
-	else {
+	else
 		mark_page_dirty(vcpu->kvm, gfn);
-		kvm_set_pfn_dirty(pfn);
-	}
 
 	if (!orig_pte->may_execute)
 		rflags |= HPTE_R_N;
@@ -202,8 +199,11 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte,
 	}
 
 out_unlock:
+	if (!orig_pte->may_write || !writable)
+		kvm_release_pfn_clean(pfn);
+	else
+		kvm_release_pfn_dirty(pfn);
 	spin_unlock(&kvm->mmu_lock);
-	kvm_release_pfn_clean(pfn);
 	if (cpte)
 		kvmppc_mmu_hpte_cache_free(cpte);
 
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 63/84] KVM: PPC: Use kvm_faultin_pfn() to handle page faults on Book3s PR
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (61 preceding siblings ...)
  2024-07-26 23:52 ` [PATCH v12 62/84] KVM: PPC: Book3S: Mark "struct page" pfns dirty/accessed after installing PTE Sean Christopherson
@ 2024-07-26 23:52 ` Sean Christopherson
  2024-07-26 23:52 ` [PATCH v12 64/84] KVM: LoongArch: Mark "struct page" pfns dirty only in "slow" page fault path Sean Christopherson
                   ` (22 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:52 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Convert Book3S PR to __kvm_faultin_pfn()+kvm_release_faultin_page(), which
are new APIs to consolidate arch code and provide consistent behavior
across all KVM architectures.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/powerpc/include/asm/kvm_book3s.h |  2 +-
 arch/powerpc/kvm/book3s.c             |  7 ++++---
 arch/powerpc/kvm/book3s_32_mmu_host.c |  7 ++++---
 arch/powerpc/kvm/book3s_64_mmu_host.c | 10 +++++-----
 4 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index 34e8f0b7b345..343c10dda80f 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -235,7 +235,7 @@ extern void kvmppc_set_bat(struct kvm_vcpu *vcpu, struct kvmppc_bat *bat,
 extern void kvmppc_giveup_ext(struct kvm_vcpu *vcpu, ulong msr);
 extern int kvmppc_emulate_paired_single(struct kvm_vcpu *vcpu);
 extern kvm_pfn_t kvmppc_gpa_to_pfn(struct kvm_vcpu *vcpu, gpa_t gpa,
-			bool writing, bool *writable);
+			bool writing, bool *writable, struct page **page);
 extern void kvmppc_add_revmap_chain(struct kvm *kvm, struct revmap_entry *rev,
 			unsigned long *rmap, long pte_index, int realmode);
 extern void kvmppc_update_dirty_map(const struct kvm_memory_slot *memslot,
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index ff6c38373957..d79c5d1098c0 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -422,7 +422,7 @@ int kvmppc_core_prepare_to_enter(struct kvm_vcpu *vcpu)
 EXPORT_SYMBOL_GPL(kvmppc_core_prepare_to_enter);
 
 kvm_pfn_t kvmppc_gpa_to_pfn(struct kvm_vcpu *vcpu, gpa_t gpa, bool writing,
-			bool *writable)
+			    bool *writable, struct page **page)
 {
 	ulong mp_pa = vcpu->arch.magic_page_pa & KVM_PAM;
 	gfn_t gfn = gpa >> PAGE_SHIFT;
@@ -437,13 +437,14 @@ kvm_pfn_t kvmppc_gpa_to_pfn(struct kvm_vcpu *vcpu, gpa_t gpa, bool writing,
 		kvm_pfn_t pfn;
 
 		pfn = (kvm_pfn_t)virt_to_phys((void*)shared_page) >> PAGE_SHIFT;
-		get_page(pfn_to_page(pfn));
+		*page = pfn_to_page(pfn);
+		get_page(*page);
 		if (writable)
 			*writable = true;
 		return pfn;
 	}
 
-	return gfn_to_pfn_prot(vcpu->kvm, gfn, writing, writable);
+	return kvm_faultin_pfn(vcpu, gfn, writing, writable, page);
 }
 EXPORT_SYMBOL_GPL(kvmppc_gpa_to_pfn);
 
diff --git a/arch/powerpc/kvm/book3s_32_mmu_host.c b/arch/powerpc/kvm/book3s_32_mmu_host.c
index 4b3a8d80cfa3..5b7212edbb13 100644
--- a/arch/powerpc/kvm/book3s_32_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_32_mmu_host.c
@@ -130,6 +130,7 @@ extern char etext[];
 int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte,
 			bool iswrite)
 {
+	struct page *page;
 	kvm_pfn_t hpaddr;
 	u64 vpn;
 	u64 vsid;
@@ -145,7 +146,7 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte,
 	bool writable;
 
 	/* Get host physical address for gpa */
-	hpaddr = kvmppc_gpa_to_pfn(vcpu, orig_pte->raddr, iswrite, &writable);
+	hpaddr = kvmppc_gpa_to_pfn(vcpu, orig_pte->raddr, iswrite, &writable, &page);
 	if (is_error_noslot_pfn(hpaddr)) {
 		printk(KERN_INFO "Couldn't get guest page for gpa %lx!\n",
 				 orig_pte->raddr);
@@ -232,7 +233,7 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte,
 
 	pte = kvmppc_mmu_hpte_cache_next(vcpu);
 	if (!pte) {
-		kvm_release_pfn_clean(hpaddr >> PAGE_SHIFT);
+		kvm_release_page_unused(page);
 		r = -EAGAIN;
 		goto out;
 	}
@@ -250,7 +251,7 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte,
 
 	kvmppc_mmu_hpte_cache_map(vcpu, pte);
 
-	kvm_release_pfn_clean(hpaddr >> PAGE_SHIFT);
+	kvm_release_page_clean(page);
 out:
 	return r;
 }
diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c b/arch/powerpc/kvm/book3s_64_mmu_host.c
index d0e4f7bbdc3d..be20aee6fd7d 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -88,13 +88,14 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte,
 	struct hpte_cache *cpte;
 	unsigned long gfn = orig_pte->raddr >> PAGE_SHIFT;
 	unsigned long pfn;
+	struct page *page;
 
 	/* used to check for invalidations in progress */
 	mmu_seq = kvm->mmu_invalidate_seq;
 	smp_rmb();
 
 	/* Get host physical address for gpa */
-	pfn = kvmppc_gpa_to_pfn(vcpu, orig_pte->raddr, iswrite, &writable);
+	pfn = kvmppc_gpa_to_pfn(vcpu, orig_pte->raddr, iswrite, &writable, &page);
 	if (is_error_noslot_pfn(pfn)) {
 		printk(KERN_INFO "Couldn't get guest page for gpa %lx!\n",
 		       orig_pte->raddr);
@@ -199,10 +200,9 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte,
 	}
 
 out_unlock:
-	if (!orig_pte->may_write || !writable)
-		kvm_release_pfn_clean(pfn);
-	else
-		kvm_release_pfn_dirty(pfn);
+	/* FIXME: Don't unconditionally pass unused=false. */
+	kvm_release_faultin_page(kvm, page, false,
+				 orig_pte->may_write && writable);
 	spin_unlock(&kvm->mmu_lock);
 	if (cpte)
 		kvmppc_mmu_hpte_cache_free(cpte);
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 64/84] KVM: LoongArch: Mark "struct page" pfns dirty only in "slow" page fault path
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (62 preceding siblings ...)
  2024-07-26 23:52 ` [PATCH v12 63/84] KVM: PPC: Use kvm_faultin_pfn() to handle page faults on Book3s PR Sean Christopherson
@ 2024-07-26 23:52 ` Sean Christopherson
  2024-08-02  7:53   ` maobibo
  2024-08-08 11:38   ` maobibo
  2024-07-26 23:52 ` [PATCH v12 65/84] KVM: LoongArch: Mark "struct page" pfns accessed " Sean Christopherson
                   ` (21 subsequent siblings)
  85 siblings, 2 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:52 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Mark pages/folios dirty only the slow page fault path, i.e. only when
mmu_lock is held and the operation is mmu_notifier-protected, as marking a
page/folio dirty after it has been written back can make some filesystems
unhappy (backing KVM guests will such filesystem files is uncommon, and
the race is minuscule, hence the lack of complaints).

See the link below for details.

Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/loongarch/kvm/mmu.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c
index 2634a9e8d82c..364dd35e0557 100644
--- a/arch/loongarch/kvm/mmu.c
+++ b/arch/loongarch/kvm/mmu.c
@@ -608,13 +608,13 @@ static int kvm_map_page_fast(struct kvm_vcpu *vcpu, unsigned long gpa, bool writ
 		if (kvm_pte_young(changed))
 			kvm_set_pfn_accessed(pfn);
 
-		if (kvm_pte_dirty(changed)) {
-			mark_page_dirty(kvm, gfn);
-			kvm_set_pfn_dirty(pfn);
-		}
 		if (page)
 			put_page(page);
 	}
+
+	if (kvm_pte_dirty(changed))
+		mark_page_dirty(kvm, gfn);
+
 	return ret;
 out:
 	spin_unlock(&kvm->mmu_lock);
@@ -915,12 +915,14 @@ static int kvm_map_page(struct kvm_vcpu *vcpu, unsigned long gpa, bool write)
 	else
 		++kvm->stat.pages;
 	kvm_set_pte(ptep, new_pte);
-	spin_unlock(&kvm->mmu_lock);
 
-	if (prot_bits & _PAGE_DIRTY) {
-		mark_page_dirty_in_slot(kvm, memslot, gfn);
+	if (writeable)
 		kvm_set_pfn_dirty(pfn);
-	}
+
+	spin_unlock(&kvm->mmu_lock);
+
+	if (prot_bits & _PAGE_DIRTY)
+		mark_page_dirty_in_slot(kvm, memslot, gfn);
 
 	kvm_release_pfn_clean(pfn);
 out:
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 65/84] KVM: LoongArch: Mark "struct page" pfns accessed only in "slow" page fault path
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (63 preceding siblings ...)
  2024-07-26 23:52 ` [PATCH v12 64/84] KVM: LoongArch: Mark "struct page" pfns dirty only in "slow" page fault path Sean Christopherson
@ 2024-07-26 23:52 ` Sean Christopherson
  2024-08-02  7:34   ` maobibo
  2024-07-26 23:52 ` [PATCH v12 66/84] KVM: LoongArch: Mark "struct page" pfn accessed before dropping mmu_lock Sean Christopherson
                   ` (20 subsequent siblings)
  85 siblings, 1 reply; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:52 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Mark pages accessed only in the slow path, before dropping mmu_lock when
faulting in guest memory so that LoongArch can convert to
kvm_release_faultin_page() without tripping its lockdep assertion on
mmu_lock being held.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/loongarch/kvm/mmu.c | 20 ++------------------
 1 file changed, 2 insertions(+), 18 deletions(-)

diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c
index 364dd35e0557..52b5c16cf250 100644
--- a/arch/loongarch/kvm/mmu.c
+++ b/arch/loongarch/kvm/mmu.c
@@ -552,12 +552,10 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 static int kvm_map_page_fast(struct kvm_vcpu *vcpu, unsigned long gpa, bool write)
 {
 	int ret = 0;
-	kvm_pfn_t pfn = 0;
 	kvm_pte_t *ptep, changed, new;
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	struct kvm *kvm = vcpu->kvm;
 	struct kvm_memory_slot *slot;
-	struct page *page;
 
 	spin_lock(&kvm->mmu_lock);
 
@@ -570,8 +568,6 @@ static int kvm_map_page_fast(struct kvm_vcpu *vcpu, unsigned long gpa, bool writ
 
 	/* Track access to pages marked old */
 	new = kvm_pte_mkyoung(*ptep);
-	/* call kvm_set_pfn_accessed() after unlock */
-
 	if (write && !kvm_pte_dirty(new)) {
 		if (!kvm_pte_write(new)) {
 			ret = -EFAULT;
@@ -595,23 +591,11 @@ static int kvm_map_page_fast(struct kvm_vcpu *vcpu, unsigned long gpa, bool writ
 	}
 
 	changed = new ^ (*ptep);
-	if (changed) {
+	if (changed)
 		kvm_set_pte(ptep, new);
-		pfn = kvm_pte_pfn(new);
-		page = kvm_pfn_to_refcounted_page(pfn);
-		if (page)
-			get_page(page);
-	}
+
 	spin_unlock(&kvm->mmu_lock);
 
-	if (changed) {
-		if (kvm_pte_young(changed))
-			kvm_set_pfn_accessed(pfn);
-
-		if (page)
-			put_page(page);
-	}
-
 	if (kvm_pte_dirty(changed))
 		mark_page_dirty(kvm, gfn);
 
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 66/84] KVM: LoongArch: Mark "struct page" pfn accessed before dropping mmu_lock
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (64 preceding siblings ...)
  2024-07-26 23:52 ` [PATCH v12 65/84] KVM: LoongArch: Mark "struct page" pfns accessed " Sean Christopherson
@ 2024-07-26 23:52 ` Sean Christopherson
  2024-08-08 11:47   ` maobibo
  2024-07-26 23:52 ` [PATCH v12 67/84] KVM: LoongArch: Use kvm_faultin_pfn() to map pfns into the guest Sean Christopherson
                   ` (19 subsequent siblings)
  85 siblings, 1 reply; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:52 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Mark pages accessed before dropping mmu_lock when faulting in guest memory
so that LoongArch can convert to kvm_release_faultin_page() without
tripping its lockdep assertion on mmu_lock being held.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/loongarch/kvm/mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c
index 52b5c16cf250..230cafa178d7 100644
--- a/arch/loongarch/kvm/mmu.c
+++ b/arch/loongarch/kvm/mmu.c
@@ -902,13 +902,13 @@ static int kvm_map_page(struct kvm_vcpu *vcpu, unsigned long gpa, bool write)
 
 	if (writeable)
 		kvm_set_pfn_dirty(pfn);
+	kvm_release_pfn_clean(pfn);
 
 	spin_unlock(&kvm->mmu_lock);
 
 	if (prot_bits & _PAGE_DIRTY)
 		mark_page_dirty_in_slot(kvm, memslot, gfn);
 
-	kvm_release_pfn_clean(pfn);
 out:
 	srcu_read_unlock(&kvm->srcu, srcu_idx);
 	return err;
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 67/84] KVM: LoongArch: Use kvm_faultin_pfn() to map pfns into the guest
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (65 preceding siblings ...)
  2024-07-26 23:52 ` [PATCH v12 66/84] KVM: LoongArch: Mark "struct page" pfn accessed before dropping mmu_lock Sean Christopherson
@ 2024-07-26 23:52 ` Sean Christopherson
  2024-07-26 23:52 ` [PATCH v12 68/84] KVM: MIPS: Mark "struct page" pfns dirty only in "slow" page fault path Sean Christopherson
                   ` (18 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:52 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Convert LoongArch to kvm_faultin_pfn()+kvm_release_faultin_page(), which
are new APIs to consolidate arch code and provide consistent behavior
across all KVM architectures.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/loongarch/kvm/mmu.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c
index 230cafa178d7..83e4376deabb 100644
--- a/arch/loongarch/kvm/mmu.c
+++ b/arch/loongarch/kvm/mmu.c
@@ -780,6 +780,7 @@ static int kvm_map_page(struct kvm_vcpu *vcpu, unsigned long gpa, bool write)
 	struct kvm *kvm = vcpu->kvm;
 	struct kvm_memory_slot *memslot;
 	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
+	struct page *page;
 
 	/* Try the fast path to handle old / clean pages */
 	srcu_idx = srcu_read_lock(&kvm->srcu);
@@ -807,7 +808,7 @@ static int kvm_map_page(struct kvm_vcpu *vcpu, unsigned long gpa, bool write)
 	mmu_seq = kvm->mmu_invalidate_seq;
 	/*
 	 * Ensure the read of mmu_invalidate_seq isn't reordered with PTE reads in
-	 * gfn_to_pfn_prot() (which calls get_user_pages()), so that we don't
+	 * kvm_faultin_pfn() (which calls get_user_pages()), so that we don't
 	 * risk the page we get a reference to getting unmapped before we have a
 	 * chance to grab the mmu_lock without mmu_invalidate_retry() noticing.
 	 *
@@ -819,7 +820,7 @@ static int kvm_map_page(struct kvm_vcpu *vcpu, unsigned long gpa, bool write)
 	smp_rmb();
 
 	/* Slow path - ask KVM core whether we can access this GPA */
-	pfn = gfn_to_pfn_prot(kvm, gfn, write, &writeable);
+	pfn = kvm_faultin_pfn(vcpu, gfn, write, &writeable, &page);
 	if (is_error_noslot_pfn(pfn)) {
 		err = -EFAULT;
 		goto out;
@@ -831,10 +832,10 @@ static int kvm_map_page(struct kvm_vcpu *vcpu, unsigned long gpa, bool write)
 		/*
 		 * This can happen when mappings are changed asynchronously, but
 		 * also synchronously if a COW is triggered by
-		 * gfn_to_pfn_prot().
+		 * kvm_faultin_pfn().
 		 */
 		spin_unlock(&kvm->mmu_lock);
-		kvm_release_pfn_clean(pfn);
+		kvm_release_page_unused(page);
 		if (retry_no > 100) {
 			retry_no = 0;
 			schedule();
@@ -900,10 +901,7 @@ static int kvm_map_page(struct kvm_vcpu *vcpu, unsigned long gpa, bool write)
 		++kvm->stat.pages;
 	kvm_set_pte(ptep, new_pte);
 
-	if (writeable)
-		kvm_set_pfn_dirty(pfn);
-	kvm_release_pfn_clean(pfn);
-
+	kvm_release_faultin_page(kvm, page, false, writeable);
 	spin_unlock(&kvm->mmu_lock);
 
 	if (prot_bits & _PAGE_DIRTY)
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 68/84] KVM: MIPS: Mark "struct page" pfns dirty only in "slow" page fault path
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (66 preceding siblings ...)
  2024-07-26 23:52 ` [PATCH v12 67/84] KVM: LoongArch: Use kvm_faultin_pfn() to map pfns into the guest Sean Christopherson
@ 2024-07-26 23:52 ` Sean Christopherson
  2024-07-26 23:52 ` [PATCH v12 69/84] KVM: MIPS: Mark "struct page" pfns accessed " Sean Christopherson
                   ` (17 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:52 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Mark pages/folios dirty only the slow page fault path, i.e. only when
mmu_lock is held and the operation is mmu_notifier-protected, as marking a
page/folio dirty after it has been written back can make some filesystems
unhappy (backing KVM guests will such filesystem files is uncommon, and
the race is minuscule, hence the lack of complaints).

See the link below for details.

Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/mips/kvm/mmu.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
index c17157e700c0..4da9ce4eb54d 100644
--- a/arch/mips/kvm/mmu.c
+++ b/arch/mips/kvm/mmu.c
@@ -514,7 +514,6 @@ static int _kvm_mips_map_page_fast(struct kvm_vcpu *vcpu, unsigned long gpa,
 		set_pte(ptep, pte_mkdirty(*ptep));
 		pfn = pte_pfn(*ptep);
 		mark_page_dirty(kvm, gfn);
-		kvm_set_pfn_dirty(pfn);
 	}
 
 	if (out_entry)
@@ -628,7 +627,6 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, unsigned long gpa,
 		if (write_fault) {
 			prot_bits |= __WRITEABLE;
 			mark_page_dirty(kvm, gfn);
-			kvm_set_pfn_dirty(pfn);
 		}
 	}
 	entry = pfn_pte(pfn, __pgprot(prot_bits));
@@ -642,6 +640,9 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, unsigned long gpa,
 	if (out_buddy)
 		*out_buddy = *ptep_buddy(ptep);
 
+	if (writeable)
+		kvm_set_pfn_dirty(pfn);
+
 	spin_unlock(&kvm->mmu_lock);
 	kvm_release_pfn_clean(pfn);
 	kvm_set_pfn_accessed(pfn);
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 69/84] KVM: MIPS: Mark "struct page" pfns accessed only in "slow" page fault path
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (67 preceding siblings ...)
  2024-07-26 23:52 ` [PATCH v12 68/84] KVM: MIPS: Mark "struct page" pfns dirty only in "slow" page fault path Sean Christopherson
@ 2024-07-26 23:52 ` Sean Christopherson
  2024-07-26 23:52 ` [PATCH v12 70/84] KVM: MIPS: Mark "struct page" pfns accessed prior to dropping mmu_lock Sean Christopherson
                   ` (16 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:52 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Mark pages accessed only in the slow page fault path in order to remove
an unnecessary user of kvm_pfn_to_refcounted_page().  Marking pages
accessed in the primary MMU during KVM page fault handling isn't harmful,
but it's largely pointless and likely a waste of a cycles since the
primary MMU will call into KVM via mmu_notifiers when aging pages.  I.e.
KVM participates in a "pull" model, so there's no need to also "push"
updates.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/mips/kvm/mmu.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
index 4da9ce4eb54d..f1e4b618ec6d 100644
--- a/arch/mips/kvm/mmu.c
+++ b/arch/mips/kvm/mmu.c
@@ -484,8 +484,6 @@ static int _kvm_mips_map_page_fast(struct kvm_vcpu *vcpu, unsigned long gpa,
 	struct kvm *kvm = vcpu->kvm;
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	pte_t *ptep;
-	kvm_pfn_t pfn = 0;	/* silence bogus GCC warning */
-	bool pfn_valid = false;
 	int ret = 0;
 
 	spin_lock(&kvm->mmu_lock);
@@ -498,12 +496,9 @@ static int _kvm_mips_map_page_fast(struct kvm_vcpu *vcpu, unsigned long gpa,
 	}
 
 	/* Track access to pages marked old */
-	if (!pte_young(*ptep)) {
+	if (!pte_young(*ptep))
 		set_pte(ptep, pte_mkyoung(*ptep));
-		pfn = pte_pfn(*ptep);
-		pfn_valid = true;
-		/* call kvm_set_pfn_accessed() after unlock */
-	}
+
 	if (write_fault && !pte_dirty(*ptep)) {
 		if (!pte_write(*ptep)) {
 			ret = -EFAULT;
@@ -512,7 +507,6 @@ static int _kvm_mips_map_page_fast(struct kvm_vcpu *vcpu, unsigned long gpa,
 
 		/* Track dirtying of writeable pages */
 		set_pte(ptep, pte_mkdirty(*ptep));
-		pfn = pte_pfn(*ptep);
 		mark_page_dirty(kvm, gfn);
 	}
 
@@ -523,8 +517,6 @@ static int _kvm_mips_map_page_fast(struct kvm_vcpu *vcpu, unsigned long gpa,
 
 out:
 	spin_unlock(&kvm->mmu_lock);
-	if (pfn_valid)
-		kvm_set_pfn_accessed(pfn);
 	return ret;
 }
 
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 70/84] KVM: MIPS: Mark "struct page" pfns accessed prior to dropping mmu_lock
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (68 preceding siblings ...)
  2024-07-26 23:52 ` [PATCH v12 69/84] KVM: MIPS: Mark "struct page" pfns accessed " Sean Christopherson
@ 2024-07-26 23:52 ` Sean Christopherson
  2024-07-26 23:52 ` [PATCH v12 71/84] KVM: MIPS: Use kvm_faultin_pfn() to map pfns into the guest Sean Christopherson
                   ` (15 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:52 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Mark pages accessed before dropping mmu_lock when faulting in guest memory
so that MIPS can convert to kvm_release_faultin_page() without tripping
its lockdep assertion on mmu_lock being held.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/mips/kvm/mmu.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
index f1e4b618ec6d..69463ab24d97 100644
--- a/arch/mips/kvm/mmu.c
+++ b/arch/mips/kvm/mmu.c
@@ -634,10 +634,9 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, unsigned long gpa,
 
 	if (writeable)
 		kvm_set_pfn_dirty(pfn);
-
-	spin_unlock(&kvm->mmu_lock);
 	kvm_release_pfn_clean(pfn);
-	kvm_set_pfn_accessed(pfn);
+
+	spin_unlock(&kvm->mmu_lock);
 out:
 	srcu_read_unlock(&kvm->srcu, srcu_idx);
 	return err;
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 71/84] KVM: MIPS: Use kvm_faultin_pfn() to map pfns into the guest
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (69 preceding siblings ...)
  2024-07-26 23:52 ` [PATCH v12 70/84] KVM: MIPS: Mark "struct page" pfns accessed prior to dropping mmu_lock Sean Christopherson
@ 2024-07-26 23:52 ` Sean Christopherson
  2024-07-26 23:52 ` [PATCH v12 72/84] KVM: PPC: Remove extra get_page() to fix page refcount leak Sean Christopherson
                   ` (14 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:52 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Convert MIPS to kvm_faultin_pfn()+kvm_release_faultin_page(), which
are new APIs to consolidate arch code and provide consistent behavior
across all KVM architectures.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/mips/kvm/mmu.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
index 69463ab24d97..d2c3b6b41f18 100644
--- a/arch/mips/kvm/mmu.c
+++ b/arch/mips/kvm/mmu.c
@@ -557,6 +557,7 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, unsigned long gpa,
 	bool writeable;
 	unsigned long prot_bits;
 	unsigned long mmu_seq;
+	struct page *page;
 
 	/* Try the fast path to handle old / clean pages */
 	srcu_idx = srcu_read_lock(&kvm->srcu);
@@ -578,7 +579,7 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, unsigned long gpa,
 	mmu_seq = kvm->mmu_invalidate_seq;
 	/*
 	 * Ensure the read of mmu_invalidate_seq isn't reordered with PTE reads
-	 * in gfn_to_pfn_prot() (which calls get_user_pages()), so that we don't
+	 * in kvm_faultin_pfn() (which calls get_user_pages()), so that we don't
 	 * risk the page we get a reference to getting unmapped before we have a
 	 * chance to grab the mmu_lock without mmu_invalidate_retry() noticing.
 	 *
@@ -590,7 +591,7 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, unsigned long gpa,
 	smp_rmb();
 
 	/* Slow path - ask KVM core whether we can access this GPA */
-	pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, &writeable);
+	pfn = kvm_faultin_pfn(vcpu, gfn, write_fault, &writeable, &page);
 	if (is_error_noslot_pfn(pfn)) {
 		err = -EFAULT;
 		goto out;
@@ -602,10 +603,10 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, unsigned long gpa,
 		/*
 		 * This can happen when mappings are changed asynchronously, but
 		 * also synchronously if a COW is triggered by
-		 * gfn_to_pfn_prot().
+		 * kvm_faultin_pfn().
 		 */
 		spin_unlock(&kvm->mmu_lock);
-		kvm_release_pfn_clean(pfn);
+		kvm_release_page_unused(page);
 		goto retry;
 	}
 
@@ -632,10 +633,7 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, unsigned long gpa,
 	if (out_buddy)
 		*out_buddy = *ptep_buddy(ptep);
 
-	if (writeable)
-		kvm_set_pfn_dirty(pfn);
-	kvm_release_pfn_clean(pfn);
-
+	kvm_release_faultin_page(kvm, page, false, writeable);
 	spin_unlock(&kvm->mmu_lock);
 out:
 	srcu_read_unlock(&kvm->srcu, srcu_idx);
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 72/84] KVM: PPC: Remove extra get_page() to fix page refcount leak
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (70 preceding siblings ...)
  2024-07-26 23:52 ` [PATCH v12 71/84] KVM: MIPS: Use kvm_faultin_pfn() to map pfns into the guest Sean Christopherson
@ 2024-07-26 23:52 ` Sean Christopherson
  2024-07-26 23:52 ` [PATCH v12 73/84] KVM: PPC: Use kvm_vcpu_map() to map guest memory to patch dcbz instructions Sean Christopherson
                   ` (13 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:52 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Don't manually do get_page() when patching dcbz, as gfn_to_page() gifts
the caller a reference.  I.e. doing get_page() will leak the page due to
not putting all references.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/powerpc/kvm/book3s_pr.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 1bdcd4ee4813..ae4757ac0848 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -652,7 +652,6 @@ static void kvmppc_patch_dcbz(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte)
 	hpage_offset &= ~0xFFFULL;
 	hpage_offset /= 4;
 
-	get_page(hpage);
 	page = kmap_atomic(hpage);
 
 	/* patch dcbz into reserved instruction, so we trap */
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 73/84] KVM: PPC: Use kvm_vcpu_map() to map guest memory to patch dcbz instructions
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (71 preceding siblings ...)
  2024-07-26 23:52 ` [PATCH v12 72/84] KVM: PPC: Remove extra get_page() to fix page refcount leak Sean Christopherson
@ 2024-07-26 23:52 ` Sean Christopherson
  2024-07-26 23:52 ` [PATCH v12 74/84] KVM: Convert gfn_to_page() to use kvm_follow_pfn() Sean Christopherson
                   ` (12 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:52 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Use kvm_vcpu_map() when patching dcbz in guest memory, as a regular GUP
isn't technically sufficient when writing to data in the target pages.
As per Documentation/core-api/pin_user_pages.rst:

      Correct (uses FOLL_PIN calls):
          pin_user_pages()
          write to the data within the pages
          unpin_user_pages()

      INCORRECT (uses FOLL_GET calls):
          get_user_pages()
          write to the data within the pages
          put_page()

As a happy bonus, using kvm_vcpu_{,un}map() takes care of creating a
mapping and marking the page dirty.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/powerpc/kvm/book3s_pr.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index ae4757ac0848..393c18958a5b 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -639,28 +639,27 @@ static void kvmppc_set_pvr_pr(struct kvm_vcpu *vcpu, u32 pvr)
  */
 static void kvmppc_patch_dcbz(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte)
 {
-	struct page *hpage;
+	struct kvm_host_map map;
 	u64 hpage_offset;
 	u32 *page;
-	int i;
+	int i, r;
 
-	hpage = gfn_to_page(vcpu->kvm, pte->raddr >> PAGE_SHIFT);
-	if (!hpage)
+	r = kvm_vcpu_map(vcpu, pte->raddr >> PAGE_SHIFT, &map);
+	if (r)
 		return;
 
 	hpage_offset = pte->raddr & ~PAGE_MASK;
 	hpage_offset &= ~0xFFFULL;
 	hpage_offset /= 4;
 
-	page = kmap_atomic(hpage);
+	page = map.hva;
 
 	/* patch dcbz into reserved instruction, so we trap */
 	for (i=hpage_offset; i < hpage_offset + (HW_PAGE_SIZE / 4); i++)
 		if ((be32_to_cpu(page[i]) & 0xff0007ff) == INS_DCBZ)
 			page[i] &= cpu_to_be32(0xfffffff7);
 
-	kunmap_atomic(page);
-	put_page(hpage);
+	kvm_vcpu_unmap(vcpu, &map);
 }
 
 static bool kvmppc_visible_gpa(struct kvm_vcpu *vcpu, gpa_t gpa)
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 74/84] KVM: Convert gfn_to_page() to use kvm_follow_pfn()
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (72 preceding siblings ...)
  2024-07-26 23:52 ` [PATCH v12 73/84] KVM: PPC: Use kvm_vcpu_map() to map guest memory to patch dcbz instructions Sean Christopherson
@ 2024-07-26 23:52 ` Sean Christopherson
  2024-07-26 23:52 ` [PATCH v12 75/84] KVM: Add support for read-only usage of gfn_to_page() Sean Christopherson
                   ` (11 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:52 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Convert gfn_to_page() to the new kvm_follow_pfn() internal API, which will
eventually allow removing gfn_to_pfn() and kvm_pfn_to_refcounted_page().

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 virt/kvm/kvm_main.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6dc448602751..d0f55a6ecb31 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3181,14 +3181,16 @@ EXPORT_SYMBOL_GPL(kvm_prefetch_pages);
  */
 struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
 {
-	kvm_pfn_t pfn;
+	struct page *refcounted_page = NULL;
+	struct kvm_follow_pfn kfp = {
+		.slot = gfn_to_memslot(kvm, gfn),
+		.gfn = gfn,
+		.flags = FOLL_WRITE,
+		.refcounted_page = &refcounted_page,
+	};
 
-	pfn = gfn_to_pfn(kvm, gfn);
-
-	if (is_error_noslot_pfn(pfn))
-		return NULL;
-
-	return kvm_pfn_to_refcounted_page(pfn);
+	(void)kvm_follow_pfn(&kfp);
+	return refcounted_page;
 }
 EXPORT_SYMBOL_GPL(gfn_to_page);
 
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 75/84] KVM: Add support for read-only usage of gfn_to_page()
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (73 preceding siblings ...)
  2024-07-26 23:52 ` [PATCH v12 74/84] KVM: Convert gfn_to_page() to use kvm_follow_pfn() Sean Christopherson
@ 2024-07-26 23:52 ` Sean Christopherson
  2024-07-26 23:52 ` [PATCH v12 76/84] KVM: arm64: Use __gfn_to_page() when copying MTE tags to/from userspace Sean Christopherson
                   ` (10 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:52 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Rework gfn_to_page() to support read-only accesses so that it can be used
by arm64 to get MTE tags out of guest memory.

Opportunistically rewrite the comment to be even more stern about using
gfn_to_page(), as there are very few scenarios where requiring a struct
page is actually the right thing to do (though there are such scenarios).
Add a FIXME to call out that KVM probably should be pinning pages, not
just getting pages.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 include/linux/kvm_host.h |  7 ++++++-
 virt/kvm/kvm_main.c      | 15 ++++++++-------
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 91341cdc6562..f2d3c3c436cc 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1198,7 +1198,12 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 int kvm_prefetch_pages(struct kvm_memory_slot *slot, gfn_t gfn,
 		       struct page **pages, int nr_pages);
 
-struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn);
+struct page *__gfn_to_page(struct kvm *kvm, gfn_t gfn, bool write);
+static inline struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
+{
+	return __gfn_to_page(kvm, gfn, true);
+}
+
 unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn);
 unsigned long gfn_to_hva_prot(struct kvm *kvm, gfn_t gfn, bool *writable);
 unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot, gfn_t gfn);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d0f55a6ecb31..16bc3ac3ff84 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3174,25 +3174,26 @@ int kvm_prefetch_pages(struct kvm_memory_slot *slot, gfn_t gfn,
 EXPORT_SYMBOL_GPL(kvm_prefetch_pages);
 
 /*
- * Do not use this helper unless you are absolutely certain the gfn _must_ be
- * backed by 'struct page'.  A valid example is if the backing memslot is
- * controlled by KVM.  Note, if the returned page is valid, it's refcount has
- * been elevated by gfn_to_pfn().
+ * Don't use this API unless you are absolutely, positively certain that KVM
+ * needs to get a struct page, e.g. to pin the page for firmware DMA.
+ *
+ * FIXME: Users of this API likely need to FOLL_PIN the page, not just elevate
+ *	  its refcount.
  */
-struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
+struct page *__gfn_to_page(struct kvm *kvm, gfn_t gfn, bool write)
 {
 	struct page *refcounted_page = NULL;
 	struct kvm_follow_pfn kfp = {
 		.slot = gfn_to_memslot(kvm, gfn),
 		.gfn = gfn,
-		.flags = FOLL_WRITE,
+		.flags = write ? FOLL_WRITE : 0,
 		.refcounted_page = &refcounted_page,
 	};
 
 	(void)kvm_follow_pfn(&kfp);
 	return refcounted_page;
 }
-EXPORT_SYMBOL_GPL(gfn_to_page);
+EXPORT_SYMBOL_GPL(__gfn_to_page);
 
 int __kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map,
 		   bool writable)
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 76/84] KVM: arm64: Use __gfn_to_page() when copying MTE tags to/from userspace
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (74 preceding siblings ...)
  2024-07-26 23:52 ` [PATCH v12 75/84] KVM: Add support for read-only usage of gfn_to_page() Sean Christopherson
@ 2024-07-26 23:52 ` Sean Christopherson
  2024-07-26 23:52 ` [PATCH v12 77/84] KVM: PPC: Explicitly require struct page memory for Ultravisor sharing Sean Christopherson
                   ` (9 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:52 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Use __gfn_to_page() instead when copying MTE tags between guest and
userspace.  This will eventually allow removing gfn_to_pfn_prot(),
gfn_to_pfn(), kvm_pfn_to_refcounted_page(), and related APIs.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm64/kvm/guest.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 962f985977c2..4cd7ffa76794 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -1051,20 +1051,18 @@ int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
 	}
 
 	while (length > 0) {
-		kvm_pfn_t pfn = gfn_to_pfn_prot(kvm, gfn, write, NULL);
+		struct page *page = __gfn_to_page(kvm, gfn, write);
 		void *maddr;
 		unsigned long num_tags;
-		struct page *page;
 
-		if (is_error_noslot_pfn(pfn)) {
-			ret = -EFAULT;
-			goto out;
-		}
-
-		page = pfn_to_online_page(pfn);
 		if (!page) {
+			ret = -EFAULT;
+			goto out;
+		}
+
+		if (!pfn_to_online_page(page_to_pfn(page))) {
 			/* Reject ZONE_DEVICE memory */
-			kvm_release_pfn_clean(pfn);
+			kvm_release_page_unused(page);
 			ret = -EFAULT;
 			goto out;
 		}
@@ -1078,7 +1076,7 @@ int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
 				/* No tags in memory, so write zeros */
 				num_tags = MTE_GRANULES_PER_PAGE -
 					clear_user(tags, MTE_GRANULES_PER_PAGE);
-			kvm_release_pfn_clean(pfn);
+			kvm_release_page_clean(page);
 		} else {
 			/*
 			 * Only locking to serialise with a concurrent
@@ -1093,8 +1091,7 @@ int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
 			if (num_tags != MTE_GRANULES_PER_PAGE)
 				mte_clear_page_tags(maddr);
 			set_page_mte_tagged(page);
-
-			kvm_release_pfn_dirty(pfn);
+			kvm_release_page_dirty(page);
 		}
 
 		if (num_tags != MTE_GRANULES_PER_PAGE) {
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 77/84] KVM: PPC: Explicitly require struct page memory for Ultravisor sharing
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (75 preceding siblings ...)
  2024-07-26 23:52 ` [PATCH v12 76/84] KVM: arm64: Use __gfn_to_page() when copying MTE tags to/from userspace Sean Christopherson
@ 2024-07-26 23:52 ` Sean Christopherson
  2024-07-26 23:52 ` [PATCH v12 78/84] KVM: Drop gfn_to_pfn() APIs now that all users are gone Sean Christopherson
                   ` (8 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:52 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Explicitly require "struct page" memory when sharing memory between
guest and host via an Ultravisor.  Given the number of pfn_to_page()
calls in the code, it's safe to assume that KVM already requires that the
pfn returned by gfn_to_pfn() is backed by struct page, i.e. this is
likely a bug fix, not a reduction in KVM capabilities.

Switching to gfn_to_page() will eventually allow removing gfn_to_pfn()
and kvm_pfn_to_refcounted_page().

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/powerpc/kvm/book3s_hv_uvmem.c | 25 ++++++++++++-------------
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 92f33115144b..3a6592a31a10 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -879,9 +879,8 @@ static unsigned long kvmppc_share_page(struct kvm *kvm, unsigned long gpa,
 {
 
 	int ret = H_PARAMETER;
-	struct page *uvmem_page;
+	struct page *page, *uvmem_page;
 	struct kvmppc_uvmem_page_pvt *pvt;
-	unsigned long pfn;
 	unsigned long gfn = gpa >> page_shift;
 	int srcu_idx;
 	unsigned long uvmem_pfn;
@@ -901,8 +900,8 @@ static unsigned long kvmppc_share_page(struct kvm *kvm, unsigned long gpa,
 
 retry:
 	mutex_unlock(&kvm->arch.uvmem_lock);
-	pfn = gfn_to_pfn(kvm, gfn);
-	if (is_error_noslot_pfn(pfn))
+	page = gfn_to_page(kvm, gfn);
+	if (!page)
 		goto out;
 
 	mutex_lock(&kvm->arch.uvmem_lock);
@@ -911,16 +910,16 @@ static unsigned long kvmppc_share_page(struct kvm *kvm, unsigned long gpa,
 		pvt = uvmem_page->zone_device_data;
 		pvt->skip_page_out = true;
 		pvt->remove_gfn = false; /* it continues to be a valid GFN */
-		kvm_release_pfn_clean(pfn);
+		kvm_release_page_unused(page);
 		goto retry;
 	}
 
-	if (!uv_page_in(kvm->arch.lpid, pfn << page_shift, gpa, 0,
+	if (!uv_page_in(kvm->arch.lpid, page_to_pfn(page) << page_shift, gpa, 0,
 				page_shift)) {
 		kvmppc_gfn_shared(gfn, kvm);
 		ret = H_SUCCESS;
 	}
-	kvm_release_pfn_clean(pfn);
+	kvm_release_page_clean(page);
 	mutex_unlock(&kvm->arch.uvmem_lock);
 out:
 	srcu_read_unlock(&kvm->srcu, srcu_idx);
@@ -1083,21 +1082,21 @@ kvmppc_h_svm_page_out(struct kvm *kvm, unsigned long gpa,
 
 int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn)
 {
-	unsigned long pfn;
+	struct page *page;
 	int ret = U_SUCCESS;
 
-	pfn = gfn_to_pfn(kvm, gfn);
-	if (is_error_noslot_pfn(pfn))
+	page = gfn_to_page(kvm, gfn);
+	if (!page)
 		return -EFAULT;
 
 	mutex_lock(&kvm->arch.uvmem_lock);
 	if (kvmppc_gfn_is_uvmem_pfn(gfn, kvm, NULL))
 		goto out;
 
-	ret = uv_page_in(kvm->arch.lpid, pfn << PAGE_SHIFT, gfn << PAGE_SHIFT,
-			 0, PAGE_SHIFT);
+	ret = uv_page_in(kvm->arch.lpid, page_to_pfn(page) << PAGE_SHIFT,
+			 gfn << PAGE_SHIFT, 0, PAGE_SHIFT);
 out:
-	kvm_release_pfn_clean(pfn);
+	kvm_release_page_clean(page);
 	mutex_unlock(&kvm->arch.uvmem_lock);
 	return (ret == U_SUCCESS) ? RESUME_GUEST : -EFAULT;
 }
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 78/84] KVM: Drop gfn_to_pfn() APIs now that all users are gone
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (76 preceding siblings ...)
  2024-07-26 23:52 ` [PATCH v12 77/84] KVM: PPC: Explicitly require struct page memory for Ultravisor sharing Sean Christopherson
@ 2024-07-26 23:52 ` Sean Christopherson
  2024-07-26 23:52 ` [PATCH v12 79/84] KVM: s390: Use kvm_release_page_dirty() to unpin "struct page" memory Sean Christopherson
                   ` (7 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:52 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Drop gfn_to_pfn() and all its variants now that all users are gone.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 include/linux/kvm_host.h | 11 --------
 virt/kvm/kvm_main.c      | 59 ----------------------------------------
 2 files changed, 70 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f2d3c3c436cc..34a1cadb1b80 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1260,14 +1260,6 @@ static inline kvm_pfn_t kvm_faultin_pfn(struct kvm_vcpu *vcpu, gfn_t gfn,
 				 write ? FOLL_WRITE : 0, writable, refcounted_page);
 }
 
-kvm_pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn);
-kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
-		      bool *writable);
-kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn);
-kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
-			       bool interruptible, bool no_wait,
-			       bool write_fault, bool *writable);
-
 void kvm_release_pfn_clean(kvm_pfn_t pfn);
 void kvm_release_pfn_dirty(kvm_pfn_t pfn);
 void kvm_set_pfn_dirty(kvm_pfn_t pfn);
@@ -1342,9 +1334,6 @@ unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn);
 void mark_page_dirty_in_slot(struct kvm *kvm, const struct kvm_memory_slot *memslot, gfn_t gfn);
 void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
 
-
-kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn);
-
 int __kvm_vcpu_map(struct kvm_vcpu *vcpu, gpa_t gpa, struct kvm_host_map *map,
 		   bool writable);
 void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 16bc3ac3ff84..5dcf3561b829 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3059,65 +3059,6 @@ static kvm_pfn_t kvm_follow_pfn(struct kvm_follow_pfn *kfp)
 	return hva_to_pfn(kfp);
 }
 
-kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
-			       bool interruptible, bool no_wait,
-			       bool write_fault, bool *writable)
-{
-	struct kvm_follow_pfn kfp = {
-		.slot = slot,
-		.gfn = gfn,
-		.map_writable = writable,
-	};
-
-	if (write_fault)
-		kfp.flags |= FOLL_WRITE;
-	if (no_wait)
-		kfp.flags |= FOLL_NOWAIT;
-	if (interruptible)
-		kfp.flags |= FOLL_INTERRUPTIBLE;
-
-	return kvm_follow_pfn(&kfp);
-}
-EXPORT_SYMBOL_GPL(__gfn_to_pfn_memslot);
-
-kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
-		      bool *writable)
-{
-	struct kvm_follow_pfn kfp = {
-		.slot = gfn_to_memslot(kvm, gfn),
-		.gfn = gfn,
-		.flags = write_fault ? FOLL_WRITE : 0,
-		.map_writable = writable,
-	};
-
-	return kvm_follow_pfn(&kfp);
-}
-EXPORT_SYMBOL_GPL(gfn_to_pfn_prot);
-
-kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn)
-{
-	struct kvm_follow_pfn kfp = {
-		.slot = slot,
-		.gfn = gfn,
-		.flags = FOLL_WRITE,
-	};
-
-	return kvm_follow_pfn(&kfp);
-}
-EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot);
-
-kvm_pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
-{
-	return gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn);
-}
-EXPORT_SYMBOL_GPL(gfn_to_pfn);
-
-kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn)
-{
-	return gfn_to_pfn_memslot(kvm_vcpu_gfn_to_memslot(vcpu, gfn), gfn);
-}
-EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_pfn);
-
 kvm_pfn_t kvm_lookup_pfn(struct kvm *kvm, gfn_t gfn)
 {
 	struct page *refcounted_page = NULL;
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 79/84] KVM: s390: Use kvm_release_page_dirty() to unpin "struct page" memory
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (77 preceding siblings ...)
  2024-07-26 23:52 ` [PATCH v12 78/84] KVM: Drop gfn_to_pfn() APIs now that all users are gone Sean Christopherson
@ 2024-07-26 23:52 ` Sean Christopherson
  2024-07-26 23:52 ` [PATCH v12 80/84] KVM: Make kvm_follow_pfn.refcounted_page a required field Sean Christopherson
                   ` (6 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:52 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Use kvm_release_page_dirty() when unpinning guest pages, as the pfn was
retrieved via pin_guest_page(), i.e. is guaranteed to be backed by struct
page memory.  This will allow dropping kvm_release_pfn_dirty() and
friends.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/s390/kvm/vsie.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
index 566697ee37eb..f6f1569be1cc 100644
--- a/arch/s390/kvm/vsie.c
+++ b/arch/s390/kvm/vsie.c
@@ -670,7 +670,7 @@ static int pin_guest_page(struct kvm *kvm, gpa_t gpa, hpa_t *hpa)
 /* Unpins a page previously pinned via pin_guest_page, marking it as dirty. */
 static void unpin_guest_page(struct kvm *kvm, gpa_t gpa, hpa_t hpa)
 {
-	kvm_release_pfn_dirty(hpa >> PAGE_SHIFT);
+	kvm_release_page_dirty(pfn_to_page(hpa >> PAGE_SHIFT));
 	/* mark the page always as dirty for migration */
 	mark_page_dirty(kvm, gpa_to_gfn(gpa));
 }
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 80/84] KVM: Make kvm_follow_pfn.refcounted_page a required field
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (78 preceding siblings ...)
  2024-07-26 23:52 ` [PATCH v12 79/84] KVM: s390: Use kvm_release_page_dirty() to unpin "struct page" memory Sean Christopherson
@ 2024-07-26 23:52 ` Sean Christopherson
  2024-07-26 23:52 ` [PATCH v12 81/84] KVM: x86/mmu: Don't mark "struct page" accessed when zapping SPTEs Sean Christopherson
                   ` (5 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:52 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Now that the legacy gfn_to_pfn() APIs are gone, and all callers of
hva_to_pfn() pass in a refcounted_page pointer, make it a required field
to ensure all future usage in KVM plays nice.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 virt/kvm/kvm_main.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 5dcf3561b829..030a08d4b21d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2844,8 +2844,7 @@ static kvm_pfn_t kvm_resolve_pfn(struct kvm_follow_pfn *kfp, struct page *page,
 		pfn = page_to_pfn(page);
 	}
 
-	if (kfp->refcounted_page)
-		*kfp->refcounted_page = page;
+	*kfp->refcounted_page = page;
 
 	return pfn;
 }
@@ -3001,6 +3000,9 @@ kvm_pfn_t hva_to_pfn(struct kvm_follow_pfn *kfp)
 
 	might_sleep();
 
+	if (WARN_ON_ONCE(!kfp->refcounted_page))
+		return KVM_PFN_ERR_FAULT;
+
 	if (hva_to_pfn_fast(kfp, &pfn))
 		return pfn;
 
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 81/84] KVM: x86/mmu: Don't mark "struct page" accessed when zapping SPTEs
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (79 preceding siblings ...)
  2024-07-26 23:52 ` [PATCH v12 80/84] KVM: Make kvm_follow_pfn.refcounted_page a required field Sean Christopherson
@ 2024-07-26 23:52 ` Sean Christopherson
  2024-07-26 23:52 ` [PATCH v12 82/84] KVM: arm64: Don't mark "struct page" accessed when making SPTE young Sean Christopherson
                   ` (4 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:52 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Don't mark pages/folios as accessed in the primary MMU when zapping SPTEs,
as doing so relies on kvm_pfn_to_refcounted_page(), and generally speaking
is unnecessary and wasteful.  KVM participates in page aging via
mmu_notifiers, so there's no need to push "accessed" updates to the
primary MMU.

And if KVM zaps a SPTe in response to an mmu_notifier, marking it accessed
_after_ the primary MMU has decided to zap the page is likely to go
unnoticed, i.e. odds are good that, if the page is being zapped for
reclaim, the page will be swapped out regardless of whether or not KVM
marks the page accessed.

Dropping x86's use of kvm_set_pfn_accessed() also paves the way for
removing kvm_pfn_to_refcounted_page() and all its users.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c     | 17 -----------------
 arch/x86/kvm/mmu/tdp_mmu.c |  3 ---
 2 files changed, 20 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 2a0cfa225c8d..5979eeb916cd 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -546,10 +546,8 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte)
  */
 static u64 mmu_spte_clear_track_bits(struct kvm *kvm, u64 *sptep)
 {
-	kvm_pfn_t pfn;
 	u64 old_spte = *sptep;
 	int level = sptep_to_sp(sptep)->role.level;
-	struct page *page;
 
 	if (!is_shadow_present_pte(old_spte) ||
 	    !spte_has_volatile_bits(old_spte))
@@ -561,21 +559,6 @@ static u64 mmu_spte_clear_track_bits(struct kvm *kvm, u64 *sptep)
 		return old_spte;
 
 	kvm_update_page_stats(kvm, level, -1);
-
-	pfn = spte_to_pfn(old_spte);
-
-	/*
-	 * KVM doesn't hold a reference to any pages mapped into the guest, and
-	 * instead uses the mmu_notifier to ensure that KVM unmaps any pages
-	 * before they are reclaimed.  Sanity check that, if the pfn is backed
-	 * by a refcounted page, the refcount is elevated.
-	 */
-	page = kvm_pfn_to_refcounted_page(pfn);
-	WARN_ON_ONCE(page && !page_count(page));
-
-	if (is_accessed_spte(old_spte))
-		kvm_set_pfn_accessed(pfn);
-
 	return old_spte;
 }
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index d1de5f28c445..dc153cf92a40 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -861,9 +861,6 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root,
 
 		tdp_mmu_iter_set_spte(kvm, &iter, SHADOW_NONPRESENT_VALUE);
 
-		if (is_accessed_spte(iter.old_spte))
-			kvm_set_pfn_accessed(spte_to_pfn(iter.old_spte));
-
 		/*
 		 * Zappings SPTEs in invalid roots doesn't require a TLB flush,
 		 * see kvm_tdp_mmu_zap_invalidated_roots() for details.
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 82/84] KVM: arm64: Don't mark "struct page" accessed when making SPTE young
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (80 preceding siblings ...)
  2024-07-26 23:52 ` [PATCH v12 81/84] KVM: x86/mmu: Don't mark "struct page" accessed when zapping SPTEs Sean Christopherson
@ 2024-07-26 23:52 ` Sean Christopherson
  2024-07-26 23:52 ` [PATCH v12 83/84] KVM: Drop APIs that manipulate "struct page" via pfns Sean Christopherson
                   ` (3 subsequent siblings)
  85 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:52 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Don't mark pages/folios as accessed in the primary MMU when making a SPTE
young in KVM's secondary MMU, as doing so relies on
kvm_pfn_to_refcounted_page(), and generally speaking is unnecessary and
wasteful.  KVM participates in page aging via mmu_notifiers, so there's no
need to push "accessed" updates to the primary MMU.

Dropping use of kvm_set_pfn_accessed() also paves the way for removing
kvm_pfn_to_refcounted_page() and all its users.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm64/include/asm/kvm_pgtable.h | 4 +---
 arch/arm64/kvm/hyp/pgtable.c         | 7 ++-----
 arch/arm64/kvm/mmu.c                 | 6 +-----
 3 files changed, 4 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 19278dfe7978..676d80723c38 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -632,10 +632,8 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size);
  *
  * If there is a valid, leaf page-table entry used to translate @addr, then
  * set the access flag in that entry.
- *
- * Return: The old page-table entry prior to setting the flag, 0 on failure.
  */
-kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr);
+void kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr);
 
 /**
  * kvm_pgtable_stage2_test_clear_young() - Test and optionally clear the access
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 9e2bbee77491..6679e02a02c4 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -1287,19 +1287,16 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
 					NULL, NULL, 0);
 }
 
-kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr)
+void kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr)
 {
-	kvm_pte_t pte = 0;
 	int ret;
 
 	ret = stage2_update_leaf_attrs(pgt, addr, 1, KVM_PTE_LEAF_ATTR_LO_S2_AF, 0,
-				       &pte, NULL,
+				       NULL, NULL,
 				       KVM_PGTABLE_WALK_HANDLE_FAULT |
 				       KVM_PGTABLE_WALK_SHARED);
 	if (!ret)
 		dsb(ishst);
-
-	return pte;
 }
 
 struct stage2_age_data {
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 756fc856ab44..8fd8ea5b5795 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1699,18 +1699,14 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 /* Resolve the access fault by making the page young again. */
 static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
 {
-	kvm_pte_t pte;
 	struct kvm_s2_mmu *mmu;
 
 	trace_kvm_access_fault(fault_ipa);
 
 	read_lock(&vcpu->kvm->mmu_lock);
 	mmu = vcpu->arch.hw_mmu;
-	pte = kvm_pgtable_stage2_mkyoung(mmu->pgt, fault_ipa);
+	kvm_pgtable_stage2_mkyoung(mmu->pgt, fault_ipa);
 	read_unlock(&vcpu->kvm->mmu_lock);
-
-	if (kvm_pte_valid(pte))
-		kvm_set_pfn_accessed(kvm_pte_to_pfn(pte));
 }
 
 /**
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 83/84] KVM: Drop APIs that manipulate "struct page" via pfns
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (81 preceding siblings ...)
  2024-07-26 23:52 ` [PATCH v12 82/84] KVM: arm64: Don't mark "struct page" accessed when making SPTE young Sean Christopherson
@ 2024-07-26 23:52 ` Sean Christopherson
  2024-08-02 11:03   ` Alex Bennée
  2024-07-26 23:52 ` [PATCH v12 84/84] KVM: Don't grab reference on VM_MIXEDMAP pfns that have a "struct page" Sean Christopherson
                   ` (2 subsequent siblings)
  85 siblings, 1 reply; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:52 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Remove all kvm_{release,set}_pfn_*() APIs not that all users are gone.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 include/linux/kvm_host.h |  5 ----
 virt/kvm/kvm_main.c      | 55 ----------------------------------------
 2 files changed, 60 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 34a1cadb1b80..87d61f16a449 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1260,11 +1260,6 @@ static inline kvm_pfn_t kvm_faultin_pfn(struct kvm_vcpu *vcpu, gfn_t gfn,
 				 write ? FOLL_WRITE : 0, writable, refcounted_page);
 }
 
-void kvm_release_pfn_clean(kvm_pfn_t pfn);
-void kvm_release_pfn_dirty(kvm_pfn_t pfn);
-void kvm_set_pfn_dirty(kvm_pfn_t pfn);
-void kvm_set_pfn_accessed(kvm_pfn_t pfn);
-
 int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset,
 			int len);
 int kvm_read_guest(struct kvm *kvm, gpa_t gpa, void *data, unsigned long len);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 030a08d4b21d..8b85e1130a63 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3200,61 +3200,6 @@ void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map)
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_unmap);
 
-void kvm_release_pfn_clean(kvm_pfn_t pfn)
-{
-	struct page *page;
-
-	if (is_error_noslot_pfn(pfn))
-		return;
-
-	page = kvm_pfn_to_refcounted_page(pfn);
-	if (!page)
-		return;
-
-	kvm_release_page_clean(page);
-}
-EXPORT_SYMBOL_GPL(kvm_release_pfn_clean);
-
-void kvm_release_pfn_dirty(kvm_pfn_t pfn)
-{
-	struct page *page;
-
-	if (is_error_noslot_pfn(pfn))
-		return;
-
-	page = kvm_pfn_to_refcounted_page(pfn);
-	if (!page)
-		return;
-
-	kvm_release_page_dirty(page);
-}
-EXPORT_SYMBOL_GPL(kvm_release_pfn_dirty);
-
-/*
- * Note, checking for an error/noslot pfn is the caller's responsibility when
- * directly marking a page dirty/accessed.  Unlike the "release" helpers, the
- * "set" helpers are not to be used when the pfn might point at garbage.
- */
-void kvm_set_pfn_dirty(kvm_pfn_t pfn)
-{
-	if (WARN_ON(is_error_noslot_pfn(pfn)))
-		return;
-
-	if (pfn_valid(pfn))
-		kvm_set_page_dirty(pfn_to_page(pfn));
-}
-EXPORT_SYMBOL_GPL(kvm_set_pfn_dirty);
-
-void kvm_set_pfn_accessed(kvm_pfn_t pfn)
-{
-	if (WARN_ON(is_error_noslot_pfn(pfn)))
-		return;
-
-	if (pfn_valid(pfn))
-		kvm_set_page_accessed(pfn_to_page(pfn));
-}
-EXPORT_SYMBOL_GPL(kvm_set_pfn_accessed);
-
 static int next_segment(unsigned long len, int offset)
 {
 	if (len > PAGE_SIZE - offset)
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* [PATCH v12 84/84] KVM: Don't grab reference on VM_MIXEDMAP pfns that have a "struct page"
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (82 preceding siblings ...)
  2024-07-26 23:52 ` [PATCH v12 83/84] KVM: Drop APIs that manipulate "struct page" via pfns Sean Christopherson
@ 2024-07-26 23:52 ` Sean Christopherson
  2024-07-30 11:38   ` Paolo Bonzini
  2024-07-30 11:52 ` [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Paolo Bonzini
  2024-08-27  9:06 ` Alex Bennée
  85 siblings, 1 reply; 150+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:52 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Now that KVM no longer relies on an ugly heuristic to find its struct page
references, i.e. now that KVM can't get false positives on VM_MIXEDMAP
pfns, remove KVM's hack to elevate the refcount for pfns that happen to
have a valid struct page.  In addition to removing a long-standing wart
in KVM, this allows KVM to map non-refcounted struct page memory into the
guest, e.g. for exposing GPU TTM buffers to KVM guests.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 include/linux/kvm_host.h |  3 --
 virt/kvm/kvm_main.c      | 75 ++--------------------------------------
 2 files changed, 2 insertions(+), 76 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 87d61f16a449..d4513ffaf2e1 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1702,9 +1702,6 @@ void kvm_arch_sync_events(struct kvm *kvm);
 
 int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu);
 
-struct page *kvm_pfn_to_refcounted_page(kvm_pfn_t pfn);
-bool kvm_is_zone_device_page(struct page *page);
-
 struct kvm_irq_ack_notifier {
 	struct hlist_node link;
 	unsigned gsi;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8b85e1130a63..e279140f2425 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -160,52 +160,6 @@ __weak void kvm_arch_guest_memory_reclaimed(struct kvm *kvm)
 {
 }
 
-bool kvm_is_zone_device_page(struct page *page)
-{
-	/*
-	 * The metadata used by is_zone_device_page() to determine whether or
-	 * not a page is ZONE_DEVICE is guaranteed to be valid if and only if
-	 * the device has been pinned, e.g. by get_user_pages().  WARN if the
-	 * page_count() is zero to help detect bad usage of this helper.
-	 */
-	if (WARN_ON_ONCE(!page_count(page)))
-		return false;
-
-	return is_zone_device_page(page);
-}
-
-/*
- * Returns a 'struct page' if the pfn is "valid" and backed by a refcounted
- * page, NULL otherwise.  Note, the list of refcounted PG_reserved page types
- * is likely incomplete, it has been compiled purely through people wanting to
- * back guest with a certain type of memory and encountering issues.
- */
-struct page *kvm_pfn_to_refcounted_page(kvm_pfn_t pfn)
-{
-	struct page *page;
-
-	if (!pfn_valid(pfn))
-		return NULL;
-
-	page = pfn_to_page(pfn);
-	if (!PageReserved(page))
-		return page;
-
-	/* The ZERO_PAGE(s) is marked PG_reserved, but is refcounted. */
-	if (is_zero_pfn(pfn))
-		return page;
-
-	/*
-	 * ZONE_DEVICE pages currently set PG_reserved, but from a refcounting
-	 * perspective they are "normal" pages, albeit with slightly different
-	 * usage rules.
-	 */
-	if (kvm_is_zone_device_page(page))
-		return page;
-
-	return NULL;
-}
-
 /*
  * Switches to specified vcpu, until a matching vcpu_put()
  */
@@ -2814,35 +2768,10 @@ static kvm_pfn_t kvm_resolve_pfn(struct kvm_follow_pfn *kfp, struct page *page,
 	if (kfp->map_writable)
 		*kfp->map_writable = writable;
 
-	/*
-	 * FIXME: Remove this once KVM no longer blindly calls put_page() on
-	 *	  every pfn that points at a struct page.
-	 *
-	 * Get a reference for follow_pte() pfns if they happen to point at a
-	 * struct page, as KVM will ultimately call kvm_release_pfn_clean() on
-	 * the returned pfn, i.e. KVM expects to have a reference.
-	 *
-	 * Certain IO or PFNMAP mappings can be backed with valid struct pages,
-	 * but be allocated without refcounting, e.g. tail pages of
-	 * non-compound higher order allocations.  Grabbing and putting a
-	 * reference to such pages would cause KVM to prematurely free a page
-	 * it doesn't own (KVM gets and puts the one and only reference).
-	 * Don't allow those pages until the FIXME is resolved.
-	 *
-	 * Don't grab a reference for pins, callers that pin pages are required
-	 * to check refcounted_page, i.e. must not blindly release the pfn.
-	 */
-	if (pte) {
+	if (pte)
 		pfn = pte_pfn(*pte);
-
-		if (!kfp->pin) {
-			page = kvm_pfn_to_refcounted_page(pfn);
-			if (page && !get_page_unless_zero(page))
-				return KVM_PFN_ERR_FAULT;
-		}
-	} else {
+	else
 		pfn = page_to_pfn(page);
-	}
 
 	*kfp->refcounted_page = page;
 
-- 
2.46.0.rc1.232.g9752f9e123-goog


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 41/84] KVM: x86/mmu: Mark pages/folios dirty at the origin of make_spte()
  2024-07-26 23:51 ` [PATCH v12 41/84] KVM: x86/mmu: Mark pages/folios dirty at the origin of make_spte() Sean Christopherson
@ 2024-07-30  8:57   ` Paolo Bonzini
  0 siblings, 0 replies; 150+ messages in thread
From: Paolo Bonzini @ 2024-07-30  8:57 UTC (permalink / raw)
  To: Sean Christopherson, Marc Zyngier, Oliver Upton, Tianrui Zhao,
	Bibo Mao, Huacai Chen, Michael Ellerman, Anup Patel,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Christian Borntraeger,
	Janosch Frank, Claudio Imbrenda
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

On 7/27/24 01:51, Sean Christopherson wrote:
> Move the marking of folios dirty from make_spte() out to its callers,
> which have access to the _struct page_, not just the underlying pfn.
> Once all architectures follow suit, this will allow removing KVM's ugly
> hack where KVM elevates the refcount of VM_MIXEDMAP pfns that happen to
> be struct page memory.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/mmu/mmu.c         | 29 +++++++++++++++++++++++++++--
>   arch/x86/kvm/mmu/paging_tmpl.h |  5 +++++
>   arch/x86/kvm/mmu/spte.c        | 11 -----------
>   3 files changed, 32 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 1cdd67707461..7e7b855ce1e1 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2918,7 +2918,16 @@ static bool kvm_mmu_prefetch_sptes(struct kvm_vcpu *vcpu, gfn_t gfn, u64 *sptep,
>   	for (i = 0; i < nr_pages; i++, gfn++, sptep++) {
>   		mmu_set_spte(vcpu, slot, sptep, access, gfn,
>   			     page_to_pfn(pages[i]), NULL);
> -		kvm_release_page_clean(pages[i]);
> +
> +		/*
> +		 * KVM always prefetches writable pages from the primary MMU,
> +		 * and KVM can make its SPTE writable in the fast page, without

"with a fast page fault"

Paolo

> +		 * notifying the primary MMU.  Mark pages/folios dirty now to
> +		 * ensure file data is written back if it ends up being written
> +		 * by the guest.  Because KVM's prefetching GUPs writable PTEs,
> +		 * the probability of unnecessary writeback is extremely low.
> +		 */
> +		kvm_release_page_dirty(pages[i]);
>   	}
>   
>   	return true;
> @@ -4314,7 +4323,23 @@ static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
>   static void kvm_mmu_finish_page_fault(struct kvm_vcpu *vcpu,
>   				      struct kvm_page_fault *fault, int r)
>   {
> -	kvm_release_pfn_clean(fault->pfn);
> +	lockdep_assert_once(lockdep_is_held(&vcpu->kvm->mmu_lock) ||
> +			    r == RET_PF_RETRY);
> +
> +	/*
> +	 * If the page that KVM got from the *primary MMU* is writable, and KVM
> +	 * installed or reused a SPTE, mark the page/folio dirty.  Note, this
> +	 * may mark a folio dirty even if KVM created a read-only SPTE, e.g. if
> +	 * the GFN is write-protected.  Folios can't be safely marked dirty
> +	 * outside of mmu_lock as doing so could race with writeback on the
> +	 * folio.  As a result, KVM can't mark folios dirty in the fast page
> +	 * fault handler, and so KVM must (somewhat) speculatively mark the
> +	 * folio dirty if KVM could locklessly make the SPTE writable.
> +	 */
> +	if (!fault->map_writable || r == RET_PF_RETRY)
> +		kvm_release_pfn_clean(fault->pfn);
> +	else
> +		kvm_release_pfn_dirty(fault->pfn);
>   }
>   
>   static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index b6897916c76b..2e2d87a925ac 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -953,6 +953,11 @@ static int FNAME(sync_spte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, int
>   		  spte_to_pfn(spte), spte, true, false,
>   		  host_writable, &spte);
>   
> +	/*
> +	 * There is no need to mark the pfn dirty, as the new protections must
> +	 * be a subset of the old protections, i.e. synchronizing a SPTE cannot
> +	 * change the SPTE from read-only to writable.
> +	 */
>   	return mmu_spte_update(sptep, spte);
>   }
>   
> diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
> index 9b8795bd2f04..2c5650390d3b 100644
> --- a/arch/x86/kvm/mmu/spte.c
> +++ b/arch/x86/kvm/mmu/spte.c
> @@ -277,17 +277,6 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
>   		mark_page_dirty_in_slot(vcpu->kvm, slot, gfn);
>   	}
>   
> -	/*
> -	 * If the page that KVM got from the primary MMU is writable, i.e. if
> -	 * it's host-writable, mark the page/folio dirty.  As alluded to above,
> -	 * folios can't be safely marked dirty in the fast page fault handler,
> -	 * and so KVM must (somewhat) speculatively mark the folio dirty even
> -	 * though it isn't guaranteed to be written as KVM won't mark the folio
> -	 * dirty if/when the SPTE is made writable.
> -	 */
> -	if (host_writable)
> -		kvm_set_pfn_dirty(pfn);
> -
>   	*new_spte = spte;
>   	return wrprot;
>   }


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 48/84] KVM: Move x86's API to release a faultin page to common KVM
  2024-07-26 23:51 ` [PATCH v12 48/84] KVM: Move x86's API to release a faultin page to common KVM Sean Christopherson
@ 2024-07-30  8:58   ` Paolo Bonzini
  2024-07-30 19:15     ` Sean Christopherson
  0 siblings, 1 reply; 150+ messages in thread
From: Paolo Bonzini @ 2024-07-30  8:58 UTC (permalink / raw)
  To: Sean Christopherson, Marc Zyngier, Oliver Upton, Tianrui Zhao,
	Bibo Mao, Huacai Chen, Michael Ellerman, Anup Patel,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Christian Borntraeger,
	Janosch Frank, Claudio Imbrenda
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

On 7/27/24 01:51, Sean Christopherson wrote:
> Move KVM x86's helper that "finishes" the faultin process to common KVM
> so that the logic can be shared across all architectures.  Note, not all
> architectures implement a fast page fault path, but the gist of the
> comment applies to all architectures.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/mmu/mmu.c   | 24 ++----------------------
>   include/linux/kvm_host.h | 26 ++++++++++++++++++++++++++
>   2 files changed, 28 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 95beb50748fc..2a0cfa225c8d 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4323,28 +4323,8 @@ static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
>   static void kvm_mmu_finish_page_fault(struct kvm_vcpu *vcpu,
>   				      struct kvm_page_fault *fault, int r)
>   {
> -	lockdep_assert_once(lockdep_is_held(&vcpu->kvm->mmu_lock) ||
> -			    r == RET_PF_RETRY);
> -
> -	if (!fault->refcounted_page)
> -		return;
> -
> -	/*
> -	 * If the page that KVM got from the *primary MMU* is writable, and KVM
> -	 * installed or reused a SPTE, mark the page/folio dirty.  Note, this
> -	 * may mark a folio dirty even if KVM created a read-only SPTE, e.g. if
> -	 * the GFN is write-protected.  Folios can't be safely marked dirty
> -	 * outside of mmu_lock as doing so could race with writeback on the
> -	 * folio.  As a result, KVM can't mark folios dirty in the fast page
> -	 * fault handler, and so KVM must (somewhat) speculatively mark the
> -	 * folio dirty if KVM could locklessly make the SPTE writable.
> -	 */
> -	if (r == RET_PF_RETRY)
> -		kvm_release_page_unused(fault->refcounted_page);
> -	else if (!fault->map_writable)
> -		kvm_release_page_clean(fault->refcounted_page);
> -	else
> -		kvm_release_page_dirty(fault->refcounted_page);
> +	kvm_release_faultin_page(vcpu->kvm, fault->refcounted_page,
> +				 r == RET_PF_RETRY, fault->map_writable);

Does it make sense to move RET_PF_* to common code, and avoid a bool 
argument here?

Paolo

>   }
>   
>   static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 9d2a97eb30e4..91341cdc6562 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1216,6 +1216,32 @@ static inline void kvm_release_page_unused(struct page *page)
>   void kvm_release_page_clean(struct page *page);
>   void kvm_release_page_dirty(struct page *page);
>   
> +static inline void kvm_release_faultin_page(struct kvm *kvm, struct page *page,
> +					    bool unused, bool dirty)
> +{
> +	lockdep_assert_once(lockdep_is_held(&kvm->mmu_lock) || unused);
> +
> +	if (!page)
> +		return;
> +
> +	/*
> +	 * If the page that KVM got from the *primary MMU* is writable, and KVM
> +	 * installed or reused a SPTE, mark the page/folio dirty.  Note, this
> +	 * may mark a folio dirty even if KVM created a read-only SPTE, e.g. if
> +	 * the GFN is write-protected.  Folios can't be safely marked dirty
> +	 * outside of mmu_lock as doing so could race with writeback on the
> +	 * folio.  As a result, KVM can't mark folios dirty in the fast page
> +	 * fault handler, and so KVM must (somewhat) speculatively mark the
> +	 * folio dirty if KVM could locklessly make the SPTE writable.
> +	 */
> +	if (unused)
> +		kvm_release_page_unused(page);
> +	else if (dirty)
> +		kvm_release_page_dirty(page);
> +	else
> +		kvm_release_page_clean(page);
> +}
> +
>   kvm_pfn_t kvm_lookup_pfn(struct kvm *kvm, gfn_t gfn);
>   kvm_pfn_t __kvm_faultin_pfn(const struct kvm_memory_slot *slot, gfn_t gfn,
>   			    unsigned int foll, bool *writable,


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 50/84] KVM: VMX: Use __kvm_faultin_page() to get APIC access page/pfn
  2024-07-26 23:51 ` [PATCH v12 50/84] KVM: VMX: Use __kvm_faultin_page() to get APIC access page/pfn Sean Christopherson
@ 2024-07-30  8:59   ` Paolo Bonzini
  0 siblings, 0 replies; 150+ messages in thread
From: Paolo Bonzini @ 2024-07-30  8:59 UTC (permalink / raw)
  To: Sean Christopherson, Marc Zyngier, Oliver Upton, Tianrui Zhao,
	Bibo Mao, Huacai Chen, Michael Ellerman, Anup Patel,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Christian Borntraeger,
	Janosch Frank, Claudio Imbrenda
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

On 7/27/24 01:51, Sean Christopherson wrote:
> Use __kvm_faultin_page() get the APIC access page so that KVM can
> precisely release the refcounted page, i.e. to remove yet another user
> of kvm_pfn_to_refcounted_page().  While the path isn't handling a guest
> page fault, the semantics are effectively the same; KVM just happens to
> be mapping the pfn into a VMCS field instead of a secondary MMU.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/vmx/vmx.c | 13 +++++++++----
>   1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 30032585f7dc..b109bd282a52 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -6786,8 +6786,10 @@ void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
>   	struct kvm *kvm = vcpu->kvm;
>   	struct kvm_memslots *slots = kvm_memslots(kvm);
>   	struct kvm_memory_slot *slot;
> +	struct page *refcounted_page;
>   	unsigned long mmu_seq;
>   	kvm_pfn_t pfn;
> +	bool ign;

Even if you don't use it, call the out argument "writable".

Paolo

>   
>   	/* Defer reload until vmcs01 is the current VMCS. */
>   	if (is_guest_mode(vcpu)) {
> @@ -6823,7 +6825,7 @@ void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
>   	 * controls the APIC-access page memslot, and only deletes the memslot
>   	 * if APICv is permanently inhibited, i.e. the memslot won't reappear.
>   	 */
> -	pfn = gfn_to_pfn_memslot(slot, gfn);
> +	pfn = __kvm_faultin_pfn(slot, gfn, FOLL_WRITE, &ign, &refcounted_page);
>   	if (is_error_noslot_pfn(pfn))
>   		return;
>   
> @@ -6834,10 +6836,13 @@ void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
>   		vmcs_write64(APIC_ACCESS_ADDR, pfn_to_hpa(pfn));
>   
>   	/*
> -	 * Do not pin apic access page in memory, the MMU notifier
> -	 * will call us again if it is migrated or swapped out.
> +	 * Do not pin the APIC access page in memory so that it can be freely
> +	 * migrated, the MMU notifier will call us again if it is migrated or
> +	 * swapped out.  KVM backs the memslot with anonymous memory, the pfn
> +	 * should always point at a refcounted page (if the pfn is valid).
>   	 */
> -	kvm_release_pfn_clean(pfn);
> +	if (!WARN_ON_ONCE(!refcounted_page))
> +		kvm_release_page_clean(refcounted_page);
>   
>   	/*
>   	 * No need for a manual TLB flush at this point, KVM has already done a


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 45/84] KVM: guest_memfd: Provide "struct page" as output from kvm_gmem_get_pfn()
  2024-07-26 23:51 ` [PATCH v12 45/84] KVM: guest_memfd: Provide "struct page" as output from kvm_gmem_get_pfn() Sean Christopherson
@ 2024-07-30  9:05   ` Paolo Bonzini
  2024-07-30 20:00     ` Sean Christopherson
  0 siblings, 1 reply; 150+ messages in thread
From: Paolo Bonzini @ 2024-07-30  9:05 UTC (permalink / raw)
  To: Sean Christopherson, Marc Zyngier, Oliver Upton, Tianrui Zhao,
	Bibo Mao, Huacai Chen, Michael Ellerman, Anup Patel,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Christian Borntraeger,
	Janosch Frank, Claudio Imbrenda
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

On 7/27/24 01:51, Sean Christopherson wrote:
> Provide the "struct page" associated with a guest_memfd pfn as an output
> from __kvm_gmem_get_pfn() so that KVM guest page fault handlers can
        ^^^^^^^^^^^^^^^^^^^^

Just "kvm_gmem_get_pfn()".

> directly put the page instead of having to rely on
> kvm_pfn_to_refcounted_page().

This will conflict with my series, where I'm introducing
folio_file_pfn() and using it here:
> -	page = folio_file_page(folio, index);
> +	*page = folio_file_page(folio, index);
>   
> -	*pfn = page_to_pfn(page);
> +	*pfn = page_to_pfn(*page);
>   	if (max_order)
>   		*max_order = 0;

That said, I think it's better to turn kvm_gmem_get_pfn() into
kvm_gmem_get_page() here, and pull the page_to_pfn() or page_to_phys()
to the caller as applicable.  This highlights that the caller always
gets a refcounted page with guest_memfd.

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 901be9e420a4..bcc4a4c594ef 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4348,13 +4348,14 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
  		return -EFAULT;
  	}
  
-	r = kvm_gmem_get_pfn(vcpu->kvm, fault->slot, fault->gfn, &fault->pfn,
+	r = kvm_gmem_get_page(vcpu->kvm, fault->slot, fault->gfn, &fault->refcounted_page,
  			     &max_order);
  	if (r) {
  		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
  		return r;
  	}
  
+	fault->pfn = page_to_pfn(page);
  	fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
  	fault->max_level = kvm_max_private_mapping_level(vcpu->kvm, fault->pfn,
  							 fault->max_level, max_order);
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index a16c873b3232..db4181d11f2e 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3847,7 +3847,7 @@ static int __sev_snp_update_protected_guest_state(struct kvm_vcpu *vcpu)
  	if (VALID_PAGE(svm->sev_es.snp_vmsa_gpa)) {
  		gfn_t gfn = gpa_to_gfn(svm->sev_es.snp_vmsa_gpa);
  		struct kvm_memory_slot *slot;
-		kvm_pfn_t pfn;
+		struct page *page;
  
  		slot = gfn_to_memslot(vcpu->kvm, gfn);
  		if (!slot)
@@ -3857,7 +3857,7 @@ static int __sev_snp_update_protected_guest_state(struct kvm_vcpu *vcpu)
  		 * The new VMSA will be private memory guest memory, so
  		 * retrieve the PFN from the gmem backend.
  		 */
-		if (kvm_gmem_get_pfn(vcpu->kvm, slot, gfn, &pfn, NULL))
+		if (kvm_gmem_get_page(vcpu->kvm, slot, gfn, &page, NULL))
  			return -EINVAL;
  
  		/*
@@ -3873,7 +3873,7 @@ static int __sev_snp_update_protected_guest_state(struct kvm_vcpu *vcpu)
  		svm->sev_es.snp_has_guest_vmsa = true;
  
  		/* Use the new VMSA */
-		svm->vmcb->control.vmsa_pa = pfn_to_hpa(pfn);
+		svm->vmcb->control.vmsa_pa = page_to_phys(page);
  
  		/* Mark the vCPU as runnable */
  		vcpu->arch.pv.pv_unhalted = false;
@@ -3886,7 +3886,7 @@ static int __sev_snp_update_protected_guest_state(struct kvm_vcpu *vcpu)
  		 * changes then care should be taken to ensure
  		 * svm->sev_es.vmsa is pinned through some other means.
  		 */
-		kvm_release_pfn_clean(pfn);
+		kvm_release_page_clean(page);
  	}
  
  	/*
@@ -4687,6 +4687,7 @@ void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
  	struct kvm *kvm = vcpu->kvm;
  	int order, rmp_level, ret;
  	bool assigned;
+	struct page *page;
  	kvm_pfn_t pfn;
  	gfn_t gfn;
  
@@ -4712,13 +4713,14 @@ void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
  		return;
  	}
  
-	ret = kvm_gmem_get_pfn(kvm, slot, gfn, &pfn, &order);
+	ret = kvm_gmem_get_page(kvm, slot, gfn, &page, &order);
  	if (ret) {
  		pr_warn_ratelimited("SEV: Unexpected RMP fault, no backing page for private GPA 0x%llx\n",
  				    gpa);
  		return;
  	}
  
+	pfn = page_to_pfn(page);
  	ret = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
  	if (ret || !assigned) {
  		pr_warn_ratelimited("SEV: Unexpected RMP fault, no assigned RMP entry found for GPA 0x%llx PFN 0x%llx error %d\n",
@@ -4770,7 +4772,7 @@ void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
  out:
  	trace_kvm_rmp_fault(vcpu, gpa, pfn, error_code, rmp_level, ret);
  out_no_trace:
-	put_page(pfn_to_page(pfn));
+	kvm_release_page_unused(page);
  }
  
  static bool is_pfn_range_shared(kvm_pfn_t start, kvm_pfn_t end)


And the change in virt/kvm/guest_memfd.c then is just as trivial, apart
from all the renaming:

-	*pfn = folio_file_pfn(folio, index);
+	*page = folio_file_page(folio, index);


Paolo


^ permalink raw reply related	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 34/84] KVM: Add a helper to lookup a pfn without grabbing a reference
  2024-07-26 23:51 ` [PATCH v12 34/84] KVM: Add a helper to lookup a pfn without grabbing a reference Sean Christopherson
@ 2024-07-30 10:41   ` Paolo Bonzini
  2024-07-30 20:15     ` Sean Christopherson
  0 siblings, 1 reply; 150+ messages in thread
From: Paolo Bonzini @ 2024-07-30 10:41 UTC (permalink / raw)
  To: Sean Christopherson, Marc Zyngier, Oliver Upton, Tianrui Zhao,
	Bibo Mao, Huacai Chen, Michael Ellerman, Anup Patel,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Christian Borntraeger,
	Janosch Frank, Claudio Imbrenda
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

On 7/27/24 01:51, Sean Christopherson wrote:
> Add a kvm_follow_pfn() wrapper, kvm_lookup_pfn(), to allow looking up a
> gfn=>pfn mapping without the caller getting a reference to any underlying
> page.  The API will be used in flows that want to know if a gfn points at
> a valid pfn, but don't actually need to do anything with the pfn.

Can you rename the function kvm_gfn_has_pfn(), or 
kvm_gfn_can_be_mapped(), and make it return a bool?

(As an aside, I wonder if reexecute_instruction() could just use 
kvm_is_error_hva(kvm_vcpu_gfn_to_hva(vcpu, gpa_to_gfn(gpa)) instead of 
going all the way to a pfn.  But it's ok to be more restrictive).

Paolo


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 84/84] KVM: Don't grab reference on VM_MIXEDMAP pfns that have a "struct page"
  2024-07-26 23:52 ` [PATCH v12 84/84] KVM: Don't grab reference on VM_MIXEDMAP pfns that have a "struct page" Sean Christopherson
@ 2024-07-30 11:38   ` Paolo Bonzini
  2024-07-30 20:21     ` Sean Christopherson
  0 siblings, 1 reply; 150+ messages in thread
From: Paolo Bonzini @ 2024-07-30 11:38 UTC (permalink / raw)
  To: Sean Christopherson, Marc Zyngier, Oliver Upton, Tianrui Zhao,
	Bibo Mao, Huacai Chen, Michael Ellerman, Anup Patel,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Christian Borntraeger,
	Janosch Frank, Claudio Imbrenda
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

On 7/27/24 01:52, Sean Christopherson wrote:
> Now that KVM no longer relies on an ugly heuristic to find its struct page
> references, i.e. now that KVM can't get false positives on VM_MIXEDMAP
> pfns, remove KVM's hack to elevate the refcount for pfns that happen to
> have a valid struct page.  In addition to removing a long-standing wart
> in KVM, this allows KVM to map non-refcounted struct page memory into the
> guest, e.g. for exposing GPU TTM buffers to KVM guests.

Feel free to leave it to me for later, but there are more cleanups that
can be made, given how simple kvm_resolve_pfn() is now:

> @@ -2814,35 +2768,10 @@ static kvm_pfn_t kvm_resolve_pfn(struct kvm_follow_pfn *kfp, struct page *page,
>   	if (kfp->map_writable)
>   		*kfp->map_writable = writable;
>   
> 	if (pte)
>   		pfn = pte_pfn(*pte);
> 	else
>   		pfn = page_to_pfn(page);
>   
>   	*kfp->refcounted_page = page;
>   

Something like (untested/uncompiled):

--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2758,32 +2758,12 @@ static inline int check_user_page_hwpois
  	return rc == -EHWPOISON;
  }
  
-static kvm_pfn_t kvm_resolve_pfn(struct kvm_follow_pfn *kfp, struct page *page,
-				 pte_t *pte, bool writable)
-{
-	kvm_pfn_t pfn;
-
-	WARN_ON_ONCE(!!page == !!pte);
-
-	if (kfp->map_writable)
-		*kfp->map_writable = writable;
-
-	if (pte)
-		pfn = pte_pfn(*pte);
-	else
-		pfn = page_to_pfn(page);
-
-	*kfp->refcounted_page = page;
-
-	return pfn;
-}
-
  /*
   * The fast path to get the writable pfn which will be stored in @pfn,
   * true indicates success, otherwise false is returned.  It's also the
   * only part that runs if we can in atomic context.
   */
-static bool hva_to_pfn_fast(struct kvm_follow_pfn *kfp, kvm_pfn_t *pfn)
+static bool hva_to_page_fast(struct kvm_follow_pfn *kfp)
  {
  	struct page *page;
  	bool r;
@@ -2799,23 +2779,21 @@ static bool hva_to_pfn_fast(struct kvm_f
  		return false;
  
  	if (kfp->pin)
-		r = pin_user_pages_fast(kfp->hva, 1, FOLL_WRITE, &page) == 1;
+		r = pin_user_pages_fast(kfp->hva, 1, FOLL_WRITE, kfp->refcounted_page) == 1;
  	else
-		r = get_user_page_fast_only(kfp->hva, FOLL_WRITE, &page);
+		r = get_user_page_fast_only(kfp->hva, FOLL_WRITE, kfp->refcounted_page);
  
-	if (r) {
-		*pfn = kvm_resolve_pfn(kfp, page, NULL, true);
-		return true;
-	}
+	if (r)
+		kfp->flags |= FOLL_WRITE;
  
-	return false;
+	return r;
  }
  
  /*
   * The slow path to get the pfn of the specified host virtual address,
   * 1 indicates success, -errno is returned if error is detected.
   */
-static int hva_to_pfn_slow(struct kvm_follow_pfn *kfp, kvm_pfn_t *pfn)
+static int hva_to_page(struct kvm_follow_pfn *kfp)
  {
  	/*
  	 * When a VCPU accesses a page that is not mapped into the secondary
@@ -2829,34 +2807,32 @@ static int hva_to_pfn_slow(struct kvm_fo
  	 * implicitly honor NUMA hinting faults and don't need this flag.
  	 */
  	unsigned int flags = FOLL_HWPOISON | FOLL_HONOR_NUMA_FAULT | kfp->flags;
-	struct page *page, *wpage;
+	struct page *wpage;
  	int npages;
  
+	if (hva_to_page_fast(kfp))
+		return 1;
+
  	if (kfp->pin)
-		npages = pin_user_pages_unlocked(kfp->hva, 1, &page, flags);
+		npages = pin_user_pages_unlocked(kfp->hva, 1, kfp->refcounted_page, flags);
  	else
-		npages = get_user_pages_unlocked(kfp->hva, 1, &page, flags);
-	if (npages != 1)
-		return npages;
+		npages = get_user_pages_unlocked(kfp->hva, 1, kfp->refcounted_page, flags);
  
  	/*
-	 * Pinning is mutually exclusive with opportunistically mapping a read
-	 * fault as writable, as KVM should never pin pages when mapping memory
-	 * into the guest (pinning is only for direct accesses from KVM).
+	 * Map read fault as writable if possible; pinning is mutually exclusive
+	 * with opportunistically mapping a read fault as writable, as KVM should
+	 * should never pin pages when mapping memory into the guest (pinning is
+	 * only for direct accesses from KVM).
  	 */
-	if (WARN_ON_ONCE(kfp->map_writable && kfp->pin))
-		goto out;
-
-	/* map read fault as writable if possible */
-	if (!(flags & FOLL_WRITE) && kfp->map_writable &&
+	if (npages == 1 &&
+	    kfp->map_writable && !WARN_ON_ONCE(kfp->pin) &&
+	    !(flags & FOLL_WRITE) &&
  	    get_user_page_fast_only(kfp->hva, FOLL_WRITE, &wpage)) {
-		put_page(page);
-		page = wpage;
-		flags |= FOLL_WRITE;
+		put_page(kfp->refcounted_page);
+		kfp->refcounted_page = wpage;
+		kfp->flags |= FOLL_WRITE;
  	}
  
-out:
-	*pfn = kvm_resolve_pfn(kfp, page, NULL, flags & FOLL_WRITE);
  	return npages;
  }
  
@@ -2915,7 +2891,9 @@ static int hva_to_pfn_remapped(struct vm
  		goto out;
  	}
  
-	*p_pfn = kvm_resolve_pfn(kfp, NULL, &pte, pte_write(pte));
+	if (kfp->map_writable)
+		*kfp->map_writable = pte_write(pte);
+	*p_pfn = pte_pfn(pte);
  out:
  	pte_unmap_unlock(ptep, ptl);
  	return r;
@@ -2932,12 +2910,13 @@ kvm_pfn_t hva_to_pfn(struct kvm_follow_p
  	if (WARN_ON_ONCE(!kfp->refcounted_page))
  		return KVM_PFN_ERR_FAULT;
  
-	if (hva_to_pfn_fast(kfp, &pfn))
-		return pfn;
+	npages = hva_to_page(kfp);
+	if (npages == 1) {
+		if (kfp->map_writable)
+			*kfp->map_writable = kfp->flags & FOLL_WRITE;
+		return page_to_pfn(kfp->refcounted_page);
+	}
  
-	npages = hva_to_pfn_slow(kfp, &pfn);
-	if (npages == 1)
-		return pfn;
  	if (npages == -EINTR)
  		return KVM_PFN_ERR_SIGPENDING;
  


Also, check_user_page_hwpoison() should not be needed anymore, probably
not since commit 234b239bea39 ("kvm: Faults which trigger IO release the
mmap_sem", 2014-09-24) removed get_user_pages_fast() from hva_to_pfn_slow().

The only way that you could get a poisoned page without returning -EHWPOISON,
is if FOLL_HWPOISON was not passed.  But even without these patches,
the cases are:
- npages == 0, then you must have FOLL_NOWAIT and you'd not use
   check_user_page_hwpoison()
- npages == 1 or npages == -EHWPOISON, all good
- npages == -EAGAIN from mmap_read_lock_killable() - should handle that like -EINTR
- everything else including -EFAULT can go downt the vma_lookup() path, because
npages < 0 means we went through hva_to_pfn_slow() which uses FOLL_HWPOISON

This means that you can simply have

	if (npages == -EHWPOISON)
		return KVM_PFN_ERR_HWPOISON;

before the mmap_read_lock() line.  You may either sneak this at the beginning
of the series or leave it for later.

Paolo


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (83 preceding siblings ...)
  2024-07-26 23:52 ` [PATCH v12 84/84] KVM: Don't grab reference on VM_MIXEDMAP pfns that have a "struct page" Sean Christopherson
@ 2024-07-30 11:52 ` Paolo Bonzini
  2024-07-30 22:35   ` Sean Christopherson
  2024-08-27  9:06 ` Alex Bennée
  85 siblings, 1 reply; 150+ messages in thread
From: Paolo Bonzini @ 2024-07-30 11:52 UTC (permalink / raw)
  To: Sean Christopherson, Marc Zyngier, Oliver Upton, Tianrui Zhao,
	Bibo Mao, Huacai Chen, Michael Ellerman, Anup Patel,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Christian Borntraeger,
	Janosch Frank, Claudio Imbrenda
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

On 7/27/24 01:51, Sean Christopherson wrote:
> arm64 folks, the first two patches are bug fixes, but I have very low
> confidence that they are correct and/or desirable.  If they are more or
> less correct, I can post them separately if that'd make life easier.  I
> included them here to avoid conflicts, and because I'm pretty sure how
> KVM deals with MTE tags vs. dirty logging will impact what APIs KVM needs
> to provide to arch code.
> 
> On to the series...  The TL;DR is that I would like to get input on two
> things:
> 
>   1. Marking folios dirty/accessed only on the intial stage-2 page fault
>   2. The new APIs for faulting, prefetching, and doing "lookups" on pfns

Wow!

Splitting out prefetching makes a lot of sense, as it's the only one 
with npages > 1 and it doesn't need all the complexity of hva_to_pfn().

I've left a comment on the lookup API, which is probably the only one 
that can be simplified further.

The faulting API looks good as a first iteration.  Code-wise, 
kvm_resolve_pfn() is probably unnecessary at the end of the series but I 
can see why you had to restrain yourself and declare it done. :)

An interesting evolution of the API could be to pass a struct 
kvm_follow_pfn pointer to {,__}kvm_faultin_pfn() and __gfn_to_page() 
(the "constructors"); and on the other side to 
kvm_release_faultin_page() and kvm_release_page_*().  The struct 
kvm_follow_pfn could be embedded in the (x86) kvm_page_fault and 
(generic) kvm_host_map structs.  But certainly not as part of this 
already huge work.

Paolo

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 48/84] KVM: Move x86's API to release a faultin page to common KVM
  2024-07-30  8:58   ` Paolo Bonzini
@ 2024-07-30 19:15     ` Sean Christopherson
  2024-07-31 10:18       ` Paolo Bonzini
  0 siblings, 1 reply; 150+ messages in thread
From: Sean Christopherson @ 2024-07-30 19:15 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Michael Ellerman, Anup Patel, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

On Tue, Jul 30, 2024, Paolo Bonzini wrote:
> On 7/27/24 01:51, Sean Christopherson wrote:
> > Move KVM x86's helper that "finishes" the faultin process to common KVM
> > so that the logic can be shared across all architectures.  Note, not all
> > architectures implement a fast page fault path, but the gist of the
> > comment applies to all architectures.
> > 
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >   arch/x86/kvm/mmu/mmu.c   | 24 ++----------------------
> >   include/linux/kvm_host.h | 26 ++++++++++++++++++++++++++
> >   2 files changed, 28 insertions(+), 22 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 95beb50748fc..2a0cfa225c8d 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -4323,28 +4323,8 @@ static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
> >   static void kvm_mmu_finish_page_fault(struct kvm_vcpu *vcpu,
> >   				      struct kvm_page_fault *fault, int r)
> >   {
> > -	lockdep_assert_once(lockdep_is_held(&vcpu->kvm->mmu_lock) ||
> > -			    r == RET_PF_RETRY);
> > -
> > -	if (!fault->refcounted_page)
> > -		return;
> > -
> > -	/*
> > -	 * If the page that KVM got from the *primary MMU* is writable, and KVM
> > -	 * installed or reused a SPTE, mark the page/folio dirty.  Note, this
> > -	 * may mark a folio dirty even if KVM created a read-only SPTE, e.g. if
> > -	 * the GFN is write-protected.  Folios can't be safely marked dirty
> > -	 * outside of mmu_lock as doing so could race with writeback on the
> > -	 * folio.  As a result, KVM can't mark folios dirty in the fast page
> > -	 * fault handler, and so KVM must (somewhat) speculatively mark the
> > -	 * folio dirty if KVM could locklessly make the SPTE writable.
> > -	 */
> > -	if (r == RET_PF_RETRY)
> > -		kvm_release_page_unused(fault->refcounted_page);
> > -	else if (!fault->map_writable)
> > -		kvm_release_page_clean(fault->refcounted_page);
> > -	else
> > -		kvm_release_page_dirty(fault->refcounted_page);
> > +	kvm_release_faultin_page(vcpu->kvm, fault->refcounted_page,
> > +				 r == RET_PF_RETRY, fault->map_writable);
> 
> Does it make sense to move RET_PF_* to common code, and avoid a bool
> argument here?

After this series, probably?  Especially if/when we make "struct kvm_page_fault"
a common structure and converge all arch code.  In this series, definitely not,
as it would require even more patches to convert other architectures, and it's
not clear that it would be a net win, at least not without even more massaging.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 45/84] KVM: guest_memfd: Provide "struct page" as output from kvm_gmem_get_pfn()
  2024-07-30  9:05   ` Paolo Bonzini
@ 2024-07-30 20:00     ` Sean Christopherson
  2024-07-31 10:12       ` Paolo Bonzini
  0 siblings, 1 reply; 150+ messages in thread
From: Sean Christopherson @ 2024-07-30 20:00 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Michael Ellerman, Anup Patel, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

On Tue, Jul 30, 2024, Paolo Bonzini wrote:
> On 7/27/24 01:51, Sean Christopherson wrote:
> > Provide the "struct page" associated with a guest_memfd pfn as an output
> > from __kvm_gmem_get_pfn() so that KVM guest page fault handlers can
>        ^^^^^^^^^^^^^^^^^^^^
> 
> Just "kvm_gmem_get_pfn()".
> 
> > directly put the page instead of having to rely on
> > kvm_pfn_to_refcounted_page().
> 
> This will conflict with my series, where I'm introducing
> folio_file_pfn() and using it here:
> > -	page = folio_file_page(folio, index);
> > +	*page = folio_file_page(folio, index);
> > -	*pfn = page_to_pfn(page);
> > +	*pfn = page_to_pfn(*page);
> >   	if (max_order)
> >   		*max_order = 0;
> 
> That said, I think it's better to turn kvm_gmem_get_pfn() into
> kvm_gmem_get_page() here, and pull the page_to_pfn() or page_to_phys()
> to the caller as applicable.  This highlights that the caller always
> gets a refcounted page with guest_memfd.

I have mixed feelings on this.

On one hand, it's silly/confusing to return a pfn+page pair and thus imply that
guest_memfd can return a pfn without a page.

On the other hand, if guest_memfd does ever serve pfns without a struct page,
it could be quite painful to unwind all of the arch arch code we'll accrue that
assumes guest_memfd only ever returns a refcounted page (as evidenced by this
series).

The probability of guest_memfd not having struct page for mapped pfns is likely
very low, but at the same time, providing a pfn+page pair doesn't cost us much.
And if it turns out that not having struct page is nonsensical, deferring the
kvm_gmem_get_pfn() => kvm_gmem_get_page() conversion could be annoying, but highly
unlikely to be painful since it should be 100% mechanical.  Whereas reverting back
to kvm_gmem_get_pfn() if we make the wrong decision now could mean doing surgery
on a pile of arch code.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 34/84] KVM: Add a helper to lookup a pfn without grabbing a reference
  2024-07-30 10:41   ` Paolo Bonzini
@ 2024-07-30 20:15     ` Sean Christopherson
  2024-07-31 10:11       ` Paolo Bonzini
  0 siblings, 1 reply; 150+ messages in thread
From: Sean Christopherson @ 2024-07-30 20:15 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Michael Ellerman, Anup Patel, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

On Tue, Jul 30, 2024, Paolo Bonzini wrote:
> On 7/27/24 01:51, Sean Christopherson wrote:
> > Add a kvm_follow_pfn() wrapper, kvm_lookup_pfn(), to allow looking up a
> > gfn=>pfn mapping without the caller getting a reference to any underlying
> > page.  The API will be used in flows that want to know if a gfn points at
> > a valid pfn, but don't actually need to do anything with the pfn.
> 
> Can you rename the function kvm_gfn_has_pfn(), or kvm_gfn_can_be_mapped(),
> and make it return a bool?

Heh, sure.  I initially planned on having it return a bool, but I couldn't figure
out a name, mainly because the kernel's pfn_valid() makes things like
kvm_gfn_has_valid_pfn() confusing/misleading :-(

> (As an aside, I wonder if reexecute_instruction() could just use
> kvm_is_error_hva(kvm_vcpu_gfn_to_hva(vcpu, gpa_to_gfn(gpa)) instead of going
> all the way to a pfn.  But it's ok to be more restrictive).

Heh #2, I wondered the same thing.  I think it would work?  Verifying that there's
a usable pfn also protects against retrying an access that hit -EHWPOISON, but I'm
prety sure that would require a rare race, and I don't think it could result in
the guest being put into an infinite loop.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 84/84] KVM: Don't grab reference on VM_MIXEDMAP pfns that have a "struct page"
  2024-07-30 11:38   ` Paolo Bonzini
@ 2024-07-30 20:21     ` Sean Christopherson
  2024-07-31  9:50       ` Paolo Bonzini
  0 siblings, 1 reply; 150+ messages in thread
From: Sean Christopherson @ 2024-07-30 20:21 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Michael Ellerman, Anup Patel, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

On Tue, Jul 30, 2024, Paolo Bonzini wrote:
> On 7/27/24 01:52, Sean Christopherson wrote:
> > Now that KVM no longer relies on an ugly heuristic to find its struct page
> > references, i.e. now that KVM can't get false positives on VM_MIXEDMAP
> > pfns, remove KVM's hack to elevate the refcount for pfns that happen to
> > have a valid struct page.  In addition to removing a long-standing wart
> > in KVM, this allows KVM to map non-refcounted struct page memory into the
> > guest, e.g. for exposing GPU TTM buffers to KVM guests.
> 
> Feel free to leave it to me for later, but there are more cleanups that
> can be made, given how simple kvm_resolve_pfn() is now:

I'll revisit kvm_resolve_pfn(), Maxim also wasn't a fan of a similar helper that
existed in v11.

> Also, check_user_page_hwpoison() should not be needed anymore, probably
> not since commit 234b239bea39 ("kvm: Faults which trigger IO release the
> mmap_sem", 2014-09-24) removed get_user_pages_fast() from hva_to_pfn_slow().

Ha, I *knew* this sounded familiar.  Past me apparently came to the same
conclusion[*], though I wrongly suspected a memory leak and promptly forgot to
ever send a patch.  I'll tack one on this time around.

[*] https://lore.kernel.org/all/ZGKC9fHoE+kDs0ar@google.com

> The only way that you could get a poisoned page without returning -EHWPOISON,
> is if FOLL_HWPOISON was not passed.  But even without these patches,
> the cases are:
> - npages == 0, then you must have FOLL_NOWAIT and you'd not use
>   check_user_page_hwpoison()
> - npages == 1 or npages == -EHWPOISON, all good
> - npages == -EAGAIN from mmap_read_lock_killable() - should handle that like -EINTR
> - everything else including -EFAULT can go downt the vma_lookup() path, because
> npages < 0 means we went through hva_to_pfn_slow() which uses FOLL_HWPOISON
> 
> This means that you can simply have
> 
> 	if (npages == -EHWPOISON)
> 		return KVM_PFN_ERR_HWPOISON;
> 
> before the mmap_read_lock() line.  You may either sneak this at the beginning
> of the series or leave it for later.
> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages
  2024-07-30 11:52 ` [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Paolo Bonzini
@ 2024-07-30 22:35   ` Sean Christopherson
  0 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-30 22:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Michael Ellerman, Anup Patel, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

On Tue, Jul 30, 2024, Paolo Bonzini wrote:
> An interesting evolution of the API could be to pass a struct kvm_follow_pfn
> pointer to {,__}kvm_faultin_pfn() and __gfn_to_page() (the "constructors");
> and on the other side to kvm_release_faultin_page() and
> kvm_release_page_*().  The struct kvm_follow_pfn could be embedded in the
> (x86) kvm_page_fault and (generic) kvm_host_map structs.  But certainly not
> as part of this already huge work.

For kvm_faultin_pfn(), my hope/dream is to make kvm_page_fault a common struct,
with an arch member (a la kvm_vcpu), and get to something like:

  static int arch_page_fault_handler(...)
  {
	struct kvm_page_fault fault = {
		<const common stuff>,

		.arch.xxx = <arch stuff>,
	};

	<arch code>


	r = kvm_faultin_pfn();
	
	...
  }

In theory, that would allow moving the kvm->mmu_invalidate_seq handling, memslot
lookup, etc. into kvm_faultin_pfn(), or maybe another helper that is invoked to
setup the fault structure.  I.e. it would give us a way to drive convergence for
at least some of the fault handling logic, without having to tackle gory arch
details, at least not right away.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 58/84] KVM: RISC-V: Use kvm_faultin_pfn() when mapping pfns into the guest
  2024-07-26 23:52 ` [PATCH v12 58/84] KVM: RISC-V: Use kvm_faultin_pfn() when mapping pfns into the guest Sean Christopherson
@ 2024-07-31  8:11   ` Andrew Jones
  2024-08-06 15:04   ` Anup Patel
  1 sibling, 0 replies; 150+ messages in thread
From: Andrew Jones @ 2024-07-31  8:11 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	David Matlack, David Stevens

On Fri, Jul 26, 2024 at 04:52:07PM GMT, Sean Christopherson wrote:
> Convert RISC-V to __kvm_faultin_pfn()+kvm_release_faultin_page(), which
> are new APIs to consolidate arch code and provide consistent behavior
> across all KVM architectures.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/riscv/kvm/mmu.c | 11 ++++-------
>  1 file changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> index 806f68e70642..f73d6a79a78c 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c
> @@ -601,6 +601,7 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
>  	bool logging = (memslot->dirty_bitmap &&
>  			!(memslot->flags & KVM_MEM_READONLY)) ? true : false;
>  	unsigned long vma_pagesize, mmu_seq;
> +	struct page *page;
>  
>  	/* We need minimum second+third level pages */
>  	ret = kvm_mmu_topup_memory_cache(pcache, gstage_pgd_levels);
> @@ -631,7 +632,7 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
>  
>  	/*
>  	 * Read mmu_invalidate_seq so that KVM can detect if the results of
> -	 * vma_lookup() or gfn_to_pfn_prot() become stale priort to acquiring
> +	 * vma_lookup() or __kvm_faultin_pfn() become stale priort to acquiring
                                                            ^ while here
						could fix this typo

>  	 * kvm->mmu_lock.
>  	 *
>  	 * Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs
> @@ -647,7 +648,7 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
>  		return -EFAULT;
>  	}
>  
> -	hfn = gfn_to_pfn_prot(kvm, gfn, is_write, &writable);
> +	hfn = kvm_faultin_pfn(vcpu, gfn, is_write, &writable, &page);
>  	if (hfn == KVM_PFN_ERR_HWPOISON) {
>  		send_sig_mceerr(BUS_MCEERR_AR, (void __user *)hva,
>  				vma_pageshift, current);
> @@ -681,11 +682,7 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
>  		kvm_err("Failed to map in G-stage\n");
>  
>  out_unlock:
> -	if ((!ret || ret == -EEXIST) && writable)
> -		kvm_set_pfn_dirty(hfn);
> -	else
> -		kvm_release_pfn_clean(hfn);
> -
> +	kvm_release_faultin_page(kvm, page, ret && ret != -EEXIST, writable);
>  	spin_unlock(&kvm->mmu_lock);
>  	return ret;
>  }
> -- 
> 2.46.0.rc1.232.g9752f9e123-goog
> 
>

Reviewed-by: Andrew Jones <ajones@ventanamicro.com>

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 56/84] KVM: RISC-V: Mark "struct page" pfns dirty iff a stage-2 PTE is installed
  2024-07-26 23:52 ` [PATCH v12 56/84] KVM: RISC-V: Mark "struct page" pfns dirty iff a stage-2 PTE is installed Sean Christopherson
@ 2024-07-31  8:11   ` Andrew Jones
  2024-08-06 15:03   ` Anup Patel
  1 sibling, 0 replies; 150+ messages in thread
From: Andrew Jones @ 2024-07-31  8:11 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	David Matlack, David Stevens

On Fri, Jul 26, 2024 at 04:52:05PM GMT, Sean Christopherson wrote:
> Don't mark pages dirty if KVM bails from the page fault handler without
> installing a stage-2 mapping, i.e. if the page is guaranteed to not be
> written by the guest.
> 
> In addition to being a (very) minor fix, this paves the way for converting
> RISC-V to use kvm_release_faultin_page().
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/riscv/kvm/mmu.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> index b63650f9b966..06aa5a0d056d 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c
> @@ -669,7 +669,6 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
>  		goto out_unlock;
>  
>  	if (writable) {
> -		kvm_set_pfn_dirty(hfn);
>  		mark_page_dirty(kvm, gfn);
>  		ret = gstage_map_page(kvm, pcache, gpa, hfn << PAGE_SHIFT,
>  				      vma_pagesize, false, true);
> @@ -682,6 +681,9 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
>  		kvm_err("Failed to map in G-stage\n");
>  
>  out_unlock:
> +	if ((!ret || ret == -EEXIST) && writable)
> +		kvm_set_pfn_dirty(hfn);
> +
>  	spin_unlock(&kvm->mmu_lock);
>  	kvm_set_pfn_accessed(hfn);
>  	kvm_release_pfn_clean(hfn);
> -- 
> 2.46.0.rc1.232.g9752f9e123-goog
>

Reviewed-by: Andrew Jones <ajones@ventanamicro.com>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 57/84] KVM: RISC-V: Mark "struct page" pfns accessed before dropping mmu_lock
  2024-07-26 23:52 ` [PATCH v12 57/84] KVM: RISC-V: Mark "struct page" pfns accessed before dropping mmu_lock Sean Christopherson
@ 2024-07-31  8:12   ` Andrew Jones
  2024-08-06 15:04   ` Anup Patel
  1 sibling, 0 replies; 150+ messages in thread
From: Andrew Jones @ 2024-07-31  8:12 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	David Matlack, David Stevens

On Fri, Jul 26, 2024 at 04:52:06PM GMT, Sean Christopherson wrote:
> Mark pages accessed before dropping mmu_lock when faulting in guest memory
> so that RISC-V can convert to kvm_release_faultin_page() without tripping
> its lockdep assertion on mmu_lock being held.  Marking pages accessed
> outside of mmu_lock is ok (not great, but safe), but marking pages _dirty_
> outside of mmu_lock can make filesystems unhappy.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/riscv/kvm/mmu.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> index 06aa5a0d056d..806f68e70642 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c
> @@ -683,10 +683,10 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
>  out_unlock:
>  	if ((!ret || ret == -EEXIST) && writable)
>  		kvm_set_pfn_dirty(hfn);
> +	else
> +		kvm_release_pfn_clean(hfn);
>  
>  	spin_unlock(&kvm->mmu_lock);
> -	kvm_set_pfn_accessed(hfn);
> -	kvm_release_pfn_clean(hfn);
>  	return ret;
>  }
>  
> -- 
> 2.46.0.rc1.232.g9752f9e123-goog
>

Reviewed-by: Andrew Jones <ajones@ventanamicro.com>

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 84/84] KVM: Don't grab reference on VM_MIXEDMAP pfns that have a "struct page"
  2024-07-30 20:21     ` Sean Christopherson
@ 2024-07-31  9:50       ` Paolo Bonzini
  0 siblings, 0 replies; 150+ messages in thread
From: Paolo Bonzini @ 2024-07-31  9:50 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Michael Ellerman, Anup Patel, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

On 7/30/24 22:21, Sean Christopherson wrote:
> On Tue, Jul 30, 2024, Paolo Bonzini wrote:
>> On 7/27/24 01:52, Sean Christopherson wrote:
>>> Now that KVM no longer relies on an ugly heuristic to find its struct page
>>> references, i.e. now that KVM can't get false positives on VM_MIXEDMAP
>>> pfns, remove KVM's hack to elevate the refcount for pfns that happen to
>>> have a valid struct page.  In addition to removing a long-standing wart
>>> in KVM, this allows KVM to map non-refcounted struct page memory into the
>>> guest, e.g. for exposing GPU TTM buffers to KVM guests.
>>
>> Feel free to leave it to me for later, but there are more cleanups that
>> can be made, given how simple kvm_resolve_pfn() is now:
> 
> I'll revisit kvm_resolve_pfn(), Maxim also wasn't a fan of a similar helper that
> existed in v11.

FWIW kvm_resolve_pfn() is totally fine as an intermediate step.  Just 
food for thought for possible follow-ups.

>> Also, check_user_page_hwpoison() should not be needed anymore, probably
>> not since commit 234b239bea39 ("kvm: Faults which trigger IO release the
>> mmap_sem", 2014-09-24) removed get_user_pages_fast() from hva_to_pfn_slow().
> 
> Ha, I *knew* this sounded familiar.  Past me apparently came to the same
> conclusion[*], though I wrongly suspected a memory leak and promptly forgot to
> ever send a patch.  I'll tack one on this time around.

As you prefer.

Paolo


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 34/84] KVM: Add a helper to lookup a pfn without grabbing a reference
  2024-07-30 20:15     ` Sean Christopherson
@ 2024-07-31 10:11       ` Paolo Bonzini
  0 siblings, 0 replies; 150+ messages in thread
From: Paolo Bonzini @ 2024-07-31 10:11 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Michael Ellerman, Anup Patel, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

On 7/30/24 22:15, Sean Christopherson wrote:
> On Tue, Jul 30, 2024, Paolo Bonzini wrote:
>> On 7/27/24 01:51, Sean Christopherson wrote:
>>> Add a kvm_follow_pfn() wrapper, kvm_lookup_pfn(), to allow looking up a
>>> gfn=>pfn mapping without the caller getting a reference to any underlying
>>> page.  The API will be used in flows that want to know if a gfn points at
>>> a valid pfn, but don't actually need to do anything with the pfn.
>>
>> Can you rename the function kvm_gfn_has_pfn(), or kvm_gfn_can_be_mapped(),
>> and make it return a bool?
> 
> Heh, sure.  I initially planned on having it return a bool, but I couldn't figure
> out a name, mainly because the kernel's pfn_valid() makes things like
> kvm_gfn_has_valid_pfn() confusing/misleading :-(
> 
>> (As an aside, I wonder if reexecute_instruction() could just use
>> kvm_is_error_hva(kvm_vcpu_gfn_to_hva(vcpu, gpa_to_gfn(gpa)) instead of going
>> all the way to a pfn.  But it's ok to be more restrictive).
> 
> Heh #2, I wondered the same thing.  I think it would work?  Verifying that there's
> a usable pfn also protects against retrying an access that hit -EHWPOISON, but I'm
> prety sure that would require a rare race, and I don't think it could result in
> the guest being put into an infinite loop.

Indeed, and even the check in kvm_alloc_apic_access_page() is totally 
useless.  The page can go away at any time between the call and 
vmx_set_apic_access_page_addr() or, for AMD, the #NPF on 
APIC_DEFAULT_PHYS_BASE.

Yes, it's verifying that the system isn't under extreme memory pressure, 
but in practice a 4K get_user_pages is never going to fail, it's just 
going to cause something else to be swapped.  I'd just get rid of both 
of them, so there's no need for kvm_lookup_pfn().

Paolo


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 45/84] KVM: guest_memfd: Provide "struct page" as output from kvm_gmem_get_pfn()
  2024-07-30 20:00     ` Sean Christopherson
@ 2024-07-31 10:12       ` Paolo Bonzini
  0 siblings, 0 replies; 150+ messages in thread
From: Paolo Bonzini @ 2024-07-31 10:12 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Michael Ellerman, Anup Patel, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

On 7/30/24 22:00, Sean Christopherson wrote:
> The probability of guest_memfd not having struct page for mapped pfns is likely
> very low, but at the same time, providing a pfn+page pair doesn't cost us much.
> And if it turns out that not having struct page is nonsensical, deferring the
> kvm_gmem_get_pfn() => kvm_gmem_get_page() conversion could be annoying, but highly
> unlikely to be painful since it should be 100% mechanical.  Whereas reverting back
> to kvm_gmem_get_pfn() if we make the wrong decision now could mean doing surgery
> on a pile of arch code.

Ok, fair enough.  The conflict resolution is trivial either way (I also 
checked the TDX series and miraculously it has only one conflict which 
is also trivial).

Paolo


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 48/84] KVM: Move x86's API to release a faultin page to common KVM
  2024-07-30 19:15     ` Sean Christopherson
@ 2024-07-31 10:18       ` Paolo Bonzini
  0 siblings, 0 replies; 150+ messages in thread
From: Paolo Bonzini @ 2024-07-31 10:18 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Michael Ellerman, Anup Patel, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

On 7/30/24 21:15, Sean Christopherson wrote:
>> Does it make sense to move RET_PF_* to common code, and avoid a bool
>> argument here?
> After this series, probably?  Especially if/when we make "struct kvm_page_fault"
> a common structure and converge all arch code.  In this series, definitely not,
> as it would require even more patches to convert other architectures, and it's
> not clear that it would be a net win, at least not without even more massaging.

It does not seem to be hard, but I agree that all the other 
architectures right now use 0/-errno in the callers of 
kvm_release_faultin_page().

Paolo


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 01/84] KVM: arm64: Release pfn, i.e. put page, if copying MTE tags hits ZONE_DEVICE
  2024-07-26 23:51 ` [PATCH v12 01/84] KVM: arm64: Release pfn, i.e. put page, if copying MTE tags hits ZONE_DEVICE Sean Christopherson
@ 2024-07-31 16:23   ` Alex Bennée
  2024-07-31 20:36     ` Sean Christopherson
  2024-08-01 10:07   ` Marc Zyngier
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 150+ messages in thread
From: Alex Bennée @ 2024-07-31 16:23 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	David Matlack, David Stevens

Sean Christopherson <seanjc@google.com> writes:

> Put the page reference acquired by gfn_to_pfn_prot() if
> kvm_vm_ioctl_mte_copy_tags() runs into ZONE_DEVICE memory.  KVM's less-
> than-stellar heuristics for dealing with pfn-mapped memory means that KVM
> can get a page reference to ZONE_DEVICE memory.
>
> Fixes: f0376edb1ddc ("KVM: arm64: Add ioctl to fetch/store tags in a guest")
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/arm64/kvm/guest.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> index 11098eb7eb44..e1f0ff08836a 100644
> --- a/arch/arm64/kvm/guest.c
> +++ b/arch/arm64/kvm/guest.c
> @@ -1059,6 +1059,7 @@ int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
>  		page = pfn_to_online_page(pfn);
>  		if (!page) {
>  			/* Reject ZONE_DEVICE memory */
> +			kvm_release_pfn_clean(pfn);

I guess this gets renamed later in the series.

However my main comment is does lack of page always mean a ZONE_DEVICE?
Looking at pfn_to_online_page() I see a bunch of other checks first. Why
isn't it that functions responsibility to clean up after itself if its
returning NULLs?

>  			ret = -EFAULT;
>  			goto out;
>  		}

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 01/84] KVM: arm64: Release pfn, i.e. put page, if copying MTE tags hits ZONE_DEVICE
  2024-07-31 16:23   ` Alex Bennée
@ 2024-07-31 20:36     ` Sean Christopherson
  0 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-07-31 20:36 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	David Matlack, David Stevens

On Wed, Jul 31, 2024, Alex Bennée wrote:
> Sean Christopherson <seanjc@google.com> writes:
> 
> > Put the page reference acquired by gfn_to_pfn_prot() if
> > kvm_vm_ioctl_mte_copy_tags() runs into ZONE_DEVICE memory.  KVM's less-
> > than-stellar heuristics for dealing with pfn-mapped memory means that KVM
> > can get a page reference to ZONE_DEVICE memory.
> >
> > Fixes: f0376edb1ddc ("KVM: arm64: Add ioctl to fetch/store tags in a guest")
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  arch/arm64/kvm/guest.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> > index 11098eb7eb44..e1f0ff08836a 100644
> > --- a/arch/arm64/kvm/guest.c
> > +++ b/arch/arm64/kvm/guest.c
> > @@ -1059,6 +1059,7 @@ int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
> >  		page = pfn_to_online_page(pfn);
> >  		if (!page) {
> >  			/* Reject ZONE_DEVICE memory */
> > +			kvm_release_pfn_clean(pfn);
> 
> I guess this gets renamed later in the series.
> 
> However my main comment is does lack of page always mean a ZONE_DEVICE?

Nope.

> Looking at pfn_to_online_page() I see a bunch of other checks first. Why
> isn't it that functions responsibility to clean up after itself if its
> returning NULLs?

pfn_to_online_page() is more strict than gfn_to_pfn_prot().  At least in theory,
gfn_to_pfn_prot() could return a pfn that has an associated "struct page", with
a reference held to said page.  But for that same pfn, pfn_to_online_page() could
return NULL, in which case KVM needs to put the reference it acquired via
gfn_to_pfn_prot().

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 02/84] KVM: arm64: Disallow copying MTE to guest memory while KVM is dirty logging
  2024-07-26 23:51 ` [PATCH v12 02/84] KVM: arm64: Disallow copying MTE to guest memory while KVM is dirty logging Sean Christopherson
@ 2024-08-01  7:34   ` Aneesh Kumar K.V
  2024-08-01 18:01     ` Sean Christopherson
  2024-08-07 16:21   ` Catalin Marinas
  2024-08-22 14:24   ` (subset) " Marc Zyngier
  2 siblings, 1 reply; 150+ messages in thread
From: Aneesh Kumar K.V @ 2024-08-01  7:34 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Marc Zyngier, Oliver Upton,
	Tianrui Zhao, Bibo Mao, Huacai Chen, Michael Ellerman, Anup Patel,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Christian Borntraeger,
	Janosch Frank, Claudio Imbrenda, Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

Sean Christopherson <seanjc@google.com> writes:

> Disallow copying MTE tags to guest memory while KVM is dirty logging, as
> writing guest memory without marking the gfn as dirty in the memslot could
> result in userspace failing to migrate the updated page.  Ideally (maybe?),
> KVM would simply mark the gfn as dirty, but there is no vCPU to work with,
> and presumably the only use case for copy MTE tags _to_ the guest is when
> restoring state on the target.
>
> Fixes: f0376edb1ddc ("KVM: arm64: Add ioctl to fetch/store tags in a guest")
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/arm64/kvm/guest.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> index e1f0ff08836a..962f985977c2 100644
> --- a/arch/arm64/kvm/guest.c
> +++ b/arch/arm64/kvm/guest.c
> @@ -1045,6 +1045,11 @@ int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
>  
>  	mutex_lock(&kvm->slots_lock);
>  
> +	if (write && atomic_read(&kvm->nr_memslots_dirty_logging)) {
> +		ret = -EBUSY;
> +		goto out;
> +	}
> +
>

is this equivalent to kvm_follow_pfn() with kfp->pin = 1 ? Should all
those pin request fail if kvm->nr_memslots_dirty_logging != 0? 


>  	while (length > 0) {
>  		kvm_pfn_t pfn = gfn_to_pfn_prot(kvm, gfn, write, NULL);
>  		void *maddr;
> -- 
> 2.46.0.rc1.232.g9752f9e123-goog

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 03/84] KVM: Drop KVM_ERR_PTR_BAD_PAGE and instead return NULL to indicate an error
  2024-07-26 23:51 ` [PATCH v12 03/84] KVM: Drop KVM_ERR_PTR_BAD_PAGE and instead return NULL to indicate an error Sean Christopherson
@ 2024-08-01  8:57   ` Alex Bennée
  0 siblings, 0 replies; 150+ messages in thread
From: Alex Bennée @ 2024-08-01  8:57 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	David Matlack, David Stevens

Sean Christopherson <seanjc@google.com> writes:

> Remove KVM_ERR_PTR_BAD_PAGE and instead return NULL, as "bad page" is just
> a leftover bit of weirdness from days of old when KVM stuffed a "bad" page
> into the guest instead of actually handling missing pages.  See commit
> cea7bb21280e ("KVM: MMU: Make gfn_to_page() always safe").
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 04/84] KVM: Allow calling kvm_release_page_{clean,dirty}() on a NULL page pointer
  2024-07-26 23:51 ` [PATCH v12 04/84] KVM: Allow calling kvm_release_page_{clean,dirty}() on a NULL page pointer Sean Christopherson
@ 2024-08-01  9:03   ` Alex Bennée
  0 siblings, 0 replies; 150+ messages in thread
From: Alex Bennée @ 2024-08-01  9:03 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	David Matlack, David Stevens

Sean Christopherson <seanjc@google.com> writes:

> Allow passing a NULL @page to kvm_release_page_{clean,dirty}(), there's no
> tangible benefit to forcing the callers to pre-check @page, and it ends up
> generating a lot of duplicate boilerplate code.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 05/84] KVM: Add kvm_release_page_unused() API to put pages that KVM never consumes
  2024-07-26 23:51 ` [PATCH v12 05/84] KVM: Add kvm_release_page_unused() API to put pages that KVM never consumes Sean Christopherson
@ 2024-08-01  9:20   ` Alex Bennée
  2024-08-01 14:43     ` Sean Christopherson
  0 siblings, 1 reply; 150+ messages in thread
From: Alex Bennée @ 2024-08-01  9:20 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	David Matlack, David Stevens

Sean Christopherson <seanjc@google.com> writes:

> Add an API to release an unused page, i.e. to put a page without marking
> it accessed or dirty.  The API will be used when KVM faults-in a page but
> bails before installing the guest mapping (and other similar flows).
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  include/linux/kvm_host.h | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 3d9617d1de41..c5d39a337aa3 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1201,6 +1201,15 @@ unsigned long gfn_to_hva_prot(struct kvm *kvm, gfn_t gfn, bool *writable);
>  unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot, gfn_t gfn);
>  unsigned long gfn_to_hva_memslot_prot(struct kvm_memory_slot *slot, gfn_t gfn,
>  				      bool *writable);
> +
> +static inline void kvm_release_page_unused(struct page *page)
> +{
> +	if (!page)
> +		return;
> +
> +	put_page(page);
> +}

I guess it's unfamiliarity with the mm layout but I was trying to find
where the get_pages come from to see the full pattern of allocate and
return. I guess somewhere in the depths of hva_to_pfn() from
hva_to_pfn_retry()? I think the indirection of the page walking confuses
me ;-)

Anyway the API seems reasonable enough given the other kvm_release_
functions.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 12/84] KVM: Drop @atomic param from gfn=>pfn and hva=>pfn APIs
  2024-07-26 23:51 ` [PATCH v12 12/84] KVM: Drop @atomic param from gfn=>pfn and hva=>pfn APIs Sean Christopherson
@ 2024-08-01  9:31   ` Alex Bennée
  0 siblings, 0 replies; 150+ messages in thread
From: Alex Bennée @ 2024-08-01  9:31 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	David Matlack, David Stevens

Sean Christopherson <seanjc@google.com> writes:

> Drop @atomic from the myriad "to_pfn" APIs now that all callers pass
> "false".
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 24/84] KVM: Use plain "struct page" pointer instead of single-entry array
  2024-07-26 23:51 ` [PATCH v12 24/84] KVM: Use plain "struct page" pointer instead of single-entry array Sean Christopherson
@ 2024-08-01  9:53   ` Alex Bennée
  0 siblings, 0 replies; 150+ messages in thread
From: Alex Bennée @ 2024-08-01  9:53 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	David Matlack, David Stevens

Sean Christopherson <seanjc@google.com> writes:

> Use a single pointer instead of a single-entry array for the struct page
> pointer in hva_to_pfn_fast().  Using an array makes the code unnecessarily
> annoying to read and update.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 26/84] KVM: Move kvm_{set,release}_page_{clean,dirty}() helpers up in kvm_main.c
  2024-07-26 23:51 ` [PATCH v12 26/84] KVM: Move kvm_{set,release}_page_{clean,dirty}() helpers up in kvm_main.c Sean Christopherson
@ 2024-08-01  9:55   ` Alex Bennée
  0 siblings, 0 replies; 150+ messages in thread
From: Alex Bennée @ 2024-08-01  9:55 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	David Matlack, David Stevens

Sean Christopherson <seanjc@google.com> writes:

> Hoist the kvm_{set,release}_page_{clean,dirty}() APIs further up in
> kvm_main.c so that they can be used by the kvm_follow_pfn family of APIs.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 01/84] KVM: arm64: Release pfn, i.e. put page, if copying MTE tags hits ZONE_DEVICE
  2024-07-26 23:51 ` [PATCH v12 01/84] KVM: arm64: Release pfn, i.e. put page, if copying MTE tags hits ZONE_DEVICE Sean Christopherson
  2024-07-31 16:23   ` Alex Bennée
@ 2024-08-01 10:07   ` Marc Zyngier
  2024-08-07 14:15   ` Catalin Marinas
  2024-08-22 14:24   ` (subset) " Marc Zyngier
  3 siblings, 0 replies; 150+ messages in thread
From: Marc Zyngier @ 2024-08-01 10:07 UTC (permalink / raw)
  To: Sean Christopherson, Steven Price
  Cc: Paolo Bonzini, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Michael Ellerman, Anup Patel, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

+ Steven Price for this patch (and the following one), as this really
is his turf.

On Sat, 27 Jul 2024 00:51:10 +0100,
Sean Christopherson <seanjc@google.com> wrote:
> 
> Put the page reference acquired by gfn_to_pfn_prot() if
> kvm_vm_ioctl_mte_copy_tags() runs into ZONE_DEVICE memory.  KVM's less-
> than-stellar heuristics for dealing with pfn-mapped memory means that KVM
> can get a page reference to ZONE_DEVICE memory.
> 
> Fixes: f0376edb1ddc ("KVM: arm64: Add ioctl to fetch/store tags in a guest")
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/arm64/kvm/guest.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> index 11098eb7eb44..e1f0ff08836a 100644
> --- a/arch/arm64/kvm/guest.c
> +++ b/arch/arm64/kvm/guest.c
> @@ -1059,6 +1059,7 @@ int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
>  		page = pfn_to_online_page(pfn);
>  		if (!page) {
>  			/* Reject ZONE_DEVICE memory */
> +			kvm_release_pfn_clean(pfn);
>  			ret = -EFAULT;
>  			goto out;
>  		}
> -- 
> 2.46.0.rc1.232.g9752f9e123-goog
> 
> 

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 05/84] KVM: Add kvm_release_page_unused() API to put pages that KVM never consumes
  2024-08-01  9:20   ` Alex Bennée
@ 2024-08-01 14:43     ` Sean Christopherson
  0 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-08-01 14:43 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	David Matlack, David Stevens

On Thu, Aug 01, 2024, Alex Bennée wrote:
> Sean Christopherson <seanjc@google.com> writes:
> 
> > Add an API to release an unused page, i.e. to put a page without marking
> > it accessed or dirty.  The API will be used when KVM faults-in a page but
> > bails before installing the guest mapping (and other similar flows).
> >
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  include/linux/kvm_host.h | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 3d9617d1de41..c5d39a337aa3 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -1201,6 +1201,15 @@ unsigned long gfn_to_hva_prot(struct kvm *kvm, gfn_t gfn, bool *writable);
> >  unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot, gfn_t gfn);
> >  unsigned long gfn_to_hva_memslot_prot(struct kvm_memory_slot *slot, gfn_t gfn,
> >  				      bool *writable);
> > +
> > +static inline void kvm_release_page_unused(struct page *page)
> > +{
> > +	if (!page)
> > +		return;
> > +
> > +	put_page(page);
> > +}
> 
> I guess it's unfamiliarity with the mm layout but I was trying to find
> where the get_pages come from to see the full pattern of allocate and
> return. I guess somewhere in the depths of hva_to_pfn() from
> hva_to_pfn_retry()?

If successful, get_user_page_fast_only() and get_user_pages_unlocked() grab a
reference on behalf of the caller.

As of this patch, hva_to_pfn_remapped() also grabs a reference to pages that
appear to be refcounted, which is the underlying wart this series aims to fix.
In KVM's early days, it _only_ supported GUP, i.e. if KVM got a pfn, that pfn
was (a) backed by struct page and (b) KVM had a reference to said page.  That
led to the current mess, as KVM didn't get reworked to properly track pages vs.
pfns when support for VM_MIXEDMAP was added.

	/*
	 * Get a reference here because callers of *hva_to_pfn* and
	 * *gfn_to_pfn* ultimately call kvm_release_pfn_clean on the
	 * returned pfn.  This is only needed if the VMA has VM_MIXEDMAP
	 * set, but the kvm_try_get_pfn/kvm_release_pfn_clean pair will
	 * simply do nothing for reserved pfns.
	 *
	 * Whoever called remap_pfn_range is also going to call e.g.
	 * unmap_mapping_range before the underlying pages are freed,
	 * causing a call to our MMU notifier.
	 *
	 * Certain IO or PFNMAP mappings can be backed with valid
	 * struct pages, but be allocated without refcounting e.g.,
	 * tail pages of non-compound higher order allocations, which
	 * would then underflow the refcount when the caller does the
	 * required put_page. Don't allow those pages here.
	 */
	if (!kvm_try_get_pfn(pfn))
		r = -EFAULT;

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 02/84] KVM: arm64: Disallow copying MTE to guest memory while KVM is dirty logging
  2024-08-01  7:34   ` Aneesh Kumar K.V
@ 2024-08-01 18:01     ` Sean Christopherson
  2024-08-05  7:57       ` Aneesh Kumar K.V
  0 siblings, 1 reply; 150+ messages in thread
From: Sean Christopherson @ 2024-08-01 18:01 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	David Matlack, David Stevens

On Thu, Aug 01, 2024, Aneesh Kumar K.V wrote:
> Sean Christopherson <seanjc@google.com> writes:
> 
> > Disallow copying MTE tags to guest memory while KVM is dirty logging, as
> > writing guest memory without marking the gfn as dirty in the memslot could
> > result in userspace failing to migrate the updated page.  Ideally (maybe?),
> > KVM would simply mark the gfn as dirty, but there is no vCPU to work with,
> > and presumably the only use case for copy MTE tags _to_ the guest is when
> > restoring state on the target.
> >
> > Fixes: f0376edb1ddc ("KVM: arm64: Add ioctl to fetch/store tags in a guest")
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  arch/arm64/kvm/guest.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> > index e1f0ff08836a..962f985977c2 100644
> > --- a/arch/arm64/kvm/guest.c
> > +++ b/arch/arm64/kvm/guest.c
> > @@ -1045,6 +1045,11 @@ int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
> >  
> >  	mutex_lock(&kvm->slots_lock);
> >  
> > +	if (write && atomic_read(&kvm->nr_memslots_dirty_logging)) {
> > +		ret = -EBUSY;
> > +		goto out;
> > +	}
> > +
> >
> 
> is this equivalent to kvm_follow_pfn() with kfp->pin = 1 ?

No, gfn_to_pfn_prot() == FOLL_GET, kfp->pin == FOLL_PIN.  But that's not really
relevant.

> Should all those pin request fail if kvm->nr_memslots_dirty_logging != 0? 

No, the conflict with dirty logging is specifically that this code doesn't invoke
mark_page_dirty().  And it can't easily do that, because there's no loaded ("running")
vCPU, i.e. doing so would trip this WARN:

#ifdef CONFIG_HAVE_KVM_DIRTY_RING
	if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
		return;

	WARN_ON_ONCE(!vcpu && !kvm_arch_allow_write_without_running_vcpu(kvm)); <====
#endif

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 65/84] KVM: LoongArch: Mark "struct page" pfns accessed only in "slow" page fault path
  2024-07-26 23:52 ` [PATCH v12 65/84] KVM: LoongArch: Mark "struct page" pfns accessed " Sean Christopherson
@ 2024-08-02  7:34   ` maobibo
  0 siblings, 0 replies; 150+ messages in thread
From: maobibo @ 2024-08-02  7:34 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Marc Zyngier, Oliver Upton,
	Tianrui Zhao, Huacai Chen, Michael Ellerman, Anup Patel,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Christian Borntraeger,
	Janosch Frank, Claudio Imbrenda
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens



On 2024/7/27 上午7:52, Sean Christopherson wrote:
> Mark pages accessed only in the slow path, before dropping mmu_lock when
> faulting in guest memory so that LoongArch can convert to
> kvm_release_faultin_page() without tripping its lockdep assertion on
> mmu_lock being held.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/loongarch/kvm/mmu.c | 20 ++------------------
>   1 file changed, 2 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c
> index 364dd35e0557..52b5c16cf250 100644
> --- a/arch/loongarch/kvm/mmu.c
> +++ b/arch/loongarch/kvm/mmu.c
> @@ -552,12 +552,10 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
>   static int kvm_map_page_fast(struct kvm_vcpu *vcpu, unsigned long gpa, bool write)
>   {
>   	int ret = 0;
> -	kvm_pfn_t pfn = 0;
>   	kvm_pte_t *ptep, changed, new;
>   	gfn_t gfn = gpa >> PAGE_SHIFT;
>   	struct kvm *kvm = vcpu->kvm;
>   	struct kvm_memory_slot *slot;
> -	struct page *page;
>   
>   	spin_lock(&kvm->mmu_lock);
>   
> @@ -570,8 +568,6 @@ static int kvm_map_page_fast(struct kvm_vcpu *vcpu, unsigned long gpa, bool writ
>   
>   	/* Track access to pages marked old */
>   	new = kvm_pte_mkyoung(*ptep);
> -	/* call kvm_set_pfn_accessed() after unlock */
> -
>   	if (write && !kvm_pte_dirty(new)) {
>   		if (!kvm_pte_write(new)) {
>   			ret = -EFAULT;
> @@ -595,23 +591,11 @@ static int kvm_map_page_fast(struct kvm_vcpu *vcpu, unsigned long gpa, bool writ
>   	}
>   
>   	changed = new ^ (*ptep);
> -	if (changed) {
> +	if (changed)
>   		kvm_set_pte(ptep, new);
> -		pfn = kvm_pte_pfn(new);
> -		page = kvm_pfn_to_refcounted_page(pfn);
> -		if (page)
> -			get_page(page);
> -	}
> +
>   	spin_unlock(&kvm->mmu_lock);
>   
> -	if (changed) {
> -		if (kvm_pte_young(changed))
> -			kvm_set_pfn_accessed(pfn);
> -
> -		if (page)
> -			put_page(page);
> -	}
> -
>   	if (kvm_pte_dirty(changed))
>   		mark_page_dirty(kvm, gfn);
>   
> 
Reviewed-by: Bibo Mao <maobibo@loongson.cn>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 64/84] KVM: LoongArch: Mark "struct page" pfns dirty only in "slow" page fault path
  2024-07-26 23:52 ` [PATCH v12 64/84] KVM: LoongArch: Mark "struct page" pfns dirty only in "slow" page fault path Sean Christopherson
@ 2024-08-02  7:53   ` maobibo
  2024-08-02 19:32     ` Sean Christopherson
  2024-08-08 11:38   ` maobibo
  1 sibling, 1 reply; 150+ messages in thread
From: maobibo @ 2024-08-02  7:53 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Marc Zyngier, Oliver Upton,
	Tianrui Zhao, Huacai Chen, Michael Ellerman, Anup Patel,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Christian Borntraeger,
	Janosch Frank, Claudio Imbrenda
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens



On 2024/7/27 上午7:52, Sean Christopherson wrote:
> Mark pages/folios dirty only the slow page fault path, i.e. only when
> mmu_lock is held and the operation is mmu_notifier-protected, as marking a
> page/folio dirty after it has been written back can make some filesystems
> unhappy (backing KVM guests will such filesystem files is uncommon, and
> the race is minuscule, hence the lack of complaints).
> 
> See the link below for details.
> 
> Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/loongarch/kvm/mmu.c | 18 ++++++++++--------
>   1 file changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c
> index 2634a9e8d82c..364dd35e0557 100644
> --- a/arch/loongarch/kvm/mmu.c
> +++ b/arch/loongarch/kvm/mmu.c
> @@ -608,13 +608,13 @@ static int kvm_map_page_fast(struct kvm_vcpu *vcpu, unsigned long gpa, bool writ
>   		if (kvm_pte_young(changed))
>   			kvm_set_pfn_accessed(pfn);
>   
> -		if (kvm_pte_dirty(changed)) {
> -			mark_page_dirty(kvm, gfn);
> -			kvm_set_pfn_dirty(pfn);
> -		}
>   		if (page)
>   			put_page(page);
>   	}
> +
> +	if (kvm_pte_dirty(changed))
> +		mark_page_dirty(kvm, gfn);
> +
>   	return ret;
>   out:
>   	spin_unlock(&kvm->mmu_lock);
> @@ -915,12 +915,14 @@ static int kvm_map_page(struct kvm_vcpu *vcpu, unsigned long gpa, bool write)
>   	else
>   		++kvm->stat.pages;
>   	kvm_set_pte(ptep, new_pte);
> -	spin_unlock(&kvm->mmu_lock);
>   
> -	if (prot_bits & _PAGE_DIRTY) {
> -		mark_page_dirty_in_slot(kvm, memslot, gfn);
> +	if (writeable)
Is it better to use write or (prot_bits & _PAGE_DIRTY) here?  writable 
is pte permission from function hva_to_pfn_slow(), write is fault action.

Regards
Bibo Mao
>   		kvm_set_pfn_dirty(pfn);
> -	}
> +
> +	spin_unlock(&kvm->mmu_lock);
> +
> +	if (prot_bits & _PAGE_DIRTY)
> +		mark_page_dirty_in_slot(kvm, memslot, gfn);
>   
>   	kvm_release_pfn_clean(pfn);
>   out:
> 


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 83/84] KVM: Drop APIs that manipulate "struct page" via pfns
  2024-07-26 23:52 ` [PATCH v12 83/84] KVM: Drop APIs that manipulate "struct page" via pfns Sean Christopherson
@ 2024-08-02 11:03   ` Alex Bennée
  0 siblings, 0 replies; 150+ messages in thread
From: Alex Bennée @ 2024-08-02 11:03 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	David Matlack, David Stevens

Sean Christopherson <seanjc@google.com> writes:

> Remove all kvm_{release,set}_pfn_*() APIs not that all users are gone.

now?

Otherwise:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 11/84] KVM: Rename gfn_to_page_many_atomic() to kvm_prefetch_pages()
  2024-07-26 23:51 ` [PATCH v12 11/84] KVM: Rename gfn_to_page_many_atomic() to kvm_prefetch_pages() Sean Christopherson
@ 2024-08-02 11:16   ` Alex Bennée
  0 siblings, 0 replies; 150+ messages in thread
From: Alex Bennée @ 2024-08-02 11:16 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	David Matlack, David Stevens

Sean Christopherson <seanjc@google.com> writes:

> Rename gfn_to_page_many_atomic() to kvm_prefetch_pages() to try and
> communicate its true purpose, as the "atomic" aspect is essentially a
> side effect of the fact that x86 uses the API while holding mmu_lock.

It's never too late to start adding some kdoc annotations to a function
and renaming a kvm_host API call seems like a good time to do it.

> E.g. even if mmu_lock weren't held, KVM wouldn't want to fault-in pages,
> as the goal is to opportunistically grab surrounding pages that have
> already been accessed and/or dirtied by the host, and to do so quickly.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
<snip>

/**
 * kvm_prefetch_pages() - opportunistically grab previously accessed pages
 * @slot: which @kvm_memory_slot the pages are in
 * @gfn: guest frame
 * @pages: array to receives page pointers
 * @nr_pages: number of pages
 *
 * Returns the number of pages actually mapped.
 */

?

>  
> -int gfn_to_page_many_atomic(struct kvm_memory_slot *slot, gfn_t gfn,
> -			    struct page **pages, int nr_pages)
> +int kvm_prefetch_pages(struct kvm_memory_slot *slot, gfn_t gfn,
> +		       struct page **pages, int nr_pages)
>  {
>  	unsigned long addr;
>  	gfn_t entry = 0;
> @@ -3075,7 +3075,7 @@ int gfn_to_page_many_atomic(struct kvm_memory_slot *slot, gfn_t gfn,
>  
>  	return get_user_pages_fast_only(addr, nr_pages, FOLL_WRITE, pages);
>  }
> -EXPORT_SYMBOL_GPL(gfn_to_page_many_atomic);
> +EXPORT_SYMBOL_GPL(kvm_prefetch_pages);
>  
>  /*
>   * Do not use this helper unless you are absolutely certain the gfn _must_ be

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 64/84] KVM: LoongArch: Mark "struct page" pfns dirty only in "slow" page fault path
  2024-08-02  7:53   ` maobibo
@ 2024-08-02 19:32     ` Sean Christopherson
  2024-08-03  3:02       ` maobibo
  0 siblings, 1 reply; 150+ messages in thread
From: Sean Christopherson @ 2024-08-02 19:32 UTC (permalink / raw)
  To: maobibo
  Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	David Matlack, David Stevens

On Fri, Aug 02, 2024, maobibo wrote:
> On 2024/7/27 上午7:52, Sean Christopherson wrote:
> > Mark pages/folios dirty only the slow page fault path, i.e. only when
> > mmu_lock is held and the operation is mmu_notifier-protected, as marking a
> > page/folio dirty after it has been written back can make some filesystems
> > unhappy (backing KVM guests will such filesystem files is uncommon, and
> > the race is minuscule, hence the lack of complaints).
> > 
> > See the link below for details.
> > 
> > Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >   arch/loongarch/kvm/mmu.c | 18 ++++++++++--------
> >   1 file changed, 10 insertions(+), 8 deletions(-)
> > 
> > diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c
> > index 2634a9e8d82c..364dd35e0557 100644
> > --- a/arch/loongarch/kvm/mmu.c
> > +++ b/arch/loongarch/kvm/mmu.c
> > @@ -608,13 +608,13 @@ static int kvm_map_page_fast(struct kvm_vcpu *vcpu, unsigned long gpa, bool writ
> >   		if (kvm_pte_young(changed))
> >   			kvm_set_pfn_accessed(pfn);
> > -		if (kvm_pte_dirty(changed)) {
> > -			mark_page_dirty(kvm, gfn);
> > -			kvm_set_pfn_dirty(pfn);
> > -		}
> >   		if (page)
> >   			put_page(page);
> >   	}
> > +
> > +	if (kvm_pte_dirty(changed))
> > +		mark_page_dirty(kvm, gfn);
> > +
> >   	return ret;
> >   out:
> >   	spin_unlock(&kvm->mmu_lock);
> > @@ -915,12 +915,14 @@ static int kvm_map_page(struct kvm_vcpu *vcpu, unsigned long gpa, bool write)
> >   	else
> >   		++kvm->stat.pages;
> >   	kvm_set_pte(ptep, new_pte);
> > -	spin_unlock(&kvm->mmu_lock);
> > -	if (prot_bits & _PAGE_DIRTY) {
> > -		mark_page_dirty_in_slot(kvm, memslot, gfn);
> > +	if (writeable)
> Is it better to use write or (prot_bits & _PAGE_DIRTY) here?  writable is
> pte permission from function hva_to_pfn_slow(), write is fault action.

Marking folios dirty in the slow/full path basically necessitates marking the
folio dirty if KVM creates a writable SPTE, as KVM won't mark the folio dirty
if/when _PAGE_DIRTY is set.

Practically speaking, I'm 99.9% certain it doesn't matter.  The folio is marked
dirty by core MM when the folio is made writable, and cleaning the folio triggers
an mmu_notifier invalidation.  I.e. if the page is mapped writable in KVM's
stage-2 PTEs, then its folio has already been marked dirty.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 64/84] KVM: LoongArch: Mark "struct page" pfns dirty only in "slow" page fault path
  2024-08-02 19:32     ` Sean Christopherson
@ 2024-08-03  3:02       ` maobibo
  2024-08-05 23:22         ` Sean Christopherson
  0 siblings, 1 reply; 150+ messages in thread
From: maobibo @ 2024-08-03  3:02 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	David Matlack, David Stevens



On 2024/8/3 上午3:32, Sean Christopherson wrote:
> On Fri, Aug 02, 2024, maobibo wrote:
>> On 2024/7/27 上午7:52, Sean Christopherson wrote:
>>> Mark pages/folios dirty only the slow page fault path, i.e. only when
>>> mmu_lock is held and the operation is mmu_notifier-protected, as marking a
>>> page/folio dirty after it has been written back can make some filesystems
>>> unhappy (backing KVM guests will such filesystem files is uncommon, and
>>> the race is minuscule, hence the lack of complaints).
>>>
>>> See the link below for details.
>>>
>>> Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com
>>> Signed-off-by: Sean Christopherson <seanjc@google.com>
>>> ---
>>>    arch/loongarch/kvm/mmu.c | 18 ++++++++++--------
>>>    1 file changed, 10 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c
>>> index 2634a9e8d82c..364dd35e0557 100644
>>> --- a/arch/loongarch/kvm/mmu.c
>>> +++ b/arch/loongarch/kvm/mmu.c
>>> @@ -608,13 +608,13 @@ static int kvm_map_page_fast(struct kvm_vcpu *vcpu, unsigned long gpa, bool writ
>>>    		if (kvm_pte_young(changed))
>>>    			kvm_set_pfn_accessed(pfn);
>>> -		if (kvm_pte_dirty(changed)) {
>>> -			mark_page_dirty(kvm, gfn);
>>> -			kvm_set_pfn_dirty(pfn);
>>> -		}
>>>    		if (page)
>>>    			put_page(page);
>>>    	}
>>> +
>>> +	if (kvm_pte_dirty(changed))
>>> +		mark_page_dirty(kvm, gfn);
>>> +
>>>    	return ret;
>>>    out:
>>>    	spin_unlock(&kvm->mmu_lock);
>>> @@ -915,12 +915,14 @@ static int kvm_map_page(struct kvm_vcpu *vcpu, unsigned long gpa, bool write)
>>>    	else
>>>    		++kvm->stat.pages;
>>>    	kvm_set_pte(ptep, new_pte);
>>> -	spin_unlock(&kvm->mmu_lock);
>>> -	if (prot_bits & _PAGE_DIRTY) {
>>> -		mark_page_dirty_in_slot(kvm, memslot, gfn);
>>> +	if (writeable)
>> Is it better to use write or (prot_bits & _PAGE_DIRTY) here?  writable is
>> pte permission from function hva_to_pfn_slow(), write is fault action.
> 
> Marking folios dirty in the slow/full path basically necessitates marking the
> folio dirty if KVM creates a writable SPTE, as KVM won't mark the folio dirty
> if/when _PAGE_DIRTY is set.
> 
> Practically speaking, I'm 99.9% certain it doesn't matter.  The folio is marked
> dirty by core MM when the folio is made writable, and cleaning the folio triggers
> an mmu_notifier invalidation.  I.e. if the page is mapped writable in KVM's
yes, it is. Thanks for the explanation. kvm_set_pfn_dirty() can be put 
only in slow page fault path. I only concern with fault type, read fault 
type can set pte entry writable however not _PAGE_DIRTY at stage-2 mmu 
table.

> stage-2 PTEs, then its folio has already been marked dirty.
Considering one condition although I do not know whether it exists 
actually. user mode VMM writes the folio with hva address firstly, then 
VCPU thread *reads* the folio. With primary mmu table, pte entry is 
writable and _PAGE_DIRTY is set, with secondary mmu table(state-2 PTE 
table), it is pte_none since the filio is accessed at first time, so 
there will be slow page fault path for stage-2 mmu page table filling.

Since it is read fault, stage-2 PTE will be created with 
_PAGE_WRITE(coming from function hva_to_pfn_slow()), however _PAGE_DIRTY 
is not set. Do we need call kvm_set_pfn_dirty() at this situation?

Regards
Bibo Mao


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 02/84] KVM: arm64: Disallow copying MTE to guest memory while KVM is dirty logging
  2024-08-01 18:01     ` Sean Christopherson
@ 2024-08-05  7:57       ` Aneesh Kumar K.V
  2024-08-05 22:09         ` Sean Christopherson
  0 siblings, 1 reply; 150+ messages in thread
From: Aneesh Kumar K.V @ 2024-08-05  7:57 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	David Matlack, David Stevens

Sean Christopherson <seanjc@google.com> writes:

> On Thu, Aug 01, 2024, Aneesh Kumar K.V wrote:
>> Sean Christopherson <seanjc@google.com> writes:
>> 
>> > Disallow copying MTE tags to guest memory while KVM is dirty logging, as
>> > writing guest memory without marking the gfn as dirty in the memslot could
>> > result in userspace failing to migrate the updated page.  Ideally (maybe?),
>> > KVM would simply mark the gfn as dirty, but there is no vCPU to work with,
>> > and presumably the only use case for copy MTE tags _to_ the guest is when
>> > restoring state on the target.
>> >
>> > Fixes: f0376edb1ddc ("KVM: arm64: Add ioctl to fetch/store tags in a guest")
>> > Signed-off-by: Sean Christopherson <seanjc@google.com>
>> > ---
>> >  arch/arm64/kvm/guest.c | 5 +++++
>> >  1 file changed, 5 insertions(+)
>> >
>> > diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
>> > index e1f0ff08836a..962f985977c2 100644
>> > --- a/arch/arm64/kvm/guest.c
>> > +++ b/arch/arm64/kvm/guest.c
>> > @@ -1045,6 +1045,11 @@ int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
>> >  
>> >  	mutex_lock(&kvm->slots_lock);
>> >  
>> > +	if (write && atomic_read(&kvm->nr_memslots_dirty_logging)) {
>> > +		ret = -EBUSY;
>> > +		goto out;
>> > +	}
>> > +
>> >
>> 
>> is this equivalent to kvm_follow_pfn() with kfp->pin = 1 ?
>
> No, gfn_to_pfn_prot() == FOLL_GET, kfp->pin == FOLL_PIN.  But that's not really
> relevant.
>


What I meant was, should we consider mte_copy_tags_from_user() as one
that update the page contents (even though it is updating tags) and
use kvm_follow_pfn() with kfp->pin = 1 instead?

Is my understanding correct in that, if we want to look up a pfn/page
from gfn with the intent of updating the page contents, we should use
kfp->pin == 1? 

-aneesh

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 02/84] KVM: arm64: Disallow copying MTE to guest memory while KVM is dirty logging
  2024-08-05  7:57       ` Aneesh Kumar K.V
@ 2024-08-05 22:09         ` Sean Christopherson
  0 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-08-05 22:09 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	David Matlack, David Stevens

On Mon, Aug 05, 2024, Aneesh Kumar K.V wrote:
> Sean Christopherson <seanjc@google.com> writes:
> 
> > On Thu, Aug 01, 2024, Aneesh Kumar K.V wrote:
> >> Sean Christopherson <seanjc@google.com> writes:
> >> 
> >> > Disallow copying MTE tags to guest memory while KVM is dirty logging, as
> >> > writing guest memory without marking the gfn as dirty in the memslot could
> >> > result in userspace failing to migrate the updated page.  Ideally (maybe?),
> >> > KVM would simply mark the gfn as dirty, but there is no vCPU to work with,
> >> > and presumably the only use case for copy MTE tags _to_ the guest is when
> >> > restoring state on the target.
> >> >
> >> > Fixes: f0376edb1ddc ("KVM: arm64: Add ioctl to fetch/store tags in a guest")
> >> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> >> > ---
> >> >  arch/arm64/kvm/guest.c | 5 +++++
> >> >  1 file changed, 5 insertions(+)
> >> >
> >> > diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> >> > index e1f0ff08836a..962f985977c2 100644
> >> > --- a/arch/arm64/kvm/guest.c
> >> > +++ b/arch/arm64/kvm/guest.c
> >> > @@ -1045,6 +1045,11 @@ int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
> >> >  
> >> >  	mutex_lock(&kvm->slots_lock);
> >> >  
> >> > +	if (write && atomic_read(&kvm->nr_memslots_dirty_logging)) {
> >> > +		ret = -EBUSY;
> >> > +		goto out;
> >> > +	}
> >> > +
> >> >
> >> 
> >> is this equivalent to kvm_follow_pfn() with kfp->pin = 1 ?
> >
> > No, gfn_to_pfn_prot() == FOLL_GET, kfp->pin == FOLL_PIN.  But that's not really
> > relevant.
> 
> What I meant was, should we consider mte_copy_tags_from_user() as one
> that update the page contents (even though it is updating tags) and
> use kvm_follow_pfn() with kfp->pin = 1 instead?

Yes, that's my understanding as well.  However, this series is already ludicruosly
long, and I don't have the ability to test the affected code, so rather than blindly
churn more arch code, I opted to add a FIXME in patch 76 instead.

https://lore.kernel.org/all/20240726235234.228822-76-seanjc@google.com

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 64/84] KVM: LoongArch: Mark "struct page" pfns dirty only in "slow" page fault path
  2024-08-03  3:02       ` maobibo
@ 2024-08-05 23:22         ` Sean Christopherson
  2024-08-06  1:16           ` maobibo
  0 siblings, 1 reply; 150+ messages in thread
From: Sean Christopherson @ 2024-08-05 23:22 UTC (permalink / raw)
  To: maobibo
  Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	David Matlack, David Stevens

On Sat, Aug 03, 2024, maobibo wrote:
> On 2024/8/3 上午3:32, Sean Christopherson wrote:
> > On Fri, Aug 02, 2024, maobibo wrote:
> > > On 2024/7/27 上午7:52, Sean Christopherson wrote:
> > > > Mark pages/folios dirty only the slow page fault path, i.e. only when
> > > > mmu_lock is held and the operation is mmu_notifier-protected, as marking a
> > > > page/folio dirty after it has been written back can make some filesystems
> > > > unhappy (backing KVM guests will such filesystem files is uncommon, and
> > > > the race is minuscule, hence the lack of complaints).
> > > > 
> > > > See the link below for details.
> > > > 
> > > > Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com
> > > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > > ---
> > > >    arch/loongarch/kvm/mmu.c | 18 ++++++++++--------
> > > >    1 file changed, 10 insertions(+), 8 deletions(-)
> > > > 
> > > > diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c
> > > > index 2634a9e8d82c..364dd35e0557 100644
> > > > --- a/arch/loongarch/kvm/mmu.c
> > > > +++ b/arch/loongarch/kvm/mmu.c
> > > > @@ -608,13 +608,13 @@ static int kvm_map_page_fast(struct kvm_vcpu *vcpu, unsigned long gpa, bool writ
> > > >    		if (kvm_pte_young(changed))
> > > >    			kvm_set_pfn_accessed(pfn);
> > > > -		if (kvm_pte_dirty(changed)) {
> > > > -			mark_page_dirty(kvm, gfn);
> > > > -			kvm_set_pfn_dirty(pfn);
> > > > -		}
> > > >    		if (page)
> > > >    			put_page(page);
> > > >    	}
> > > > +
> > > > +	if (kvm_pte_dirty(changed))
> > > > +		mark_page_dirty(kvm, gfn);
> > > > +
> > > >    	return ret;
> > > >    out:
> > > >    	spin_unlock(&kvm->mmu_lock);
> > > > @@ -915,12 +915,14 @@ static int kvm_map_page(struct kvm_vcpu *vcpu, unsigned long gpa, bool write)
> > > >    	else
> > > >    		++kvm->stat.pages;
> > > >    	kvm_set_pte(ptep, new_pte);
> > > > -	spin_unlock(&kvm->mmu_lock);
> > > > -	if (prot_bits & _PAGE_DIRTY) {
> > > > -		mark_page_dirty_in_slot(kvm, memslot, gfn);
> > > > +	if (writeable)
> > > Is it better to use write or (prot_bits & _PAGE_DIRTY) here?  writable is
> > > pte permission from function hva_to_pfn_slow(), write is fault action.
> > 
> > Marking folios dirty in the slow/full path basically necessitates marking the
> > folio dirty if KVM creates a writable SPTE, as KVM won't mark the folio dirty
> > if/when _PAGE_DIRTY is set.
> > 
> > Practically speaking, I'm 99.9% certain it doesn't matter.  The folio is marked
> > dirty by core MM when the folio is made writable, and cleaning the folio triggers
> > an mmu_notifier invalidation.  I.e. if the page is mapped writable in KVM's
> yes, it is. Thanks for the explanation. kvm_set_pfn_dirty() can be put only
> in slow page fault path. I only concern with fault type, read fault type can
> set pte entry writable however not _PAGE_DIRTY at stage-2 mmu table.
> 
> > stage-2 PTEs, then its folio has already been marked dirty.
> Considering one condition although I do not know whether it exists actually.
> user mode VMM writes the folio with hva address firstly, then VCPU thread
> *reads* the folio. With primary mmu table, pte entry is writable and
> _PAGE_DIRTY is set, with secondary mmu table(state-2 PTE table), it is
> pte_none since the filio is accessed at first time, so there will be slow
> page fault path for stage-2 mmu page table filling.
> 
> Since it is read fault, stage-2 PTE will be created with _PAGE_WRITE(coming
> from function hva_to_pfn_slow()), however _PAGE_DIRTY is not set. Do we need
> call kvm_set_pfn_dirty() at this situation?

If KVM doesn't mark the folio dirty when the stage-2 _PAGE_DIRTY flag is set,
i.e. as proposed in this series, then yes, KVM needs to call kvm_set_pfn_dirty()
even though the VM hasn't (yet) written to the memory.  In practice, KVM calling
kvm_set_pfn_dirty() is redundant the majority of the time, as the stage-1 PTE
will have _PAGE_DIRTY set, and that will get propagated to the folio when the
primary MMU does anything relevant with the PTE.  And for file systems that care
about writeback, odds are very good that the folio was marked dirty even earlier,
when MM invoked vm_operations_struct.page_mkwrite().

The reason I am pushing to have all architectures mark pages/folios dirty in the
slow page fault path is that a false positive (marking a folio dirty without the
folio ever being written in _any_ context since the last pte_mkclean()) is rare,
and at worst results an unnecessary writeback.  On the other hand, marking folios
dirty in fast page fault handlers (or anywhere else that isn't protected by
mmu_notifiers) is technically unsafe.

In other words, the intent is to sacrifice accuracy to improve stability/robustness,
because the vast majority of time the loss in accuracy has no effect, and the worst
case scenario is that the kernel does I/O that wasn't necessary.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 54/84] KVM: arm64: Mark "struct page" pfns accessed/dirty before dropping mmu_lock
  2024-07-26 23:52 ` [PATCH v12 54/84] KVM: arm64: Mark "struct page" pfns accessed/dirty before dropping mmu_lock Sean Christopherson
@ 2024-08-05 23:25   ` Oliver Upton
  2024-08-05 23:26     ` Oliver Upton
  2024-08-06  8:24     ` Fuad Tabba
  0 siblings, 2 replies; 150+ messages in thread
From: Oliver Upton @ 2024-08-05 23:25 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Marc Zyngier, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Michael Ellerman, Anup Patel, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

[+cc Fuad]

Fuad, you mentioned in commit 9c30fc615daa ("KVM: arm64: Move setting
the page as dirty out of the critical section") that restructuring
around the MMU lock was helpful for reuse (presumably for pKVM), but I
lack the context there.

On Fri, Jul 26, 2024 at 04:52:03PM -0700, Sean Christopherson wrote:
> Mark pages/folios accessed+dirty prior to dropping mmu_lock, as marking a
> page/folio dirty after it has been written back can make some filesystems
> unhappy (backing KVM guests will such filesystem files is uncommon, and

typo: s/will/with/

> the race is minuscule, hence the lack of complaints).  See the link below
> for details.
> 
> This will also allow converting arm64 to kvm_release_faultin_page(), which
> requires that mmu_lock be held (for the aforementioned reason).
> 
> Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/arm64/kvm/mmu.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 22ee37360c4e..ce13c3d884d5 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1685,15 +1685,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	}
>  
>  out_unlock:
> +	if (writable && !ret)
> +		kvm_set_pfn_dirty(pfn);

I'm guessing you meant kvm_release_pfn_dirty() here, because this leaks
a reference.

> +	else
> +		kvm_release_pfn_clean(pfn);
> +
>  	read_unlock(&kvm->mmu_lock);
>  
>  	/* Mark the page dirty only if the fault is handled successfully */
> -	if (writable && !ret) {
> -		kvm_set_pfn_dirty(pfn);
> +	if (writable && !ret)
>  		mark_page_dirty_in_slot(kvm, memslot, gfn);
> -	}
>  
> -	kvm_release_pfn_clean(pfn);
>  	return ret != -EAGAIN ? ret : 0;
>  }
>  
> -- 
> 2.46.0.rc1.232.g9752f9e123-goog
> 

-- 
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 54/84] KVM: arm64: Mark "struct page" pfns accessed/dirty before dropping mmu_lock
  2024-08-05 23:25   ` Oliver Upton
@ 2024-08-05 23:26     ` Oliver Upton
  2024-08-05 23:53       ` Sean Christopherson
  2024-08-06  8:55       ` Marc Zyngier
  2024-08-06  8:24     ` Fuad Tabba
  1 sibling, 2 replies; 150+ messages in thread
From: Oliver Upton @ 2024-08-05 23:26 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Marc Zyngier, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Michael Ellerman, Anup Patel, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens, Fuad Tabba

On Mon, Aug 05, 2024 at 11:26:03PM +0000, Oliver Upton wrote:
> [+cc Fuad]

Take 2!

> Fuad, you mentioned in commit 9c30fc615daa ("KVM: arm64: Move setting
> the page as dirty out of the critical section") that restructuring
> around the MMU lock was helpful for reuse (presumably for pKVM), but I
> lack the context there.
> 
> On Fri, Jul 26, 2024 at 04:52:03PM -0700, Sean Christopherson wrote:
> > Mark pages/folios accessed+dirty prior to dropping mmu_lock, as marking a
> > page/folio dirty after it has been written back can make some filesystems
> > unhappy (backing KVM guests will such filesystem files is uncommon, and
> 
> typo: s/will/with/
> 
> > the race is minuscule, hence the lack of complaints).  See the link below
> > for details.
> > 
> > This will also allow converting arm64 to kvm_release_faultin_page(), which
> > requires that mmu_lock be held (for the aforementioned reason).
> > 
> > Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  arch/arm64/kvm/mmu.c | 10 ++++++----
> >  1 file changed, 6 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 22ee37360c4e..ce13c3d884d5 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -1685,15 +1685,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >  	}
> >  
> >  out_unlock:
> > +	if (writable && !ret)
> > +		kvm_set_pfn_dirty(pfn);
> 
> I'm guessing you meant kvm_release_pfn_dirty() here, because this leaks
> a reference.
> 
> > +	else
> > +		kvm_release_pfn_clean(pfn);
> > +
> >  	read_unlock(&kvm->mmu_lock);
> >  
> >  	/* Mark the page dirty only if the fault is handled successfully */
> > -	if (writable && !ret) {
> > -		kvm_set_pfn_dirty(pfn);
> > +	if (writable && !ret)
> >  		mark_page_dirty_in_slot(kvm, memslot, gfn);
> > -	}
> >  
> > -	kvm_release_pfn_clean(pfn);
> >  	return ret != -EAGAIN ? ret : 0;
> >  }
> >  
> > -- 
> > 2.46.0.rc1.232.g9752f9e123-goog
> > 
> 
> -- 
> Thanks,
> Oliver

-- 
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 54/84] KVM: arm64: Mark "struct page" pfns accessed/dirty before dropping mmu_lock
  2024-08-05 23:26     ` Oliver Upton
@ 2024-08-05 23:53       ` Sean Christopherson
  2024-08-05 23:56         ` Oliver Upton
  2024-08-06  8:55       ` Marc Zyngier
  1 sibling, 1 reply; 150+ messages in thread
From: Sean Christopherson @ 2024-08-05 23:53 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Paolo Bonzini, Marc Zyngier, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Michael Ellerman, Anup Patel, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens, Fuad Tabba

On Mon, Aug 05, 2024, Oliver Upton wrote:
> > > ---
> > >  arch/arm64/kvm/mmu.c | 10 ++++++----
> > >  1 file changed, 6 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > > index 22ee37360c4e..ce13c3d884d5 100644
> > > --- a/arch/arm64/kvm/mmu.c
> > > +++ b/arch/arm64/kvm/mmu.c
> > > @@ -1685,15 +1685,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> > >  	}
> > >  
> > >  out_unlock:
> > > +	if (writable && !ret)
> > > +		kvm_set_pfn_dirty(pfn);
> > 
> > I'm guessing you meant kvm_release_pfn_dirty() here, because this leaks
> > a reference.

Doh, I did indeed.  Alternatively, this could be:

	if (writable && !ret)
		kvm_set_pfn_dirty(pfn);

	kvm_release_pfn_clean(pfn);

It won't matter in the end, because this just becomes:

	kvm_release_faultin_page(kvm, page, !!ret, writable);

So I guess the question is if you prefer to make the switch to an if-else in this
path, or more implicitly in the conversion to kvm_release_faultin_page().

I made the same goof for RISC-V, perhaps to prove that I too can copy+paste arm64's
MMU code ;-)

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 54/84] KVM: arm64: Mark "struct page" pfns accessed/dirty before dropping mmu_lock
  2024-08-05 23:53       ` Sean Christopherson
@ 2024-08-05 23:56         ` Oliver Upton
  0 siblings, 0 replies; 150+ messages in thread
From: Oliver Upton @ 2024-08-05 23:56 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Marc Zyngier, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Michael Ellerman, Anup Patel, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens, Fuad Tabba

On Mon, Aug 05, 2024 at 04:53:01PM -0700, Sean Christopherson wrote:
> On Mon, Aug 05, 2024, Oliver Upton wrote:
> > > > ---
> > > >  arch/arm64/kvm/mmu.c | 10 ++++++----
> > > >  1 file changed, 6 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > > > index 22ee37360c4e..ce13c3d884d5 100644
> > > > --- a/arch/arm64/kvm/mmu.c
> > > > +++ b/arch/arm64/kvm/mmu.c
> > > > @@ -1685,15 +1685,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> > > >  	}
> > > >  
> > > >  out_unlock:
> > > > +	if (writable && !ret)
> > > > +		kvm_set_pfn_dirty(pfn);
> > > 
> > > I'm guessing you meant kvm_release_pfn_dirty() here, because this leaks
> > > a reference.
> 
> Doh, I did indeed.  Alternatively, this could be:
> 
> 	if (writable && !ret)
> 		kvm_set_pfn_dirty(pfn);
> 
> 	kvm_release_pfn_clean(pfn);
> 
> It won't matter in the end, because this just becomes:
> 
> 	kvm_release_faultin_page(kvm, page, !!ret, writable);
> 
> So I guess the question is if you prefer to make the switch to an if-else in this
> path, or more implicitly in the conversion to kvm_release_faultin_page().
> 
> I made the same goof for RISC-V, perhaps to prove that I too can copy+paste arm64's
> MMU code ;-)

LOL, whatever way you want to address it is fine by me, just wanted to
make sure this intermediate bug wouldn't bite an unlucky bisection.

-- 
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 64/84] KVM: LoongArch: Mark "struct page" pfns dirty only in "slow" page fault path
  2024-08-05 23:22         ` Sean Christopherson
@ 2024-08-06  1:16           ` maobibo
  0 siblings, 0 replies; 150+ messages in thread
From: maobibo @ 2024-08-06  1:16 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	David Matlack, David Stevens



On 2024/8/6 上午7:22, Sean Christopherson wrote:
> On Sat, Aug 03, 2024, maobibo wrote:
>> On 2024/8/3 上午3:32, Sean Christopherson wrote:
>>> On Fri, Aug 02, 2024, maobibo wrote:
>>>> On 2024/7/27 上午7:52, Sean Christopherson wrote:
>>>>> Mark pages/folios dirty only the slow page fault path, i.e. only when
>>>>> mmu_lock is held and the operation is mmu_notifier-protected, as marking a
>>>>> page/folio dirty after it has been written back can make some filesystems
>>>>> unhappy (backing KVM guests will such filesystem files is uncommon, and
>>>>> the race is minuscule, hence the lack of complaints).
>>>>>
>>>>> See the link below for details.
>>>>>
>>>>> Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com
>>>>> Signed-off-by: Sean Christopherson <seanjc@google.com>
>>>>> ---
>>>>>     arch/loongarch/kvm/mmu.c | 18 ++++++++++--------
>>>>>     1 file changed, 10 insertions(+), 8 deletions(-)
>>>>>
>>>>> diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c
>>>>> index 2634a9e8d82c..364dd35e0557 100644
>>>>> --- a/arch/loongarch/kvm/mmu.c
>>>>> +++ b/arch/loongarch/kvm/mmu.c
>>>>> @@ -608,13 +608,13 @@ static int kvm_map_page_fast(struct kvm_vcpu *vcpu, unsigned long gpa, bool writ
>>>>>     		if (kvm_pte_young(changed))
>>>>>     			kvm_set_pfn_accessed(pfn);
>>>>> -		if (kvm_pte_dirty(changed)) {
>>>>> -			mark_page_dirty(kvm, gfn);
>>>>> -			kvm_set_pfn_dirty(pfn);
>>>>> -		}
>>>>>     		if (page)
>>>>>     			put_page(page);
>>>>>     	}
>>>>> +
>>>>> +	if (kvm_pte_dirty(changed))
>>>>> +		mark_page_dirty(kvm, gfn);
>>>>> +
>>>>>     	return ret;
>>>>>     out:
>>>>>     	spin_unlock(&kvm->mmu_lock);
>>>>> @@ -915,12 +915,14 @@ static int kvm_map_page(struct kvm_vcpu *vcpu, unsigned long gpa, bool write)
>>>>>     	else
>>>>>     		++kvm->stat.pages;
>>>>>     	kvm_set_pte(ptep, new_pte);
>>>>> -	spin_unlock(&kvm->mmu_lock);
>>>>> -	if (prot_bits & _PAGE_DIRTY) {
>>>>> -		mark_page_dirty_in_slot(kvm, memslot, gfn);
>>>>> +	if (writeable)
>>>> Is it better to use write or (prot_bits & _PAGE_DIRTY) here?  writable is
>>>> pte permission from function hva_to_pfn_slow(), write is fault action.
>>>
>>> Marking folios dirty in the slow/full path basically necessitates marking the
>>> folio dirty if KVM creates a writable SPTE, as KVM won't mark the folio dirty
>>> if/when _PAGE_DIRTY is set.
>>>
>>> Practically speaking, I'm 99.9% certain it doesn't matter.  The folio is marked
>>> dirty by core MM when the folio is made writable, and cleaning the folio triggers
>>> an mmu_notifier invalidation.  I.e. if the page is mapped writable in KVM's
>> yes, it is. Thanks for the explanation. kvm_set_pfn_dirty() can be put only
>> in slow page fault path. I only concern with fault type, read fault type can
>> set pte entry writable however not _PAGE_DIRTY at stage-2 mmu table.
>>
>>> stage-2 PTEs, then its folio has already been marked dirty.
>> Considering one condition although I do not know whether it exists actually.
>> user mode VMM writes the folio with hva address firstly, then VCPU thread
>> *reads* the folio. With primary mmu table, pte entry is writable and
>> _PAGE_DIRTY is set, with secondary mmu table(state-2 PTE table), it is
>> pte_none since the filio is accessed at first time, so there will be slow
>> page fault path for stage-2 mmu page table filling.
>>
>> Since it is read fault, stage-2 PTE will be created with _PAGE_WRITE(coming
>> from function hva_to_pfn_slow()), however _PAGE_DIRTY is not set. Do we need
>> call kvm_set_pfn_dirty() at this situation?
> 
> If KVM doesn't mark the folio dirty when the stage-2 _PAGE_DIRTY flag is set,
> i.e. as proposed in this series, then yes, KVM needs to call kvm_set_pfn_dirty()
> even though the VM hasn't (yet) written to the memory.  In practice, KVM calling
> kvm_set_pfn_dirty() is redundant the majority of the time, as the stage-1 PTE
> will have _PAGE_DIRTY set, and that will get propagated to the folio when the
> primary MMU does anything relevant with the PTE.  And for file systems that care
> about writeback, odds are very good that the folio was marked dirty even earlier,
> when MM invoked vm_operations_struct.page_mkwrite().
> 
> The reason I am pushing to have all architectures mark pages/folios dirty in the
> slow page fault path is that a false positive (marking a folio dirty without the
> folio ever being written in _any_ context since the last pte_mkclean()) is rare,
> and at worst results an unnecessary writeback.  On the other hand, marking folios
It does not influence the result. At worst there is one unnecessary 
kvm_set_pfn_dirty() before the last pte_mkclean(). That is ok for me, 
and thanks for your detailed explanation.

> dirty in fast page fault handlers (or anywhere else that isn't protected by
> mmu_notifiers) is technically unsafe.
yeap, moving marking folios dirty to slow fault handler makes logic 
clear and simple here, and technically safer.

Regards
Bibo Mao
> 
> In other words, the intent is to sacrifice accuracy to improve stability/robustness,
> because the vast majority of time the loss in accuracy has no effect, and the worst
> case scenario is that the kernel does I/O that wasn't necessary.
> 


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 54/84] KVM: arm64: Mark "struct page" pfns accessed/dirty before dropping mmu_lock
  2024-08-05 23:25   ` Oliver Upton
  2024-08-05 23:26     ` Oliver Upton
@ 2024-08-06  8:24     ` Fuad Tabba
  1 sibling, 0 replies; 150+ messages in thread
From: Fuad Tabba @ 2024-08-06  8:24 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Sean Christopherson, Paolo Bonzini, Marc Zyngier, Tianrui Zhao,
	Bibo Mao, Huacai Chen, Michael Ellerman, Anup Patel,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Christian Borntraeger,
	Janosch Frank, Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm,
	loongarch, linux-mips, linuxppc-dev, kvm-riscv, linux-riscv,
	linux-kernel, David Matlack, David Stevens

Hi Oliver,

On Tue, 6 Aug 2024 at 00:26, Oliver Upton <oliver.upton@linux.dev> wrote:
>
> [+cc Fuad]
>
> Fuad, you mentioned in commit 9c30fc615daa ("KVM: arm64: Move setting
> the page as dirty out of the critical section") that restructuring
> around the MMU lock was helpful for reuse (presumably for pKVM), but I
> lack the context there.

That was for some refactoring I'd done later on for mem_aborts in
pKVM. That said, I didn't know at the time that there might be a race
with some filesystems. I'll keep this in mind for the pKVM code we
have for now, and when upstreaming.

Thanks,
/fuad

> On Fri, Jul 26, 2024 at 04:52:03PM -0700, Sean Christopherson wrote:
> > Mark pages/folios accessed+dirty prior to dropping mmu_lock, as marking a
> > page/folio dirty after it has been written back can make some filesystems
> > unhappy (backing KVM guests will such filesystem files is uncommon, and
>
> typo: s/will/with/
>
> > the race is minuscule, hence the lack of complaints).  See the link below
> > for details.
> >
> > This will also allow converting arm64 to kvm_release_faultin_page(), which
> > requires that mmu_lock be held (for the aforementioned reason).
> >
> > Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  arch/arm64/kvm/mmu.c | 10 ++++++----
> >  1 file changed, 6 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 22ee37360c4e..ce13c3d884d5 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -1685,15 +1685,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >       }
> >
> >  out_unlock:
> > +     if (writable && !ret)
> > +             kvm_set_pfn_dirty(pfn);
>
> I'm guessing you meant kvm_release_pfn_dirty() here, because this leaks
> a reference.
>
> > +     else
> > +             kvm_release_pfn_clean(pfn);
> > +
> >       read_unlock(&kvm->mmu_lock);
> >
> >       /* Mark the page dirty only if the fault is handled successfully */
> > -     if (writable && !ret) {
> > -             kvm_set_pfn_dirty(pfn);
> > +     if (writable && !ret)
> >               mark_page_dirty_in_slot(kvm, memslot, gfn);
> > -     }
> >
> > -     kvm_release_pfn_clean(pfn);
> >       return ret != -EAGAIN ? ret : 0;
> >  }
> >
> > --
> > 2.46.0.rc1.232.g9752f9e123-goog
> >
>
> --
> Thanks,
> Oliver
>

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 54/84] KVM: arm64: Mark "struct page" pfns accessed/dirty before dropping mmu_lock
  2024-08-05 23:26     ` Oliver Upton
  2024-08-05 23:53       ` Sean Christopherson
@ 2024-08-06  8:55       ` Marc Zyngier
  2024-08-06 15:19         ` Sean Christopherson
  1 sibling, 1 reply; 150+ messages in thread
From: Marc Zyngier @ 2024-08-06  8:55 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Sean Christopherson, Paolo Bonzini, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	David Matlack, David Stevens, Fuad Tabba

On Tue, 06 Aug 2024 00:26:54 +0100,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> On Mon, Aug 05, 2024 at 11:26:03PM +0000, Oliver Upton wrote:
> > [+cc Fuad]
> 
> Take 2!
> 
> > Fuad, you mentioned in commit 9c30fc615daa ("KVM: arm64: Move setting
> > the page as dirty out of the critical section") that restructuring
> > around the MMU lock was helpful for reuse (presumably for pKVM), but I
> > lack the context there.
> > 
> > On Fri, Jul 26, 2024 at 04:52:03PM -0700, Sean Christopherson wrote:
> > > Mark pages/folios accessed+dirty prior to dropping mmu_lock, as marking a
> > > page/folio dirty after it has been written back can make some filesystems
> > > unhappy (backing KVM guests will such filesystem files is uncommon, and
> > 
> > typo: s/will/with/
> > 
> > > the race is minuscule, hence the lack of complaints).  See the link below
> > > for details.

Should we consider reverting 9c30fc615daa then?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 56/84] KVM: RISC-V: Mark "struct page" pfns dirty iff a stage-2 PTE is installed
  2024-07-26 23:52 ` [PATCH v12 56/84] KVM: RISC-V: Mark "struct page" pfns dirty iff a stage-2 PTE is installed Sean Christopherson
  2024-07-31  8:11   ` Andrew Jones
@ 2024-08-06 15:03   ` Anup Patel
  1 sibling, 0 replies; 150+ messages in thread
From: Anup Patel @ 2024-08-06 15:03 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

On Sat, Jul 27, 2024 at 5:24 AM Sean Christopherson <seanjc@google.com> wrote:
>
> Don't mark pages dirty if KVM bails from the page fault handler without
> installing a stage-2 mapping, i.e. if the page is guaranteed to not be
> written by the guest.
>
> In addition to being a (very) minor fix, this paves the way for converting
> RISC-V to use kvm_release_faultin_page().
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

For KVM RISC-V:
Acked-by: Anup Patel <anup@brainfault.org>

Regards,
Anup


> ---
>  arch/riscv/kvm/mmu.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> index b63650f9b966..06aa5a0d056d 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c
> @@ -669,7 +669,6 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
>                 goto out_unlock;
>
>         if (writable) {
> -               kvm_set_pfn_dirty(hfn);
>                 mark_page_dirty(kvm, gfn);
>                 ret = gstage_map_page(kvm, pcache, gpa, hfn << PAGE_SHIFT,
>                                       vma_pagesize, false, true);
> @@ -682,6 +681,9 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
>                 kvm_err("Failed to map in G-stage\n");
>
>  out_unlock:
> +       if ((!ret || ret == -EEXIST) && writable)
> +               kvm_set_pfn_dirty(hfn);
> +
>         spin_unlock(&kvm->mmu_lock);
>         kvm_set_pfn_accessed(hfn);
>         kvm_release_pfn_clean(hfn);
> --
> 2.46.0.rc1.232.g9752f9e123-goog
>

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 57/84] KVM: RISC-V: Mark "struct page" pfns accessed before dropping mmu_lock
  2024-07-26 23:52 ` [PATCH v12 57/84] KVM: RISC-V: Mark "struct page" pfns accessed before dropping mmu_lock Sean Christopherson
  2024-07-31  8:12   ` Andrew Jones
@ 2024-08-06 15:04   ` Anup Patel
  1 sibling, 0 replies; 150+ messages in thread
From: Anup Patel @ 2024-08-06 15:04 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

On Sat, Jul 27, 2024 at 5:24 AM Sean Christopherson <seanjc@google.com> wrote:
>
> Mark pages accessed before dropping mmu_lock when faulting in guest memory
> so that RISC-V can convert to kvm_release_faultin_page() without tripping
> its lockdep assertion on mmu_lock being held.  Marking pages accessed
> outside of mmu_lock is ok (not great, but safe), but marking pages _dirty_
> outside of mmu_lock can make filesystems unhappy.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

For KVM RISC-V:
Acked-by: Anup Patel <anup@brainfault.org>

Regards,
Anup


> ---
>  arch/riscv/kvm/mmu.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> index 06aa5a0d056d..806f68e70642 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c
> @@ -683,10 +683,10 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
>  out_unlock:
>         if ((!ret || ret == -EEXIST) && writable)
>                 kvm_set_pfn_dirty(hfn);
> +       else
> +               kvm_release_pfn_clean(hfn);
>
>         spin_unlock(&kvm->mmu_lock);
> -       kvm_set_pfn_accessed(hfn);
> -       kvm_release_pfn_clean(hfn);
>         return ret;
>  }
>
> --
> 2.46.0.rc1.232.g9752f9e123-goog
>

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 58/84] KVM: RISC-V: Use kvm_faultin_pfn() when mapping pfns into the guest
  2024-07-26 23:52 ` [PATCH v12 58/84] KVM: RISC-V: Use kvm_faultin_pfn() when mapping pfns into the guest Sean Christopherson
  2024-07-31  8:11   ` Andrew Jones
@ 2024-08-06 15:04   ` Anup Patel
  1 sibling, 0 replies; 150+ messages in thread
From: Anup Patel @ 2024-08-06 15:04 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

On Sat, Jul 27, 2024 at 5:24 AM Sean Christopherson <seanjc@google.com> wrote:
>
> Convert RISC-V to __kvm_faultin_pfn()+kvm_release_faultin_page(), which
> are new APIs to consolidate arch code and provide consistent behavior
> across all KVM architectures.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

For KVM RISC-V:
Acked-by: Anup Patel <anup@brainfault.org>

Regards,
Anup


> ---
>  arch/riscv/kvm/mmu.c | 11 ++++-------
>  1 file changed, 4 insertions(+), 7 deletions(-)
>
> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> index 806f68e70642..f73d6a79a78c 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c
> @@ -601,6 +601,7 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
>         bool logging = (memslot->dirty_bitmap &&
>                         !(memslot->flags & KVM_MEM_READONLY)) ? true : false;
>         unsigned long vma_pagesize, mmu_seq;
> +       struct page *page;
>
>         /* We need minimum second+third level pages */
>         ret = kvm_mmu_topup_memory_cache(pcache, gstage_pgd_levels);
> @@ -631,7 +632,7 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
>
>         /*
>          * Read mmu_invalidate_seq so that KVM can detect if the results of
> -        * vma_lookup() or gfn_to_pfn_prot() become stale priort to acquiring
> +        * vma_lookup() or __kvm_faultin_pfn() become stale priort to acquiring
>          * kvm->mmu_lock.
>          *
>          * Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs
> @@ -647,7 +648,7 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
>                 return -EFAULT;
>         }
>
> -       hfn = gfn_to_pfn_prot(kvm, gfn, is_write, &writable);
> +       hfn = kvm_faultin_pfn(vcpu, gfn, is_write, &writable, &page);
>         if (hfn == KVM_PFN_ERR_HWPOISON) {
>                 send_sig_mceerr(BUS_MCEERR_AR, (void __user *)hva,
>                                 vma_pageshift, current);
> @@ -681,11 +682,7 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
>                 kvm_err("Failed to map in G-stage\n");
>
>  out_unlock:
> -       if ((!ret || ret == -EEXIST) && writable)
> -               kvm_set_pfn_dirty(hfn);
> -       else
> -               kvm_release_pfn_clean(hfn);
> -
> +       kvm_release_faultin_page(kvm, page, ret && ret != -EEXIST, writable);
>         spin_unlock(&kvm->mmu_lock);
>         return ret;
>  }
> --
> 2.46.0.rc1.232.g9752f9e123-goog
>

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 54/84] KVM: arm64: Mark "struct page" pfns accessed/dirty before dropping mmu_lock
  2024-08-06  8:55       ` Marc Zyngier
@ 2024-08-06 15:19         ` Sean Christopherson
  0 siblings, 0 replies; 150+ messages in thread
From: Sean Christopherson @ 2024-08-06 15:19 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Oliver Upton, Paolo Bonzini, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Michael Ellerman, Anup Patel, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens, Fuad Tabba

On Tue, Aug 06, 2024, Marc Zyngier wrote:
> On Tue, 06 Aug 2024 00:26:54 +0100,
> Oliver Upton <oliver.upton@linux.dev> wrote:
> > 
> > On Mon, Aug 05, 2024 at 11:26:03PM +0000, Oliver Upton wrote:
> > > [+cc Fuad]
> > 
> > Take 2!
> > 
> > > Fuad, you mentioned in commit 9c30fc615daa ("KVM: arm64: Move setting
> > > the page as dirty out of the critical section") that restructuring
> > > around the MMU lock was helpful for reuse (presumably for pKVM), but I
> > > lack the context there.
> > > 
> > > On Fri, Jul 26, 2024 at 04:52:03PM -0700, Sean Christopherson wrote:
> > > > Mark pages/folios accessed+dirty prior to dropping mmu_lock, as marking a
> > > > page/folio dirty after it has been written back can make some filesystems
> > > > unhappy (backing KVM guests will such filesystem files is uncommon, and
> > > 
> > > typo: s/will/with/
> > > 
> > > > the race is minuscule, hence the lack of complaints).  See the link below
> > > > for details.
> 
> Should we consider reverting 9c30fc615daa then?

Aha!  After thinking through things more, I don't think a revert is necessary.
I _think_ the worst case scenario is that KVM would trigger this WARN in
filemap_unaccount_folio():

	/*
	 * At this point folio must be either written or cleaned by
	 * truncate.  Dirty folio here signals a bug and loss of
	 * unwritten data - on ordinary filesystems.
	 *
	 * But it's harmless on in-memory filesystems like tmpfs; and can
	 * occur when a driver which did get_user_pages() sets page dirty
	 * before putting it, while the inode is being finally evicted.
	 *
	 * Below fixes dirty accounting after removing the folio entirely
	 * but leaves the dirty flag set: it has no effect for truncated
	 * folio and anyway will be cleared before returning folio to
	 * buddy allocator.
	 */
	if (WARN_ON_ONCE(folio_test_dirty(folio) &&
			 mapping_can_writeback(mapping)))
		folio_account_cleaned(folio, inode_to_wb(mapping->host));

KVM won't actually write memory because the stage-2 mappings are protected by the
mmu_notifier, i.e. there is no risk of loss of data, even if the VM were backed
by memory that needs writeback.

And FWIW, given that multiple other KVM architectures mark folios dirty outside
of mmu_notifier protection and have never tripped over this, I think it's highly
unlikely the WARN will ever be triggered by a sane virtualization setup.

I can add something to that effect to the changelog, e.g. to document that this
isn't super urgent.  

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 01/84] KVM: arm64: Release pfn, i.e. put page, if copying MTE tags hits ZONE_DEVICE
  2024-07-26 23:51 ` [PATCH v12 01/84] KVM: arm64: Release pfn, i.e. put page, if copying MTE tags hits ZONE_DEVICE Sean Christopherson
  2024-07-31 16:23   ` Alex Bennée
  2024-08-01 10:07   ` Marc Zyngier
@ 2024-08-07 14:15   ` Catalin Marinas
  2024-08-08  9:54     ` Steven Price
  2024-08-22 14:24   ` (subset) " Marc Zyngier
  3 siblings, 1 reply; 150+ messages in thread
From: Catalin Marinas @ 2024-08-07 14:15 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	David Matlack, David Stevens, Steven Price

On Fri, Jul 26, 2024 at 04:51:10PM -0700, Sean Christopherson wrote:
> Put the page reference acquired by gfn_to_pfn_prot() if
> kvm_vm_ioctl_mte_copy_tags() runs into ZONE_DEVICE memory.  KVM's less-
> than-stellar heuristics for dealing with pfn-mapped memory means that KVM
> can get a page reference to ZONE_DEVICE memory.
> 
> Fixes: f0376edb1ddc ("KVM: arm64: Add ioctl to fetch/store tags in a guest")
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/arm64/kvm/guest.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> index 11098eb7eb44..e1f0ff08836a 100644
> --- a/arch/arm64/kvm/guest.c
> +++ b/arch/arm64/kvm/guest.c
> @@ -1059,6 +1059,7 @@ int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
>  		page = pfn_to_online_page(pfn);
>  		if (!page) {
>  			/* Reject ZONE_DEVICE memory */
> +			kvm_release_pfn_clean(pfn);
>  			ret = -EFAULT;
>  			goto out;
>  		}

This patch makes sense irrespective of whether the above pfn is a
ZONE_DEVICE or not. gfn_to_pfn_prot() increased the page refcount via
GUP, so it must be released before bailing out of this loop.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 02/84] KVM: arm64: Disallow copying MTE to guest memory while KVM is dirty logging
  2024-07-26 23:51 ` [PATCH v12 02/84] KVM: arm64: Disallow copying MTE to guest memory while KVM is dirty logging Sean Christopherson
  2024-08-01  7:34   ` Aneesh Kumar K.V
@ 2024-08-07 16:21   ` Catalin Marinas
  2024-08-08  9:54     ` Steven Price
  2024-08-22 14:24   ` (subset) " Marc Zyngier
  2 siblings, 1 reply; 150+ messages in thread
From: Catalin Marinas @ 2024-08-07 16:21 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	David Matlack, David Stevens

On Fri, Jul 26, 2024 at 04:51:11PM -0700, Sean Christopherson wrote:
> Disallow copying MTE tags to guest memory while KVM is dirty logging, as
> writing guest memory without marking the gfn as dirty in the memslot could
> result in userspace failing to migrate the updated page.  Ideally (maybe?),
> KVM would simply mark the gfn as dirty, but there is no vCPU to work with,
> and presumably the only use case for copy MTE tags _to_ the guest is when
> restoring state on the target.
> 
> Fixes: f0376edb1ddc ("KVM: arm64: Add ioctl to fetch/store tags in a guest")
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/arm64/kvm/guest.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> index e1f0ff08836a..962f985977c2 100644
> --- a/arch/arm64/kvm/guest.c
> +++ b/arch/arm64/kvm/guest.c
> @@ -1045,6 +1045,11 @@ int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
>  
>  	mutex_lock(&kvm->slots_lock);
>  
> +	if (write && atomic_read(&kvm->nr_memslots_dirty_logging)) {
> +		ret = -EBUSY;
> +		goto out;
> +	}

There are ways to actually log the page dirtying but I don't think
it's worth it. AFAICT, reading the tags still works and that's what's
used during migration (on the VM where dirty tracking takes place).

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 01/84] KVM: arm64: Release pfn, i.e. put page, if copying MTE tags hits ZONE_DEVICE
  2024-08-07 14:15   ` Catalin Marinas
@ 2024-08-08  9:54     ` Steven Price
  0 siblings, 0 replies; 150+ messages in thread
From: Steven Price @ 2024-08-08  9:54 UTC (permalink / raw)
  To: Catalin Marinas, Sean Christopherson
  Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	David Matlack, David Stevens

On 07/08/2024 15:15, Catalin Marinas wrote:
> On Fri, Jul 26, 2024 at 04:51:10PM -0700, Sean Christopherson wrote:
>> Put the page reference acquired by gfn_to_pfn_prot() if
>> kvm_vm_ioctl_mte_copy_tags() runs into ZONE_DEVICE memory.  KVM's less-
>> than-stellar heuristics for dealing with pfn-mapped memory means that KVM
>> can get a page reference to ZONE_DEVICE memory.
>>
>> Fixes: f0376edb1ddc ("KVM: arm64: Add ioctl to fetch/store tags in a guest")
>> Signed-off-by: Sean Christopherson <seanjc@google.com>
>> ---
>>  arch/arm64/kvm/guest.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
>> index 11098eb7eb44..e1f0ff08836a 100644
>> --- a/arch/arm64/kvm/guest.c
>> +++ b/arch/arm64/kvm/guest.c
>> @@ -1059,6 +1059,7 @@ int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
>>  		page = pfn_to_online_page(pfn);
>>  		if (!page) {
>>  			/* Reject ZONE_DEVICE memory */
>> +			kvm_release_pfn_clean(pfn);
>>  			ret = -EFAULT;
>>  			goto out;
>>  		}
> 
> This patch makes sense irrespective of whether the above pfn is a
> ZONE_DEVICE or not. gfn_to_pfn_prot() increased the page refcount via
> GUP, so it must be released before bailing out of this loop.
> 
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> 

Yep, as Catalin says, this is an 'obviously' correct fix - the reference
needs releasing before bailing out. The comment there is perhaps
misleading - it's not just ZONE_DEVICE memory that will be rejected, but
this is the case that was in my mind when I wrote it. Although clearly I
wasn't thinking hard enough when writing the code in the first place... ;)

Reviewed-by: Steven Price <steven.price@arm.com>

Thanks,

Steve

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 02/84] KVM: arm64: Disallow copying MTE to guest memory while KVM is dirty logging
  2024-08-07 16:21   ` Catalin Marinas
@ 2024-08-08  9:54     ` Steven Price
  0 siblings, 0 replies; 150+ messages in thread
From: Steven Price @ 2024-08-08  9:54 UTC (permalink / raw)
  To: Catalin Marinas, Sean Christopherson
  Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	David Matlack, David Stevens

On 07/08/2024 17:21, Catalin Marinas wrote:
> On Fri, Jul 26, 2024 at 04:51:11PM -0700, Sean Christopherson wrote:
>> Disallow copying MTE tags to guest memory while KVM is dirty logging, as
>> writing guest memory without marking the gfn as dirty in the memslot could
>> result in userspace failing to migrate the updated page.  Ideally (maybe?),
>> KVM would simply mark the gfn as dirty, but there is no vCPU to work with,
>> and presumably the only use case for copy MTE tags _to_ the guest is when
>> restoring state on the target.
>>
>> Fixes: f0376edb1ddc ("KVM: arm64: Add ioctl to fetch/store tags in a guest")
>> Signed-off-by: Sean Christopherson <seanjc@google.com>
>> ---
>>  arch/arm64/kvm/guest.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
>> index e1f0ff08836a..962f985977c2 100644
>> --- a/arch/arm64/kvm/guest.c
>> +++ b/arch/arm64/kvm/guest.c
>> @@ -1045,6 +1045,11 @@ int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
>>  
>>  	mutex_lock(&kvm->slots_lock);
>>  
>> +	if (write && atomic_read(&kvm->nr_memslots_dirty_logging)) {
>> +		ret = -EBUSY;
>> +		goto out;
>> +	}
> 
> There are ways to actually log the page dirtying but I don't think
> it's worth it. AFAICT, reading the tags still works and that's what's
> used during migration (on the VM where dirty tracking takes place).
> 
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> 

Looks sensible to me - my initial thought was "why would a VMM do
that?". But it would make sense to actually return a failure rather than
letting the VMM shoot itself in the foot.

If there's actually a use-case then we could look at making the dirty
tracking work, but I'm not convinced there is a good reason.

Reviewed-by: Steven Price <steven.price@arm.com>

Thanks,

Steve

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 64/84] KVM: LoongArch: Mark "struct page" pfns dirty only in "slow" page fault path
  2024-07-26 23:52 ` [PATCH v12 64/84] KVM: LoongArch: Mark "struct page" pfns dirty only in "slow" page fault path Sean Christopherson
  2024-08-02  7:53   ` maobibo
@ 2024-08-08 11:38   ` maobibo
  1 sibling, 0 replies; 150+ messages in thread
From: maobibo @ 2024-08-08 11:38 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Marc Zyngier, Oliver Upton,
	Tianrui Zhao, Huacai Chen, Michael Ellerman, Anup Patel,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Christian Borntraeger,
	Janosch Frank, Claudio Imbrenda
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens



On 2024/7/27 上午7:52, Sean Christopherson wrote:
> Mark pages/folios dirty only the slow page fault path, i.e. only when
> mmu_lock is held and the operation is mmu_notifier-protected, as marking a
> page/folio dirty after it has been written back can make some filesystems
> unhappy (backing KVM guests will such filesystem files is uncommon, and
> the race is minuscule, hence the lack of complaints).
> 
> See the link below for details.
> 
> Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/loongarch/kvm/mmu.c | 18 ++++++++++--------
>   1 file changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c
> index 2634a9e8d82c..364dd35e0557 100644
> --- a/arch/loongarch/kvm/mmu.c
> +++ b/arch/loongarch/kvm/mmu.c
> @@ -608,13 +608,13 @@ static int kvm_map_page_fast(struct kvm_vcpu *vcpu, unsigned long gpa, bool writ
>   		if (kvm_pte_young(changed))
>   			kvm_set_pfn_accessed(pfn);
>   
> -		if (kvm_pte_dirty(changed)) {
> -			mark_page_dirty(kvm, gfn);
> -			kvm_set_pfn_dirty(pfn);
> -		}
>   		if (page)
>   			put_page(page);
>   	}
> +
> +	if (kvm_pte_dirty(changed))
> +		mark_page_dirty(kvm, gfn);
> +
>   	return ret;
>   out:
>   	spin_unlock(&kvm->mmu_lock);
> @@ -915,12 +915,14 @@ static int kvm_map_page(struct kvm_vcpu *vcpu, unsigned long gpa, bool write)
>   	else
>   		++kvm->stat.pages;
>   	kvm_set_pte(ptep, new_pte);
> -	spin_unlock(&kvm->mmu_lock);
>   
> -	if (prot_bits & _PAGE_DIRTY) {
> -		mark_page_dirty_in_slot(kvm, memslot, gfn);
> +	if (writeable)
>   		kvm_set_pfn_dirty(pfn);
> -	}
> +
> +	spin_unlock(&kvm->mmu_lock);
> +
> +	if (prot_bits & _PAGE_DIRTY)
> +		mark_page_dirty_in_slot(kvm, memslot, gfn);
>   
>   	kvm_release_pfn_clean(pfn);
>   out:
> 
Reviewed-by: Bibo Mao <maobibo@loongson.cn>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 66/84] KVM: LoongArch: Mark "struct page" pfn accessed before dropping mmu_lock
  2024-07-26 23:52 ` [PATCH v12 66/84] KVM: LoongArch: Mark "struct page" pfn accessed before dropping mmu_lock Sean Christopherson
@ 2024-08-08 11:47   ` maobibo
  0 siblings, 0 replies; 150+ messages in thread
From: maobibo @ 2024-08-08 11:47 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Marc Zyngier, Oliver Upton,
	Tianrui Zhao, Huacai Chen, Michael Ellerman, Anup Patel,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Christian Borntraeger,
	Janosch Frank, Claudio Imbrenda
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens



On 2024/7/27 上午7:52, Sean Christopherson wrote:
> Mark pages accessed before dropping mmu_lock when faulting in guest memory
> so that LoongArch can convert to kvm_release_faultin_page() without
> tripping its lockdep assertion on mmu_lock being held.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/loongarch/kvm/mmu.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c
> index 52b5c16cf250..230cafa178d7 100644
> --- a/arch/loongarch/kvm/mmu.c
> +++ b/arch/loongarch/kvm/mmu.cBibo Mao <maobibo@loongson.cn>
> @@ -902,13 +902,13 @@ static int kvm_map_page(struct kvm_vcpu *vcpu, unsigned long gpa, bool write)
>   
>   	if (writeable)
>   		kvm_set_pfn_dirty(pfn);
> +	kvm_release_pfn_clean(pfn);
>   
>   	spin_unlock(&kvm->mmu_lock);
>   
>   	if (prot_bits & _PAGE_DIRTY)
>   		mark_page_dirty_in_slot(kvm, memslot, gfn);
>   
> -	kvm_release_pfn_clean(pfn);
>   out:
>   	srcu_read_unlock(&kvm->srcu, srcu_idx);
>   	return err;
> 
Reviewed-by: Bibo Mao <maobibo@loongson.cn>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 13/84] KVM: Annotate that all paths in hva_to_pfn() might sleep
  2024-07-26 23:51 ` [PATCH v12 13/84] KVM: Annotate that all paths in hva_to_pfn() might sleep Sean Christopherson
@ 2024-08-08 12:00   ` Alex Bennée
  2024-08-08 13:16     ` Sean Christopherson
  0 siblings, 1 reply; 150+ messages in thread
From: Alex Bennée @ 2024-08-08 12:00 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	David Matlack, David Stevens

Sean Christopherson <seanjc@google.com> writes:

> Now that hva_to_pfn() no longer supports being called in atomic context,
> move the might_sleep() annotation from hva_to_pfn_slow() to
> hva_to_pfn().

The commentary for hva_to_pfn_fast disagrees.

  /*
   * The fast path to get the writable pfn which will be stored in @pfn,
   * true indicates success, otherwise false is returned.  It's also the
   * only part that runs if we can in atomic context.
   */
  static bool hva_to_pfn_fast(struct kvm_follow_pfn *kfp, kvm_pfn_t *pfn)

At which point did it loose the ability to run in the atomic context? I
couldn't work it out from the commits.

>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  virt/kvm/kvm_main.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 84c73b4fc804..03af1a0090b1 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2807,8 +2807,6 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
>  	struct page *page;
>  	int npages;
>  
> -	might_sleep();
> -
>  	if (writable)
>  		*writable = write_fault;
>  
> @@ -2947,6 +2945,8 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool interruptible, bool *async,
>  	kvm_pfn_t pfn;
>  	int npages, r;
>  
> +	might_sleep();
> +
>  	if (hva_to_pfn_fast(addr, write_fault, writable, &pfn))
>  		return pfn;

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 13/84] KVM: Annotate that all paths in hva_to_pfn() might sleep
  2024-08-08 12:00   ` Alex Bennée
@ 2024-08-08 13:16     ` Sean Christopherson
  2024-08-08 15:18       ` Alex Bennée
  0 siblings, 1 reply; 150+ messages in thread
From: Sean Christopherson @ 2024-08-08 13:16 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	David Matlack, David Stevens

On Thu, Aug 08, 2024, Alex Bennée wrote:
> Sean Christopherson <seanjc@google.com> writes:
> 
> > Now that hva_to_pfn() no longer supports being called in atomic context,
> > move the might_sleep() annotation from hva_to_pfn_slow() to
> > hva_to_pfn().
> 
> The commentary for hva_to_pfn_fast disagrees.
> 
>   /*
>    * The fast path to get the writable pfn which will be stored in @pfn,
>    * true indicates success, otherwise false is returned.  It's also the
>    * only part that runs if we can in atomic context.
>    */
>   static bool hva_to_pfn_fast(struct kvm_follow_pfn *kfp, kvm_pfn_t *pfn)
> 
> At which point did it loose the ability to run in the atomic context? I
> couldn't work it out from the commits.

It didn't lose the ability per se (calling hva_to_pfn_fast() in atomic context
would still be functionally ok), rather the previous patch

  KVM: Drop @atomic param from gfn=>pfn and hva=>pfn APIs

removed support for doing so in order to simplify hva_to_pfn() as a whole.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 13/84] KVM: Annotate that all paths in hva_to_pfn() might sleep
  2024-08-08 13:16     ` Sean Christopherson
@ 2024-08-08 15:18       ` Alex Bennée
  2024-08-08 15:31         ` Sean Christopherson
  0 siblings, 1 reply; 150+ messages in thread
From: Alex Bennée @ 2024-08-08 15:18 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	David Matlack, David Stevens

Sean Christopherson <seanjc@google.com> writes:

> On Thu, Aug 08, 2024, Alex Bennée wrote:
>> Sean Christopherson <seanjc@google.com> writes:
>> 
>> > Now that hva_to_pfn() no longer supports being called in atomic context,
>> > move the might_sleep() annotation from hva_to_pfn_slow() to
>> > hva_to_pfn().
>> 
>> The commentary for hva_to_pfn_fast disagrees.
>> 
>>   /*
>>    * The fast path to get the writable pfn which will be stored in @pfn,
>>    * true indicates success, otherwise false is returned.  It's also the
>>    * only part that runs if we can in atomic context.
>>    */
>>   static bool hva_to_pfn_fast(struct kvm_follow_pfn *kfp, kvm_pfn_t *pfn)
>> 
>> At which point did it loose the ability to run in the atomic context? I
>> couldn't work it out from the commits.
>
> It didn't lose the ability per se (calling hva_to_pfn_fast() in atomic context
> would still be functionally ok), rather the previous patch
>
>   KVM: Drop @atomic param from gfn=>pfn and hva=>pfn APIs
>
> removed support for doing so in order to simplify hva_to_pfn() as a whole.

It still sticks out given the only caller no longer enforces this. 

How about:

    * true indicates success, otherwise false is returned.  It's also the
    * only part that could run in an atomic context if we wanted to
    * (although no callers expect it to).

?

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 13/84] KVM: Annotate that all paths in hva_to_pfn() might sleep
  2024-08-08 15:18       ` Alex Bennée
@ 2024-08-08 15:31         ` Sean Christopherson
  2024-08-08 16:16           ` Alex Bennée
  0 siblings, 1 reply; 150+ messages in thread
From: Sean Christopherson @ 2024-08-08 15:31 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	David Matlack, David Stevens

On Thu, Aug 08, 2024, Alex Bennée wrote:
> Sean Christopherson <seanjc@google.com> writes:
> 
> > On Thu, Aug 08, 2024, Alex Bennée wrote:
> >> Sean Christopherson <seanjc@google.com> writes:
> >> 
> >> > Now that hva_to_pfn() no longer supports being called in atomic context,
> >> > move the might_sleep() annotation from hva_to_pfn_slow() to
> >> > hva_to_pfn().
> >> 
> >> The commentary for hva_to_pfn_fast disagrees.
> >> 
> >>   /*
> >>    * The fast path to get the writable pfn which will be stored in @pfn,
> >>    * true indicates success, otherwise false is returned.  It's also the
> >>    * only part that runs if we can in atomic context.
> >>    */
> >>   static bool hva_to_pfn_fast(struct kvm_follow_pfn *kfp, kvm_pfn_t *pfn)
> >> 
> >> At which point did it loose the ability to run in the atomic context? I
> >> couldn't work it out from the commits.
> >
> > It didn't lose the ability per se (calling hva_to_pfn_fast() in atomic context
> > would still be functionally ok), rather the previous patch
> >
> >   KVM: Drop @atomic param from gfn=>pfn and hva=>pfn APIs
> >
> > removed support for doing so in order to simplify hva_to_pfn() as a whole.
> 
> It still sticks out given the only caller no longer enforces this. 

Oh, sorry, I should have been more explicit.  I'll fix the comment, I simply
missed it.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 13/84] KVM: Annotate that all paths in hva_to_pfn() might sleep
  2024-08-08 15:31         ` Sean Christopherson
@ 2024-08-08 16:16           ` Alex Bennée
  0 siblings, 0 replies; 150+ messages in thread
From: Alex Bennée @ 2024-08-08 16:16 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	David Matlack, David Stevens

Sean Christopherson <seanjc@google.com> writes:

> On Thu, Aug 08, 2024, Alex Bennée wrote:
>> Sean Christopherson <seanjc@google.com> writes:
>> 
>> > On Thu, Aug 08, 2024, Alex Bennée wrote:
>> >> Sean Christopherson <seanjc@google.com> writes:
>> >> 
>> >> > Now that hva_to_pfn() no longer supports being called in atomic context,
>> >> > move the might_sleep() annotation from hva_to_pfn_slow() to
>> >> > hva_to_pfn().
>> >> 
>> >> The commentary for hva_to_pfn_fast disagrees.
>> >> 
>> >>   /*
>> >>    * The fast path to get the writable pfn which will be stored in @pfn,
>> >>    * true indicates success, otherwise false is returned.  It's also the
>> >>    * only part that runs if we can in atomic context.
>> >>    */
>> >>   static bool hva_to_pfn_fast(struct kvm_follow_pfn *kfp, kvm_pfn_t *pfn)
>> >> 
>> >> At which point did it loose the ability to run in the atomic context? I
>> >> couldn't work it out from the commits.
>> >
>> > It didn't lose the ability per se (calling hva_to_pfn_fast() in atomic context
>> > would still be functionally ok), rather the previous patch
>> >
>> >   KVM: Drop @atomic param from gfn=>pfn and hva=>pfn APIs
>> >
>> > removed support for doing so in order to simplify hva_to_pfn() as a whole.
>> 
>> It still sticks out given the only caller no longer enforces this. 
>
> Oh, sorry, I should have been more explicit.  I'll fix the comment, I simply
> missed it.

No worries, with the fixed comment:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: (subset) [PATCH v12 01/84] KVM: arm64: Release pfn, i.e. put page, if copying MTE tags hits ZONE_DEVICE
  2024-07-26 23:51 ` [PATCH v12 01/84] KVM: arm64: Release pfn, i.e. put page, if copying MTE tags hits ZONE_DEVICE Sean Christopherson
                     ` (2 preceding siblings ...)
  2024-08-07 14:15   ` Catalin Marinas
@ 2024-08-22 14:24   ` Marc Zyngier
  3 siblings, 0 replies; 150+ messages in thread
From: Marc Zyngier @ 2024-08-22 14:24 UTC (permalink / raw)
  To: Paolo Bonzini, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Michael Ellerman, Anup Patel, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

On Fri, 26 Jul 2024 16:51:10 -0700, Sean Christopherson wrote:
> Put the page reference acquired by gfn_to_pfn_prot() if
> kvm_vm_ioctl_mte_copy_tags() runs into ZONE_DEVICE memory.  KVM's less-
> than-stellar heuristics for dealing with pfn-mapped memory means that KVM
> can get a page reference to ZONE_DEVICE memory.
> 
> 

Applied to next, thanks!

[01/84] KVM: arm64: Release pfn, i.e. put page, if copying MTE tags hits ZONE_DEVICE
        commit: ae41d7dbaeb4f79134136cd65ad7015cf9ccf78a

Cheers,

	M.
-- 
Without deviation from the norm, progress is not possible.



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: (subset) [PATCH v12 02/84] KVM: arm64: Disallow copying MTE to guest memory while KVM is dirty logging
  2024-07-26 23:51 ` [PATCH v12 02/84] KVM: arm64: Disallow copying MTE to guest memory while KVM is dirty logging Sean Christopherson
  2024-08-01  7:34   ` Aneesh Kumar K.V
  2024-08-07 16:21   ` Catalin Marinas
@ 2024-08-22 14:24   ` Marc Zyngier
  2 siblings, 0 replies; 150+ messages in thread
From: Marc Zyngier @ 2024-08-22 14:24 UTC (permalink / raw)
  To: Paolo Bonzini, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Michael Ellerman, Anup Patel, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	Sean Christopherson
  Cc: kvm, linux-arm-kernel, kvmarm, loongarch, linux-mips,
	linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel, David Matlack,
	David Stevens

On Fri, 26 Jul 2024 16:51:11 -0700, Sean Christopherson wrote:
> Disallow copying MTE tags to guest memory while KVM is dirty logging, as
> writing guest memory without marking the gfn as dirty in the memslot could
> result in userspace failing to migrate the updated page.  Ideally (maybe?),
> KVM would simply mark the gfn as dirty, but there is no vCPU to work with,
> and presumably the only use case for copy MTE tags _to_ the guest is when
> restoring state on the target.
> 
> [...]

Applied to next, thanks!

[02/84] KVM: arm64: Disallow copying MTE to guest memory while KVM is dirty logging
        commit: e0b7de4fd18c47ebd47ec0dd1af6503d4071b943

Cheers,

	M.
-- 
Without deviation from the norm, progress is not possible.



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages
  2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
                   ` (84 preceding siblings ...)
  2024-07-30 11:52 ` [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Paolo Bonzini
@ 2024-08-27  9:06 ` Alex Bennée
  85 siblings, 0 replies; 150+ messages in thread
From: Alex Bennée @ 2024-08-27  9:06 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
	Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, kvm, linux-arm-kernel, kvmarm, loongarch,
	linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-kernel,
	David Matlack, David Stevens

Sean Christopherson <seanjc@google.com> writes:

> arm64 folks, the first two patches are bug fixes, but I have very low
> confidence that they are correct and/or desirable.  If they are more or
> less correct, I can post them separately if that'd make life easier.  I
> included them here to avoid conflicts, and because I'm pretty sure how
> KVM deals with MTE tags vs. dirty logging will impact what APIs KVM needs
> to provide to arch code.
>
> On to the series...  The TL;DR is that I would like to get input on two
> things:
>
>  1. Marking folios dirty/accessed only on the intial stage-2 page fault
>  2. The new APIs for faulting, prefetching, and doing "lookups" on
>  pfns

I've finally managed to get virtio-vulkan working on my Arm64 devbox
with an AMD graphics card plugged into the PCI. I'm confident that the
graphics path is using the discrete card memory (as it has been mapped
as device memory with alignment handlers to deal with the broken Altra
PCI). However aside from running graphics workloads in KVM guests is
their anything else I can check to see things are behaving as expected?

The predecessor series did break launching some KVM guests on my x86
system but with this series launching guests works fine and I haven't
noticed any weirdness.

So for those caveats you can certainly have a:

Tested-by: Alex Bennée <alex.bennee@linaro.org>

However if there is anything else I can do to further stress test this
code do let me know.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

^ permalink raw reply	[flat|nested] 150+ messages in thread

end of thread, other threads:[~2024-08-27  9:06 UTC | newest]

Thread overview: 150+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-26 23:51 [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Sean Christopherson
2024-07-26 23:51 ` [PATCH v12 01/84] KVM: arm64: Release pfn, i.e. put page, if copying MTE tags hits ZONE_DEVICE Sean Christopherson
2024-07-31 16:23   ` Alex Bennée
2024-07-31 20:36     ` Sean Christopherson
2024-08-01 10:07   ` Marc Zyngier
2024-08-07 14:15   ` Catalin Marinas
2024-08-08  9:54     ` Steven Price
2024-08-22 14:24   ` (subset) " Marc Zyngier
2024-07-26 23:51 ` [PATCH v12 02/84] KVM: arm64: Disallow copying MTE to guest memory while KVM is dirty logging Sean Christopherson
2024-08-01  7:34   ` Aneesh Kumar K.V
2024-08-01 18:01     ` Sean Christopherson
2024-08-05  7:57       ` Aneesh Kumar K.V
2024-08-05 22:09         ` Sean Christopherson
2024-08-07 16:21   ` Catalin Marinas
2024-08-08  9:54     ` Steven Price
2024-08-22 14:24   ` (subset) " Marc Zyngier
2024-07-26 23:51 ` [PATCH v12 03/84] KVM: Drop KVM_ERR_PTR_BAD_PAGE and instead return NULL to indicate an error Sean Christopherson
2024-08-01  8:57   ` Alex Bennée
2024-07-26 23:51 ` [PATCH v12 04/84] KVM: Allow calling kvm_release_page_{clean,dirty}() on a NULL page pointer Sean Christopherson
2024-08-01  9:03   ` Alex Bennée
2024-07-26 23:51 ` [PATCH v12 05/84] KVM: Add kvm_release_page_unused() API to put pages that KVM never consumes Sean Christopherson
2024-08-01  9:20   ` Alex Bennée
2024-08-01 14:43     ` Sean Christopherson
2024-07-26 23:51 ` [PATCH v12 06/84] KVM: x86/mmu: Skip the "try unsync" path iff the old SPTE was a leaf SPTE Sean Christopherson
2024-07-26 23:51 ` [PATCH v12 07/84] KVM: x86/mmu: Mark folio dirty when creating SPTE, not when zapping/modifying Sean Christopherson
2024-07-26 23:51 ` [PATCH v12 08/84] KVM: x86/mmu: Mark page/folio accessed only when zapping leaf SPTEs Sean Christopherson
2024-07-26 23:51 ` [PATCH v12 09/84] KVM: x86/mmu: Don't force flush if SPTE update clears Accessed bit Sean Christopherson
2024-07-26 23:51 ` [PATCH v12 10/84] KVM: x86/mmu: Use gfn_to_page_many_atomic() when prefetching indirect PTEs Sean Christopherson
2024-07-26 23:51 ` [PATCH v12 11/84] KVM: Rename gfn_to_page_many_atomic() to kvm_prefetch_pages() Sean Christopherson
2024-08-02 11:16   ` Alex Bennée
2024-07-26 23:51 ` [PATCH v12 12/84] KVM: Drop @atomic param from gfn=>pfn and hva=>pfn APIs Sean Christopherson
2024-08-01  9:31   ` Alex Bennée
2024-07-26 23:51 ` [PATCH v12 13/84] KVM: Annotate that all paths in hva_to_pfn() might sleep Sean Christopherson
2024-08-08 12:00   ` Alex Bennée
2024-08-08 13:16     ` Sean Christopherson
2024-08-08 15:18       ` Alex Bennée
2024-08-08 15:31         ` Sean Christopherson
2024-08-08 16:16           ` Alex Bennée
2024-07-26 23:51 ` [PATCH v12 14/84] KVM: Replace "async" pointer in gfn=>pfn with "no_wait" and error code Sean Christopherson
2024-07-26 23:51 ` [PATCH v12 15/84] KVM: x86/mmu: Drop kvm_page_fault.hva, i.e. don't track intermediate hva Sean Christopherson
2024-07-26 23:51 ` [PATCH v12 16/84] KVM: Drop unused "hva" pointer from __gfn_to_pfn_memslot() Sean Christopherson
2024-07-26 23:51 ` [PATCH v12 17/84] KVM: Introduce kvm_follow_pfn() to eventually replace "gfn_to_pfn" APIs Sean Christopherson
2024-07-26 23:51 ` [PATCH v12 18/84] KVM: Remove pointless sanity check on @map param to kvm_vcpu_(un)map() Sean Christopherson
2024-07-26 23:51 ` [PATCH v12 19/84] KVM: Explicitly initialize all fields at the start of kvm_vcpu_map() Sean Christopherson
2024-07-26 23:51 ` [PATCH v12 20/84] KVM: Use NULL for struct page pointer to indicate mremapped memory Sean Christopherson
2024-07-26 23:51 ` [PATCH v12 21/84] KVM: nVMX: Rely on kvm_vcpu_unmap() to track validity of eVMCS mapping Sean Christopherson
2024-07-26 23:51 ` [PATCH v12 22/84] KVM: nVMX: Drop pointless msr_bitmap_map field from struct nested_vmx Sean Christopherson
2024-07-26 23:51 ` [PATCH v12 23/84] KVM: nVMX: Add helper to put (unmap) vmcs12 pages Sean Christopherson
2024-07-26 23:51 ` [PATCH v12 24/84] KVM: Use plain "struct page" pointer instead of single-entry array Sean Christopherson
2024-08-01  9:53   ` Alex Bennée
2024-07-26 23:51 ` [PATCH v12 25/84] KVM: Provide refcounted page as output field in struct kvm_follow_pfn Sean Christopherson
2024-07-26 23:51 ` [PATCH v12 26/84] KVM: Move kvm_{set,release}_page_{clean,dirty}() helpers up in kvm_main.c Sean Christopherson
2024-08-01  9:55   ` Alex Bennée
2024-07-26 23:51 ` [PATCH v12 27/84] KVM: pfncache: Precisely track refcounted pages Sean Christopherson
2024-07-26 23:51 ` [PATCH v12 28/84] KVM: Migrate kvm_vcpu_map() to kvm_follow_pfn() Sean Christopherson
2024-07-26 23:51 ` [PATCH v12 29/84] KVM: Pin (as in FOLL_PIN) pages during kvm_vcpu_map() Sean Christopherson
2024-07-26 23:51 ` [PATCH v12 30/84] KVM: nVMX: Mark vmcs12's APIC access page dirty when unmapping Sean Christopherson
2024-07-26 23:51 ` [PATCH v12 31/84] KVM: Pass in write/dirty to kvm_vcpu_map(), not kvm_vcpu_unmap() Sean Christopherson
2024-07-26 23:51 ` [PATCH v12 32/84] KVM: Get writable mapping for __kvm_vcpu_map() only when necessary Sean Christopherson
2024-07-26 23:51 ` [PATCH v12 33/84] KVM: Disallow direct access (w/o mmu_notifier) to unpinned pfn by default Sean Christopherson
2024-07-26 23:51 ` [PATCH v12 34/84] KVM: Add a helper to lookup a pfn without grabbing a reference Sean Christopherson
2024-07-30 10:41   ` Paolo Bonzini
2024-07-30 20:15     ` Sean Christopherson
2024-07-31 10:11       ` Paolo Bonzini
2024-07-26 23:51 ` [PATCH v12 35/84] KVM: x86: Use kvm_lookup_pfn() to check if retrying #PF is useful Sean Christopherson
2024-07-26 23:51 ` [PATCH v12 36/84] KVM: x86: Use kvm_lookup_pfn() to check if APIC access page was installed Sean Christopherson
2024-07-26 23:51 ` [PATCH v12 37/84] KVM: x86/mmu: Add "mmu" prefix fault-in helpers to free up generic names Sean Christopherson
2024-07-26 23:51 ` [PATCH v12 38/84] KVM: x86/mmu: Put direct prefetched pages via kvm_release_page_clean() Sean Christopherson
2024-07-26 23:51 ` [PATCH v12 39/84] KVM: x86/mmu: Add common helper to handle prefetching SPTEs Sean Christopherson
2024-07-26 23:51 ` [PATCH v12 40/84] KVM: x86/mmu: Add helper to "finish" handling a guest page fault Sean Christopherson
2024-07-26 23:51 ` [PATCH v12 41/84] KVM: x86/mmu: Mark pages/folios dirty at the origin of make_spte() Sean Christopherson
2024-07-30  8:57   ` Paolo Bonzini
2024-07-26 23:51 ` [PATCH v12 42/84] KVM: Move declarations of memslot accessors up in kvm_host.h Sean Christopherson
2024-07-26 23:51 ` [PATCH v12 43/84] KVM: Add kvm_faultin_pfn() to specifically service guest page faults Sean Christopherson
2024-07-26 23:51 ` [PATCH v12 44/84] KVM: x86/mmu: Convert page fault paths to kvm_faultin_pfn() Sean Christopherson
2024-07-26 23:51 ` [PATCH v12 45/84] KVM: guest_memfd: Provide "struct page" as output from kvm_gmem_get_pfn() Sean Christopherson
2024-07-30  9:05   ` Paolo Bonzini
2024-07-30 20:00     ` Sean Christopherson
2024-07-31 10:12       ` Paolo Bonzini
2024-07-26 23:51 ` [PATCH v12 46/84] KVM: x86/mmu: Put refcounted pages instead of blindly releasing pfns Sean Christopherson
2024-07-26 23:51 ` [PATCH v12 47/84] KVM: x86/mmu: Don't mark unused faultin pages as accessed Sean Christopherson
2024-07-26 23:51 ` [PATCH v12 48/84] KVM: Move x86's API to release a faultin page to common KVM Sean Christopherson
2024-07-30  8:58   ` Paolo Bonzini
2024-07-30 19:15     ` Sean Christopherson
2024-07-31 10:18       ` Paolo Bonzini
2024-07-26 23:51 ` [PATCH v12 49/84] KVM: VMX: Hold mmu_lock until page is released when updating APIC access page Sean Christopherson
2024-07-26 23:51 ` [PATCH v12 50/84] KVM: VMX: Use __kvm_faultin_page() to get APIC access page/pfn Sean Christopherson
2024-07-30  8:59   ` Paolo Bonzini
2024-07-26 23:52 ` [PATCH v12 51/84] KVM: PPC: e500: Mark "struct page" dirty in kvmppc_e500_shadow_map() Sean Christopherson
2024-07-26 23:52 ` [PATCH v12 52/84] KVM: PPC: e500: Mark "struct page" pfn accessed before dropping mmu_lock Sean Christopherson
2024-07-26 23:52 ` [PATCH v12 53/84] KVM: PPC: e500: Use __kvm_faultin_pfn() to handle page faults Sean Christopherson
2024-07-26 23:52 ` [PATCH v12 54/84] KVM: arm64: Mark "struct page" pfns accessed/dirty before dropping mmu_lock Sean Christopherson
2024-08-05 23:25   ` Oliver Upton
2024-08-05 23:26     ` Oliver Upton
2024-08-05 23:53       ` Sean Christopherson
2024-08-05 23:56         ` Oliver Upton
2024-08-06  8:55       ` Marc Zyngier
2024-08-06 15:19         ` Sean Christopherson
2024-08-06  8:24     ` Fuad Tabba
2024-07-26 23:52 ` [PATCH v12 55/84] KVM: arm64: Use __kvm_faultin_pfn() to handle memory aborts Sean Christopherson
2024-07-26 23:52 ` [PATCH v12 56/84] KVM: RISC-V: Mark "struct page" pfns dirty iff a stage-2 PTE is installed Sean Christopherson
2024-07-31  8:11   ` Andrew Jones
2024-08-06 15:03   ` Anup Patel
2024-07-26 23:52 ` [PATCH v12 57/84] KVM: RISC-V: Mark "struct page" pfns accessed before dropping mmu_lock Sean Christopherson
2024-07-31  8:12   ` Andrew Jones
2024-08-06 15:04   ` Anup Patel
2024-07-26 23:52 ` [PATCH v12 58/84] KVM: RISC-V: Use kvm_faultin_pfn() when mapping pfns into the guest Sean Christopherson
2024-07-31  8:11   ` Andrew Jones
2024-08-06 15:04   ` Anup Patel
2024-07-26 23:52 ` [PATCH v12 59/84] KVM: PPC: Use __kvm_faultin_pfn() to handle page faults on Book3s HV Sean Christopherson
2024-07-26 23:52 ` [PATCH v12 60/84] KVM: PPC: Use __kvm_faultin_pfn() to handle page faults on Book3s Radix Sean Christopherson
2024-07-26 23:52 ` [PATCH v12 61/84] KVM: PPC: Drop unused @kvm_ro param from kvmppc_book3s_instantiate_page() Sean Christopherson
2024-07-26 23:52 ` [PATCH v12 62/84] KVM: PPC: Book3S: Mark "struct page" pfns dirty/accessed after installing PTE Sean Christopherson
2024-07-26 23:52 ` [PATCH v12 63/84] KVM: PPC: Use kvm_faultin_pfn() to handle page faults on Book3s PR Sean Christopherson
2024-07-26 23:52 ` [PATCH v12 64/84] KVM: LoongArch: Mark "struct page" pfns dirty only in "slow" page fault path Sean Christopherson
2024-08-02  7:53   ` maobibo
2024-08-02 19:32     ` Sean Christopherson
2024-08-03  3:02       ` maobibo
2024-08-05 23:22         ` Sean Christopherson
2024-08-06  1:16           ` maobibo
2024-08-08 11:38   ` maobibo
2024-07-26 23:52 ` [PATCH v12 65/84] KVM: LoongArch: Mark "struct page" pfns accessed " Sean Christopherson
2024-08-02  7:34   ` maobibo
2024-07-26 23:52 ` [PATCH v12 66/84] KVM: LoongArch: Mark "struct page" pfn accessed before dropping mmu_lock Sean Christopherson
2024-08-08 11:47   ` maobibo
2024-07-26 23:52 ` [PATCH v12 67/84] KVM: LoongArch: Use kvm_faultin_pfn() to map pfns into the guest Sean Christopherson
2024-07-26 23:52 ` [PATCH v12 68/84] KVM: MIPS: Mark "struct page" pfns dirty only in "slow" page fault path Sean Christopherson
2024-07-26 23:52 ` [PATCH v12 69/84] KVM: MIPS: Mark "struct page" pfns accessed " Sean Christopherson
2024-07-26 23:52 ` [PATCH v12 70/84] KVM: MIPS: Mark "struct page" pfns accessed prior to dropping mmu_lock Sean Christopherson
2024-07-26 23:52 ` [PATCH v12 71/84] KVM: MIPS: Use kvm_faultin_pfn() to map pfns into the guest Sean Christopherson
2024-07-26 23:52 ` [PATCH v12 72/84] KVM: PPC: Remove extra get_page() to fix page refcount leak Sean Christopherson
2024-07-26 23:52 ` [PATCH v12 73/84] KVM: PPC: Use kvm_vcpu_map() to map guest memory to patch dcbz instructions Sean Christopherson
2024-07-26 23:52 ` [PATCH v12 74/84] KVM: Convert gfn_to_page() to use kvm_follow_pfn() Sean Christopherson
2024-07-26 23:52 ` [PATCH v12 75/84] KVM: Add support for read-only usage of gfn_to_page() Sean Christopherson
2024-07-26 23:52 ` [PATCH v12 76/84] KVM: arm64: Use __gfn_to_page() when copying MTE tags to/from userspace Sean Christopherson
2024-07-26 23:52 ` [PATCH v12 77/84] KVM: PPC: Explicitly require struct page memory for Ultravisor sharing Sean Christopherson
2024-07-26 23:52 ` [PATCH v12 78/84] KVM: Drop gfn_to_pfn() APIs now that all users are gone Sean Christopherson
2024-07-26 23:52 ` [PATCH v12 79/84] KVM: s390: Use kvm_release_page_dirty() to unpin "struct page" memory Sean Christopherson
2024-07-26 23:52 ` [PATCH v12 80/84] KVM: Make kvm_follow_pfn.refcounted_page a required field Sean Christopherson
2024-07-26 23:52 ` [PATCH v12 81/84] KVM: x86/mmu: Don't mark "struct page" accessed when zapping SPTEs Sean Christopherson
2024-07-26 23:52 ` [PATCH v12 82/84] KVM: arm64: Don't mark "struct page" accessed when making SPTE young Sean Christopherson
2024-07-26 23:52 ` [PATCH v12 83/84] KVM: Drop APIs that manipulate "struct page" via pfns Sean Christopherson
2024-08-02 11:03   ` Alex Bennée
2024-07-26 23:52 ` [PATCH v12 84/84] KVM: Don't grab reference on VM_MIXEDMAP pfns that have a "struct page" Sean Christopherson
2024-07-30 11:38   ` Paolo Bonzini
2024-07-30 20:21     ` Sean Christopherson
2024-07-31  9:50       ` Paolo Bonzini
2024-07-30 11:52 ` [PATCH v12 00/84] KVM: Stop grabbing references to PFNMAP'd pages Paolo Bonzini
2024-07-30 22:35   ` Sean Christopherson
2024-08-27  9:06 ` Alex Bennée

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).