Re: [PATCH] KVM: Avoid illegal stage2 mapping on invalid memory slot

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Gavin Shan <gshan@redhat.com>
Cc: kvmarm@lists.linux.dev, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org,  pbonzini@redhat.com,
	oliver.upton@linux.dev, maz@kernel.org,  hshuai@redhat.com,
	zhenyzha@redhat.com, shan.gavin@gmail.com
Subject: Re: [PATCH] KVM: Avoid illegal stage2 mapping on invalid memory slot
Date: Mon, 12 Jun 2023 07:41:16 -0700	[thread overview]
Message-ID: <ZIcujNKVosLeQEqR@google.com> (raw)
In-Reply-To: <20230608090348.414990-1-gshan@redhat.com>

On Thu, Jun 08, 2023, Gavin Shan wrote:
> We run into guest hang in edk2 firmware when KSM is kept as running
> on the host. The edk2 firmware is waiting for status 0x80 from QEMU's
> pflash device (TYPE_PFLASH_CFI01) during the operation for sector
> erasing or buffered write. The status is returned by reading the
> memory region of the pflash device and the read request should
> have been forwarded to QEMU and emulated by it. Unfortunately, the
> read request is covered by an illegal stage2 mapping when the guest
> hang issue occurs. The read request is completed with QEMU bypassed and
> wrong status is fetched.
> 
> The illegal stage2 mapping is populated due to same page mering by
> KSM at (C) even the associated memory slot has been marked as invalid
> at (B).
> 
>   CPU-A                    CPU-B
>   -----                    -----
>                            ioctl(kvm_fd, KVM_SET_USER_MEMORY_REGION)
>                            kvm_vm_ioctl_set_memory_region
>                            kvm_set_memory_region
>                            __kvm_set_memory_region
>                            kvm_set_memslot(kvm, old, NULL, KVM_MR_DELETE)
>                              kvm_invalidate_memslot
>                                kvm_copy_memslot
>                                kvm_replace_memslot
>                                kvm_swap_active_memslots        (A)
>                                kvm_arch_flush_shadow_memslot   (B)
>   same page merging by KSM
>   kvm_mmu_notifier_change_pte
>   kvm_handle_hva_range
>   __kvm_handle_hva_range       (C)
> 
> Fix the issue by skipping the invalid memory slot at (C) to avoid the
> illegal stage2 mapping. Without the illegal stage2 mapping, the read
> request for the pflash's status is forwarded to QEMU and emulated by
> it. The correct pflash's status can be returned from QEMU to break
> the infinite wait in edk2 firmware.
> 
> Cc: stable@vger.kernel.org # v5.13+
> Fixes: 3039bcc74498 ("KVM: Move x86's MMU notifier memslot walkers to generic code")

This Fixes isn't correct.  That change only affected x86, which doesn't have this
bug.  And looking at commit cd4c71835228 ("KVM: arm64: Convert to the gfn-based MMU
notifier callbacks"), arm64 did NOT skip invalid slots

        slots = kvm_memslots(kvm);

        /* we only care about the pages that the guest sees */
        kvm_for_each_memslot(memslot, slots) {
                unsigned long hva_start, hva_end;
                gfn_t gpa;

                hva_start = max(start, memslot->userspace_addr);
                hva_end = min(end, memslot->userspace_addr +
                                        (memslot->npages << PAGE_SHIFT));
                if (hva_start >= hva_end)
                        continue;

                gpa = hva_to_gfn_memslot(hva_start, memslot) << PAGE_SHIFT;
                ret |= handler(kvm, gpa, (u64)(hva_end - hva_start), data);
        }

#define kvm_for_each_memslot(memslot, slots)                            \
        for (memslot = &slots->memslots[0];                             \
             memslot < slots->memslots + slots->used_slots; memslot++)  \
                if (WARN_ON_ONCE(!memslot->npages)) {                   \
                } else

Unless I'm missing something, this goes all the way back to initial arm64 support
added by commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup").

> Reported-by: Shuai Hu <hshuai@redhat.com>
> Reported-by: Zhenyu Zhang <zhenyzha@redhat.com>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
>  virt/kvm/kvm_main.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 479802a892d4..7f81a3a209b6 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -598,6 +598,9 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm,
>  			unsigned long hva_start, hva_end;
>  
>  			slot = container_of(node, struct kvm_memory_slot, hva_node[slots->node_idx]);
> +			if (slot->flags & KVM_MEMSLOT_INVALID)
> +				continue;

Skipping the memslot will lead to use-after-free.  E.g. if an invalidate_range_start()
comes along between installing the invalid slot and zapping SPTEs, KVM will
return from kvm_mmu_notifier_invalidate_range_start() without having dropped all
references to the range.

I.e.

	kvm_copy_memslot(invalid_slot, old);
	invalid_slot->flags |= KVM_MEMSLOT_INVALID;
	kvm_replace_memslot(kvm, old, invalid_slot);

	/*
	 * Activate the slot that is now marked INVALID, but don't propagate
	 * the slot to the now inactive slots. The slot is either going to be
	 * deleted or recreated as a new slot.
	 */
	kvm_swap_active_memslots(kvm, old->as_id);


==> invalidate_range_start()

	/*
	 * From this point no new shadow pages pointing to a deleted, or moved,
	 * memslot will be created.  Validation of sp->gfn happens in:
	 *	- gfn_to_hva (kvm_read_guest, gfn_to_pfn)
	 *	- kvm_is_visible_gfn (mmu_check_root)
	 */
	kvm_arch_flush_shadow_memslot(kvm, old);

And even for change_pte(), skipping the memslot is wrong, as KVM would then fail
unmap the prior SPTE.  Of course, that can't happen in the current code base
because change_pte() is wrapped with invalidate_range_{start,end}(), but that's
more of a bug than a design choice (see c13fda237f08 "KVM: Assert that notifier
count is elevated in .change_pte()" for details).  That's also why this doesn't
show up on x86; x86 installs a SPTE for the change_pte() notifier iff an existing
SPTE is present, which is never true due to the invalidation.

I'd honestly love to just delete the change_pte() callback, but my opinion is more
than a bit biased since we don't use KSM.  Assuming we keep change_pte(), the best
option is probably to provide a wrapper around kvm_set_spte_gfn() to skip the
memslot, but with a sanity check and comment explaining the dependency on there
being no SPTEs due to the invalidation.  E.g.

---
 virt/kvm/kvm_main.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 479802a892d4..b4987b49fac3 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -686,6 +686,24 @@ static __always_inline int kvm_handle_hva_range_no_flush(struct mmu_notifier *mn
 
 	return __kvm_handle_hva_range(kvm, &range);
 }
+
+static bool kvm_change_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
+{
+	/*
+	 * Skipping invalid memslots is correct if and only change_pte() is
+	 * surrounded by invalidate_range_{start,end}(), which is currently
+	 * guaranteed by the primary MMU.  If that ever changes, KVM needs to
+	 * unmap the memslot instead of skipping the memslot to ensure that KVM
+	 * doesn't hold references to the old PFN.
+	 */
+	WARN_ON_ONCE(!READ_ONCE(kvm->mn_active_invalidate_count));
+
+        if (range->slot->flags & KVM_MEMSLOT_INVALID)
+		return false;
+
+	return kvm_set_spte_gfn(kvm, range);
+}
+
 static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
 					struct mm_struct *mm,
 					unsigned long address,
@@ -707,7 +725,7 @@ static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
 	if (!READ_ONCE(kvm->mmu_invalidate_in_progress))
 		return;
 
-	kvm_handle_hva_range(mn, address, address + 1, pte, kvm_set_spte_gfn);
+	kvm_handle_hva_range(mn, address, address + 1, pte, kvm_change_spte_gfn);
 }
 
 void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start,

base-commit: 94cdeebd82111d7b7da5bd4da053eed9e0f65d72
--

next prev parent reply	other threads:[~2023-06-12 14:41 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-08  9:03 [PATCH] KVM: Avoid illegal stage2 mapping on invalid memory slot Gavin Shan
2023-06-08 14:31 ` Marc Zyngier
2023-06-08 23:17   ` Gavin Shan
2023-06-09  8:06     ` Marc Zyngier
2023-06-09  9:02       ` Gavin Shan
2023-06-12 14:41 ` Sean Christopherson [this message]
2023-06-13  1:06   ` Gavin Shan
2023-06-13 14:58     ` Sean Christopherson
2023-06-14  0:01       ` Gavin Shan

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:479802a892d dfblob:b4987b49fac )
 OR (
bs:"Re: [PATCH] KVM: Avoid illegal stage2 mapping on invalid memory slot" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZIcujNKVosLeQEqR@google.com \
    --to=seanjc@google.com \
    --cc=gshan@redhat.com \
    --cc=hshuai@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=oliver.upton@linux.dev \
    --cc=pbonzini@redhat.com \
    --cc=shan.gavin@gmail.com \
    --cc=zhenyzha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.