All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wanpeng Li <wanpeng.li@linux.intel.com>
To: David Matlack <dmatlack@google.com>
Cc: Gleb Natapov <gleb@kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>,
	kvm@vger.kernel.org, x86@kernel.org,
	Eric Northup <digitaleric@google.com>,
	David Matlack <dmatlack@google.com>
Subject: Re: [PATCH v2] kvm: x86: fix stale mmio cache bug
Date: Tue, 5 Aug 2014 08:31:13 +0800	[thread overview]
Message-ID: <20140805003113.GA14438@kernel> (raw)
In-Reply-To: <1407186620-1999-1-git-send-email-dmatlack@google.com>

Hi David,
On Mon, Aug 04, 2014 at 02:10:20PM -0700, David Matlack wrote:
>The following events can lead to an incorrect KVM_EXIT_MMIO bubbling
>up to userspace:
>
>(1) Guest accesses gpa X without a memory slot. The gfn is cached in
>struct kvm_vcpu_arch (mmio_gfn). On Intel EPT-enabled hosts, KVM sets
>the SPTE write-execute-noread so that future accesses cause
>EPT_MISCONFIGs.
>
>(2) Host userspace creates a memory slot via KVM_SET_USER_MEMORY_REGION
>covering the page just accessed.
>

One question:

Who trigger host userspace creates a mmio memslot? It will be created
just after first mmio #PF? 

Regards,
Wanpeng Li 

>(3) Guest attempts to read or write to gpa X again. On Intel, this
>generates an EPT_MISCONFIG. The memory slot generation number that
>was incremented in (2) would normally take care of this but we fast
>path mmio faults through quickly_check_mmio_pf(), which only checks
>the per-vcpu mmio cache. Since we hit the cache, KVM passes a
>KVM_EXIT_MMIO up to userspace.
>
>This patch fixes the issue by doing the following:
>  - Tag the mmio cache with the memslot generation and use it to
>    validate mmio cache lookups.
>  - Extend vcpu_clear_mmio_info to clear mmio_gfn in addition to
>    mmio_gva, since both can be used to fast path mmio faults.
>  - In mmu_sync_roots, unconditionally clear the mmio cache since
>    even direct_map (e.g. tdp) hosts use it.
>
>Signed-off-by: David Matlack <dmatlack@google.com>
>---
>Changes in v2:
>  - Use memslot generation to invalidate the mmio cache rather than
>    actively invalidating the cache.
>  - Update patch description with new cache invalidation technique.
>  - Pull mmio cache/clear code up out of x86.h and mmu.c and into
>    mmu.h.
>
> arch/x86/include/asm/kvm_host.h |  1 +
> arch/x86/kvm/mmu.c              | 16 ++--------
> arch/x86/kvm/mmu.h              | 70 +++++++++++++++++++++++++++++++++++++++++
> arch/x86/kvm/x86.h              | 36 ---------------------
> 4 files changed, 73 insertions(+), 50 deletions(-)
>
>diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>index 49205d0..f518d14 100644
>--- a/arch/x86/include/asm/kvm_host.h
>+++ b/arch/x86/include/asm/kvm_host.h
>@@ -479,6 +479,7 @@ struct kvm_vcpu_arch {
> 	u64 mmio_gva;
> 	unsigned access;
> 	gfn_t mmio_gfn;
>+	unsigned int mmio_gen;
> 
> 	struct kvm_pmu pmu;
> 
>diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>index 9314678..43f1c18 100644
>--- a/arch/x86/kvm/mmu.c
>+++ b/arch/x86/kvm/mmu.c
>@@ -206,11 +206,8 @@ EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask);
> #define MMIO_SPTE_GEN_LOW_SHIFT		3
> #define MMIO_SPTE_GEN_HIGH_SHIFT	52
> 
>-#define MMIO_GEN_SHIFT			19
> #define MMIO_GEN_LOW_SHIFT		9
> #define MMIO_GEN_LOW_MASK		((1 << MMIO_GEN_LOW_SHIFT) - 1)
>-#define MMIO_GEN_MASK			((1 << MMIO_GEN_SHIFT) - 1)
>-#define MMIO_MAX_GEN			((1 << MMIO_GEN_SHIFT) - 1)
> 
> static u64 generation_mmio_spte_mask(unsigned int gen)
> {
>@@ -234,16 +231,6 @@ static unsigned int get_mmio_spte_generation(u64 spte)
> 	return gen;
> }
> 
>-static unsigned int kvm_current_mmio_generation(struct kvm *kvm)
>-{
>-	/*
>-	 * Init kvm generation close to MMIO_MAX_GEN to easily test the
>-	 * code of handling generation number wrap-around.
>-	 */
>-	return (kvm_memslots(kvm)->generation +
>-		      MMIO_MAX_GEN - 150) & MMIO_GEN_MASK;
>-}
>-
> static void mark_mmio_spte(struct kvm *kvm, u64 *sptep, u64 gfn,
> 			   unsigned access)
> {
>@@ -3157,13 +3144,14 @@ static void mmu_sync_roots(struct kvm_vcpu *vcpu)
> 	int i;
> 	struct kvm_mmu_page *sp;
> 
>+	vcpu_clear_mmio_info(vcpu, MMIO_GVA_ANY);
>+
> 	if (vcpu->arch.mmu.direct_map)
> 		return;
> 
> 	if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
> 		return;
> 
>-	vcpu_clear_mmio_info(vcpu, ~0ul);
> 	kvm_mmu_audit(vcpu, AUDIT_PRE_SYNC);
> 	if (vcpu->arch.mmu.root_level == PT64_ROOT_LEVEL) {
> 		hpa_t root = vcpu->arch.mmu.root_hpa;
>diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
>index b982112..058651a 100644
>--- a/arch/x86/kvm/mmu.h
>+++ b/arch/x86/kvm/mmu.h
>@@ -82,6 +82,76 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context,
> void update_permission_bitmask(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> 		bool ept);
> 
>+#define MMIO_GEN_SHIFT                 19
>+#define MMIO_GEN_MASK                  ((1 << MMIO_GEN_SHIFT) - 1)
>+#define MMIO_MAX_GEN                   ((1 << MMIO_GEN_SHIFT) - 1)
>+static inline unsigned int kvm_current_mmio_generation(struct kvm *kvm)
>+{
>+	/*
>+	 * Init kvm generation close to MMIO_MAX_GEN to easily test the
>+	 * code of handling generation number wrap-around.
>+	 */
>+	return (kvm_memslots(kvm)->generation + MMIO_MAX_GEN - 150) &
>+	       MMIO_GEN_MASK;
>+}
>+
>+static inline void vcpu_cache_mmio_info(struct kvm_vcpu *vcpu,
>+					gva_t gva, gfn_t gfn, unsigned access)
>+{
>+	vcpu->arch.mmio_gen = kvm_current_mmio_generation(vcpu->kvm);
>+
>+	/*
>+	 * Ensure that the mmio_gen is set before the rest of the cache entry.
>+	 * Otherwise we might see a new generation number attached to an old
>+	 * cache entry if creating/deleting a memslot races with mmio caching.
>+	 * The inverse case is possible (old generation number with new cache
>+	 * info), but that is safe. The next access will just miss the cache
>+	 * when it should have hit.
>+	 */
>+	smp_wmb();
>+
>+	vcpu->arch.mmio_gva = gva & PAGE_MASK;
>+	vcpu->arch.access = access;
>+	vcpu->arch.mmio_gfn = gfn;
>+}
>+
>+/*
>+ * Clear the mmio cache info for the given gva. If gva is MMIO_GVA_ANY,
>+ * unconditionally clear the mmio cache.
>+ */
>+#define MMIO_GVA_ANY ~((gva_t) 0)
>+static inline void vcpu_clear_mmio_info(struct kvm_vcpu *vcpu, gva_t gva)
>+{
>+	if (gva != MMIO_GVA_ANY && vcpu->arch.mmio_gva != (gva & PAGE_MASK))
>+		return;
>+
>+	vcpu->arch.mmio_gva = 0;
>+	vcpu->arch.mmio_gfn = 0;
>+}
>+
>+static inline bool vcpu_match_mmio_gen(struct kvm_vcpu *vcpu)
>+{
>+	return vcpu->arch.mmio_gen == kvm_current_mmio_generation(vcpu->kvm);
>+}
>+
>+static inline bool vcpu_match_mmio_gva(struct kvm_vcpu *vcpu, unsigned long gva)
>+{
>+	u64 mmio_gva = vcpu->arch.mmio_gva;
>+
>+	return vcpu_match_mmio_gen(vcpu) &&
>+	       mmio_gva &&
>+	       mmio_gva == (gva & PAGE_MASK);
>+}
>+
>+static inline bool vcpu_match_mmio_gpa(struct kvm_vcpu *vcpu, gpa_t gpa)
>+{
>+	gfn_t mmio_gfn = vcpu->arch.mmio_gfn;
>+
>+	return vcpu_match_mmio_gen(vcpu) &&
>+	       mmio_gfn &&
>+	       mmio_gfn == (gpa >> PAGE_SHIFT);
>+}
>+
> static inline unsigned int kvm_mmu_available_pages(struct kvm *kvm)
> {
> 	if (kvm->arch.n_max_mmu_pages > kvm->arch.n_used_mmu_pages)
>diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
>index 8c97bac..c686c91 100644
>--- a/arch/x86/kvm/x86.h
>+++ b/arch/x86/kvm/x86.h
>@@ -72,42 +72,6 @@ static inline u32 bit(int bitno)
> 	return 1 << (bitno & 31);
> }
> 
>-static inline void vcpu_cache_mmio_info(struct kvm_vcpu *vcpu,
>-					gva_t gva, gfn_t gfn, unsigned access)
>-{
>-	vcpu->arch.mmio_gva = gva & PAGE_MASK;
>-	vcpu->arch.access = access;
>-	vcpu->arch.mmio_gfn = gfn;
>-}
>-
>-/*
>- * Clear the mmio cache info for the given gva,
>- * specially, if gva is ~0ul, we clear all mmio cache info.
>- */
>-static inline void vcpu_clear_mmio_info(struct kvm_vcpu *vcpu, gva_t gva)
>-{
>-	if (gva != (~0ul) && vcpu->arch.mmio_gva != (gva & PAGE_MASK))
>-		return;
>-
>-	vcpu->arch.mmio_gva = 0;
>-}
>-
>-static inline bool vcpu_match_mmio_gva(struct kvm_vcpu *vcpu, unsigned long gva)
>-{
>-	if (vcpu->arch.mmio_gva && vcpu->arch.mmio_gva == (gva & PAGE_MASK))
>-		return true;
>-
>-	return false;
>-}
>-
>-static inline bool vcpu_match_mmio_gpa(struct kvm_vcpu *vcpu, gpa_t gpa)
>-{
>-	if (vcpu->arch.mmio_gfn && vcpu->arch.mmio_gfn == gpa >> PAGE_SHIFT)
>-		return true;
>-
>-	return false;
>-}
>-
> void kvm_before_handle_nmi(struct kvm_vcpu *vcpu);
> void kvm_after_handle_nmi(struct kvm_vcpu *vcpu);
> int kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip);
>-- 
>2.0.0.526.g5318336
>
>--
>To unsubscribe from this list: send the line "unsubscribe kvm" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2014-08-05  0:29 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-04 21:10 [PATCH v2] kvm: x86: fix stale mmio cache bug David Matlack
2014-08-05  0:31 ` Wanpeng Li [this message]
2014-08-05 18:56   ` David Matlack
2014-08-05  3:36 ` Xiao Guangrong
2014-08-05 22:39   ` David Matlack
2014-08-06  3:26     ` Xiao Guangrong
2014-08-07  4:20       ` David Matlack

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140805003113.GA14438@kernel \
    --to=wanpeng.li@linux.intel.com \
    --cc=digitaleric@google.com \
    --cc=dmatlack@google.com \
    --cc=gleb@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=x86@kernel.org \
    --cc=xiaoguangrong@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.