linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
@ 2014-08-12  5:02 Xiao Guangrong
  2014-08-12 21:18 ` David Matlack
  0 siblings, 1 reply; 29+ messages in thread
From: Xiao Guangrong @ 2014-08-12  5:02 UTC (permalink / raw)
  To: gleb
  Cc: avi.kivity, mtosatti, pbonzini, linux-kernel, kvm, Xiao Guangrong,
	stable, David Matlack

We may cache the current mmio generation number and stale memslot info
into spte, like this scenario:

       CPU 0                                  CPU 1
page fault:                                add a new memslot
read memslot and detecting its a mmio access
                                           update memslots
                                           update generation number
read generation number
cache the gpa and current gen number into spte

So, if guest accesses the gpa later, it will generate a incorrect
mmio exit

This patch fixes it by updating the generation number after
synchronize_srcu_expedited() that makes sure the generation
number updated only if memslots update is finished

Cc: stable@vger.kernel.org
Cc: David Matlack <dmatlack@google.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 virt/kvm/kvm_main.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 33712fb..ca3cdac 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -96,7 +96,7 @@ static void hardware_disable_all(void);
 
 static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
 static void update_memslots(struct kvm_memslots *slots,
-			    struct kvm_memory_slot *new, u64 last_generation);
+			    struct kvm_memory_slot *new);
 
 static void kvm_release_pfn_dirty(pfn_t pfn);
 static void mark_page_dirty_in_slot(struct kvm *kvm,
@@ -687,8 +687,7 @@ static void sort_memslots(struct kvm_memslots *slots)
 }
 
 static void update_memslots(struct kvm_memslots *slots,
-			    struct kvm_memory_slot *new,
-			    u64 last_generation)
+			    struct kvm_memory_slot *new)
 {
 	if (new) {
 		int id = new->id;
@@ -699,8 +698,6 @@ static void update_memslots(struct kvm_memslots *slots,
 		if (new->npages != npages)
 			sort_memslots(slots);
 	}
-
-	slots->generation = last_generation + 1;
 }
 
 static int check_memory_region_flags(struct kvm_userspace_memory_region *mem)
@@ -722,9 +719,10 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
 {
 	struct kvm_memslots *old_memslots = kvm->memslots;
 
-	update_memslots(slots, new, kvm->memslots->generation);
+	update_memslots(slots, new);
 	rcu_assign_pointer(kvm->memslots, slots);
 	synchronize_srcu_expedited(&kvm->srcu);
+	slots->generation++;
 
 	kvm_arch_memslots_updated(kvm);
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
  2014-08-12  5:02 Xiao Guangrong
@ 2014-08-12 21:18 ` David Matlack
  2014-08-14  5:41   ` Xiao Guangrong
  0 siblings, 1 reply; 29+ messages in thread
From: David Matlack @ 2014-08-12 21:18 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: gleb, avi.kivity, mtosatti, Paolo Bonzini, linux-kernel, kvm,
	stable

On Mon, Aug 11, 2014 at 10:02 PM, Xiao Guangrong
<xiaoguangrong@linux.vnet.ibm.com> wrote:
> @@ -722,9 +719,10 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
>  {
>         struct kvm_memslots *old_memslots = kvm->memslots;
>

I think you want

  slots->generation = old_memslots->generation;

here.

On the KVM_MR_DELETE path, install_new_memslots is called twice so this
patch introduces a short window of time where the generation number
actually decreases.

> -       update_memslots(slots, new, kvm->memslots->generation);
> +       update_memslots(slots, new);
>         rcu_assign_pointer(kvm->memslots, slots);
>         synchronize_srcu_expedited(&kvm->srcu);
> +       slots->generation++;
>
>         kvm_arch_memslots_updated(kvm);
>
> --
> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
  2014-08-12 21:18 ` David Matlack
@ 2014-08-14  5:41   ` Xiao Guangrong
  0 siblings, 0 replies; 29+ messages in thread
From: Xiao Guangrong @ 2014-08-14  5:41 UTC (permalink / raw)
  To: David Matlack
  Cc: gleb, avi.kivity, mtosatti, Paolo Bonzini, linux-kernel, kvm,
	stable

On 08/13/2014 05:18 AM, David Matlack wrote:
> On Mon, Aug 11, 2014 at 10:02 PM, Xiao Guangrong
> <xiaoguangrong@linux.vnet.ibm.com> wrote:
>> @@ -722,9 +719,10 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
>>  {
>>         struct kvm_memslots *old_memslots = kvm->memslots;
>>
> 
> I think you want
> 
>   slots->generation = old_memslots->generation;
> 
> here.
> 
> On the KVM_MR_DELETE path, install_new_memslots is called twice so this
> patch introduces a short window of time where the generation number
> actually decreases.

Yes, indeed. Thank you for pointing it out, will update.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
@ 2014-08-14  7:01 Xiao Guangrong
  2014-08-14  7:01 ` [PATCH 2/2] kvm: x86: fix stale mmio cache bug Xiao Guangrong
                   ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Xiao Guangrong @ 2014-08-14  7:01 UTC (permalink / raw)
  To: gleb
  Cc: avi.kivity, mtosatti, pbonzini, linux-kernel, kvm, Xiao Guangrong,
	stable, David Matlack

We may cache the current mmio generation number and stale memslot info
into spte, like this scenario:

       CPU 0                                  CPU 1
page fault:                                add a new memslot
read memslot and detecting its a mmio access
                                           update memslots
                                           update generation number
read generation number
cache the gpa and current gen number into spte

So, if guest accesses the gpa later, it will generate a incorrect
mmio exit

This patch fixes it by updating the generation number after
synchronize_srcu_expedited() that makes sure the generation
number updated only if memslots update is finished

Cc: stable@vger.kernel.org
Cc: David Matlack <dmatlack@google.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 virt/kvm/kvm_main.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 33712fb..bb40df3 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -96,7 +96,7 @@ static void hardware_disable_all(void);
 
 static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
 static void update_memslots(struct kvm_memslots *slots,
-			    struct kvm_memory_slot *new, u64 last_generation);
+			    struct kvm_memory_slot *new);
 
 static void kvm_release_pfn_dirty(pfn_t pfn);
 static void mark_page_dirty_in_slot(struct kvm *kvm,
@@ -687,8 +687,7 @@ static void sort_memslots(struct kvm_memslots *slots)
 }
 
 static void update_memslots(struct kvm_memslots *slots,
-			    struct kvm_memory_slot *new,
-			    u64 last_generation)
+			    struct kvm_memory_slot *new)
 {
 	if (new) {
 		int id = new->id;
@@ -699,8 +698,6 @@ static void update_memslots(struct kvm_memslots *slots,
 		if (new->npages != npages)
 			sort_memslots(slots);
 	}
-
-	slots->generation = last_generation + 1;
 }
 
 static int check_memory_region_flags(struct kvm_userspace_memory_region *mem)
@@ -722,9 +719,12 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
 {
 	struct kvm_memslots *old_memslots = kvm->memslots;
 
-	update_memslots(slots, new, kvm->memslots->generation);
+	/* ensure generation number is always increased. */
+	slots->generation = old_memslots->generation;
+	update_memslots(slots, new);
 	rcu_assign_pointer(kvm->memslots, slots);
 	synchronize_srcu_expedited(&kvm->srcu);
+	slots->generation++;
 
 	kvm_arch_memslots_updated(kvm);
 
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/2] kvm: x86: fix stale mmio cache bug
  2014-08-14  7:01 [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number Xiao Guangrong
@ 2014-08-14  7:01 ` Xiao Guangrong
  2014-08-14 16:25   ` David Matlack
  2014-08-18 21:24   ` Paolo Bonzini
  2014-08-14  7:06 ` [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number Xiao Guangrong
  2014-08-18 13:57 ` Paolo Bonzini
  2 siblings, 2 replies; 29+ messages in thread
From: Xiao Guangrong @ 2014-08-14  7:01 UTC (permalink / raw)
  To: gleb
  Cc: avi.kivity, mtosatti, pbonzini, linux-kernel, kvm, David Matlack,
	stable, Xiao Guangrong

From: David Matlack <dmatlack@google.com>

The following events can lead to an incorrect KVM_EXIT_MMIO bubbling
up to userspace:

(1) Guest accesses gpa X without a memory slot. The gfn is cached in
struct kvm_vcpu_arch (mmio_gfn). On Intel EPT-enabled hosts, KVM sets
the SPTE write-execute-noread so that future accesses cause
EPT_MISCONFIGs.

(2) Host userspace creates a memory slot via KVM_SET_USER_MEMORY_REGION
covering the page just accessed.

(3) Guest attempts to read or write to gpa X again. On Intel, this
generates an EPT_MISCONFIG. The memory slot generation number that
was incremented in (2) would normally take care of this but we fast
path mmio faults through quickly_check_mmio_pf(), which only checks
the per-vcpu mmio cache. Since we hit the cache, KVM passes a
KVM_EXIT_MMIO up to userspace.

This patch fixes the issue by using the memslot generation number
to validate the mmio cache.

[ xiaoguangrong: adjust the code to make it simpler for stable-tree fix. ]

Cc: stable@vger.kernel.org
Signed-off-by: David Matlack <dmatlack@google.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/mmu.c              |  4 ++--
 arch/x86/kvm/mmu.h              |  2 ++
 arch/x86/kvm/x86.h              | 19 +++++++++++++++----
 4 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5724601..58fa3ab 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -481,6 +481,7 @@ struct kvm_vcpu_arch {
 	u64 mmio_gva;
 	unsigned access;
 	gfn_t mmio_gfn;
+	unsigned int mmio_gen;
 
 	struct kvm_pmu pmu;
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9314678..e00fbfe 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -234,7 +234,7 @@ static unsigned int get_mmio_spte_generation(u64 spte)
 	return gen;
 }
 
-static unsigned int kvm_current_mmio_generation(struct kvm *kvm)
+unsigned int kvm_current_mmio_generation(struct kvm *kvm)
 {
 	/*
 	 * Init kvm generation close to MMIO_MAX_GEN to easily test the
@@ -3163,7 +3163,7 @@ static void mmu_sync_roots(struct kvm_vcpu *vcpu)
 	if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
 		return;
 
-	vcpu_clear_mmio_info(vcpu, ~0ul);
+	vcpu_clear_mmio_info(vcpu, MMIO_GVA_ANY);
 	kvm_mmu_audit(vcpu, AUDIT_PRE_SYNC);
 	if (vcpu->arch.mmu.root_level == PT64_ROOT_LEVEL) {
 		hpa_t root = vcpu->arch.mmu.root_hpa;
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index b982112..e2d902a 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -76,6 +76,8 @@ enum {
 };
 
 int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool direct);
+unsigned int kvm_current_mmio_generation(struct kvm *kvm);
+
 void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
 void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context,
 		bool execonly);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 306a1b7..ffd03b7 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -3,6 +3,7 @@
 
 #include <linux/kvm_host.h>
 #include "kvm_cache_regs.h"
+#include "mmu.h"
 
 static inline void kvm_clear_exception_queue(struct kvm_vcpu *vcpu)
 {
@@ -88,15 +89,23 @@ static inline void vcpu_cache_mmio_info(struct kvm_vcpu *vcpu,
 	vcpu->arch.mmio_gva = gva & PAGE_MASK;
 	vcpu->arch.access = access;
 	vcpu->arch.mmio_gfn = gfn;
+	vcpu->arch.mmio_gen = kvm_current_mmio_generation(vcpu->kvm);
+}
+
+static inline bool vcpu_match_mmio_gen(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.mmio_gen == kvm_current_mmio_generation(vcpu->kvm);
 }
 
 /*
  * Clear the mmio cache info for the given gva,
- * specially, if gva is ~0ul, we clear all mmio cache info.
+ * specially, if gva is ~MMIO_GVA_ANY, we clear all mmio cache info.
  */
+#define MMIO_GVA_ANY	~((gva_t)0)
+
 static inline void vcpu_clear_mmio_info(struct kvm_vcpu *vcpu, gva_t gva)
 {
-	if (gva != (~0ul) && vcpu->arch.mmio_gva != (gva & PAGE_MASK))
+	if (gva != MMIO_GVA_ANY && vcpu->arch.mmio_gva != (gva & PAGE_MASK))
 		return;
 
 	vcpu->arch.mmio_gva = 0;
@@ -104,7 +113,8 @@ static inline void vcpu_clear_mmio_info(struct kvm_vcpu *vcpu, gva_t gva)
 
 static inline bool vcpu_match_mmio_gva(struct kvm_vcpu *vcpu, unsigned long gva)
 {
-	if (vcpu->arch.mmio_gva && vcpu->arch.mmio_gva == (gva & PAGE_MASK))
+	if (vcpu_match_mmio_gen(vcpu) && vcpu->arch.mmio_gva &&
+	      vcpu->arch.mmio_gva == (gva & PAGE_MASK))
 		return true;
 
 	return false;
@@ -112,7 +122,8 @@ static inline bool vcpu_match_mmio_gva(struct kvm_vcpu *vcpu, unsigned long gva)
 
 static inline bool vcpu_match_mmio_gpa(struct kvm_vcpu *vcpu, gpa_t gpa)
 {
-	if (vcpu->arch.mmio_gfn && vcpu->arch.mmio_gfn == gpa >> PAGE_SHIFT)
+	if (vcpu_match_mmio_gen(vcpu) && vcpu->arch.mmio_gfn &&
+	      vcpu->arch.mmio_gfn == gpa >> PAGE_SHIFT)
 		return true;
 
 	return false;
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
  2014-08-14  7:01 [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number Xiao Guangrong
  2014-08-14  7:01 ` [PATCH 2/2] kvm: x86: fix stale mmio cache bug Xiao Guangrong
@ 2014-08-14  7:06 ` Xiao Guangrong
  2014-08-18 13:57 ` Paolo Bonzini
  2 siblings, 0 replies; 29+ messages in thread
From: Xiao Guangrong @ 2014-08-14  7:06 UTC (permalink / raw)
  To: gleb
  Cc: avi.kivity, mtosatti, pbonzini, linux-kernel, kvm, stable,
	David Matlack


Sorry, the title is not clear enough.

This is the v2 which fixes the issue pointed out by David:
" the generation number actually decreases."

Please review.

On 08/14/2014 03:01 PM, Xiao Guangrong wrote:
> We may cache the current mmio generation number and stale memslot info
> into spte, like this scenario:
> 
>        CPU 0                                  CPU 1
> page fault:                                add a new memslot
> read memslot and detecting its a mmio access
>                                            update memslots
>                                            update generation number
> read generation number
> cache the gpa and current gen number into spte
> 
> So, if guest accesses the gpa later, it will generate a incorrect
> mmio exit
> 
> This patch fixes it by updating the generation number after
> synchronize_srcu_expedited() that makes sure the generation
> number updated only if memslots update is finished
> 
> Cc: stable@vger.kernel.org
> Cc: David Matlack <dmatlack@google.com>
> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> ---
>  virt/kvm/kvm_main.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 33712fb..bb40df3 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -96,7 +96,7 @@ static void hardware_disable_all(void);
> 
>  static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
>  static void update_memslots(struct kvm_memslots *slots,
> -			    struct kvm_memory_slot *new, u64 last_generation);
> +			    struct kvm_memory_slot *new);
> 
>  static void kvm_release_pfn_dirty(pfn_t pfn);
>  static void mark_page_dirty_in_slot(struct kvm *kvm,
> @@ -687,8 +687,7 @@ static void sort_memslots(struct kvm_memslots *slots)
>  }
> 
>  static void update_memslots(struct kvm_memslots *slots,
> -			    struct kvm_memory_slot *new,
> -			    u64 last_generation)
> +			    struct kvm_memory_slot *new)
>  {
>  	if (new) {
>  		int id = new->id;
> @@ -699,8 +698,6 @@ static void update_memslots(struct kvm_memslots *slots,
>  		if (new->npages != npages)
>  			sort_memslots(slots);
>  	}
> -
> -	slots->generation = last_generation + 1;
>  }
> 
>  static int check_memory_region_flags(struct kvm_userspace_memory_region *mem)
> @@ -722,9 +719,12 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
>  {
>  	struct kvm_memslots *old_memslots = kvm->memslots;
> 
> -	update_memslots(slots, new, kvm->memslots->generation);
> +	/* ensure generation number is always increased. */
> +	slots->generation = old_memslots->generation;
> +	update_memslots(slots, new);
>  	rcu_assign_pointer(kvm->memslots, slots);
>  	synchronize_srcu_expedited(&kvm->srcu);
> +	slots->generation++;
> 
>  	kvm_arch_memslots_updated(kvm);
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] kvm: x86: fix stale mmio cache bug
  2014-08-14  7:01 ` [PATCH 2/2] kvm: x86: fix stale mmio cache bug Xiao Guangrong
@ 2014-08-14 16:25   ` David Matlack
  2014-08-18 21:24   ` Paolo Bonzini
  1 sibling, 0 replies; 29+ messages in thread
From: David Matlack @ 2014-08-14 16:25 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Gleb Natapov, Avi Kivity, mtosatti, Paolo Bonzini, linux-kernel,
	kvm, stable

On Thu, Aug 14, 2014 at 12:01 AM, Xiao Guangrong
<xiaoguangrong@linux.vnet.ibm.com> wrote:
> From: David Matlack <dmatlack@google.com>
>
> The following events can lead to an incorrect KVM_EXIT_MMIO bubbling
> up to userspace:
>
> (1) Guest accesses gpa X without a memory slot. The gfn is cached in
> struct kvm_vcpu_arch (mmio_gfn). On Intel EPT-enabled hosts, KVM sets
> the SPTE write-execute-noread so that future accesses cause
> EPT_MISCONFIGs.
>
> (2) Host userspace creates a memory slot via KVM_SET_USER_MEMORY_REGION
> covering the page just accessed.
>
> (3) Guest attempts to read or write to gpa X again. On Intel, this
> generates an EPT_MISCONFIG. The memory slot generation number that
> was incremented in (2) would normally take care of this but we fast
> path mmio faults through quickly_check_mmio_pf(), which only checks
> the per-vcpu mmio cache. Since we hit the cache, KVM passes a
> KVM_EXIT_MMIO up to userspace.
>
> This patch fixes the issue by using the memslot generation number
> to validate the mmio cache.
>
> [ xiaoguangrong: adjust the code to make it simpler for stable-tree fix. ]

The adjustments look good. Thanks!

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
  2014-08-14  7:01 [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number Xiao Guangrong
  2014-08-14  7:01 ` [PATCH 2/2] kvm: x86: fix stale mmio cache bug Xiao Guangrong
  2014-08-14  7:06 ` [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number Xiao Guangrong
@ 2014-08-18 13:57 ` Paolo Bonzini
  2014-08-18 16:35   ` Xiao Guangrong
  2 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2014-08-18 13:57 UTC (permalink / raw)
  To: Xiao Guangrong, gleb
  Cc: avi.kivity, mtosatti, linux-kernel, kvm, stable, David Matlack

Il 14/08/2014 09:01, Xiao Guangrong ha scritto:
> -	update_memslots(slots, new, kvm->memslots->generation);
> +	/* ensure generation number is always increased. */
> +	slots->generation = old_memslots->generation;
> +	update_memslots(slots, new);
>  	rcu_assign_pointer(kvm->memslots, slots);
>  	synchronize_srcu_expedited(&kvm->srcu);
> +	slots->generation++;

I don't trust my brain enough to review this patch.

kvm_current_mmio_generation seems like a very bad (race-prone) API.  One
patch I trust myself reviewing would change a bunch of functions in
kvm_main.c to take a memslots struct.  This would make it easy to
respect the hard and fast rule of not dereferencing the same pointer
twice.  But it would be a tedious change.

Another alternative could be to use the low bit to mark an in-progress
change, and skip the caching if the low bit is set.  Similar to a
seqcount (except if read_seqcount_retry fails, we just punt and not
retry anything), you could use it even though the memory barriers
provided by write_seqcount_begin/end are not too useful in this case.

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
  2014-08-18 13:57 ` Paolo Bonzini
@ 2014-08-18 16:35   ` Xiao Guangrong
  2014-08-18 18:20     ` David Matlack
  2014-08-18 18:47     ` Paolo Bonzini
  0 siblings, 2 replies; 29+ messages in thread
From: Xiao Guangrong @ 2014-08-18 16:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: gleb, avi.kivity, mtosatti, linux-kernel, kvm, stable,
	David Matlack


Hi Paolo,

Thank you to review the patch!

On Aug 18, 2014, at 9:57 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:

> Il 14/08/2014 09:01, Xiao Guangrong ha scritto:
>> -	update_memslots(slots, new, kvm->memslots->generation);
>> +	/* ensure generation number is always increased. */
>> +	slots->generation = old_memslots->generation;
>> +	update_memslots(slots, new);
>> 	rcu_assign_pointer(kvm->memslots, slots);
>> 	synchronize_srcu_expedited(&kvm->srcu);
>> +	slots->generation++;
> 
> I don't trust my brain enough to review this patch.

Sorry to make you confused. I should expain it more clearly.

What this patch tried to fix is:  kvm will generate wrong mmio-exit forever
if no luck event cleans mmio spte. (eg. if no memory pressure or  no
context-sync on that spte.)

Note, it is hard to do precise sync between kvm_vm_ioctl_set_memory_region
and mmio-exit - that means userspace is able to get mmio-exit even if
kvm_vm_ioctl_set_memory_region have finished, for example, kvm identifies
a mmio access before issuing the ioctl and injects mmio-exit to userspace after
ioctl return. So checking if mmio-exit is a real mmio access in userspace is
needed anyway.

> kvm_current_mmio_generation seems like a very bad (race-prone) API.  One
> patch I trust myself reviewing would change a bunch of functions in
> kvm_main.c to take a memslots struct.  This would make it easy to
> respect the hard and fast rule of not dereferencing the same pointer
> twice.  But it would be a tedious change.

kvm_set_memory_region is the only place updating memslot and
kvm_current_mmio_generation accesses memslot by rcu-dereference,
i do not know why other places need to take into account.

I think this patch is auditable, page-fault is always called by holding
srcu-lock so that a page fault can’t go across synchronize_srcu_expedited.
Only these cases can happen:

1)  page fault occurs before synchronize_srcu_expedited.
    In this case, vcpu will generate mmio-exit for the memslot being registered
    by the ioctl. That’s ok since the ioctl have not finished.

2) page fault occurs after synchronize_srcu_expedited and during
   increasing generation-number.
   In this case, userspace may get wrong mmio-exit (that happen if handing
   page-fault is slower that the ioctl), that’s ok too since userspace need do
  the check anyway like i said above.

3) page fault occurs after generation-number update
   that’s definitely correct. :)

> Another alternative could be to use the low bit to mark an in-progress
> change, and skip the caching if the low bit is set.  Similar to a
> seqcount (except if read_seqcount_retry fails, we just punt and not
> retry anything), you could use it even though the memory barriers
> provided by write_seqcount_begin/end are not too useful in this case.

I do not know how the bit works, page fault will cache the memslot before
the bit set and cache the generation-number after the bit set.

Maybe i missed your idea, could you please detail it?


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
  2014-08-18 16:35   ` Xiao Guangrong
@ 2014-08-18 18:20     ` David Matlack
  2014-08-18 18:47     ` Paolo Bonzini
  1 sibling, 0 replies; 29+ messages in thread
From: David Matlack @ 2014-08-18 18:20 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Paolo Bonzini, Gleb Natapov, Avi Kivity, mtosatti, linux-kernel,
	kvm, stable

On Mon, Aug 18, 2014 at 9:35 AM, Xiao Guangrong
<xiaoguangrong.eric@gmail.com> wrote:
>
> Hi Paolo,
>
> Thank you to review the patch!
>
> On Aug 18, 2014, at 9:57 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>> Il 14/08/2014 09:01, Xiao Guangrong ha scritto:
>>> -    update_memslots(slots, new, kvm->memslots->generation);
>>> +    /* ensure generation number is always increased. */
>>> +    slots->generation = old_memslots->generation;
>>> +    update_memslots(slots, new);
>>>      rcu_assign_pointer(kvm->memslots, slots);
>>>      synchronize_srcu_expedited(&kvm->srcu);
>>> +    slots->generation++;
>>
>> I don't trust my brain enough to review this patch.

Xiao, I thought about your approach a lot and I can't think of a bad race
that isn't already possible due to the fact that kvm allows memslot
mutation to race with vm exits. That being said, it's hard to reason about
all the other "clients" of memslots and what weirdness (or badness) will
be caused by updating generation after srcu_synch. I like Paolo's two
approaches because they fix the bug without any side-effects.

> Sorry to make you confused. I should expain it more clearly.
>
> What this patch tried to fix is:  kvm will generate wrong mmio-exit forever
> if no luck event cleans mmio spte. (eg. if no memory pressure or  no
> context-sync on that spte.)
>
> Note, it is hard to do precise sync between kvm_vm_ioctl_set_memory_region
> and mmio-exit - that means userspace is able to get mmio-exit even if
> kvm_vm_ioctl_set_memory_region have finished, for example, kvm identifies
> a mmio access before issuing the ioctl and injects mmio-exit to userspace after
> ioctl return. So checking if mmio-exit is a real mmio access in userspace is
> needed anyway.
>
>> kvm_current_mmio_generation seems like a very bad (race-prone) API.  One
>> patch I trust myself reviewing would change a bunch of functions in
>> kvm_main.c to take a memslots struct.  This would make it easy to
>> respect the hard and fast rule of not dereferencing the same pointer
>> twice.  But it would be a tedious change.
>
> kvm_set_memory_region is the only place updating memslot and
> kvm_current_mmio_generation accesses memslot by rcu-dereference,
> i do not know why other places need to take into account.

If you rcu_dereference() more than once, you can't trust previous
decisions based on rcu_dereference()'s. If the mmio cache code only
did one rcu_dereference() per vm exit, this bug would be gone.

> I think this patch is auditable, page-fault is always called by holding
> srcu-lock so that a page fault can’t go across synchronize_srcu_expedited.
> Only these cases can happen:
>
> 1)  page fault occurs before synchronize_srcu_expedited.
>     In this case, vcpu will generate mmio-exit for the memslot being registered
>     by the ioctl. That’s ok since the ioctl have not finished.
>
> 2) page fault occurs after synchronize_srcu_expedited and during
>    increasing generation-number.
>    In this case, userspace may get wrong mmio-exit (that happen if handing
>    page-fault is slower that the ioctl), that’s ok too since userspace need do
>   the check anyway like i said above.
>
> 3) page fault occurs after generation-number update
>    that’s definitely correct. :)
>
>> Another alternative could be to use the low bit to mark an in-progress
>> change, and skip the caching if the low bit is set.  Similar to a
>> seqcount (except if read_seqcount_retry fails, we just punt and not
>> retry anything), you could use it even though the memory barriers
>> provided by write_seqcount_begin/end are not too useful in this case.

I like this approach best. It would have the least code changes and provide
the same guarantees.

> I do not know how the bit works, page fault will cache the memslot before
> the bit set and cache the generation-number after the bit set.
>
> Maybe i missed your idea, could you please detail it?

In vcpu_cache_mmio_info() if generation is odd, just don't do the caching
because memslots were changed while we were running and we just assume the
worst case.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
  2014-08-18 16:35   ` Xiao Guangrong
  2014-08-18 18:20     ` David Matlack
@ 2014-08-18 18:47     ` Paolo Bonzini
  2014-08-18 19:56       ` Xiao Guangrong
  1 sibling, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2014-08-18 18:47 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: gleb, avi.kivity, mtosatti, linux-kernel, kvm, stable,
	David Matlack

Il 18/08/2014 18:35, Xiao Guangrong ha scritto:
> 
> Hi Paolo,
> 
> Thank you to review the patch!
> 
> On Aug 18, 2014, at 9:57 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
>> Il 14/08/2014 09:01, Xiao Guangrong ha scritto:
>>> -	update_memslots(slots, new, kvm->memslots->generation);
>>> +	/* ensure generation number is always increased. */
>>> +	slots->generation = old_memslots->generation;
>>> +	update_memslots(slots, new);
>>> 	rcu_assign_pointer(kvm->memslots, slots);
>>> 	synchronize_srcu_expedited(&kvm->srcu);
>>> +	slots->generation++;
>>
>> I don't trust my brain enough to review this patch.
> 
> Sorry to make you confused. I should expain it more clearly.

Don't worry, it's not your fault. :)

>> kvm_current_mmio_generation seems like a very bad (race-prone) API.  One
>> patch I trust myself reviewing would change a bunch of functions in
>> kvm_main.c to take a memslots struct.  This would make it easy to
>> respect the hard and fast rule of not dereferencing the same pointer
>> twice.  But it would be a tedious change.
> 
> kvm_set_memory_region is the only place updating memslot and
> kvm_current_mmio_generation accesses memslot by rcu-dereference,
> i do not know why other places need to take into account.

The race occurs because gfn_to_pfn_many_atomic or some other function
has already used kvm_memslots().  Calling kvm_memslots() twice is the
root cause the bug.

> I think this patch is auditable, page-fault is always called by holding
> srcu-lock so that a page fault can’t go across synchronize_srcu_expedited.
> Only these cases can happen:
> 
> 1)  page fault occurs before synchronize_srcu_expedited.
>     In this case, vcpu will generate mmio-exit for the memslot being registered
>     by the ioctl. That’s ok since the ioctl have not finished.
> 
> 2) page fault occurs after synchronize_srcu_expedited and during
>    increasing generation-number.
>    In this case, userspace may get wrong mmio-exit (that happen if handing
>    page-fault is slower that the ioctl), that’s ok too since userspace need do
>   the check anyway like i said above.
> 
> 3) page fault occurs after generation-number update
>    that’s definitely correct. :)
> 
>> Another alternative could be to use the low bit to mark an in-progress
>> change, and skip the caching if the low bit is set.  Similar to a
>> seqcount (except if read_seqcount_retry fails, we just punt and not
>> retry anything), you could use it even though the memory barriers
>> provided by write_seqcount_begin/end are not too useful in this case.
> 
> I do not know how the bit works, page fault will cache the memslot before
> the bit set and cache the generation-number after the bit set.
> 
> Maybe i missed your idea, could you please detail it?

Something like this:

-	update_memslots(slots, new, kvm->memslots->generation);
+	/* ensure generation number is always increased. */
+	slots->generation = old_memslots->generation + 1;
+	update_memslots(slots, new);
	rcu_assign_pointer(kvm->memslots, slots);
	synchronize_srcu_expedited(&kvm->srcu);
+	slots->generation++;

Then case 1 and 2 will just have a cache miss.

The "low bit" is really just because each slot update does 2 generation
increases.

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
  2014-08-18 18:47     ` Paolo Bonzini
@ 2014-08-18 19:56       ` Xiao Guangrong
  2014-08-18 21:15         ` David Matlack
  0 siblings, 1 reply; 29+ messages in thread
From: Xiao Guangrong @ 2014-08-18 19:56 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: gleb, avi.kivity, mtosatti, linux-kernel, kvm, stable,
	David Matlack


On Aug 19, 2014, at 2:47 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:

> 
>> I think this patch is auditable, page-fault is always called by holding
>> srcu-lock so that a page fault can’t go across synchronize_srcu_expedited.
>> Only these cases can happen:
>> 
>> 1)  page fault occurs before synchronize_srcu_expedited.
>>    In this case, vcpu will generate mmio-exit for the memslot being registered
>>    by the ioctl. That’s ok since the ioctl have not finished.
>> 
>> 2) page fault occurs after synchronize_srcu_expedited and during
>>   increasing generation-number.
>>   In this case, userspace may get wrong mmio-exit (that happen if handing
>>   page-fault is slower that the ioctl), that’s ok too since userspace need do
>>  the check anyway like i said above.
>> 
>> 3) page fault occurs after generation-number update
>>   that’s definitely correct. :)
>> 
>>> Another alternative could be to use the low bit to mark an in-progress
>>> change, and skip the caching if the low bit is set.  Similar to a
>>> seqcount (except if read_seqcount_retry fails, we just punt and not
>>> retry anything), you could use it even though the memory barriers
>>> provided by write_seqcount_begin/end are not too useful in this case.
>> 
>> I do not know how the bit works, page fault will cache the memslot before
>> the bit set and cache the generation-number after the bit set.
>> 
>> Maybe i missed your idea, could you please detail it?
> 
> Something like this:
> 
> -	update_memslots(slots, new, kvm->memslots->generation);
> +	/* ensure generation number is always increased. */
> +	slots->generation = old_memslots->generation + 1;
> +	update_memslots(slots, new);
> 	rcu_assign_pointer(kvm->memslots, slots);
> 	synchronize_srcu_expedited(&kvm->srcu);
> +	slots->generation++;
> 
> Then case 1 and 2 will just have a cache miss.

So, case 2 is what you concerned? :) I still think it is ok but i do not have
strong opinion on that. How about simplify it like this:

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9314678..9fabf6a 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -234,16 +234,22 @@ static unsigned int get_mmio_spte_generation(u64 spte)
 	return gen;
 }
 
-static unsigned int kvm_current_mmio_generation(struct kvm *kvm)
+static unsigned int __kvm_current_mmio_generation(struct kvm_memslots *slots)
 {
+
 	/*
 	 * Init kvm generation close to MMIO_MAX_GEN to easily test the
 	 * code of handling generation number wrap-around.
 	 */
-	return (kvm_memslots(kvm)->generation +
+	return (slots->generation +
 		      MMIO_MAX_GEN - 150) & MMIO_GEN_MASK;
 }
 
+static unsigned int kvm_current_mmio_generation(struct kvm *kvm)
+{
+	return __kvm_current_mmio_generation(kvm_memslots(kvm));
+}
+
 static void mark_mmio_spte(struct kvm *kvm, u64 *sptep, u64 gfn,
 			   unsigned access)
 {
@@ -287,9 +293,15 @@ static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, gfn_t gfn,
 
 static bool check_mmio_spte(struct kvm *kvm, u64 spte)
 {
+	struct kvm_memslots *slots = kvm_memslots(kvm);
 	unsigned int kvm_gen, spte_gen;
 
-	kvm_gen = kvm_current_mmio_generation(kvm);
+	if (slots->updated)
+		return false;
+
+	smp_rmb();
+	
+	kvm_gen = __kvm_current_mmio_generation(slots);
 	spte_gen = get_mmio_spte_generation(spte);
 
 	trace_check_mmio_spte(spte, kvm_gen, spte_gen);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4b6c01b..1d4e78f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -96,7 +96,7 @@ static void hardware_disable_all(void);
 
 static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
 static void update_memslots(struct kvm_memslots *slots,
-			    struct kvm_memory_slot *new, u64 last_generation);
+			    struct kvm_memory_slot *new);
 
 static void kvm_release_pfn_dirty(pfn_t pfn);
 static void mark_page_dirty_in_slot(struct kvm *kvm,
@@ -685,8 +685,7 @@ static void sort_memslots(struct kvm_memslots *slots)
 }
 
 static void update_memslots(struct kvm_memslots *slots,
-			    struct kvm_memory_slot *new,
-			    u64 last_generation)
+			    struct kvm_memory_slot *new)
 {
 	if (new) {
 		int id = new->id;
@@ -697,8 +696,6 @@ static void update_memslots(struct kvm_memslots *slots,
 		if (new->npages != npages)
 			sort_memslots(slots);
 	}
-
-	slots->generation = last_generation + 1;
 }
 
 static int check_memory_region_flags(struct kvm_userspace_memory_region *mem)
@@ -720,10 +717,17 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
 {
 	struct kvm_memslots *old_memslots = kvm->memslots;
 
-	update_memslots(slots, new, kvm->memslots->generation);
+	/* ensure generation number is always increased. */
+	slots->updated = true;
+	slots->generation = old_memslots->generation;
+	update_memslots(slots, new);
 	rcu_assign_pointer(kvm->memslots, slots);
 	synchronize_srcu_expedited(&kvm->srcu);
 
+	slots->generation++;
+	smp_wmb();
+	slots->updated = false;
+
 	kvm_arch_memslots_updated(kvm);
 
 	return old_memslots;


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
  2014-08-18 19:56       ` Xiao Guangrong
@ 2014-08-18 21:15         ` David Matlack
  2014-08-18 21:24           ` Paolo Bonzini
  2014-08-19  3:50           ` Xiao Guangrong
  0 siblings, 2 replies; 29+ messages in thread
From: David Matlack @ 2014-08-18 21:15 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Paolo Bonzini, Gleb Natapov, Avi Kivity, mtosatti, linux-kernel,
	kvm, stable

On Mon, Aug 18, 2014 at 12:56 PM, Xiao Guangrong
<xiaoguangrong.eric@gmail.com> wrote:
> @@ -287,9 +293,15 @@ static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, gfn_t gfn,
>
>  static bool check_mmio_spte(struct kvm *kvm, u64 spte)
>  {
> +       struct kvm_memslots *slots = kvm_memslots(kvm);
>         unsigned int kvm_gen, spte_gen;
>
> -       kvm_gen = kvm_current_mmio_generation(kvm);
> +       if (slots->updated)
> +               return false;
> +
> +       smp_rmb();
> +
> +       kvm_gen = __kvm_current_mmio_generation(slots);
>         spte_gen = get_mmio_spte_generation(spte);
>

What does this fix? Case 2 can still happen. (Case 2 is unavoidable unless we
block during memslot updates, which I don't think we should :).

>         trace_check_mmio_spte(spte, kvm_gen, spte_gen);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 4b6c01b..1d4e78f 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -96,7 +96,7 @@ static void hardware_disable_all(void);
>
>  static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
>  static void update_memslots(struct kvm_memslots *slots,
> -                           struct kvm_memory_slot *new, u64 last_generation);
> +                           struct kvm_memory_slot *new);
>
>  static void kvm_release_pfn_dirty(pfn_t pfn);
>  static void mark_page_dirty_in_slot(struct kvm *kvm,
> @@ -685,8 +685,7 @@ static void sort_memslots(struct kvm_memslots *slots)
>  }
>
>  static void update_memslots(struct kvm_memslots *slots,
> -                           struct kvm_memory_slot *new,
> -                           u64 last_generation)
> +                           struct kvm_memory_slot *new)
>  {
>         if (new) {
>                 int id = new->id;
> @@ -697,8 +696,6 @@ static void update_memslots(struct kvm_memslots *slots,
>                 if (new->npages != npages)
>                         sort_memslots(slots);
>         }
> -
> -       slots->generation = last_generation + 1;
>  }
>
>  static int check_memory_region_flags(struct kvm_userspace_memory_region *mem)
> @@ -720,10 +717,17 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
>  {
>         struct kvm_memslots *old_memslots = kvm->memslots;
>
> -       update_memslots(slots, new, kvm->memslots->generation);
> +       /* ensure generation number is always increased. */
> +       slots->updated = true;
> +       slots->generation = old_memslots->generation;
> +       update_memslots(slots, new);
>         rcu_assign_pointer(kvm->memslots, slots);
>         synchronize_srcu_expedited(&kvm->srcu);
>
> +       slots->generation++;
> +       smp_wmb();
> +       slots->updated = false;
> +
>         kvm_arch_memslots_updated(kvm);
>
>         return old_memslots;
>

This is effectively the same as the first approach.

I just realized how simple Paolo's idea is. I think it can be a one line
patch (without comments):

[...]
        update_memslots(slots, new, kvm->memslots->generation);
        rcu_assign_pointer(kvm->memslots, slots);
        synchronize_srcu_expedited(&kvm->srcu);
+       slots->generation++;

        kvm_arch_memslots_updated(kvm);
[...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
  2014-08-18 21:15         ` David Matlack
@ 2014-08-18 21:24           ` Paolo Bonzini
  2014-08-18 21:33             ` David Matlack
  2014-08-19  3:50           ` Xiao Guangrong
  1 sibling, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2014-08-18 21:24 UTC (permalink / raw)
  To: David Matlack, Xiao Guangrong
  Cc: Gleb Natapov, Avi Kivity, mtosatti, linux-kernel, kvm, stable

Il 18/08/2014 23:15, David Matlack ha scritto:
> I just realized how simple Paolo's idea is. I think it can be a one line
> patch (without comments):
> 
> [...]
>         update_memslots(slots, new, kvm->memslots->generation);
>         rcu_assign_pointer(kvm->memslots, slots);
>         synchronize_srcu_expedited(&kvm->srcu);
> +       slots->generation++;
> 
>         kvm_arch_memslots_updated(kvm);
> [...]

Yeah, you're right.  I think at this point it makes sense to put all
generation handling in install_new_memslots, but with proper comments
the above can do as well.

Would you like to send it?  Patch 2 still applies on top.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] kvm: x86: fix stale mmio cache bug
  2014-08-14  7:01 ` [PATCH 2/2] kvm: x86: fix stale mmio cache bug Xiao Guangrong
  2014-08-14 16:25   ` David Matlack
@ 2014-08-18 21:24   ` Paolo Bonzini
  1 sibling, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2014-08-18 21:24 UTC (permalink / raw)
  To: Xiao Guangrong, gleb
  Cc: avi.kivity, mtosatti, linux-kernel, kvm, David Matlack, stable

Il 14/08/2014 09:01, Xiao Guangrong ha scritto:
>   * Clear the mmio cache info for the given gva,
> - * specially, if gva is ~0ul, we clear all mmio cache info.
> + * specially, if gva is ~MMIO_GVA_ANY, we clear all mmio cache info.

Extra ~.

>   */
> +#define MMIO_GVA_ANY	~((gva_t)0)
> +

Better: (~(gva_t)0).

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
  2014-08-18 21:24           ` Paolo Bonzini
@ 2014-08-18 21:33             ` David Matlack
  0 siblings, 0 replies; 29+ messages in thread
From: David Matlack @ 2014-08-18 21:33 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Xiao Guangrong, Gleb Natapov, Avi Kivity, mtosatti, linux-kernel,
	kvm, stable

On Mon, Aug 18, 2014 at 2:24 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 18/08/2014 23:15, David Matlack ha scritto:
>> I just realized how simple Paolo's idea is. I think it can be a one line
>> patch (without comments):
>>
>> [...]
>>         update_memslots(slots, new, kvm->memslots->generation);
>>         rcu_assign_pointer(kvm->memslots, slots);
>>         synchronize_srcu_expedited(&kvm->srcu);
>> +       slots->generation++;
>>
>>         kvm_arch_memslots_updated(kvm);
>> [...]
>
> Yeah, you're right.  I think at this point it makes sense to put all
> generation handling in install_new_memslots, but with proper comments
> the above can do as well.
>
> Would you like to send it?  Patch 2 still applies on top.

Sure, I like doing everything in install_new_memslots() as well so I'll fix
that.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
  2014-08-18 21:15         ` David Matlack
  2014-08-18 21:24           ` Paolo Bonzini
@ 2014-08-19  3:50           ` Xiao Guangrong
  2014-08-19  4:31             ` David Matlack
  2014-08-19  8:28             ` Paolo Bonzini
  1 sibling, 2 replies; 29+ messages in thread
From: Xiao Guangrong @ 2014-08-19  3:50 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Gleb Natapov, Avi Kivity, mtosatti, linux-kernel,
	kvm, stable

On 08/19/2014 05:15 AM, David Matlack wrote:
> On Mon, Aug 18, 2014 at 12:56 PM, Xiao Guangrong
> <xiaoguangrong.eric@gmail.com> wrote:
>> @@ -287,9 +293,15 @@ static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, gfn_t gfn,
>>
>>  static bool check_mmio_spte(struct kvm *kvm, u64 spte)
>>  {
>> +       struct kvm_memslots *slots = kvm_memslots(kvm);
>>         unsigned int kvm_gen, spte_gen;
>>
>> -       kvm_gen = kvm_current_mmio_generation(kvm);
>> +       if (slots->updated)
>> +               return false;
>> +
>> +       smp_rmb();
>> +
>> +       kvm_gen = __kvm_current_mmio_generation(slots);
>>         spte_gen = get_mmio_spte_generation(spte);
>>
> 
> What does this fix? Case 2 can still happen. (Case 2 is unavoidable unless we
> block during memslot updates, which I don't think we should :).

This exactly fixes case 2, slots->updated just acts as the "low bit"
but avoid generation number wrap-around and trick handling of the number.
More details please see below.

> 
>>         trace_check_mmio_spte(spte, kvm_gen, spte_gen);
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index 4b6c01b..1d4e78f 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -96,7 +96,7 @@ static void hardware_disable_all(void);
>>
>>  static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
>>  static void update_memslots(struct kvm_memslots *slots,
>> -                           struct kvm_memory_slot *new, u64 last_generation);
>> +                           struct kvm_memory_slot *new);
>>
>>  static void kvm_release_pfn_dirty(pfn_t pfn);
>>  static void mark_page_dirty_in_slot(struct kvm *kvm,
>> @@ -685,8 +685,7 @@ static void sort_memslots(struct kvm_memslots *slots)
>>  }
>>
>>  static void update_memslots(struct kvm_memslots *slots,
>> -                           struct kvm_memory_slot *new,
>> -                           u64 last_generation)
>> +                           struct kvm_memory_slot *new)
>>  {
>>         if (new) {
>>                 int id = new->id;
>> @@ -697,8 +696,6 @@ static void update_memslots(struct kvm_memslots *slots,
>>                 if (new->npages != npages)
>>                         sort_memslots(slots);
>>         }
>> -
>> -       slots->generation = last_generation + 1;
>>  }
>>
>>  static int check_memory_region_flags(struct kvm_userspace_memory_region *mem)
>> @@ -720,10 +717,17 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
>>  {
>>         struct kvm_memslots *old_memslots = kvm->memslots;
>>
>> -       update_memslots(slots, new, kvm->memslots->generation);
>> +       /* ensure generation number is always increased. */
>> +       slots->updated = true;
>> +       slots->generation = old_memslots->generation;
>> +       update_memslots(slots, new);
>>         rcu_assign_pointer(kvm->memslots, slots);
>>         synchronize_srcu_expedited(&kvm->srcu);
>>
>> +       slots->generation++;
>> +       smp_wmb();
>> +       slots->updated = false;
>> +
>>         kvm_arch_memslots_updated(kvm);
>>
>>         return old_memslots;
>>
> 
> This is effectively the same as the first approach.
> 
> I just realized how simple Paolo's idea is. I think it can be a one line
> patch (without comments):
> 
> [...]
>         update_memslots(slots, new, kvm->memslots->generation);
>         rcu_assign_pointer(kvm->memslots, slots);
>         synchronize_srcu_expedited(&kvm->srcu);
> +       slots->generation++;
> 
>         kvm_arch_memslots_updated(kvm);
> [...]

Really? Unfortunately no. :)

See this scenario:

CPU 0                                  CPU 1
ioctl registering a new memslot which
contains GPA:
                           page-fault handler:
                             see it'a mmio access on GPA;

 assign the new memslots with generation number increased
                             cache the generation-number into spte;
                             fix the access and comeback to guest;
SRCU-sync
                             page-fault again and check the spte is a valid mmio-spte(*)
generation-number++;
return to userspace;
                             do mmio-emulation and inject mmio-exit;

!!! userspace receives a unexpected mmio-exit, that is case 2 i exactly
said in the last mail.


Note in the step *, my approach detects the invalid generation-number which
will invalidate the mmio spte properly .


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
  2014-08-19  3:50           ` Xiao Guangrong
@ 2014-08-19  4:31             ` David Matlack
  2014-08-19  4:41               ` Xiao Guangrong
  2014-08-19  8:28             ` Paolo Bonzini
  1 sibling, 1 reply; 29+ messages in thread
From: David Matlack @ 2014-08-19  4:31 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Paolo Bonzini, Gleb Natapov, Avi Kivity, mtosatti, linux-kernel,
	kvm, stable

On Mon, Aug 18, 2014 at 8:50 PM, Xiao Guangrong
<xiaoguangrong@linux.vnet.ibm.com> wrote:
> On 08/19/2014 05:15 AM, David Matlack wrote:
>> On Mon, Aug 18, 2014 at 12:56 PM, Xiao Guangrong
>> <xiaoguangrong.eric@gmail.com> wrote:
>>> @@ -287,9 +293,15 @@ static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, gfn_t gfn,
>>>
>>>  static bool check_mmio_spte(struct kvm *kvm, u64 spte)
>>>  {
>>> +       struct kvm_memslots *slots = kvm_memslots(kvm);
>>>         unsigned int kvm_gen, spte_gen;
>>>
>>> -       kvm_gen = kvm_current_mmio_generation(kvm);
>>> +       if (slots->updated)
>>> +               return false;
>>> +
>>> +       smp_rmb();
>>> +
>>> +       kvm_gen = __kvm_current_mmio_generation(slots);
>>>         spte_gen = get_mmio_spte_generation(spte);
>>>
>>
>> What does this fix? Case 2 can still happen. (Case 2 is unavoidable unless we
>> block during memslot updates, which I don't think we should :).
>
> This exactly fixes case 2, slots->updated just acts as the "low bit"
> but avoid generation number wrap-around and trick handling of the number.
> More details please see below.
>
>>
>>>         trace_check_mmio_spte(spte, kvm_gen, spte_gen);
>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>> index 4b6c01b..1d4e78f 100644
>>> --- a/virt/kvm/kvm_main.c
>>> +++ b/virt/kvm/kvm_main.c
>>> @@ -96,7 +96,7 @@ static void hardware_disable_all(void);
>>>
>>>  static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
>>>  static void update_memslots(struct kvm_memslots *slots,
>>> -                           struct kvm_memory_slot *new, u64 last_generation);
>>> +                           struct kvm_memory_slot *new);
>>>
>>>  static void kvm_release_pfn_dirty(pfn_t pfn);
>>>  static void mark_page_dirty_in_slot(struct kvm *kvm,
>>> @@ -685,8 +685,7 @@ static void sort_memslots(struct kvm_memslots *slots)
>>>  }
>>>
>>>  static void update_memslots(struct kvm_memslots *slots,
>>> -                           struct kvm_memory_slot *new,
>>> -                           u64 last_generation)
>>> +                           struct kvm_memory_slot *new)
>>>  {
>>>         if (new) {
>>>                 int id = new->id;
>>> @@ -697,8 +696,6 @@ static void update_memslots(struct kvm_memslots *slots,
>>>                 if (new->npages != npages)
>>>                         sort_memslots(slots);
>>>         }
>>> -
>>> -       slots->generation = last_generation + 1;
>>>  }
>>>
>>>  static int check_memory_region_flags(struct kvm_userspace_memory_region *mem)
>>> @@ -720,10 +717,17 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
>>>  {
>>>         struct kvm_memslots *old_memslots = kvm->memslots;
>>>
>>> -       update_memslots(slots, new, kvm->memslots->generation);
>>> +       /* ensure generation number is always increased. */
>>> +       slots->updated = true;
>>> +       slots->generation = old_memslots->generation;
>>> +       update_memslots(slots, new);
>>>         rcu_assign_pointer(kvm->memslots, slots);
>>>         synchronize_srcu_expedited(&kvm->srcu);
>>>
>>> +       slots->generation++;
>>> +       smp_wmb();
>>> +       slots->updated = false;
>>> +
>>>         kvm_arch_memslots_updated(kvm);
>>>
>>>         return old_memslots;
>>>
>>
>> This is effectively the same as the first approach.
>>
>> I just realized how simple Paolo's idea is. I think it can be a one line
>> patch (without comments):
>>
>> [...]
>>         update_memslots(slots, new, kvm->memslots->generation);
>>         rcu_assign_pointer(kvm->memslots, slots);
>>         synchronize_srcu_expedited(&kvm->srcu);
>> +       slots->generation++;
>>
>>         kvm_arch_memslots_updated(kvm);
>> [...]
>
> Really? Unfortunately no. :)
>
> See this scenario:
>
> CPU 0                                  CPU 1
> ioctl registering a new memslot which
> contains GPA:
>                            page-fault handler:
>                              see it'a mmio access on GPA;
>
>  assign the new memslots with generation number increased
>                              cache the generation-number into spte;
>                              fix the access and comeback to guest;
> SRCU-sync
>                              page-fault again and check the spte is a valid mmio-spte(*)
> generation-number++;
> return to userspace;
>                              do mmio-emulation and inject mmio-exit;
>
> !!! userspace receives a unexpected mmio-exit, that is case 2 i exactly
> said in the last mail.
>
>
> Note in the step *, my approach detects the invalid generation-number which
> will invalidate the mmio spte properly .

Sorry I didn't explain myself very well: Since we can get a single wrong
mmio exit no matter what, it has to be handled in userspace. So my point
was, it doesn't really help to fix that one very specific way that it can
happen, because it can just happen in other ways. (E.g. update memslots
occurs after is_noslot_pfn() and before mmio exit).

But it looks like you basically said the same thing earlier, so I think
we're on the same page.

The single line patch I suggested was only intended to fix the "forever
incorrectly exit mmio".

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
  2014-08-19  4:31             ` David Matlack
@ 2014-08-19  4:41               ` Xiao Guangrong
  2014-08-19  5:00                 ` David Matlack
  0 siblings, 1 reply; 29+ messages in thread
From: Xiao Guangrong @ 2014-08-19  4:41 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Gleb Natapov, Avi Kivity, mtosatti, linux-kernel,
	kvm, stable

On 08/19/2014 12:31 PM, David Matlack wrote:
> On Mon, Aug 18, 2014 at 8:50 PM, Xiao Guangrong
> <xiaoguangrong@linux.vnet.ibm.com> wrote:
>> On 08/19/2014 05:15 AM, David Matlack wrote:
>>> On Mon, Aug 18, 2014 at 12:56 PM, Xiao Guangrong
>>> <xiaoguangrong.eric@gmail.com> wrote:
>>>> @@ -287,9 +293,15 @@ static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, gfn_t gfn,
>>>>
>>>>  static bool check_mmio_spte(struct kvm *kvm, u64 spte)
>>>>  {
>>>> +       struct kvm_memslots *slots = kvm_memslots(kvm);
>>>>         unsigned int kvm_gen, spte_gen;
>>>>
>>>> -       kvm_gen = kvm_current_mmio_generation(kvm);
>>>> +       if (slots->updated)
>>>> +               return false;
>>>> +
>>>> +       smp_rmb();
>>>> +
>>>> +       kvm_gen = __kvm_current_mmio_generation(slots);
>>>>         spte_gen = get_mmio_spte_generation(spte);
>>>>
>>>
>>> What does this fix? Case 2 can still happen. (Case 2 is unavoidable unless we
>>> block during memslot updates, which I don't think we should :).
>>
>> This exactly fixes case 2, slots->updated just acts as the "low bit"
>> but avoid generation number wrap-around and trick handling of the number.
>> More details please see below.
>>
>>>
>>>>         trace_check_mmio_spte(spte, kvm_gen, spte_gen);
>>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>>> index 4b6c01b..1d4e78f 100644
>>>> --- a/virt/kvm/kvm_main.c
>>>> +++ b/virt/kvm/kvm_main.c
>>>> @@ -96,7 +96,7 @@ static void hardware_disable_all(void);
>>>>
>>>>  static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
>>>>  static void update_memslots(struct kvm_memslots *slots,
>>>> -                           struct kvm_memory_slot *new, u64 last_generation);
>>>> +                           struct kvm_memory_slot *new);
>>>>
>>>>  static void kvm_release_pfn_dirty(pfn_t pfn);
>>>>  static void mark_page_dirty_in_slot(struct kvm *kvm,
>>>> @@ -685,8 +685,7 @@ static void sort_memslots(struct kvm_memslots *slots)
>>>>  }
>>>>
>>>>  static void update_memslots(struct kvm_memslots *slots,
>>>> -                           struct kvm_memory_slot *new,
>>>> -                           u64 last_generation)
>>>> +                           struct kvm_memory_slot *new)
>>>>  {
>>>>         if (new) {
>>>>                 int id = new->id;
>>>> @@ -697,8 +696,6 @@ static void update_memslots(struct kvm_memslots *slots,
>>>>                 if (new->npages != npages)
>>>>                         sort_memslots(slots);
>>>>         }
>>>> -
>>>> -       slots->generation = last_generation + 1;
>>>>  }
>>>>
>>>>  static int check_memory_region_flags(struct kvm_userspace_memory_region *mem)
>>>> @@ -720,10 +717,17 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
>>>>  {
>>>>         struct kvm_memslots *old_memslots = kvm->memslots;
>>>>
>>>> -       update_memslots(slots, new, kvm->memslots->generation);
>>>> +       /* ensure generation number is always increased. */
>>>> +       slots->updated = true;
>>>> +       slots->generation = old_memslots->generation;
>>>> +       update_memslots(slots, new);
>>>>         rcu_assign_pointer(kvm->memslots, slots);
>>>>         synchronize_srcu_expedited(&kvm->srcu);
>>>>
>>>> +       slots->generation++;
>>>> +       smp_wmb();
>>>> +       slots->updated = false;
>>>> +
>>>>         kvm_arch_memslots_updated(kvm);
>>>>
>>>>         return old_memslots;
>>>>
>>>
>>> This is effectively the same as the first approach.
>>>
>>> I just realized how simple Paolo's idea is. I think it can be a one line
>>> patch (without comments):
>>>
>>> [...]
>>>         update_memslots(slots, new, kvm->memslots->generation);
>>>         rcu_assign_pointer(kvm->memslots, slots);
>>>         synchronize_srcu_expedited(&kvm->srcu);
>>> +       slots->generation++;
>>>
>>>         kvm_arch_memslots_updated(kvm);
>>> [...]
>>
>> Really? Unfortunately no. :)
>>
>> See this scenario:
>>
>> CPU 0                                  CPU 1
>> ioctl registering a new memslot which
>> contains GPA:
>>                            page-fault handler:
>>                              see it'a mmio access on GPA;
>>
>>  assign the new memslots with generation number increased
>>                              cache the generation-number into spte;
>>                              fix the access and comeback to guest;
>> SRCU-sync
>>                              page-fault again and check the spte is a valid mmio-spte(*)
>> generation-number++;
>> return to userspace;
>>                              do mmio-emulation and inject mmio-exit;
>>
>> !!! userspace receives a unexpected mmio-exit, that is case 2 i exactly
>> said in the last mail.
>>
>>
>> Note in the step *, my approach detects the invalid generation-number which
>> will invalidate the mmio spte properly .
> 
> Sorry I didn't explain myself very well: Since we can get a single wrong
> mmio exit no matter what, it has to be handled in userspace. So my point
> was, it doesn't really help to fix that one very specific way that it can
> happen, because it can just happen in other ways. (E.g. update memslots
> occurs after is_noslot_pfn() and before mmio exit).
> 
> But it looks like you basically said the same thing earlier, so I think
> we're on the same page.
> 

Yes, that is what i try to explain in previous mails. :(

> The single line patch I suggested was only intended to fix the "forever
> incorrectly exit mmio".

My patch also fixes this case and that does not doubly increase the
number. I think this is the better one.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
  2014-08-19  4:41               ` Xiao Guangrong
@ 2014-08-19  5:00                 ` David Matlack
  2014-08-19  5:19                   ` Xiao Guangrong
  0 siblings, 1 reply; 29+ messages in thread
From: David Matlack @ 2014-08-19  5:00 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Paolo Bonzini, Gleb Natapov, Avi Kivity, mtosatti, linux-kernel,
	kvm, stable

On Mon, Aug 18, 2014 at 9:41 PM, Xiao Guangrong
<xiaoguangrong@linux.vnet.ibm.com> wrote:
> On 08/19/2014 12:31 PM, David Matlack wrote:
>> But it looks like you basically said the same thing earlier, so I think
>> we're on the same page.
>>
>
> Yes, that is what i try to explain in previous mails. :(

I'm glad we understand each other now! Sorry again for my confusion.

>> The single line patch I suggested was only intended to fix the "forever
>> incorrectly exit mmio".
>
> My patch also fixes this case and that does not doubly increase the
> number. I think this is the better one.

I prefer doubly increasing the generation for this reason: the updated boolean
requires extra code on the "client-side" to check if there's an update in
progress. And that makes it easy to get wrong. In fact, your patch
forgot to check the updated bit in mark_mmio_spte(). Doubly increasing the
generation requires no "client-side" code to work.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
  2014-08-19  5:00                 ` David Matlack
@ 2014-08-19  5:19                   ` Xiao Guangrong
  2014-08-19  5:40                     ` David Matlack
  0 siblings, 1 reply; 29+ messages in thread
From: Xiao Guangrong @ 2014-08-19  5:19 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Gleb Natapov, Avi Kivity, mtosatti, linux-kernel,
	kvm, stable

On 08/19/2014 01:00 PM, David Matlack wrote:
> On Mon, Aug 18, 2014 at 9:41 PM, Xiao Guangrong
> <xiaoguangrong@linux.vnet.ibm.com> wrote:
>> On 08/19/2014 12:31 PM, David Matlack wrote:
>>> But it looks like you basically said the same thing earlier, so I think
>>> we're on the same page.
>>>
>>
>> Yes, that is what i try to explain in previous mails. :(
> 
> I'm glad we understand each other now! Sorry again for my confusion.

Yup, me too. :)

> 
>>> The single line patch I suggested was only intended to fix the "forever
>>> incorrectly exit mmio".
>>
>> My patch also fixes this case and that does not doubly increase the
>> number. I think this is the better one.
> 
> I prefer doubly increasing the generation for this reason: the updated boolean
> requires extra code on the "client-side" to check if there's an update in
> progress. And that makes it easy to get wrong. In fact, your patch
> forgot to check the updated bit in mark_mmio_spte(). Doubly increasing the
> generation requires no "client-side" code to work.

No, the updated patch is used to fix case 2 which i draw the scenario in
the last mail. I mean the original patch in this patchset which just
increase the number after srcu-sync.

Then could you tell me that your approach can do but my original patch can not?


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
  2014-08-19  5:19                   ` Xiao Guangrong
@ 2014-08-19  5:40                     ` David Matlack
  2014-08-19  5:55                       ` Xiao Guangrong
  0 siblings, 1 reply; 29+ messages in thread
From: David Matlack @ 2014-08-19  5:40 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Paolo Bonzini, Gleb Natapov, Avi Kivity, mtosatti, linux-kernel,
	kvm, stable

On Mon, Aug 18, 2014 at 10:19 PM, Xiao Guangrong
<xiaoguangrong@linux.vnet.ibm.com> wrote:
> On 08/19/2014 01:00 PM, David Matlack wrote:
>> On Mon, Aug 18, 2014 at 9:41 PM, Xiao Guangrong
>> <xiaoguangrong@linux.vnet.ibm.com> wrote:
>>> On 08/19/2014 12:31 PM, David Matlack wrote:
>>>> The single line patch I suggested was only intended to fix the "forever
>>>> incorrectly exit mmio".
>>>
>>> My patch also fixes this case and that does not doubly increase the
>>> number. I think this is the better one.
>>
>> I prefer doubly increasing the generation for this reason: the updated boolean
>> requires extra code on the "client-side" to check if there's an update in
>> progress. And that makes it easy to get wrong. In fact, your patch
>> forgot to check the updated bit in mark_mmio_spte(). Doubly increasing the
>> generation requires no "client-side" code to work.
>
> No, the updated patch is used to fix case 2 which i draw the scenario in
> the last mail. I mean the original patch in this patchset which just
> increase the number after srcu-sync.
>
> Then could you tell me that your approach can do but my original patch can not?

It avoids publishing new memslots with an old generation number attached to
them (even if it only lasts for a short period of time). Do you have a reason
why you don't want to doubly increase the generation?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
  2014-08-19  5:40                     ` David Matlack
@ 2014-08-19  5:55                       ` Xiao Guangrong
  0 siblings, 0 replies; 29+ messages in thread
From: Xiao Guangrong @ 2014-08-19  5:55 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Gleb Natapov, Avi Kivity, mtosatti, linux-kernel,
	kvm, stable

On 08/19/2014 01:40 PM, David Matlack wrote:
> On Mon, Aug 18, 2014 at 10:19 PM, Xiao Guangrong
> <xiaoguangrong@linux.vnet.ibm.com> wrote:
>> On 08/19/2014 01:00 PM, David Matlack wrote:
>>> On Mon, Aug 18, 2014 at 9:41 PM, Xiao Guangrong
>>> <xiaoguangrong@linux.vnet.ibm.com> wrote:
>>>> On 08/19/2014 12:31 PM, David Matlack wrote:
>>>>> The single line patch I suggested was only intended to fix the "forever
>>>>> incorrectly exit mmio".
>>>>
>>>> My patch also fixes this case and that does not doubly increase the
>>>> number. I think this is the better one.
>>>
>>> I prefer doubly increasing the generation for this reason: the updated boolean
>>> requires extra code on the "client-side" to check if there's an update in
>>> progress. And that makes it easy to get wrong. In fact, your patch
>>> forgot to check the updated bit in mark_mmio_spte(). Doubly increasing the
>>> generation requires no "client-side" code to work.
>>
>> No, the updated patch is used to fix case 2 which i draw the scenario in
>> the last mail. I mean the original patch in this patchset which just
>> increase the number after srcu-sync.
>>
>> Then could you tell me that your approach can do but my original patch can not?
> 
> It avoids publishing new memslots with an old generation number attached to
> them (even if it only lasts for a short period of time). 

I can not see the problem if that happen, could you please draw the scenario?

> Do you have a reason
> why you don't want to doubly increase the generation?

That more easily causes the number wrap-around.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
  2014-08-19  3:50           ` Xiao Guangrong
  2014-08-19  4:31             ` David Matlack
@ 2014-08-19  8:28             ` Paolo Bonzini
  2014-08-19  8:50               ` Xiao Guangrong
  1 sibling, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2014-08-19  8:28 UTC (permalink / raw)
  To: Xiao Guangrong, David Matlack
  Cc: Gleb Natapov, Avi Kivity, mtosatti, linux-kernel, kvm, stable

Il 19/08/2014 05:50, Xiao Guangrong ha scritto:
> 
> Note in the step *, my approach detects the invalid generation-number which
> will invalidate the mmio spte properly .

You are right, in fact my mail included another part: "Another 
alternative could be to use the low bit to mark an in-progress change, 
*and skip the caching if the low bit is set*."

I think if you always treat the low bit as zero in mmio sptes, you can 
do that without losing a bit of the generation.

Something like this (untested/uncompiled):

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 931467881da7..3a56d377c6d7 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -199,16 +199,20 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask)
 EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask);
 
 /*
- * spte bits of bit 3 ~ bit 11 are used as low 9 bits of generation number,
- * the bits of bits 52 ~ bit 61 are used as high 10 bits of generation
- * number.
+ * the low bit of the generation number is always presumed to be zero.
+ * This disables mmio caching during memslot updates.  The concept is
+ * similar to a seqcount but instead of retrying the access we just punt
+ * and ignore the cache.
+ *
+ * spte bits 3-11 are used as bits 1-9 of the generation number,
+ * the bits 52-61 are used as bits 10-19 of the generation number.
  */
-#define MMIO_SPTE_GEN_LOW_SHIFT		3
+#define MMIO_SPTE_GEN_LOW_SHIFT		2
 #define MMIO_SPTE_GEN_HIGH_SHIFT	52
 
-#define MMIO_GEN_SHIFT			19
-#define MMIO_GEN_LOW_SHIFT		9
-#define MMIO_GEN_LOW_MASK		((1 << MMIO_GEN_LOW_SHIFT) - 1)
+#define MMIO_GEN_SHIFT			20
+#define MMIO_GEN_LOW_SHIFT		10
+#define MMIO_GEN_LOW_MASK		((1 << MMIO_GEN_LOW_SHIFT) - 2)
 #define MMIO_GEN_MASK			((1 << MMIO_GEN_SHIFT) - 1)
 #define MMIO_MAX_GEN			((1 << MMIO_GEN_SHIFT) - 1)
 
@@ -236,12 +240,7 @@ static unsigned int get_mmio_spte_generation(u64 spte)
 
 static unsigned int kvm_current_mmio_generation(struct kvm *kvm)
 {
-	/*
-	 * Init kvm generation close to MMIO_MAX_GEN to easily test the
-	 * code of handling generation number wrap-around.
-	 */
-	return (kvm_memslots(kvm)->generation +
-		      MMIO_MAX_GEN - 150) & MMIO_GEN_MASK;
+	return kvm_memslots(kvm)->generation & MMIO_GEN_MASK;
 }
 
 static void mark_mmio_spte(struct kvm *kvm, u64 *sptep, u64 gfn,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a69a623938b8..c7e2800313b8 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -474,6 +476,13 @@ static struct kvm *kvm_create_vm(unsigned long type)
 	kvm->memslots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
 	if (!kvm->memslots)
 		goto out_err_no_srcu;
+
+	/*
+	 * Init kvm generation close to MMIO_MAX_GEN to easily test the
+	 * code of handling generation number wrap-around.
+	 */
+	kvm->memslots->generation = -150;
+
 	kvm_init_memslots_id(kvm);
 	if (init_srcu_struct(&kvm->srcu))
 		goto out_err_no_srcu;
@@ -725,6 +732,8 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
 	synchronize_srcu_expedited(&kvm->srcu);
 
 	kvm_arch_memslots_updated(kvm);
+	slots->generation++;
+	WARN_ON(slots->generation & 1);
 
 	return old_memslots;
 }

(modulo the changes to always set the generation in install_new_memslots, which
I'm eliding for clarity).

Moving the initialization to kvm_create_vm ensures that the low bit is untouched
between install_new_memslots and kvm_current_mmio_generation.

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
  2014-08-19  8:28             ` Paolo Bonzini
@ 2014-08-19  8:50               ` Xiao Guangrong
  2014-08-19  9:03                 ` Paolo Bonzini
  0 siblings, 1 reply; 29+ messages in thread
From: Xiao Guangrong @ 2014-08-19  8:50 UTC (permalink / raw)
  To: Paolo Bonzini, David Matlack
  Cc: Gleb Natapov, Avi Kivity, mtosatti, linux-kernel, kvm, stable

On 08/19/2014 04:28 PM, Paolo Bonzini wrote:
> Il 19/08/2014 05:50, Xiao Guangrong ha scritto:
>>
>> Note in the step *, my approach detects the invalid generation-number which
>> will invalidate the mmio spte properly .
> 
> You are right, in fact my mail included another part: "Another 
> alternative could be to use the low bit to mark an in-progress change, 
> *and skip the caching if the low bit is set*."

Okay, what confused me it that it seems that the single line patch
is ok to you. :)

Now, do we really need to care the case 2? like David said:
"Sorry I didn't explain myself very well: Since we can get a single wrong
mmio exit no matter what, it has to be handled in userspace. So my point
was, it doesn't really help to fix that one very specific way that it can
happen, because it can just happen in other ways. (E.g. update memslots
occurs after is_noslot_pfn() and before mmio exit)."

What's your idea?

> 
> I think if you always treat the low bit as zero in mmio sptes, you can 
> do that without losing a bit of the generation.

What's you did is avoiding cache a invalid generation number into spte, but
actually if we can figure it out when we check mmio access, it's ok. Like the
updated patch i posted should fix it, that way avoids doubly increase the number.

Okay, if you're interested increasing the number doubly, there is the simpler
one:

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9314678..bf49170 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -236,6 +236,9 @@ static unsigned int get_mmio_spte_generation(u64 spte)

 static unsigned int kvm_current_mmio_generation(struct kvm *kvm)
 {
+       /* The initialized generation number should be even. */
+       BUILD_BUG_ON((MMIO_MAX_GEN - 150) & 0x1);
+
        /*
         * Init kvm generation close to MMIO_MAX_GEN to easily test the
         * code of handling generation number wrap-around.
@@ -292,6 +295,14 @@ static bool check_mmio_spte(struct kvm *kvm, u64 spte)
        kvm_gen = kvm_current_mmio_generation(kvm);
        spte_gen = get_mmio_spte_generation(spte);

+       /*
+        * Aha, we cached a being-updated generation number or
+        * generation number is currnetly being updated, let do the
+        * real check for mmio access.
+        */
+       if ((kvm_gen | spte_gen) & 0x1)
+               return false;
+
        trace_check_mmio_spte(spte, kvm_gen, spte_gen);
        return likely(kvm_gen == spte_gen);
 }
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 33712fb..5c3f7b7 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -725,7 +725,7 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
        update_memslots(slots, new, kvm->memslots->generation);
        rcu_assign_pointer(kvm->memslots, slots);
        synchronize_srcu_expedited(&kvm->srcu);
-
+       kvm->memslots->generation++;
        kvm_arch_memslots_updated(kvm);

        return old_memslots;


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
  2014-08-19  8:50               ` Xiao Guangrong
@ 2014-08-19  9:03                 ` Paolo Bonzini
  2014-08-20  0:29                   ` Xiao Guangrong
  0 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2014-08-19  9:03 UTC (permalink / raw)
  To: Xiao Guangrong, David Matlack
  Cc: Gleb Natapov, Avi Kivity, mtosatti, linux-kernel, kvm, stable

Il 19/08/2014 10:50, Xiao Guangrong ha scritto:
> Okay, what confused me it that it seems that the single line patch
> is ok to you. :)

No, it was late and I was confused. :)

> Now, do we really need to care the case 2? like David said:
> "Sorry I didn't explain myself very well: Since we can get a single wrong
> mmio exit no matter what, it has to be handled in userspace. So my point
> was, it doesn't really help to fix that one very specific way that it can
> happen, because it can just happen in other ways. (E.g. update memslots
> occurs after is_noslot_pfn() and before mmio exit)."
> 
> What's your idea?
> 
> > I think if you always treat the low bit as zero in mmio sptes, you can 
> > do that without losing a bit of the generation.
> 
> What's you did is avoiding cache a invalid generation number into spte, but
> actually if we can figure it out when we check mmio access, it's ok. Like the
> updated patch i posted should fix it, that way avoids doubly increase the number.

Yes.

> Okay, if you're interested increasing the number doubly, there is the simpler
> one:

This wastes a bit in the mmio spte though.  My idea is to increase the
memslots generation twice, but drop the low bit in the mmio spte.

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
  2014-08-19  9:03                 ` Paolo Bonzini
@ 2014-08-20  0:29                   ` Xiao Guangrong
  2014-08-20  1:03                     ` David Matlack
  0 siblings, 1 reply; 29+ messages in thread
From: Xiao Guangrong @ 2014-08-20  0:29 UTC (permalink / raw)
  To: Paolo Bonzini, David Matlack
  Cc: Gleb Natapov, Avi Kivity, mtosatti, linux-kernel, kvm, stable

On 08/19/2014 05:03 PM, Paolo Bonzini wrote:
> Il 19/08/2014 10:50, Xiao Guangrong ha scritto:
>> Okay, what confused me it that it seems that the single line patch
>> is ok to you. :)
> 
> No, it was late and I was confused. :)
> 
>> Now, do we really need to care the case 2? like David said:
>> "Sorry I didn't explain myself very well: Since we can get a single wrong
>> mmio exit no matter what, it has to be handled in userspace. So my point
>> was, it doesn't really help to fix that one very specific way that it can
>> happen, because it can just happen in other ways. (E.g. update memslots
>> occurs after is_noslot_pfn() and before mmio exit)."
>>
>> What's your idea?
>>
>>> I think if you always treat the low bit as zero in mmio sptes, you can 
>>> do that without losing a bit of the generation.
>>
>> What's you did is avoiding cache a invalid generation number into spte, but
>> actually if we can figure it out when we check mmio access, it's ok. Like the
>> updated patch i posted should fix it, that way avoids doubly increase the number.
> 
> Yes.
> 
>> Okay, if you're interested increasing the number doubly, there is the simpler
>> one:
> 
> This wastes a bit in the mmio spte though.  My idea is to increase the
> memslots generation twice, but drop the low bit in the mmio spte.

Yeah, really smart idea. :)

Paolo/David, would you mind making a patch for this (+ the comments in David's
patch)?

Please feel free to add my:
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
  2014-08-20  0:29                   ` Xiao Guangrong
@ 2014-08-20  1:03                     ` David Matlack
  2014-08-20  8:38                       ` Paolo Bonzini
  0 siblings, 1 reply; 29+ messages in thread
From: David Matlack @ 2014-08-20  1:03 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Paolo Bonzini, Gleb Natapov, Avi Kivity, mtosatti, linux-kernel,
	kvm, stable

On Tue, Aug 19, 2014 at 5:29 PM, Xiao Guangrong
<xiaoguangrong@linux.vnet.ibm.com> wrote:
> On 08/19/2014 05:03 PM, Paolo Bonzini wrote:
>> Il 19/08/2014 10:50, Xiao Guangrong ha scritto:
>>> Okay, what confused me it that it seems that the single line patch
>>> is ok to you. :)
>>
>> No, it was late and I was confused. :)
>>
>>> Now, do we really need to care the case 2? like David said:
>>> "Sorry I didn't explain myself very well: Since we can get a single wrong
>>> mmio exit no matter what, it has to be handled in userspace. So my point
>>> was, it doesn't really help to fix that one very specific way that it can
>>> happen, because it can just happen in other ways. (E.g. update memslots
>>> occurs after is_noslot_pfn() and before mmio exit)."
>>>
>>> What's your idea?
>>>
>>>> I think if you always treat the low bit as zero in mmio sptes, you can
>>>> do that without losing a bit of the generation.
>>>
>>> What's you did is avoiding cache a invalid generation number into spte, but
>>> actually if we can figure it out when we check mmio access, it's ok. Like the
>>> updated patch i posted should fix it, that way avoids doubly increase the number.
>>
>> Yes.
>>
>>> Okay, if you're interested increasing the number doubly, there is the simpler
>>> one:
>>
>> This wastes a bit in the mmio spte though.  My idea is to increase the
>> memslots generation twice, but drop the low bit in the mmio spte.
>
> Yeah, really smart idea. :)
>
> Paolo/David, would you mind making a patch for this (+ the comments in David's
> patch)?

Paolo, since it was your idea would you like to write it? I don't mind either
way.

>
> Please feel free to add my:
> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
  2014-08-20  1:03                     ` David Matlack
@ 2014-08-20  8:38                       ` Paolo Bonzini
  0 siblings, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2014-08-20  8:38 UTC (permalink / raw)
  To: David Matlack, Xiao Guangrong
  Cc: Gleb Natapov, Avi Kivity, mtosatti, linux-kernel, kvm, stable

Il 20/08/2014 03:03, David Matlack ha scritto:
> On Tue, Aug 19, 2014 at 5:29 PM, Xiao Guangrong
> <xiaoguangrong@linux.vnet.ibm.com> wrote:
>> On 08/19/2014 05:03 PM, Paolo Bonzini wrote:
>>> Il 19/08/2014 10:50, Xiao Guangrong ha scritto:
>>>> Okay, what confused me it that it seems that the single line patch
>>>> is ok to you. :)
>>>
>>> No, it was late and I was confused. :)
>>>
>>>> Now, do we really need to care the case 2? like David said:
>>>> "Sorry I didn't explain myself very well: Since we can get a single wrong
>>>> mmio exit no matter what, it has to be handled in userspace. So my point
>>>> was, it doesn't really help to fix that one very specific way that it can
>>>> happen, because it can just happen in other ways. (E.g. update memslots
>>>> occurs after is_noslot_pfn() and before mmio exit)."
>>>>
>>>> What's your idea?
>>>>
>>>>> I think if you always treat the low bit as zero in mmio sptes, you can
>>>>> do that without losing a bit of the generation.
>>>>
>>>> What's you did is avoiding cache a invalid generation number into spte, but
>>>> actually if we can figure it out when we check mmio access, it's ok. Like the
>>>> updated patch i posted should fix it, that way avoids doubly increase the number.
>>>
>>> Yes.
>>>
>>>> Okay, if you're interested increasing the number doubly, there is the simpler
>>>> one:
>>>
>>> This wastes a bit in the mmio spte though.  My idea is to increase the
>>> memslots generation twice, but drop the low bit in the mmio spte.
>>
>> Yeah, really smart idea. :)
>>
>> Paolo/David, would you mind making a patch for this (+ the comments in David's
>> patch)?
> 
> Paolo, since it was your idea would you like to write it? I don't mind either
> way.

Sure, I'll post the patch for review.

Paolo


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2014-08-20  8:39 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-14  7:01 [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number Xiao Guangrong
2014-08-14  7:01 ` [PATCH 2/2] kvm: x86: fix stale mmio cache bug Xiao Guangrong
2014-08-14 16:25   ` David Matlack
2014-08-18 21:24   ` Paolo Bonzini
2014-08-14  7:06 ` [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number Xiao Guangrong
2014-08-18 13:57 ` Paolo Bonzini
2014-08-18 16:35   ` Xiao Guangrong
2014-08-18 18:20     ` David Matlack
2014-08-18 18:47     ` Paolo Bonzini
2014-08-18 19:56       ` Xiao Guangrong
2014-08-18 21:15         ` David Matlack
2014-08-18 21:24           ` Paolo Bonzini
2014-08-18 21:33             ` David Matlack
2014-08-19  3:50           ` Xiao Guangrong
2014-08-19  4:31             ` David Matlack
2014-08-19  4:41               ` Xiao Guangrong
2014-08-19  5:00                 ` David Matlack
2014-08-19  5:19                   ` Xiao Guangrong
2014-08-19  5:40                     ` David Matlack
2014-08-19  5:55                       ` Xiao Guangrong
2014-08-19  8:28             ` Paolo Bonzini
2014-08-19  8:50               ` Xiao Guangrong
2014-08-19  9:03                 ` Paolo Bonzini
2014-08-20  0:29                   ` Xiao Guangrong
2014-08-20  1:03                     ` David Matlack
2014-08-20  8:38                       ` Paolo Bonzini
  -- strict thread matches above, loose matches on Subject: below --
2014-08-12  5:02 Xiao Guangrong
2014-08-12 21:18 ` David Matlack
2014-08-14  5:41   ` Xiao Guangrong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).