kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/11] KVM: x86: track guest page access
@ 2016-02-24  9:51 Xiao Guangrong
  2016-02-24  9:51 ` [PATCH 01/11] KVM: MMU: rename has_wrprotected_page to mmu_gfn_lpage_is_disallowed Xiao Guangrong
                   ` (11 more replies)
  0 siblings, 12 replies; 14+ messages in thread
From: Xiao Guangrong @ 2016-02-24  9:51 UTC (permalink / raw)
  To: pbonzini
  Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song,
	Xiao Guangrong

Changelog in v4:
thanks to Paolo's review, the changes in this version are:
- comment struct kvm_mmu_page_role to explain the rule bits used by indirect
  shadow page can not be more than 15 bits
- simplify overflow detection of track count
- remove the wrap functions,  kvm_page_track_{add,remove}_page
- rename kvm_slot_page_track_*_nolock to kvm_slot_page_track_*
- rename the function, check_mode, to page_track_mode_is_valid and inline it
- rename kvm_page_track_check_mode to kvm_page_track_is_active
- check if the notifier list is empty before holding the srcu lock
- move the patch adjusting write-flooding more earlier
 
Changelong in v3:
- refine the code of mmu_need_write_protect() based on Huang Kai's suggestion
- rebase the patchset against current code

Changelog in v2:
- fix a issue that the track memory of memslot is freed if we only move
  the memslot or change the flags of memslot
- do not track the gfn which is not mapped in memslots
- introduce the nolock APIs at the begin of the patchset
- use 'unsigned short' as the track counter to reduce the memory and which
  should be enough for shadow page table and KVMGT

This patchset introduces the feature which allows us to track page
access in guest. Currently, only write access tracking is implemented
in this version.

Four APIs are introduces:
- kvm_page_track_add_page(kvm, gfn, mode), single guest page @gfn is
  added into the track pool of the guest instance represented by @kvm,
  @mode specifies which kind of access on the @gfn is tracked
  
- kvm_page_track_remove_page(kvm, gfn, mode), is the opposed operation
  of kvm_page_track_add_page() which removes @gfn from the tracking pool.
  gfn is no tracked after its last user is gone

- kvm_page_track_register_notifier(kvm, n), register a notifier so that
  the event triggered by page tracking will be received, at that time,
  the callback of n->track_write() will be called

- kvm_page_track_unregister_notifier(kvm, n), does the opposed operation
  of kvm_page_track_register_notifier(), which unlinks the notifier and
  stops receiving the tracked event

The first user of page track is non-leaf shadow page tables as they are
always write protected. It also gains performance improvement because
page track speeds up page fault handler for the tracked pages. The
performance result of kernel building is as followings:

   before           after
real 461.63       real 455.48
user 4529.55      user 4557.88
sys 1995.39       sys 1922.57

Furthermore, it is the infrastructure of other kind of shadow page table,
such as GPU shadow page table introduced in KVMGT (1) and native nested
IOMMU.

(1): http://lkml.iu.edu/hypermail/linux/kernel/1510.3/01562.html

Xiao Guangrong (11):
  KVM: MMU: rename has_wrprotected_page to mmu_gfn_lpage_is_disallowed
  KVM: MMU: introduce kvm_mmu_gfn_{allow,disallow}_lpage
  KVM: MMU: introduce kvm_mmu_slot_gfn_write_protect
  KVM: page track: add the framework of guest page tracking
  KVM: page track: introduce kvm_slot_page_track_{add,remove}_page
  KVM: MMU: let page fault handler be aware tracked page
  KVM: MMU: clear write-flooding on the fast path of tracked page
  KVM: page track: add notifier support
  KVM: MMU: use page track for non-leaf shadow pages
  KVM: MMU: simplify mmu_need_write_protect
  KVM: MMU: apply page track notifier

 Documentation/virtual/kvm/mmu.txt     |   6 +-
 arch/x86/include/asm/kvm_host.h       |  20 ++-
 arch/x86/include/asm/kvm_page_track.h |  61 ++++++++++
 arch/x86/kvm/Makefile                 |   3 +-
 arch/x86/kvm/mmu.c                    | 209 ++++++++++++++++++++++----------
 arch/x86/kvm/mmu.h                    |   5 +
 arch/x86/kvm/page_track.c             | 222 ++++++++++++++++++++++++++++++++++
 arch/x86/kvm/paging_tmpl.h            |   5 +
 arch/x86/kvm/x86.c                    |  27 +++--
 9 files changed, 479 insertions(+), 79 deletions(-)
 create mode 100644 arch/x86/include/asm/kvm_page_track.h
 create mode 100644 arch/x86/kvm/page_track.c

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 01/11] KVM: MMU: rename has_wrprotected_page to mmu_gfn_lpage_is_disallowed
  2016-02-24  9:51 [PATCH v4 00/11] KVM: x86: track guest page access Xiao Guangrong
@ 2016-02-24  9:51 ` Xiao Guangrong
  2016-02-24  9:51 ` [PATCH 02/11] KVM: MMU: introduce kvm_mmu_gfn_{allow,disallow}_lpage Xiao Guangrong
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Xiao Guangrong @ 2016-02-24  9:51 UTC (permalink / raw)
  To: pbonzini
  Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song,
	Xiao Guangrong

kvm_lpage_info->write_count is used to detect if the large page mapping
for the gfn on the specified level is allowed, rename it to disallow_lpage
to reflect its purpose, also we rename has_wrprotected_page() to
mmu_gfn_lpage_is_disallowed() to make the code more clearer

Later we will extend this mechanism for page tracking: if the gfn is
tracked then large mapping for that gfn on any level is not allowed.
The new name is more straightforward

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 Documentation/virtual/kvm/mmu.txt |  6 +++---
 arch/x86/include/asm/kvm_host.h   |  2 +-
 arch/x86/kvm/mmu.c                | 25 +++++++++++++------------
 arch/x86/kvm/x86.c                | 14 ++++++++------
 4 files changed, 25 insertions(+), 22 deletions(-)

diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt
index daf9c0f..dda2e93 100644
--- a/Documentation/virtual/kvm/mmu.txt
+++ b/Documentation/virtual/kvm/mmu.txt
@@ -391,11 +391,11 @@ To instantiate a large spte, four constraints must be satisfied:
   write-protected pages
 - the guest page must be wholly contained by a single memory slot
 
-To check the last two conditions, the mmu maintains a ->write_count set of
+To check the last two conditions, the mmu maintains a ->disallow_lpage set of
 arrays for each memory slot and large page size.  Every write protected page
-causes its write_count to be incremented, thus preventing instantiation of
+causes its disallow_lpage to be incremented, thus preventing instantiation of
 a large spte.  The frames at the end of an unaligned memory slot have
-artificially inflated ->write_counts so they can never be instantiated.
+artificially inflated ->disallow_lpages so they can never be instantiated.
 
 Zapping all pages (page generation count)
 =========================================
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7b54599..1f7fed5 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -644,7 +644,7 @@ struct kvm_vcpu_arch {
 };
 
 struct kvm_lpage_info {
-	int write_count;
+	int disallow_lpage;
 };
 
 struct kvm_arch_memory_slot {
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 07f4c26..06dd4ab 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -789,7 +789,7 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 	slot = __gfn_to_memslot(slots, gfn);
 	for (i = PT_DIRECTORY_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
 		linfo = lpage_info_slot(gfn, slot, i);
-		linfo->write_count += 1;
+		linfo->disallow_lpage += 1;
 	}
 	kvm->arch.indirect_shadow_pages++;
 }
@@ -807,31 +807,32 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 	slot = __gfn_to_memslot(slots, gfn);
 	for (i = PT_DIRECTORY_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
 		linfo = lpage_info_slot(gfn, slot, i);
-		linfo->write_count -= 1;
-		WARN_ON(linfo->write_count < 0);
+		linfo->disallow_lpage -= 1;
+		WARN_ON(linfo->disallow_lpage < 0);
 	}
 	kvm->arch.indirect_shadow_pages--;
 }
 
-static int __has_wrprotected_page(gfn_t gfn, int level,
-				  struct kvm_memory_slot *slot)
+static bool __mmu_gfn_lpage_is_disallowed(gfn_t gfn, int level,
+					  struct kvm_memory_slot *slot)
 {
 	struct kvm_lpage_info *linfo;
 
 	if (slot) {
 		linfo = lpage_info_slot(gfn, slot, level);
-		return linfo->write_count;
+		return !!linfo->disallow_lpage;
 	}
 
-	return 1;
+	return true;
 }
 
-static int has_wrprotected_page(struct kvm_vcpu *vcpu, gfn_t gfn, int level)
+static bool mmu_gfn_lpage_is_disallowed(struct kvm_vcpu *vcpu, gfn_t gfn,
+					int level)
 {
 	struct kvm_memory_slot *slot;
 
 	slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
-	return __has_wrprotected_page(gfn, level, slot);
+	return __mmu_gfn_lpage_is_disallowed(gfn, level, slot);
 }
 
 static int host_mapping_level(struct kvm *kvm, gfn_t gfn)
@@ -897,7 +898,7 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn,
 	max_level = min(kvm_x86_ops->get_lpage_level(), host_level);
 
 	for (level = PT_DIRECTORY_LEVEL; level <= max_level; ++level)
-		if (__has_wrprotected_page(large_gfn, level, slot))
+		if (__mmu_gfn_lpage_is_disallowed(large_gfn, level, slot))
 			break;
 
 	return level - 1;
@@ -2503,7 +2504,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		 * be fixed if guest refault.
 		 */
 		if (level > PT_PAGE_TABLE_LEVEL &&
-		    has_wrprotected_page(vcpu, gfn, level))
+		    mmu_gfn_lpage_is_disallowed(vcpu, gfn, level))
 			goto done;
 
 		spte |= PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE;
@@ -2768,7 +2769,7 @@ static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu,
 	if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn) &&
 	    level == PT_PAGE_TABLE_LEVEL &&
 	    PageTransCompound(pfn_to_page(pfn)) &&
-	    !has_wrprotected_page(vcpu, gfn, PT_DIRECTORY_LEVEL)) {
+	    !mmu_gfn_lpage_is_disallowed(vcpu, gfn, PT_DIRECTORY_LEVEL)) {
 		unsigned long mask;
 		/*
 		 * mmu_notifier_retry was successful and we hold the
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2fb92c0..668625e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7879,6 +7879,7 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
 	int i;
 
 	for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
+		struct kvm_lpage_info *linfo;
 		unsigned long ugfn;
 		int lpages;
 		int level = i + 1;
@@ -7893,15 +7894,16 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
 		if (i == 0)
 			continue;
 
-		slot->arch.lpage_info[i - 1] = kvm_kvzalloc(lpages *
-					sizeof(*slot->arch.lpage_info[i - 1]));
-		if (!slot->arch.lpage_info[i - 1])
+		linfo = kvm_kvzalloc(lpages * sizeof(*linfo));
+		if (!linfo)
 			goto out_free;
 
+		slot->arch.lpage_info[i - 1] = linfo;
+
 		if (slot->base_gfn & (KVM_PAGES_PER_HPAGE(level) - 1))
-			slot->arch.lpage_info[i - 1][0].write_count = 1;
+			linfo[0].disallow_lpage = 1;
 		if ((slot->base_gfn + npages) & (KVM_PAGES_PER_HPAGE(level) - 1))
-			slot->arch.lpage_info[i - 1][lpages - 1].write_count = 1;
+			linfo[lpages - 1].disallow_lpage = 1;
 		ugfn = slot->userspace_addr >> PAGE_SHIFT;
 		/*
 		 * If the gfn and userspace address are not aligned wrt each
@@ -7913,7 +7915,7 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
 			unsigned long j;
 
 			for (j = 0; j < lpages; ++j)
-				slot->arch.lpage_info[i - 1][j].write_count = 1;
+				linfo[j].disallow_lpage = 1;
 		}
 	}
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 02/11] KVM: MMU: introduce kvm_mmu_gfn_{allow,disallow}_lpage
  2016-02-24  9:51 [PATCH v4 00/11] KVM: x86: track guest page access Xiao Guangrong
  2016-02-24  9:51 ` [PATCH 01/11] KVM: MMU: rename has_wrprotected_page to mmu_gfn_lpage_is_disallowed Xiao Guangrong
@ 2016-02-24  9:51 ` Xiao Guangrong
  2016-02-24  9:51 ` [PATCH 03/11] KVM: MMU: introduce kvm_mmu_slot_gfn_write_protect Xiao Guangrong
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Xiao Guangrong @ 2016-02-24  9:51 UTC (permalink / raw)
  To: pbonzini
  Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song,
	Xiao Guangrong

Abstract the common operations from account_shadowed() and
unaccount_shadowed(), then introduce kvm_mmu_gfn_disallow_lpage()
and kvm_mmu_gfn_allow_lpage()

These two functions will be used by page tracking in the later patch

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 arch/x86/kvm/mmu.c | 38 +++++++++++++++++++++++++-------------
 arch/x86/kvm/mmu.h |  3 +++
 2 files changed, 28 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 06dd4ab..566639d 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -776,21 +776,39 @@ static struct kvm_lpage_info *lpage_info_slot(gfn_t gfn,
 	return &slot->arch.lpage_info[level - 2][idx];
 }
 
+static void update_gfn_disallow_lpage_count(struct kvm_memory_slot *slot,
+					    gfn_t gfn, int count)
+{
+	struct kvm_lpage_info *linfo;
+	int i;
+
+	for (i = PT_DIRECTORY_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
+		linfo = lpage_info_slot(gfn, slot, i);
+		linfo->disallow_lpage += count;
+		WARN_ON(linfo->disallow_lpage < 0);
+	}
+}
+
+void kvm_mmu_gfn_disallow_lpage(struct kvm_memory_slot *slot, gfn_t gfn)
+{
+	update_gfn_disallow_lpage_count(slot, gfn, 1);
+}
+
+void kvm_mmu_gfn_allow_lpage(struct kvm_memory_slot *slot, gfn_t gfn)
+{
+	update_gfn_disallow_lpage_count(slot, gfn, -1);
+}
+
 static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	struct kvm_memslots *slots;
 	struct kvm_memory_slot *slot;
-	struct kvm_lpage_info *linfo;
 	gfn_t gfn;
-	int i;
 
 	gfn = sp->gfn;
 	slots = kvm_memslots_for_spte_role(kvm, sp->role);
 	slot = __gfn_to_memslot(slots, gfn);
-	for (i = PT_DIRECTORY_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
-		linfo = lpage_info_slot(gfn, slot, i);
-		linfo->disallow_lpage += 1;
-	}
+	kvm_mmu_gfn_disallow_lpage(slot, gfn);
 	kvm->arch.indirect_shadow_pages++;
 }
 
@@ -798,18 +816,12 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	struct kvm_memslots *slots;
 	struct kvm_memory_slot *slot;
-	struct kvm_lpage_info *linfo;
 	gfn_t gfn;
-	int i;
 
 	gfn = sp->gfn;
 	slots = kvm_memslots_for_spte_role(kvm, sp->role);
 	slot = __gfn_to_memslot(slots, gfn);
-	for (i = PT_DIRECTORY_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
-		linfo = lpage_info_slot(gfn, slot, i);
-		linfo->disallow_lpage -= 1;
-		WARN_ON(linfo->disallow_lpage < 0);
-	}
+	kvm_mmu_gfn_allow_lpage(slot, gfn);
 	kvm->arch.indirect_shadow_pages--;
 }
 
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 55ffb7b..de92bed 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -174,4 +174,7 @@ static inline bool permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 
 void kvm_mmu_invalidate_zap_all_pages(struct kvm *kvm);
 void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
+
+void kvm_mmu_gfn_disallow_lpage(struct kvm_memory_slot *slot, gfn_t gfn);
+void kvm_mmu_gfn_allow_lpage(struct kvm_memory_slot *slot, gfn_t gfn);
 #endif
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 03/11] KVM: MMU: introduce kvm_mmu_slot_gfn_write_protect
  2016-02-24  9:51 [PATCH v4 00/11] KVM: x86: track guest page access Xiao Guangrong
  2016-02-24  9:51 ` [PATCH 01/11] KVM: MMU: rename has_wrprotected_page to mmu_gfn_lpage_is_disallowed Xiao Guangrong
  2016-02-24  9:51 ` [PATCH 02/11] KVM: MMU: introduce kvm_mmu_gfn_{allow,disallow}_lpage Xiao Guangrong
@ 2016-02-24  9:51 ` Xiao Guangrong
  2016-02-24  9:51 ` [PATCH 04/11] KVM: page track: add the framework of guest page tracking Xiao Guangrong
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Xiao Guangrong @ 2016-02-24  9:51 UTC (permalink / raw)
  To: pbonzini
  Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song,
	Xiao Guangrong

Split rmap_write_protect() and introduce the function to abstract the write
protection based on the slot

This function will be used in the later patch

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 arch/x86/kvm/mmu.c | 16 +++++++++++-----
 arch/x86/kvm/mmu.h |  2 ++
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 566639d..7184218 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1336,23 +1336,29 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 		kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
 }
 
-static bool rmap_write_protect(struct kvm_vcpu *vcpu, u64 gfn)
+bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
+				    struct kvm_memory_slot *slot, u64 gfn)
 {
-	struct kvm_memory_slot *slot;
 	struct kvm_rmap_head *rmap_head;
 	int i;
 	bool write_protected = false;
 
-	slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
-
 	for (i = PT_PAGE_TABLE_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
 		rmap_head = __gfn_to_rmap(gfn, i, slot);
-		write_protected |= __rmap_write_protect(vcpu->kvm, rmap_head, true);
+		write_protected |= __rmap_write_protect(kvm, rmap_head, true);
 	}
 
 	return write_protected;
 }
 
+static bool rmap_write_protect(struct kvm_vcpu *vcpu, u64 gfn)
+{
+	struct kvm_memory_slot *slot;
+
+	slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
+	return kvm_mmu_slot_gfn_write_protect(vcpu->kvm, slot, gfn);
+}
+
 static bool kvm_zap_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head)
 {
 	u64 *sptep;
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index de92bed..58fe98a 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -177,4 +177,6 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
 
 void kvm_mmu_gfn_disallow_lpage(struct kvm_memory_slot *slot, gfn_t gfn);
 void kvm_mmu_gfn_allow_lpage(struct kvm_memory_slot *slot, gfn_t gfn);
+bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
+				    struct kvm_memory_slot *slot, u64 gfn);
 #endif
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 04/11] KVM: page track: add the framework of guest page tracking
  2016-02-24  9:51 [PATCH v4 00/11] KVM: x86: track guest page access Xiao Guangrong
                   ` (2 preceding siblings ...)
  2016-02-24  9:51 ` [PATCH 03/11] KVM: MMU: introduce kvm_mmu_slot_gfn_write_protect Xiao Guangrong
@ 2016-02-24  9:51 ` Xiao Guangrong
  2016-02-24  9:51 ` [PATCH 05/11] KVM: page track: introduce kvm_slot_page_track_{add,remove}_page Xiao Guangrong
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Xiao Guangrong @ 2016-02-24  9:51 UTC (permalink / raw)
  To: pbonzini
  Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song,
	Xiao Guangrong

The array, gfn_track[mode][gfn], is introduced in memory slot for every
guest page, this is the tracking count for the gust page on different
modes. If the page is tracked then the count is increased, the page is
not tracked after the count reaches zero

We use 'unsigned short' as the tracking count which should be enough as
shadow page table only can use 2^14 (2^3 for level, 2^1 for cr4_pae, 2^2
for quadrant, 2^3 for access, 2^1 for nxe, 2^1 for cr0_wp, 2^1 for
smep_andnot_wp, 2^1 for smap_andnot_wp, and 2^1 for smm) at most, there
is enough room for other trackers

Two callbacks, kvm_page_track_create_memslot() and
kvm_page_track_free_memslot() are implemented in this patch, they are
internally used to initialize and reclaim the memory of the array

Currently, only write track mode is supported

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 arch/x86/include/asm/kvm_host.h       | 10 +++++++
 arch/x86/include/asm/kvm_page_track.h | 13 +++++++++
 arch/x86/kvm/Makefile                 |  3 +-
 arch/x86/kvm/page_track.c             | 52 +++++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c                    |  5 ++++
 5 files changed, 82 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/include/asm/kvm_page_track.h
 create mode 100644 arch/x86/kvm/page_track.c

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1f7fed5..71e43fe 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -32,6 +32,7 @@
 #include <asm/mtrr.h>
 #include <asm/msr-index.h>
 #include <asm/asm.h>
+#include <asm/kvm_page_track.h>
 
 #define KVM_MAX_VCPUS 255
 #define KVM_SOFT_MAX_VCPUS 160
@@ -214,6 +215,14 @@ struct kvm_mmu_memory_cache {
 	void *objects[KVM_NR_MEM_OBJS];
 };
 
+/*
+ * the pages used as guest page table on soft mmu are tracked by
+ * kvm_memory_slot.arch.gfn_track which is 16 bits, so the role bits used
+ * by indirect shadow page can not be more than 15 bits.
+ *
+ * Currently, we used 14 bits that are @level, @cr4_pae, @quadrant, @access,
+ * @nxe, @cr0_wp, @smep_andnot_wp and @smap_andnot_wp.
+ */
 union kvm_mmu_page_role {
 	unsigned word;
 	struct {
@@ -650,6 +659,7 @@ struct kvm_lpage_info {
 struct kvm_arch_memory_slot {
 	struct kvm_rmap_head *rmap[KVM_NR_PAGE_SIZES];
 	struct kvm_lpage_info *lpage_info[KVM_NR_PAGE_SIZES - 1];
+	unsigned short *gfn_track[KVM_PAGE_TRACK_MAX];
 };
 
 /*
diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
new file mode 100644
index 0000000..55200406
--- /dev/null
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -0,0 +1,13 @@
+#ifndef _ASM_X86_KVM_PAGE_TRACK_H
+#define _ASM_X86_KVM_PAGE_TRACK_H
+
+enum kvm_page_track_mode {
+	KVM_PAGE_TRACK_WRITE,
+	KVM_PAGE_TRACK_MAX,
+};
+
+void kvm_page_track_free_memslot(struct kvm_memory_slot *free,
+				 struct kvm_memory_slot *dont);
+int kvm_page_track_create_memslot(struct kvm_memory_slot *slot,
+				  unsigned long npages);
+#endif
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index a1ff508..464fa47 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -13,9 +13,10 @@ kvm-$(CONFIG_KVM_ASYNC_PF)	+= $(KVM)/async_pf.o
 
 kvm-y			+= x86.o mmu.o emulate.o i8259.o irq.o lapic.o \
 			   i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \
-			   hyperv.o
+			   hyperv.o page_track.o
 
 kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT)	+= assigned-dev.o iommu.o
+
 kvm-intel-y		+= vmx.o pmu_intel.o
 kvm-amd-y		+= svm.o pmu_amd.o
 
diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c
new file mode 100644
index 0000000..8c396d0
--- /dev/null
+++ b/arch/x86/kvm/page_track.c
@@ -0,0 +1,52 @@
+/*
+ * Support KVM gust page tracking
+ *
+ * This feature allows us to track page access in guest. Currently, only
+ * write access is tracked.
+ *
+ * Copyright(C) 2015 Intel Corporation.
+ *
+ * Author:
+ *   Xiao Guangrong <guangrong.xiao@linux.intel.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include <linux/kvm_host.h>
+#include <asm/kvm_host.h>
+#include <asm/kvm_page_track.h>
+
+#include "mmu.h"
+
+void kvm_page_track_free_memslot(struct kvm_memory_slot *free,
+				 struct kvm_memory_slot *dont)
+{
+	int i;
+
+	for (i = 0; i < KVM_PAGE_TRACK_MAX; i++)
+		if (!dont || free->arch.gfn_track[i] !=
+		      dont->arch.gfn_track[i]) {
+			kvfree(free->arch.gfn_track[i]);
+			free->arch.gfn_track[i] = NULL;
+		}
+}
+
+int kvm_page_track_create_memslot(struct kvm_memory_slot *slot,
+				  unsigned long npages)
+{
+	int  i;
+
+	for (i = 0; i < KVM_PAGE_TRACK_MAX; i++) {
+		slot->arch.gfn_track[i] = kvm_kvzalloc(npages *
+					    sizeof(*slot->arch.gfn_track[i]));
+		if (!slot->arch.gfn_track[i])
+			goto track_free;
+	}
+
+	return 0;
+
+track_free:
+	kvm_page_track_free_memslot(slot, NULL);
+	return -ENOMEM;
+}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 668625e..7b4cfea 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7871,6 +7871,8 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
 			free->arch.lpage_info[i - 1] = NULL;
 		}
 	}
+
+	kvm_page_track_free_memslot(free, dont);
 }
 
 int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
@@ -7919,6 +7921,9 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
 		}
 	}
 
+	if (kvm_page_track_create_memslot(slot, npages))
+		goto out_free;
+
 	return 0;
 
 out_free:
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 05/11] KVM: page track: introduce kvm_slot_page_track_{add,remove}_page
  2016-02-24  9:51 [PATCH v4 00/11] KVM: x86: track guest page access Xiao Guangrong
                   ` (3 preceding siblings ...)
  2016-02-24  9:51 ` [PATCH 04/11] KVM: page track: add the framework of guest page tracking Xiao Guangrong
@ 2016-02-24  9:51 ` Xiao Guangrong
  2016-02-24  9:51 ` [PATCH 06/11] KVM: MMU: let page fault handler be aware tracked page Xiao Guangrong
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Xiao Guangrong @ 2016-02-24  9:51 UTC (permalink / raw)
  To: pbonzini
  Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song,
	Xiao Guangrong

These two functions are the user APIs:
- kvm_slot_page_track_add_page(): add the page to the tracking pool
  after that later specified access on that page will be tracked

- kvm_slot_page_track_remove_page(): remove the page from the tracking
  pool, the specified access on the page is not tracked after the last
  user is gone

Both of these are called under the protection both of mmu-lock and
kvm->srcu or kvm->slots_lock

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 arch/x86/include/asm/kvm_page_track.h |  7 +++
 arch/x86/kvm/page_track.c             | 85 +++++++++++++++++++++++++++++++++++
 2 files changed, 92 insertions(+)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 55200406..e363e30 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -10,4 +10,11 @@ void kvm_page_track_free_memslot(struct kvm_memory_slot *free,
 				 struct kvm_memory_slot *dont);
 int kvm_page_track_create_memslot(struct kvm_memory_slot *slot,
 				  unsigned long npages);
+
+void kvm_slot_page_track_add_page(struct kvm *kvm,
+				  struct kvm_memory_slot *slot, gfn_t gfn,
+				  enum kvm_page_track_mode mode);
+void kvm_slot_page_track_remove_page(struct kvm *kvm,
+				     struct kvm_memory_slot *slot, gfn_t gfn,
+				     enum kvm_page_track_mode mode);
 #endif
diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c
index 8c396d0..cd76bc3 100644
--- a/arch/x86/kvm/page_track.c
+++ b/arch/x86/kvm/page_track.c
@@ -50,3 +50,88 @@ track_free:
 	kvm_page_track_free_memslot(slot, NULL);
 	return -ENOMEM;
 }
+
+static inline bool page_track_mode_is_valid(enum kvm_page_track_mode mode)
+{
+	if (mode < 0 || mode >= KVM_PAGE_TRACK_MAX)
+		return false;
+
+	return true;
+}
+
+static void update_gfn_track(struct kvm_memory_slot *slot, gfn_t gfn,
+			     enum kvm_page_track_mode mode, short count)
+{
+	int index, val;
+
+	index = gfn_to_index(gfn, slot->base_gfn, PT_PAGE_TABLE_LEVEL);
+
+	val = slot->arch.gfn_track[mode][index];
+
+	if (WARN_ON(val + count < 0 || val + count > USHRT_MAX))
+		return;
+
+	slot->arch.gfn_track[mode][index] += count;
+}
+
+/*
+ * add guest page to the tracking pool so that corresponding access on that
+ * page will be intercepted.
+ *
+ * It should be called under the protection both of mmu-lock and kvm->srcu
+ * or kvm->slots_lock.
+ *
+ * @kvm: the guest instance we are interested in.
+ * @slot: the @gfn belongs to.
+ * @gfn: the guest page.
+ * @mode: tracking mode, currently only write track is supported.
+ */
+void kvm_slot_page_track_add_page(struct kvm *kvm,
+				  struct kvm_memory_slot *slot, gfn_t gfn,
+				  enum kvm_page_track_mode mode)
+{
+
+	if (WARN_ON(!page_track_mode_is_valid(mode)))
+		return;
+
+	update_gfn_track(slot, gfn, mode, 1);
+
+	/*
+	 * new track stops large page mapping for the
+	 * tracked page.
+	 */
+	kvm_mmu_gfn_disallow_lpage(slot, gfn);
+
+	if (mode == KVM_PAGE_TRACK_WRITE)
+		if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn))
+			kvm_flush_remote_tlbs(kvm);
+}
+
+/*
+ * remove the guest page from the tracking pool which stops the interception
+ * of corresponding access on that page. It is the opposed operation of
+ * kvm_slot_page_track_add_page().
+ *
+ * It should be called under the protection both of mmu-lock and kvm->srcu
+ * or kvm->slots_lock.
+ *
+ * @kvm: the guest instance we are interested in.
+ * @slot: the @gfn belongs to.
+ * @gfn: the guest page.
+ * @mode: tracking mode, currently only write track is supported.
+ */
+void kvm_slot_page_track_remove_page(struct kvm *kvm,
+				     struct kvm_memory_slot *slot, gfn_t gfn,
+				     enum kvm_page_track_mode mode)
+{
+	if (WARN_ON(!page_track_mode_is_valid(mode)))
+		return;
+
+	update_gfn_track(slot, gfn, mode, -1);
+
+	/*
+	 * allow large page mapping for the tracked page
+	 * after the tracker is gone.
+	 */
+	kvm_mmu_gfn_allow_lpage(slot, gfn);
+}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 06/11] KVM: MMU: let page fault handler be aware tracked page
  2016-02-24  9:51 [PATCH v4 00/11] KVM: x86: track guest page access Xiao Guangrong
                   ` (4 preceding siblings ...)
  2016-02-24  9:51 ` [PATCH 05/11] KVM: page track: introduce kvm_slot_page_track_{add,remove}_page Xiao Guangrong
@ 2016-02-24  9:51 ` Xiao Guangrong
  2016-02-24  9:51 ` [PATCH 07/11] KVM: MMU: clear write-flooding on the fast path of " Xiao Guangrong
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Xiao Guangrong @ 2016-02-24  9:51 UTC (permalink / raw)
  To: pbonzini
  Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song,
	Xiao Guangrong

The page fault caused by write access on the write tracked page can not
be fixed, it always need to be emulated. page_fault_handle_page_track()
is the fast path we introduce here to skip holding mmu-lock and shadow
page table walking

However, if the page table is not present, it is worth making the page
table entry present and readonly to make the read access happy

mmu_need_write_protect() need to be cooked to avoid page becoming writable
when making page table present or sync/prefetch shadow page table entries

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 arch/x86/include/asm/kvm_page_track.h |  2 ++
 arch/x86/kvm/mmu.c                    | 44 +++++++++++++++++++++++++++++------
 arch/x86/kvm/page_track.c             | 15 ++++++++++++
 arch/x86/kvm/paging_tmpl.h            |  3 +++
 4 files changed, 57 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index e363e30..5f16e28 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -17,4 +17,6 @@ void kvm_slot_page_track_add_page(struct kvm *kvm,
 void kvm_slot_page_track_remove_page(struct kvm *kvm,
 				     struct kvm_memory_slot *slot, gfn_t gfn,
 				     enum kvm_page_track_mode mode);
+bool kvm_page_track_is_active(struct kvm_vcpu *vcpu, gfn_t gfn,
+			      enum kvm_page_track_mode mode);
 #endif
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 7184218..dd8e3ca 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -41,6 +41,7 @@
 #include <asm/cmpxchg.h>
 #include <asm/io.h>
 #include <asm/vmx.h>
+#include <asm/kvm_page_track.h>
 
 /*
  * When setting this variable to true it enables Two-Dimensional-Paging
@@ -2448,25 +2449,29 @@ static void kvm_unsync_pages(struct kvm_vcpu *vcpu,  gfn_t gfn)
 	}
 }
 
-static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
-				  bool can_unsync)
+static bool mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
+				   bool can_unsync)
 {
 	struct kvm_mmu_page *s;
 	bool need_unsync = false;
 
+	if (kvm_page_track_is_active(vcpu, gfn, KVM_PAGE_TRACK_WRITE))
+		return true;
+
 	for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn) {
 		if (!can_unsync)
-			return 1;
+			return true;
 
 		if (s->role.level != PT_PAGE_TABLE_LEVEL)
-			return 1;
+			return true;
 
 		if (!s->unsync)
 			need_unsync = true;
 	}
 	if (need_unsync)
 		kvm_unsync_pages(vcpu, gfn);
-	return 0;
+
+	return false;
 }
 
 static bool kvm_is_mmio_pfn(kvm_pfn_t pfn)
@@ -3381,21 +3386,43 @@ int handle_mmio_page_fault(struct kvm_vcpu *vcpu, u64 addr, bool direct)
 }
 EXPORT_SYMBOL_GPL(handle_mmio_page_fault);
 
+static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
+					 u32 error_code, gfn_t gfn)
+{
+	if (unlikely(error_code & PFERR_RSVD_MASK))
+		return false;
+
+	if (!(error_code & PFERR_PRESENT_MASK) ||
+	      !(error_code & PFERR_WRITE_MASK))
+		return false;
+
+	/*
+	 * guest is writing the page which is write tracked which can
+	 * not be fixed by page fault handler.
+	 */
+	if (kvm_page_track_is_active(vcpu, gfn, KVM_PAGE_TRACK_WRITE))
+		return true;
+
+	return false;
+}
+
 static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
 				u32 error_code, bool prefault)
 {
-	gfn_t gfn;
+	gfn_t gfn = gva >> PAGE_SHIFT;
 	int r;
 
 	pgprintk("%s: gva %lx error %x\n", __func__, gva, error_code);
 
+	if (page_fault_handle_page_track(vcpu, error_code, gfn))
+		return 1;
+
 	r = mmu_topup_memory_caches(vcpu);
 	if (r)
 		return r;
 
 	MMU_WARN_ON(!VALID_PAGE(vcpu->arch.mmu.root_hpa));
 
-	gfn = gva >> PAGE_SHIFT;
 
 	return nonpaging_map(vcpu, gva & PAGE_MASK,
 			     error_code, gfn, prefault);
@@ -3472,6 +3499,9 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
 
 	MMU_WARN_ON(!VALID_PAGE(vcpu->arch.mmu.root_hpa));
 
+	if (page_fault_handle_page_track(vcpu, error_code, gfn))
+		return 1;
+
 	r = mmu_topup_memory_caches(vcpu);
 	if (r)
 		return r;
diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c
index cd76bc3..f127f6d 100644
--- a/arch/x86/kvm/page_track.c
+++ b/arch/x86/kvm/page_track.c
@@ -135,3 +135,18 @@ void kvm_slot_page_track_remove_page(struct kvm *kvm,
 	 */
 	kvm_mmu_gfn_allow_lpage(slot, gfn);
 }
+
+/*
+ * check if the corresponding access on the specified guest page is tracked.
+ */
+bool kvm_page_track_is_active(struct kvm_vcpu *vcpu, gfn_t gfn,
+			      enum kvm_page_track_mode mode)
+{
+	struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
+	int index = gfn_to_index(gfn, slot->base_gfn, PT_PAGE_TABLE_LEVEL);
+
+	if (WARN_ON(!page_track_mode_is_valid(mode)))
+		return false;
+
+	return !!ACCESS_ONCE(slot->arch.gfn_track[mode][index]);
+}
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 05827ff..52ae2d9 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -728,6 +728,9 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
 		return 0;
 	}
 
+	if (page_fault_handle_page_track(vcpu, error_code, walker.gfn))
+		return 1;
+
 	vcpu->arch.write_fault_to_shadow_pgtable = false;
 
 	is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 07/11] KVM: MMU: clear write-flooding on the fast path of tracked page
  2016-02-24  9:51 [PATCH v4 00/11] KVM: x86: track guest page access Xiao Guangrong
                   ` (5 preceding siblings ...)
  2016-02-24  9:51 ` [PATCH 06/11] KVM: MMU: let page fault handler be aware tracked page Xiao Guangrong
@ 2016-02-24  9:51 ` Xiao Guangrong
  2016-02-24  9:51 ` [PATCH 08/11] KVM: page track: add notifier support Xiao Guangrong
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Xiao Guangrong @ 2016-02-24  9:51 UTC (permalink / raw)
  To: pbonzini
  Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song,
	Xiao Guangrong

If the page fault is caused by write access on write tracked page, the
real shadow page walking is skipped, we lost the chance to clear write
flooding for the page structure current vcpu is using

Fix it by locklessly waking shadow page table to clear write flooding
on the shadow page structure out of mmu-lock. So that we change the
count to atomic_t

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/mmu.c              | 22 ++++++++++++++++++++--
 arch/x86/kvm/paging_tmpl.h      |  4 +++-
 3 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 71e43fe..e2fc5c0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -285,7 +285,7 @@ struct kvm_mmu_page {
 #endif
 
 	/* Number of writes since the last time traversal visited this page.  */
-	int write_flooding_count;
+	atomic_t write_flooding_count;
 };
 
 struct kvm_pio_request {
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index dd8e3ca..58c067d 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2063,7 +2063,7 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
 
 static void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp)
 {
-	sp->write_flooding_count = 0;
+	atomic_set(&sp->write_flooding_count,  0);
 }
 
 static void clear_sp_write_flooding_count(u64 *spte)
@@ -3406,6 +3406,23 @@ static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
 	return false;
 }
 
+static void shadow_page_table_clear_flood(struct kvm_vcpu *vcpu, gva_t addr)
+{
+	struct kvm_shadow_walk_iterator iterator;
+	u64 spte;
+
+	if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
+		return;
+
+	walk_shadow_page_lockless_begin(vcpu);
+	for_each_shadow_entry_lockless(vcpu, addr, iterator, spte) {
+		clear_sp_write_flooding_count(iterator.sptep);
+		if (!is_shadow_present_pte(spte))
+			break;
+	}
+	walk_shadow_page_lockless_end(vcpu);
+}
+
 static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
 				u32 error_code, bool prefault)
 {
@@ -4221,7 +4238,8 @@ static bool detect_write_flooding(struct kvm_mmu_page *sp)
 	if (sp->role.level == PT_PAGE_TABLE_LEVEL)
 		return false;
 
-	return ++sp->write_flooding_count >= 3;
+	atomic_inc(&sp->write_flooding_count);
+	return atomic_read(&sp->write_flooding_count) >= 3;
 }
 
 /*
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 52ae2d9..4174cf2 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -728,8 +728,10 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
 		return 0;
 	}
 
-	if (page_fault_handle_page_track(vcpu, error_code, walker.gfn))
+	if (page_fault_handle_page_track(vcpu, error_code, walker.gfn)) {
+		shadow_page_table_clear_flood(vcpu, addr);
 		return 1;
+	}
 
 	vcpu->arch.write_fault_to_shadow_pgtable = false;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 08/11] KVM: page track: add notifier support
  2016-02-24  9:51 [PATCH v4 00/11] KVM: x86: track guest page access Xiao Guangrong
                   ` (6 preceding siblings ...)
  2016-02-24  9:51 ` [PATCH 07/11] KVM: MMU: clear write-flooding on the fast path of " Xiao Guangrong
@ 2016-02-24  9:51 ` Xiao Guangrong
  2016-02-24  9:51 ` [PATCH 09/11] KVM: MMU: use page track for non-leaf shadow pages Xiao Guangrong
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Xiao Guangrong @ 2016-02-24  9:51 UTC (permalink / raw)
  To: pbonzini
  Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song,
	Xiao Guangrong

Notifier list is introduced so that any node wants to receive the track
event can register to the list

Two APIs are introduced here:
- kvm_page_track_register_notifier(): register the notifier to receive
  track event

- kvm_page_track_unregister_notifier(): stop receiving track event by
  unregister the notifier

The callback, node->track_write() is called when a write access on the
write tracked page happens

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 arch/x86/include/asm/kvm_host.h       |  1 +
 arch/x86/include/asm/kvm_page_track.h | 39 +++++++++++++++++++
 arch/x86/kvm/page_track.c             | 70 +++++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c                    |  4 ++
 4 files changed, 114 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e2fc5c0..eb68e6a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -704,6 +704,7 @@ struct kvm_arch {
 	 */
 	struct list_head active_mmu_pages;
 	struct list_head zapped_obsolete_pages;
+	struct kvm_page_track_notifier_head track_notifier_head;
 
 	struct list_head assigned_dev_head;
 	struct iommu_domain *iommu_domain;
diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 5f16e28..c2b8d24 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -6,6 +6,36 @@ enum kvm_page_track_mode {
 	KVM_PAGE_TRACK_MAX,
 };
 
+/*
+ * The notifier represented by @kvm_page_track_notifier_node is linked into
+ * the head which will be notified when guest is triggering the track event.
+ *
+ * Write access on the head is protected by kvm->mmu_lock, read access
+ * is protected by track_srcu.
+ */
+struct kvm_page_track_notifier_head {
+	struct srcu_struct track_srcu;
+	struct hlist_head track_notifier_list;
+};
+
+struct kvm_page_track_notifier_node {
+	struct hlist_node node;
+
+	/*
+	 * It is called when guest is writing the write-tracked page
+	 * and write emulation is finished at that time.
+	 *
+	 * @vcpu: the vcpu where the write access happened.
+	 * @gpa: the physical address written by guest.
+	 * @new: the data was written to the address.
+	 * @bytes: the written length.
+	 */
+	void (*track_write)(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
+			    int bytes);
+};
+
+void kvm_page_track_init(struct kvm *kvm);
+
 void kvm_page_track_free_memslot(struct kvm_memory_slot *free,
 				 struct kvm_memory_slot *dont);
 int kvm_page_track_create_memslot(struct kvm_memory_slot *slot,
@@ -19,4 +49,13 @@ void kvm_slot_page_track_remove_page(struct kvm *kvm,
 				     enum kvm_page_track_mode mode);
 bool kvm_page_track_is_active(struct kvm_vcpu *vcpu, gfn_t gfn,
 			      enum kvm_page_track_mode mode);
+
+void
+kvm_page_track_register_notifier(struct kvm *kvm,
+				 struct kvm_page_track_notifier_node *n);
+void
+kvm_page_track_unregister_notifier(struct kvm *kvm,
+				   struct kvm_page_track_notifier_node *n);
+void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
+			  int bytes);
 #endif
diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c
index f127f6d..11f7643 100644
--- a/arch/x86/kvm/page_track.c
+++ b/arch/x86/kvm/page_track.c
@@ -150,3 +150,73 @@ bool kvm_page_track_is_active(struct kvm_vcpu *vcpu, gfn_t gfn,
 
 	return !!ACCESS_ONCE(slot->arch.gfn_track[mode][index]);
 }
+
+void kvm_page_track_init(struct kvm *kvm)
+{
+	struct kvm_page_track_notifier_head *head;
+
+	head = &kvm->arch.track_notifier_head;
+	init_srcu_struct(&head->track_srcu);
+	INIT_HLIST_HEAD(&head->track_notifier_list);
+}
+
+/*
+ * register the notifier so that event interception for the tracked guest
+ * pages can be received.
+ */
+void
+kvm_page_track_register_notifier(struct kvm *kvm,
+				 struct kvm_page_track_notifier_node *n)
+{
+	struct kvm_page_track_notifier_head *head;
+
+	head = &kvm->arch.track_notifier_head;
+
+	spin_lock(&kvm->mmu_lock);
+	hlist_add_head_rcu(&n->node, &head->track_notifier_list);
+	spin_unlock(&kvm->mmu_lock);
+}
+
+/*
+ * stop receiving the event interception. It is the opposed operation of
+ * kvm_page_track_register_notifier().
+ */
+void
+kvm_page_track_unregister_notifier(struct kvm *kvm,
+				   struct kvm_page_track_notifier_node *n)
+{
+	struct kvm_page_track_notifier_head *head;
+
+	head = &kvm->arch.track_notifier_head;
+
+	spin_lock(&kvm->mmu_lock);
+	hlist_del_rcu(&n->node);
+	spin_unlock(&kvm->mmu_lock);
+	synchronize_srcu(&head->track_srcu);
+}
+
+/*
+ * Notify the node that write access is intercepted and write emulation is
+ * finished at this time.
+ *
+ * The node should figure out if the written page is the one that node is
+ * interested in by itself.
+ */
+void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
+			  int bytes)
+{
+	struct kvm_page_track_notifier_head *head;
+	struct kvm_page_track_notifier_node *n;
+	int idx;
+
+	head = &vcpu->kvm->arch.track_notifier_head;
+
+	if (hlist_empty(&head->track_notifier_list))
+		return;
+
+	idx = srcu_read_lock(&head->track_srcu);
+	hlist_for_each_entry_rcu(n, &head->track_notifier_list, node)
+		if (n->track_write)
+			n->track_write(vcpu, gpa, new, bytes);
+	srcu_read_unlock(&head->track_srcu, idx);
+}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7b4cfea..b81c14e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4346,6 +4346,7 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
 	if (ret < 0)
 		return 0;
 	kvm_mmu_pte_write(vcpu, gpa, val, bytes);
+	kvm_page_track_write(vcpu, gpa, val, bytes);
 	return 1;
 }
 
@@ -4604,6 +4605,7 @@ static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt,
 
 	kvm_vcpu_mark_page_dirty(vcpu, gpa >> PAGE_SHIFT);
 	kvm_mmu_pte_write(vcpu, gpa, new, bytes);
+	kvm_page_track_write(vcpu, gpa, new, bytes);
 
 	return X86EMUL_CONTINUE;
 
@@ -7724,6 +7726,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	INIT_DELAYED_WORK(&kvm->arch.kvmclock_update_work, kvmclock_update_fn);
 	INIT_DELAYED_WORK(&kvm->arch.kvmclock_sync_work, kvmclock_sync_fn);
 
+	kvm_page_track_init(kvm);
+
 	return 0;
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 09/11] KVM: MMU: use page track for non-leaf shadow pages
  2016-02-24  9:51 [PATCH v4 00/11] KVM: x86: track guest page access Xiao Guangrong
                   ` (7 preceding siblings ...)
  2016-02-24  9:51 ` [PATCH 08/11] KVM: page track: add notifier support Xiao Guangrong
@ 2016-02-24  9:51 ` Xiao Guangrong
  2016-02-29 15:22   ` Paolo Bonzini
  2016-02-24  9:51 ` [PATCH 10/11] KVM: MMU: simplify mmu_need_write_protect Xiao Guangrong
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 14+ messages in thread
From: Xiao Guangrong @ 2016-02-24  9:51 UTC (permalink / raw)
  To: pbonzini
  Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song,
	Xiao Guangrong

non-leaf shadow pages are always write protected, it can be the user
of page track

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 arch/x86/kvm/mmu.c | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 58c067d..74684b2 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -806,11 +806,17 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 	struct kvm_memory_slot *slot;
 	gfn_t gfn;
 
+	kvm->arch.indirect_shadow_pages++;
 	gfn = sp->gfn;
 	slots = kvm_memslots_for_spte_role(kvm, sp->role);
 	slot = __gfn_to_memslot(slots, gfn);
+
+	/* the non-leaf shadow pages are keeping readonly. */
+	if (sp->role.level > PT_PAGE_TABLE_LEVEL)
+		return kvm_slot_page_track_add_page(kvm, slot, gfn,
+						    KVM_PAGE_TRACK_WRITE);
+
 	kvm_mmu_gfn_disallow_lpage(slot, gfn);
-	kvm->arch.indirect_shadow_pages++;
 }
 
 static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
@@ -819,11 +825,15 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 	struct kvm_memory_slot *slot;
 	gfn_t gfn;
 
+	kvm->arch.indirect_shadow_pages--;
 	gfn = sp->gfn;
 	slots = kvm_memslots_for_spte_role(kvm, sp->role);
 	slot = __gfn_to_memslot(slots, gfn);
+	if (sp->role.level > PT_PAGE_TABLE_LEVEL)
+		return kvm_slot_page_track_remove_page(kvm, slot, gfn,
+						       KVM_PAGE_TRACK_WRITE);
+
 	kvm_mmu_gfn_allow_lpage(slot, gfn);
-	kvm->arch.indirect_shadow_pages--;
 }
 
 static bool __mmu_gfn_lpage_is_disallowed(gfn_t gfn, int level,
@@ -2132,12 +2142,18 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 	hlist_add_head(&sp->hash_link,
 		&vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)]);
 	if (!direct) {
-		if (rmap_write_protect(vcpu, gfn))
+		/*
+		 * we should do write protection before syncing pages
+		 * otherwise the content of the synced shadow page may
+		 * be inconsistent with guest page table.
+		 */
+		account_shadowed(vcpu->kvm, sp);
+
+		if (level == PT_PAGE_TABLE_LEVEL &&
+		      rmap_write_protect(vcpu, gfn))
 			kvm_flush_remote_tlbs(vcpu->kvm);
 		if (level > PT_PAGE_TABLE_LEVEL && need_sync)
 			kvm_sync_pages(vcpu, gfn);
-
-		account_shadowed(vcpu->kvm, sp);
 	}
 	sp->mmu_valid_gen = vcpu->kvm->arch.mmu_valid_gen;
 	clear_page(sp->spt);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 10/11] KVM: MMU: simplify mmu_need_write_protect
  2016-02-24  9:51 [PATCH v4 00/11] KVM: x86: track guest page access Xiao Guangrong
                   ` (8 preceding siblings ...)
  2016-02-24  9:51 ` [PATCH 09/11] KVM: MMU: use page track for non-leaf shadow pages Xiao Guangrong
@ 2016-02-24  9:51 ` Xiao Guangrong
  2016-02-24  9:51 ` [PATCH 11/11] KVM: MMU: apply page track notifier Xiao Guangrong
  2016-02-24 13:42 ` [PATCH v4 00/11] KVM: x86: track guest page access Paolo Bonzini
  11 siblings, 0 replies; 14+ messages in thread
From: Xiao Guangrong @ 2016-02-24  9:51 UTC (permalink / raw)
  To: pbonzini
  Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song,
	Xiao Guangrong

Now, all non-leaf shadow page are page tracked, if gfn is not tracked
there is no non-leaf shadow page of gfn is existed, we can directly
make the shadow page of gfn to unsync

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 arch/x86/kvm/mmu.c | 29 +++++++----------------------
 1 file changed, 7 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 74684b2..ddf67e1 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2444,7 +2444,7 @@ int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn)
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page);
 
-static void __kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
+static void kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 {
 	trace_kvm_mmu_unsync_page(sp);
 	++vcpu->kvm->stat.mmu_unsync;
@@ -2453,39 +2453,24 @@ static void __kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 	kvm_mmu_mark_parents_unsync(sp);
 }
 
-static void kvm_unsync_pages(struct kvm_vcpu *vcpu,  gfn_t gfn)
-{
-	struct kvm_mmu_page *s;
-
-	for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn) {
-		if (s->unsync)
-			continue;
-		WARN_ON(s->role.level != PT_PAGE_TABLE_LEVEL);
-		__kvm_unsync_page(vcpu, s);
-	}
-}
-
 static bool mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
 				   bool can_unsync)
 {
-	struct kvm_mmu_page *s;
-	bool need_unsync = false;
+	struct kvm_mmu_page *sp;
 
 	if (kvm_page_track_is_active(vcpu, gfn, KVM_PAGE_TRACK_WRITE))
 		return true;
 
-	for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn) {
+	for_each_gfn_indirect_valid_sp(vcpu->kvm, sp, gfn) {
 		if (!can_unsync)
 			return true;
 
-		if (s->role.level != PT_PAGE_TABLE_LEVEL)
-			return true;
+		if (sp->unsync)
+			continue;
 
-		if (!s->unsync)
-			need_unsync = true;
+		WARN_ON(sp->role.level != PT_PAGE_TABLE_LEVEL);
+		kvm_unsync_page(vcpu, sp);
 	}
-	if (need_unsync)
-		kvm_unsync_pages(vcpu, gfn);
 
 	return false;
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 11/11] KVM: MMU: apply page track notifier
  2016-02-24  9:51 [PATCH v4 00/11] KVM: x86: track guest page access Xiao Guangrong
                   ` (9 preceding siblings ...)
  2016-02-24  9:51 ` [PATCH 10/11] KVM: MMU: simplify mmu_need_write_protect Xiao Guangrong
@ 2016-02-24  9:51 ` Xiao Guangrong
  2016-02-24 13:42 ` [PATCH v4 00/11] KVM: x86: track guest page access Paolo Bonzini
  11 siblings, 0 replies; 14+ messages in thread
From: Xiao Guangrong @ 2016-02-24  9:51 UTC (permalink / raw)
  To: pbonzini
  Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song,
	Xiao Guangrong

Register the notifier to receive write track event so that we can update
our shadow page table

It makes kvm_mmu_pte_write() be the callback of the notifier, no function
is changed

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 arch/x86/include/asm/kvm_host.h |  5 +++--
 arch/x86/kvm/mmu.c              | 19 +++++++++++++++++--
 arch/x86/kvm/x86.c              |  4 ++--
 3 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index eb68e6a..1c3e390 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -704,6 +704,7 @@ struct kvm_arch {
 	 */
 	struct list_head active_mmu_pages;
 	struct list_head zapped_obsolete_pages;
+	struct kvm_page_track_notifier_node mmu_sp_tracker;
 	struct kvm_page_track_notifier_head track_notifier_head;
 
 	struct list_head assigned_dev_head;
@@ -1001,6 +1002,8 @@ void kvm_mmu_module_exit(void);
 void kvm_mmu_destroy(struct kvm_vcpu *vcpu);
 int kvm_mmu_create(struct kvm_vcpu *vcpu);
 void kvm_mmu_setup(struct kvm_vcpu *vcpu);
+void kvm_mmu_init_vm(struct kvm *kvm);
+void kvm_mmu_uninit_vm(struct kvm *kvm);
 void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
 		u64 dirty_mask, u64 nx_mask, u64 x_mask);
 
@@ -1140,8 +1143,6 @@ void kvm_pic_clear_all(struct kvm_pic *pic, int irq_source_id);
 
 void kvm_inject_nmi(struct kvm_vcpu *vcpu);
 
-void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
-		       const u8 *new, int bytes);
 int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn);
 int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva);
 void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ddf67e1..09a59e3c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4302,8 +4302,8 @@ static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte)
 	return spte;
 }
 
-void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
-		       const u8 *new, int bytes)
+static void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
+			      const u8 *new, int bytes)
 {
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	struct kvm_mmu_page *sp;
@@ -4517,6 +4517,21 @@ void kvm_mmu_setup(struct kvm_vcpu *vcpu)
 	init_kvm_mmu(vcpu);
 }
 
+void kvm_mmu_init_vm(struct kvm *kvm)
+{
+	struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
+
+	node->track_write = kvm_mmu_pte_write;
+	kvm_page_track_register_notifier(kvm, node);
+}
+
+void kvm_mmu_uninit_vm(struct kvm *kvm)
+{
+	struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
+
+	kvm_page_track_unregister_notifier(kvm, node);
+}
+
 /* The return value indicates if tlb flush on all vcpus is needed. */
 typedef bool (*slot_level_handler) (struct kvm *kvm, struct kvm_rmap_head *rmap_head);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b81c14e..f4654e4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4345,7 +4345,6 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
 	ret = kvm_vcpu_write_guest(vcpu, gpa, val, bytes);
 	if (ret < 0)
 		return 0;
-	kvm_mmu_pte_write(vcpu, gpa, val, bytes);
 	kvm_page_track_write(vcpu, gpa, val, bytes);
 	return 1;
 }
@@ -4604,7 +4603,6 @@ static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt,
 		return X86EMUL_CMPXCHG_FAILED;
 
 	kvm_vcpu_mark_page_dirty(vcpu, gpa >> PAGE_SHIFT);
-	kvm_mmu_pte_write(vcpu, gpa, new, bytes);
 	kvm_page_track_write(vcpu, gpa, new, bytes);
 
 	return X86EMUL_CONTINUE;
@@ -7727,6 +7725,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	INIT_DELAYED_WORK(&kvm->arch.kvmclock_sync_work, kvmclock_sync_fn);
 
 	kvm_page_track_init(kvm);
+	kvm_mmu_init_vm(kvm);
 
 	return 0;
 }
@@ -7854,6 +7853,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 	kfree(kvm->arch.vioapic);
 	kvm_free_vcpus(kvm);
 	kfree(rcu_dereference_check(kvm->arch.apic_map, 1));
+	kvm_mmu_uninit_vm(kvm);
 }
 
 void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 00/11] KVM: x86: track guest page access
  2016-02-24  9:51 [PATCH v4 00/11] KVM: x86: track guest page access Xiao Guangrong
                   ` (10 preceding siblings ...)
  2016-02-24  9:51 ` [PATCH 11/11] KVM: MMU: apply page track notifier Xiao Guangrong
@ 2016-02-24 13:42 ` Paolo Bonzini
  11 siblings, 0 replies; 14+ messages in thread
From: Paolo Bonzini @ 2016-02-24 13:42 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song



On 24/02/2016 10:51, Xiao Guangrong wrote:
> Changelog in v4:
> thanks to Paolo's review, the changes in this version are:
> - comment struct kvm_mmu_page_role to explain the rule bits used by indirect
>   shadow page can not be more than 15 bits
> - simplify overflow detection of track count
> - remove the wrap functions,  kvm_page_track_{add,remove}_page
> - rename kvm_slot_page_track_*_nolock to kvm_slot_page_track_*
> - rename the function, check_mode, to page_track_mode_is_valid and inline it
> - rename kvm_page_track_check_mode to kvm_page_track_is_active
> - check if the notifier list is empty before holding the srcu lock
> - move the patch adjusting write-flooding more earlier

Looks good, thanks!

Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 09/11] KVM: MMU: use page track for non-leaf shadow pages
  2016-02-24  9:51 ` [PATCH 09/11] KVM: MMU: use page track for non-leaf shadow pages Xiao Guangrong
@ 2016-02-29 15:22   ` Paolo Bonzini
  0 siblings, 0 replies; 14+ messages in thread
From: Paolo Bonzini @ 2016-02-29 15:22 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song



On 24/02/2016 10:51, Xiao Guangrong wrote:
> non-leaf shadow pages are always write protected, it can be the user
> of page track
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>  arch/x86/kvm/mmu.c | 26 +++++++++++++++++++++-----
>  1 file changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 58c067d..74684b2 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -806,11 +806,17 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
>  	struct kvm_memory_slot *slot;
>  	gfn_t gfn;
>  
> +	kvm->arch.indirect_shadow_pages++;
>  	gfn = sp->gfn;
>  	slots = kvm_memslots_for_spte_role(kvm, sp->role);
>  	slot = __gfn_to_memslot(slots, gfn);
> +
> +	/* the non-leaf shadow pages are keeping readonly. */
> +	if (sp->role.level > PT_PAGE_TABLE_LEVEL)
> +		return kvm_slot_page_track_add_page(kvm, slot, gfn,
> +						    KVM_PAGE_TRACK_WRITE);
> +
>  	kvm_mmu_gfn_disallow_lpage(slot, gfn);
> -	kvm->arch.indirect_shadow_pages++;
>  }
>  
>  static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
> @@ -819,11 +825,15 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
>  	struct kvm_memory_slot *slot;
>  	gfn_t gfn;
>  
> +	kvm->arch.indirect_shadow_pages--;
>  	gfn = sp->gfn;
>  	slots = kvm_memslots_for_spte_role(kvm, sp->role);
>  	slot = __gfn_to_memslot(slots, gfn);
> +	if (sp->role.level > PT_PAGE_TABLE_LEVEL)
> +		return kvm_slot_page_track_remove_page(kvm, slot, gfn,
> +						       KVM_PAGE_TRACK_WRITE);
> +
>  	kvm_mmu_gfn_allow_lpage(slot, gfn);
> -	kvm->arch.indirect_shadow_pages--;
>  }
>  
>  static bool __mmu_gfn_lpage_is_disallowed(gfn_t gfn, int level,
> @@ -2132,12 +2142,18 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
>  	hlist_add_head(&sp->hash_link,
>  		&vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)]);
>  	if (!direct) {
> -		if (rmap_write_protect(vcpu, gfn))
> +		/*
> +		 * we should do write protection before syncing pages
> +		 * otherwise the content of the synced shadow page may
> +		 * be inconsistent with guest page table.
> +		 */
> +		account_shadowed(vcpu->kvm, sp);
> +
> +		if (level == PT_PAGE_TABLE_LEVEL &&
> +		      rmap_write_protect(vcpu, gfn))
>  			kvm_flush_remote_tlbs(vcpu->kvm);
>  		if (level > PT_PAGE_TABLE_LEVEL && need_sync)
>  			kvm_sync_pages(vcpu, gfn);
> -
> -		account_shadowed(vcpu->kvm, sp);
>  	}
>  	sp->mmu_valid_gen = vcpu->kvm->arch.mmu_valid_gen;
>  	clear_page(sp->spt);
> 

Hi,

I'm applying this series with just a very simple change on top of this patch:

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 74684b2b7e69..fe03d2a1d4d3 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2148,10 +2148,10 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 		 * be inconsistent with guest page table.
 		 */
 		account_shadowed(vcpu->kvm, sp);
-
 		if (level == PT_PAGE_TABLE_LEVEL &&
 		      rmap_write_protect(vcpu, gfn))
 			kvm_flush_remote_tlbs(vcpu->kvm);
+
 		if (level > PT_PAGE_TABLE_LEVEL && need_sync)
 			kvm_sync_pages(vcpu, gfn);
 	}

Yes, it's just moving around an empty line. :)

Paolo

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-02-29 15:22 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-24  9:51 [PATCH v4 00/11] KVM: x86: track guest page access Xiao Guangrong
2016-02-24  9:51 ` [PATCH 01/11] KVM: MMU: rename has_wrprotected_page to mmu_gfn_lpage_is_disallowed Xiao Guangrong
2016-02-24  9:51 ` [PATCH 02/11] KVM: MMU: introduce kvm_mmu_gfn_{allow,disallow}_lpage Xiao Guangrong
2016-02-24  9:51 ` [PATCH 03/11] KVM: MMU: introduce kvm_mmu_slot_gfn_write_protect Xiao Guangrong
2016-02-24  9:51 ` [PATCH 04/11] KVM: page track: add the framework of guest page tracking Xiao Guangrong
2016-02-24  9:51 ` [PATCH 05/11] KVM: page track: introduce kvm_slot_page_track_{add,remove}_page Xiao Guangrong
2016-02-24  9:51 ` [PATCH 06/11] KVM: MMU: let page fault handler be aware tracked page Xiao Guangrong
2016-02-24  9:51 ` [PATCH 07/11] KVM: MMU: clear write-flooding on the fast path of " Xiao Guangrong
2016-02-24  9:51 ` [PATCH 08/11] KVM: page track: add notifier support Xiao Guangrong
2016-02-24  9:51 ` [PATCH 09/11] KVM: MMU: use page track for non-leaf shadow pages Xiao Guangrong
2016-02-29 15:22   ` Paolo Bonzini
2016-02-24  9:51 ` [PATCH 10/11] KVM: MMU: simplify mmu_need_write_protect Xiao Guangrong
2016-02-24  9:51 ` [PATCH 11/11] KVM: MMU: apply page track notifier Xiao Guangrong
2016-02-24 13:42 ` [PATCH v4 00/11] KVM: x86: track guest page access Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).