* [PATCH 0/6] KVM: VMX: Page Modification Logging (PML) support
@ 2015-01-28 2:54 Kai Huang
2015-01-28 2:54 ` [PATCH 1/6] KVM: Rename kvm_arch_mmu_write_protect_pt_masked to be more generic for log dirty Kai Huang
` (5 more replies)
0 siblings, 6 replies; 22+ messages in thread
From: Kai Huang @ 2015-01-28 2:54 UTC (permalink / raw)
To: pbonzini, gleb, linux, kvm; +Cc: Kai Huang
This patch series adds Page Modification Logging (PML) support in VMX.
1) Introduction
PML is a new feature on Intel's Boardwell server platfrom targeted to reduce
overhead of dirty logging mechanism.
The specification can be found at:
http://www.intel.com/content/www/us/en/processors/page-modification-logging-vmm-white-paper.html
Currently, dirty logging is done by write protection, which write protects guest
memory, and mark dirty GFN to dirty_bitmap in subsequent write fault. This works
fine, except with overhead of additional write fault for logging each dirty GFN.
The overhead can be large if the write operations from geust is intensive.
PML is a hardware-assisted efficient way for dirty logging. PML logs dirty GPA
automatically to a 4K PML memory buffer when CPU changes EPT table's D-bit from
0 to 1. To do this, A new 4K PML buffer base address, and a PML index were added
to VMCS. Initially PML index is set to 512 (8 bytes for each GPA), and CPU
decreases PML index after logging one GPA, and eventually a PML buffer full
VMEXIT happens when PML buffer is fully logged.
With PML, we don't have to use write protection so the intensive write fault EPT
violation can be avoided, with an additional PML buffer full VMEXIT for 512
dirty GPAs. Theoretically, this can reduce hypervisor overhead when guest is in
dirty logging mode, and therefore more CPU cycles can be allocated to guest, so
it's expected benchmarks in guest will have better performance comparing to
non-PML.
2) Design
a. Enable/Disable PML
PML is per-vcpu (per-VMCS), while EPT table can be shared by vcpus, so we need
to enable/disable PML for all vcpus of guest. A dedicated 4K page will be
allocated for each vcpu when PML is enabled for that vcpu.
Currently, we choose to always enable PML for guest, which means we enables PML
when creating VCPU, and never disable it during guest's life time. This avoids
the complicated logic to enable PML by demand when guest is running. And to
eliminate potential unnecessary GPA logging in non-dirty logging mode, we set
D-bit manually for the slots with dirty logging disabled.
b. Flush PML buffer
When userspace querys dirty_bitmap, it's possible that there are GPAs logged in
vcpu's PML buffer, but as PML buffer is not full, so no VMEXIT happens. In this
case, we'd better to manually flush PML buffer for all vcpus and update the
dirty GPAs to dirty_bitmap.
We do PML buffer flush at the beginning of each VMEXIT, this makes dirty_bitmap
more updated, and also makes logic of flushing PML buffer for all vcpus easier
-- we only need to kick all vcpus out of guest and PML buffer for each vcpu will
be flushed automatically.
3) Tests and benchmark results
I tested specjbb benchmark, which is memory intensive to measure PML. All tests
are done in below configuration:
Machine (Boardwell server): 16 CPUs (1.4G) + 4G memory
Host Kernel: KVM queue branch. Transparent Hugepage disabled. C-state, P-state,
S-state disabled. Swap disabled.
Guest: Ubuntu 14.04 with kernel 3.13.0-36-generic
Guest: 4 vcpus + 1G memory. All vcpus are pinned.
a. Comapre score with and without PML enabled.
This is to make sure PML won't bring any performance regression as it's always
enabled for guest.
Booting guest with graphic window (no --nographic)
NOPML PML
109755 109379
108786 109300
109234 109663
109257 107471
108514 108904
109740 107623
avg: 109214 108723
performance regression: (109214 - 108723) / 109214 = 0.45%
Booting guest without graphic window (--nographic)
NOPML PML
109090 109686
109461 110533
110523 108550
109960 110775
109090 109802
110787 109192
avg: 109818 109756
performance regression: (109818 - 109756) / 109818 = 0.06%
So there's no noticeable performance regression leaving PML always enabled.
b. Compare specjbb score between PML and Write Protection.
This is used to see how much performance gain PML can bring when guest is in
dirty logging mode.
I modified qemu by adding an additional "Monitoring thread" to query
dirty_bitmap periodically (once per 1 second). With this thread, we can get
performance gain of PML by comparing specjbb score under PML code path and
write protection code path.
Again, I got score for both with/without graphic window of guest.
Booting guest with graphic window (no --nographic)
PML WP No monitoring thread
104748 101358
102934 99895
103525 98832
105331 100678
106038 99476
104776 99851
avg: 104558 100015 108723 (== PML score in test a)
percent: 96.17% 91.99% 100%
performance gain: 96.17% - 91.99% = 4.18%
Booting guest without graphic window (--nographic)
PML WP No monithring thread
104778 98967
104856 99380
103783 99406
105210 100638
106218 99763
105475 99287
avg: 105053 99573 109756 (== PML score in test a)
percent: 95.72% 90.72% 100%
performance gain: 95.72% - 90.72% = 5%
So there's noticeable performance gain (around 4%~5%) of PML comparing to Write
Protection.
Kai Huang (6):
KVM: Rename kvm_arch_mmu_write_protect_pt_masked to be more generic
for log dirty
KVM: MMU: Add mmu help functions to support PML
KVM: MMU: Explicitly set D-bit for writable spte.
KVM: x86: Change parameter of kvm_mmu_slot_remove_write_access
KVM: x86: Add new dirty logging kvm_x86_ops for PML
KVM: VMX: Add PML support in VMX
arch/arm/kvm/mmu.c | 18 ++-
arch/x86/include/asm/kvm_host.h | 37 +++++-
arch/x86/include/asm/vmx.h | 4 +
arch/x86/include/uapi/asm/vmx.h | 1 +
arch/x86/kvm/mmu.c | 243 +++++++++++++++++++++++++++++++++++++++-
arch/x86/kvm/trace.h | 18 +++
arch/x86/kvm/vmx.c | 195 +++++++++++++++++++++++++++++++-
arch/x86/kvm/x86.c | 78 +++++++++++--
include/linux/kvm_host.h | 2 +-
virt/kvm/kvm_main.c | 2 +-
10 files changed, 577 insertions(+), 21 deletions(-)
--
2.1.0
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 1/6] KVM: Rename kvm_arch_mmu_write_protect_pt_masked to be more generic for log dirty
2015-01-28 2:54 [PATCH 0/6] KVM: VMX: Page Modification Logging (PML) support Kai Huang
@ 2015-01-28 2:54 ` Kai Huang
2015-01-28 2:54 ` [PATCH 2/6] KVM: MMU: Add mmu help functions to support PML Kai Huang
` (4 subsequent siblings)
5 siblings, 0 replies; 22+ messages in thread
From: Kai Huang @ 2015-01-28 2:54 UTC (permalink / raw)
To: pbonzini, gleb, linux, kvm; +Cc: Kai Huang
We don't have to write protect guest memory for dirty logging if architecture
supports hardware dirty logging, such as PML on VMX, so rename it to be more
generic.
Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
arch/arm/kvm/mmu.c | 18 ++++++++++++++++--
arch/x86/kvm/mmu.c | 21 +++++++++++++++++++--
include/linux/kvm_host.h | 2 +-
virt/kvm/kvm_main.c | 2 +-
4 files changed, 37 insertions(+), 6 deletions(-)
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 74aeaba..6034697 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -1081,7 +1081,7 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
}
/**
- * kvm_arch_mmu_write_protect_pt_masked() - write protect dirty pages
+ * kvm_mmu_write_protect_pt_masked() - write protect dirty pages
* @kvm: The KVM pointer
* @slot: The memory slot associated with mask
* @gfn_offset: The gfn offset in memory slot
@@ -1091,7 +1091,7 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
* Walks bits set in mask write protects the associated pte's. Caller must
* acquire kvm_mmu_lock.
*/
-void kvm_arch_mmu_write_protect_pt_masked(struct kvm *kvm,
+static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
struct kvm_memory_slot *slot,
gfn_t gfn_offset, unsigned long mask)
{
@@ -1102,6 +1102,20 @@ void kvm_arch_mmu_write_protect_pt_masked(struct kvm *kvm,
stage2_wp_range(kvm, start, end);
}
+/*
+ * kvm_arch_mmu_enable_log_dirty_pt_masked - enable dirty logging for selected
+ * dirty pages.
+ *
+ * It calls kvm_mmu_write_protect_pt_masked to write protect selected pages to
+ * enable dirty logging for them.
+ */
+void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
+ struct kvm_memory_slot *slot,
+ gfn_t gfn_offset, unsigned long mask)
+{
+ kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
+}
+
static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
struct kvm_memory_slot *memslot, unsigned long hva,
unsigned long fault_status)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 0ed9f79..b18e65c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1216,7 +1216,7 @@ static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
}
/**
- * kvm_arch_mmu_write_protect_pt_masked - write protect selected PT level pages
+ * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages
* @kvm: kvm instance
* @slot: slot to protect
* @gfn_offset: start of the BITS_PER_LONG pages we care about
@@ -1225,7 +1225,7 @@ static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
* Used when we do not need to care about huge page mappings: e.g. during dirty
* logging we do not have any such mappings.
*/
-void kvm_arch_mmu_write_protect_pt_masked(struct kvm *kvm,
+static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
struct kvm_memory_slot *slot,
gfn_t gfn_offset, unsigned long mask)
{
@@ -1241,6 +1241,23 @@ void kvm_arch_mmu_write_protect_pt_masked(struct kvm *kvm,
}
}
+/**
+ * kvm_arch_mmu_enable_log_dirty_pt_masked - enable dirty logging for selected
+ * PT level pages.
+ *
+ * It calls kvm_mmu_write_protect_pt_masked to write protect selected pages to
+ * enable dirty logging for them.
+ *
+ * Used when we do not need to care about huge page mappings: e.g. during dirty
+ * logging we do not have any such mappings.
+ */
+void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
+ struct kvm_memory_slot *slot,
+ gfn_t gfn_offset, unsigned long mask)
+{
+ kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
+}
+
static bool rmap_write_protect(struct kvm *kvm, u64 gfn)
{
struct kvm_memory_slot *slot;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7d67195..32d0575 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -615,7 +615,7 @@ int kvm_get_dirty_log(struct kvm *kvm,
int kvm_get_dirty_log_protect(struct kvm *kvm,
struct kvm_dirty_log *log, bool *is_dirty);
-void kvm_arch_mmu_write_protect_pt_masked(struct kvm *kvm,
+void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
struct kvm_memory_slot *slot,
gfn_t gfn_offset,
unsigned long mask);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a8490f0..0c28176 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1059,7 +1059,7 @@ int kvm_get_dirty_log_protect(struct kvm *kvm,
dirty_bitmap_buffer[i] = mask;
offset = i * BITS_PER_LONG;
- kvm_arch_mmu_write_protect_pt_masked(kvm, memslot, offset,
+ kvm_arch_mmu_enable_log_dirty_pt_masked(kvm, memslot, offset,
mask);
}
--
2.1.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 2/6] KVM: MMU: Add mmu help functions to support PML
2015-01-28 2:54 [PATCH 0/6] KVM: VMX: Page Modification Logging (PML) support Kai Huang
2015-01-28 2:54 ` [PATCH 1/6] KVM: Rename kvm_arch_mmu_write_protect_pt_masked to be more generic for log dirty Kai Huang
@ 2015-01-28 2:54 ` Kai Huang
2015-02-03 17:34 ` Radim Krčmář
2015-01-28 2:54 ` [PATCH 3/6] KVM: MMU: Explicitly set D-bit for writable spte Kai Huang
` (3 subsequent siblings)
5 siblings, 1 reply; 22+ messages in thread
From: Kai Huang @ 2015-01-28 2:54 UTC (permalink / raw)
To: pbonzini, gleb, linux, kvm; +Cc: Kai Huang
This patch adds new mmu layer functions to clear/set D-bit for memory slot, and
to write protect superpages for memory slot.
In case of PML, CPU logs the dirty GPA automatically to PML buffer when CPU
updates D-bit from 0 to 1, therefore we don't have to write protect 4K pages,
instead, we only need to clear D-bit in order to log that GPA.
For superpages, we still write protect it and let page fault code to handle
dirty page logging, as we still need to split superpage to 4K pages in PML.
As PML is always enabled during guest's lifetime, to eliminate unnecessary PML
GPA logging, we set D-bit manually for the slot with dirty logging disabled.
Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
arch/x86/include/asm/kvm_host.h | 9 ++
arch/x86/kvm/mmu.c | 195 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 204 insertions(+)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 843bea0..4f6369b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -835,6 +835,15 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
+void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
+ struct kvm_memory_slot *memslot);
+void kvm_mmu_slot_largepage_remove_write_access(struct kvm *kvm,
+ struct kvm_memory_slot *memslot);
+void kvm_mmu_slot_set_dirty(struct kvm *kvm,
+ struct kvm_memory_slot *memslot);
+void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm,
+ struct kvm_memory_slot *slot,
+ gfn_t gfn_offset, unsigned long mask);
void kvm_mmu_zap_all(struct kvm *kvm);
void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm);
unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index b18e65c..c438224 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1215,6 +1215,60 @@ static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
return flush;
}
+static bool spte_clear_dirty(struct kvm *kvm, u64 *sptep)
+{
+ u64 spte = *sptep;
+
+ rmap_printk("rmap_clear_dirty: spte %p %llx\n", sptep, *sptep);
+
+ spte &= ~shadow_dirty_mask;
+
+ return mmu_spte_update(sptep, spte);
+}
+
+static bool __rmap_clear_dirty(struct kvm *kvm, unsigned long *rmapp)
+{
+ u64 *sptep;
+ struct rmap_iterator iter;
+ bool flush = false;
+
+ for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
+ BUG_ON(!(*sptep & PT_PRESENT_MASK));
+
+ flush |= spte_clear_dirty(kvm, sptep);
+ sptep = rmap_get_next(&iter);
+ }
+
+ return flush;
+}
+
+static bool spte_set_dirty(struct kvm *kvm, u64 *sptep)
+{
+ u64 spte = *sptep;
+
+ rmap_printk("rmap_set_dirty: spte %p %llx\n", sptep, *sptep);
+
+ spte |= shadow_dirty_mask;
+
+ return mmu_spte_update(sptep, spte);
+}
+
+static bool __rmap_set_dirty(struct kvm *kvm, unsigned long *rmapp)
+{
+ u64 *sptep;
+ struct rmap_iterator iter;
+ bool flush = false;
+
+ for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
+ BUG_ON(!(*sptep & PT_PRESENT_MASK));
+
+ flush |= spte_set_dirty(kvm, sptep);
+ sptep = rmap_get_next(&iter);
+ }
+
+ return flush;
+}
+
/**
* kvm_mmu_write_protect_pt_masked - write protect selected PT level pages
* @kvm: kvm instance
@@ -1242,6 +1296,32 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
}
/**
+ * kvm_mmu_clear_dirty_pt_masked - clear MMU D-bit for PT level pages
+ * @kvm: kvm instance
+ * @slot: slot to clear D-bit
+ * @gfn_offset: start of the BITS_PER_LONG pages we care about
+ * @mask: indicates which pages we should clear D-bit
+ *
+ * Used for PML to re-log the dirty GPAs after userspace querying dirty_bitmap.
+ */
+void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm,
+ struct kvm_memory_slot *slot,
+ gfn_t gfn_offset, unsigned long mask)
+{
+ unsigned long *rmapp;
+
+ while (mask) {
+ rmapp = __gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask),
+ PT_PAGE_TABLE_LEVEL, slot);
+ __rmap_clear_dirty(kvm, rmapp);
+
+ /* clear the first set bit */
+ mask &= mask - 1;
+ }
+}
+EXPORT_SYMBOL_GPL(kvm_mmu_clear_dirty_pt_masked);
+
+/**
* kvm_arch_mmu_enable_log_dirty_pt_masked - enable dirty logging for selected
* PT level pages.
*
@@ -4368,6 +4448,121 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot)
kvm_flush_remote_tlbs(kvm);
}
+void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
+ struct kvm_memory_slot *memslot)
+{
+ gfn_t last_gfn;
+ unsigned long *rmapp;
+ unsigned long last_index, index;
+ bool flush = false;
+
+ last_gfn = memslot->base_gfn + memslot->npages - 1;
+
+ spin_lock(&kvm->mmu_lock);
+
+ rmapp = memslot->arch.rmap[PT_PAGE_TABLE_LEVEL - 1];
+ last_index = gfn_to_index(last_gfn, memslot->base_gfn,
+ PT_PAGE_TABLE_LEVEL);
+
+ for (index = 0; index <= last_index; ++index, ++rmapp) {
+ if (*rmapp)
+ flush |= __rmap_clear_dirty(kvm, rmapp);
+
+ if (need_resched() || spin_needbreak(&kvm->mmu_lock))
+ cond_resched_lock(&kvm->mmu_lock);
+ }
+
+ spin_unlock(&kvm->mmu_lock);
+
+ lockdep_assert_held(&kvm->slots_lock);
+
+ /*
+ * It's also safe to flush TLBs out of mmu lock here as currently this
+ * function is only used for dirty logging, in which case flushing TLB
+ * out of mmu lock also guarantees no dirty pages will be lost in
+ * dirty_bitmap.
+ */
+ if (flush)
+ kvm_flush_remote_tlbs(kvm);
+}
+EXPORT_SYMBOL_GPL(kvm_mmu_slot_leaf_clear_dirty);
+
+void kvm_mmu_slot_largepage_remove_write_access(struct kvm *kvm,
+ struct kvm_memory_slot *memslot)
+{
+ gfn_t last_gfn;
+ int i;
+ bool flush = false;
+
+ last_gfn = memslot->base_gfn + memslot->npages - 1;
+
+ spin_lock(&kvm->mmu_lock);
+
+ for (i = PT_PAGE_TABLE_LEVEL + 1; /* skip rmap for 4K page */
+ i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
+ unsigned long *rmapp;
+ unsigned long last_index, index;
+
+ rmapp = memslot->arch.rmap[i - PT_PAGE_TABLE_LEVEL];
+ last_index = gfn_to_index(last_gfn, memslot->base_gfn, i);
+
+ for (index = 0; index <= last_index; ++index, ++rmapp) {
+ if (*rmapp)
+ flush |= __rmap_write_protect(kvm, rmapp,
+ false);
+
+ if (need_resched() || spin_needbreak(&kvm->mmu_lock))
+ cond_resched_lock(&kvm->mmu_lock);
+ }
+ }
+ spin_unlock(&kvm->mmu_lock);
+
+ /* see kvm_mmu_slot_remove_write_access */
+ lockdep_assert_held(&kvm->slots_lock);
+
+ if (flush)
+ kvm_flush_remote_tlbs(kvm);
+}
+EXPORT_SYMBOL_GPL(kvm_mmu_slot_largepage_remove_write_access);
+
+void kvm_mmu_slot_set_dirty(struct kvm *kvm,
+ struct kvm_memory_slot *memslot)
+{
+ gfn_t last_gfn;
+ int i;
+ bool flush = false;
+
+ last_gfn = memslot->base_gfn + memslot->npages - 1;
+
+ spin_lock(&kvm->mmu_lock);
+
+ for (i = PT_PAGE_TABLE_LEVEL;
+ i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
+ unsigned long *rmapp;
+ unsigned long last_index, index;
+
+ rmapp = memslot->arch.rmap[i - PT_PAGE_TABLE_LEVEL];
+ last_index = gfn_to_index(last_gfn, memslot->base_gfn, i);
+
+ for (index = 0; index <= last_index; ++index, ++rmapp) {
+ if (*rmapp)
+ flush |= __rmap_set_dirty(kvm, rmapp);
+
+ if (need_resched() || spin_needbreak(&kvm->mmu_lock))
+ cond_resched_lock(&kvm->mmu_lock);
+ }
+ }
+
+ spin_unlock(&kvm->mmu_lock);
+
+ lockdep_assert_held(&kvm->slots_lock);
+
+ /* see kvm_mmu_slot_leaf_clear_dirty */
+ if (flush)
+ kvm_flush_remote_tlbs(kvm);
+}
+EXPORT_SYMBOL_GPL(kvm_mmu_slot_set_dirty);
+
#define BATCH_ZAP_PAGES 10
static void kvm_zap_obsolete_pages(struct kvm *kvm)
{
--
2.1.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 3/6] KVM: MMU: Explicitly set D-bit for writable spte.
2015-01-28 2:54 [PATCH 0/6] KVM: VMX: Page Modification Logging (PML) support Kai Huang
2015-01-28 2:54 ` [PATCH 1/6] KVM: Rename kvm_arch_mmu_write_protect_pt_masked to be more generic for log dirty Kai Huang
2015-01-28 2:54 ` [PATCH 2/6] KVM: MMU: Add mmu help functions to support PML Kai Huang
@ 2015-01-28 2:54 ` Kai Huang
2015-01-28 2:54 ` [PATCH 4/6] KVM: x86: Change parameter of kvm_mmu_slot_remove_write_access Kai Huang
` (2 subsequent siblings)
5 siblings, 0 replies; 22+ messages in thread
From: Kai Huang @ 2015-01-28 2:54 UTC (permalink / raw)
To: pbonzini, gleb, linux, kvm; +Cc: Kai Huang
This patch avoids unnecessary dirty GPA logging to PML buffer in EPT violation
path by setting D-bit manually prior to the occurrence of the write from guest.
We only set D-bit manually in set_spte, and leave fast_page_fault path
unchanged, as fast_page_fault is very unlikely to happen in case of PML.
For the hva <-> pa change case, the spte is updated to either read-only (host
pte is read-only) or be dropped (host pte is writeable), and both cases will be
handled by above changes, therefore no change is necessary.
Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
arch/x86/kvm/mmu.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c438224..fb35535 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2597,8 +2597,14 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
}
}
- if (pte_access & ACC_WRITE_MASK)
+ if (pte_access & ACC_WRITE_MASK) {
mark_page_dirty(vcpu->kvm, gfn);
+ /*
+ * Explicitly set dirty bit. It is used to eliminate unnecessary
+ * dirty GPA logging in case of PML is enabled on VMX.
+ */
+ spte |= shadow_dirty_mask;
+ }
set_pte:
if (mmu_spte_update(sptep, spte))
@@ -2914,6 +2920,16 @@ fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
*/
gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
+ /*
+ * Theoretically we could also set dirty bit (and flush TLB) here in
+ * order to eliminate the unnecessary PML logging. See comments in
+ * set_spte. But as in case of PML, fast_page_fault is very unlikely to
+ * happen so we leave it unchanged. This might result in the same GPA
+ * to be logged in PML buffer again when the write really happens, and
+ * eventually to be called by mark_page_dirty twice. But it's also no
+ * harm. This also avoids the TLB flush needed after setting dirty bit
+ * so non-PML cases won't be impacted.
+ */
if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) == spte)
mark_page_dirty(vcpu->kvm, gfn);
--
2.1.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 4/6] KVM: x86: Change parameter of kvm_mmu_slot_remove_write_access
2015-01-28 2:54 [PATCH 0/6] KVM: VMX: Page Modification Logging (PML) support Kai Huang
` (2 preceding siblings ...)
2015-01-28 2:54 ` [PATCH 3/6] KVM: MMU: Explicitly set D-bit for writable spte Kai Huang
@ 2015-01-28 2:54 ` Kai Huang
2015-02-03 16:28 ` Radim Krčmář
2015-01-28 2:54 ` [PATCH 5/6] KVM: x86: Add new dirty logging kvm_x86_ops for PML Kai Huang
2015-01-28 2:54 ` [PATCH 6/6] KVM: VMX: Add PML support in VMX Kai Huang
5 siblings, 1 reply; 22+ messages in thread
From: Kai Huang @ 2015-01-28 2:54 UTC (permalink / raw)
To: pbonzini, gleb, linux, kvm; +Cc: Kai Huang
This patch changes the second parameter of kvm_mmu_slot_remove_write_access from
'slot id' to 'struct kvm_memory_slot *' to align with kvm_x86_ops dirty logging
hooks, which will be introduced in further patch.
Better way is to change second parameter of kvm_arch_commit_memory_region from
'struct kvm_userspace_memory_region *' to 'struct kvm_memory_slot * new', but it
requires changes on other non-x86 ARCH too, so avoid it now.
Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
arch/x86/include/asm/kvm_host.h | 3 ++-
arch/x86/kvm/mmu.c | 5 ++---
arch/x86/kvm/x86.c | 10 +++++++---
3 files changed, 11 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4f6369b..67a98d7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -834,7 +834,8 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
u64 dirty_mask, u64 nx_mask, u64 x_mask);
void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
-void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
+void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
+ struct kvm_memory_slot *memslot);
void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
struct kvm_memory_slot *memslot);
void kvm_mmu_slot_largepage_remove_write_access(struct kvm *kvm,
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index fb35535..6c24af3 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4410,14 +4410,13 @@ void kvm_mmu_setup(struct kvm_vcpu *vcpu)
init_kvm_mmu(vcpu);
}
-void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot)
+void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
+ struct kvm_memory_slot *memslot)
{
- struct kvm_memory_slot *memslot;
gfn_t last_gfn;
int i;
bool flush = false;
- memslot = id_to_memslot(kvm->memslots, slot);
last_gfn = memslot->base_gfn + memslot->npages - 1;
spin_lock(&kvm->mmu_lock);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1e10e3f..3a7fcff 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7538,7 +7538,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
const struct kvm_memory_slot *old,
enum kvm_mr_change change)
{
-
+ struct kvm_memory_slot *new;
int nr_mmu_pages = 0;
if ((mem->slot >= KVM_USER_MEM_SLOTS) && (change == KVM_MR_DELETE)) {
@@ -7557,6 +7557,10 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
if (nr_mmu_pages)
kvm_mmu_change_mmu_pages(kvm, nr_mmu_pages);
+
+ /* It's OK to get 'new' slot here as it has already been installed */
+ new = id_to_memslot(kvm->memslots, mem->slot);
+
/*
* Write protect all pages for dirty logging.
*
@@ -7566,8 +7570,8 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
*
* See the comments in fast_page_fault().
*/
- if ((change != KVM_MR_DELETE) && (mem->flags & KVM_MEM_LOG_DIRTY_PAGES))
- kvm_mmu_slot_remove_write_access(kvm, mem->slot);
+ if ((change != KVM_MR_DELETE) && (new->flags & KVM_MEM_LOG_DIRTY_PAGES))
+ kvm_mmu_slot_remove_write_access(kvm, new);
}
void kvm_arch_flush_shadow_all(struct kvm *kvm)
--
2.1.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 5/6] KVM: x86: Add new dirty logging kvm_x86_ops for PML
2015-01-28 2:54 [PATCH 0/6] KVM: VMX: Page Modification Logging (PML) support Kai Huang
` (3 preceding siblings ...)
2015-01-28 2:54 ` [PATCH 4/6] KVM: x86: Change parameter of kvm_mmu_slot_remove_write_access Kai Huang
@ 2015-01-28 2:54 ` Kai Huang
2015-02-03 15:53 ` Radim Krčmář
2015-01-28 2:54 ` [PATCH 6/6] KVM: VMX: Add PML support in VMX Kai Huang
5 siblings, 1 reply; 22+ messages in thread
From: Kai Huang @ 2015-01-28 2:54 UTC (permalink / raw)
To: pbonzini, gleb, linux, kvm; +Cc: Kai Huang
This patch adds new kvm_x86_ops dirty logging hooks to enable/disable dirty
logging for particular memory slot, and to flush potentially logged dirty GPAs
before reporting slot->dirty_bitmap to userspace.
kvm x86 common code calls these hooks when they are available so PML logic can
be hidden to VMX specific. Other ARCHs won't be impacted as these hooks are NULL
for them.
Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
arch/x86/include/asm/kvm_host.h | 25 +++++++++++++++
arch/x86/kvm/mmu.c | 6 +++-
arch/x86/kvm/x86.c | 71 ++++++++++++++++++++++++++++++++++++-----
3 files changed, 93 insertions(+), 9 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 67a98d7..57916ec 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -802,6 +802,31 @@ struct kvm_x86_ops {
int (*check_nested_events)(struct kvm_vcpu *vcpu, bool external_intr);
void (*sched_in)(struct kvm_vcpu *kvm, int cpu);
+
+ /*
+ * Arch-specific dirty logging hooks. These hooks are only supposed to
+ * be valid if the specific arch has hardware-accelerated dirty logging
+ * mechanism. Currently only for PML on VMX.
+ *
+ * - slot_enable_log_dirty:
+ * called when enabling log dirty mode for the slot.
+ * - slot_disable_log_dirty:
+ * called when disabling log dirty mode for the slot.
+ * also called when slot is created with log dirty disabled.
+ * - flush_log_dirty:
+ * called before reporting dirty_bitmap to userspace.
+ * - enable_log_dirty_pt_masked:
+ * called when reenabling log dirty for the GFNs in the mask after
+ * corresponding bits are cleared in slot->dirty_bitmap.
+ */
+ void (*slot_enable_log_dirty)(struct kvm *kvm,
+ struct kvm_memory_slot *slot);
+ void (*slot_disable_log_dirty)(struct kvm *kvm,
+ struct kvm_memory_slot *slot);
+ void (*flush_log_dirty)(struct kvm *kvm);
+ void (*enable_log_dirty_pt_masked)(struct kvm *kvm,
+ struct kvm_memory_slot *slot,
+ gfn_t offset, unsigned long mask);
};
struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 6c24af3..c5833ca 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1335,7 +1335,11 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
struct kvm_memory_slot *slot,
gfn_t gfn_offset, unsigned long mask)
{
- kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
+ if (kvm_x86_ops->enable_log_dirty_pt_masked)
+ kvm_x86_ops->enable_log_dirty_pt_masked(kvm, slot, gfn_offset,
+ mask);
+ else
+ kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
}
static bool rmap_write_protect(struct kvm *kvm, u64 gfn)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3a7fcff..442ee7d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3780,6 +3780,12 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
mutex_lock(&kvm->slots_lock);
+ /*
+ * Flush potentially hardware-cached dirty pages to dirty_bitmap.
+ */
+ if (kvm_x86_ops->flush_log_dirty)
+ kvm_x86_ops->flush_log_dirty(kvm);
+
r = kvm_get_dirty_log_protect(kvm, log, &is_dirty);
/*
@@ -7533,6 +7539,56 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
return 0;
}
+static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
+ struct kvm_memory_slot *new)
+{
+ /* Still write protect RO slot */
+ if (new->flags & KVM_MEM_READONLY) {
+ kvm_mmu_slot_remove_write_access(kvm, new);
+ return;
+ }
+
+ /*
+ * Call kvm_x86_ops dirty logging hooks when they are valid.
+ *
+ * kvm_x86_ops->slot_disable_log_dirty is called when:
+ *
+ * - KVM_MR_CREATE with dirty logging is disabled
+ * - KVM_MR_FLAGS_ONLY with dirty logging is disabled in new flag
+ *
+ * The reason is, in case of PML, we need to set D-bit for any slots
+ * with dirty logging disabled in order to eliminate unnecessary GPA
+ * logging in PML buffer (and potential PML buffer full VMEXT). This
+ * guarantees leaving PML enabled during guest's lifetime won't have
+ * any additonal overhead from PML when guest is running with dirty
+ * logging disabled for memory slots.
+ *
+ * kvm_x86_ops->slot_enable_log_dirty is called when switching new slot
+ * to dirty logging mode.
+ *
+ * If kvm_x86_ops dirty logging hooks are invalid, use write protect.
+ *
+ * In case of write protect:
+ *
+ * Write protect all pages for dirty logging.
+ *
+ * All the sptes including the large sptes which point to this
+ * slot are set to readonly. We can not create any new large
+ * spte on this slot until the end of the logging.
+ *
+ * See the comments in fast_page_fault().
+ */
+ if (new->flags & KVM_MEM_LOG_DIRTY_PAGES) {
+ if (kvm_x86_ops->slot_enable_log_dirty)
+ kvm_x86_ops->slot_enable_log_dirty(kvm, new);
+ else
+ kvm_mmu_slot_remove_write_access(kvm, new);
+ } else {
+ if (kvm_x86_ops->slot_disable_log_dirty)
+ kvm_x86_ops->slot_disable_log_dirty(kvm, new);
+ }
+}
+
void kvm_arch_commit_memory_region(struct kvm *kvm,
struct kvm_userspace_memory_region *mem,
const struct kvm_memory_slot *old,
@@ -7562,16 +7618,15 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
new = id_to_memslot(kvm->memslots, mem->slot);
/*
- * Write protect all pages for dirty logging.
+ * Set up write protection and/or dirty logging for the new slot.
*
- * All the sptes including the large sptes which point to this
- * slot are set to readonly. We can not create any new large
- * spte on this slot until the end of the logging.
- *
- * See the comments in fast_page_fault().
+ * For KVM_MR_DELETE and KVM_MR_MOVE, the shadow pages of old slot have
+ * been zapped so no dirty logging staff is needed for old slot. For
+ * KVM_MR_FLAGS_ONLY, the old slot is essentially the same one as the
+ * new and it's also covered when dealing with the new slot.
*/
- if ((change != KVM_MR_DELETE) && (new->flags & KVM_MEM_LOG_DIRTY_PAGES))
- kvm_mmu_slot_remove_write_access(kvm, new);
+ if (change != KVM_MR_DELETE)
+ kvm_mmu_slot_apply_flags(kvm, new);
}
void kvm_arch_flush_shadow_all(struct kvm *kvm)
--
2.1.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 6/6] KVM: VMX: Add PML support in VMX
2015-01-28 2:54 [PATCH 0/6] KVM: VMX: Page Modification Logging (PML) support Kai Huang
` (4 preceding siblings ...)
2015-01-28 2:54 ` [PATCH 5/6] KVM: x86: Add new dirty logging kvm_x86_ops for PML Kai Huang
@ 2015-01-28 2:54 ` Kai Huang
2015-02-03 15:18 ` Radim Krčmář
5 siblings, 1 reply; 22+ messages in thread
From: Kai Huang @ 2015-01-28 2:54 UTC (permalink / raw)
To: pbonzini, gleb, linux, kvm; +Cc: Kai Huang
This patch adds PML support in VMX. A new module parameter 'enable_pml' is added
to allow user to enable/disable it manually.
Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
arch/x86/include/asm/vmx.h | 4 +
arch/x86/include/uapi/asm/vmx.h | 1 +
arch/x86/kvm/trace.h | 18 ++++
arch/x86/kvm/vmx.c | 195 +++++++++++++++++++++++++++++++++++++++-
arch/x86/kvm/x86.c | 1 +
5 files changed, 218 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 45afaee..da772ed 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -69,6 +69,7 @@
#define SECONDARY_EXEC_PAUSE_LOOP_EXITING 0x00000400
#define SECONDARY_EXEC_ENABLE_INVPCID 0x00001000
#define SECONDARY_EXEC_SHADOW_VMCS 0x00004000
+#define SECONDARY_EXEC_ENABLE_PML 0x00020000
#define SECONDARY_EXEC_XSAVES 0x00100000
@@ -121,6 +122,7 @@ enum vmcs_field {
GUEST_LDTR_SELECTOR = 0x0000080c,
GUEST_TR_SELECTOR = 0x0000080e,
GUEST_INTR_STATUS = 0x00000810,
+ GUEST_PML_INDEX = 0x00000812,
HOST_ES_SELECTOR = 0x00000c00,
HOST_CS_SELECTOR = 0x00000c02,
HOST_SS_SELECTOR = 0x00000c04,
@@ -140,6 +142,8 @@ enum vmcs_field {
VM_EXIT_MSR_LOAD_ADDR_HIGH = 0x00002009,
VM_ENTRY_MSR_LOAD_ADDR = 0x0000200a,
VM_ENTRY_MSR_LOAD_ADDR_HIGH = 0x0000200b,
+ PML_ADDRESS = 0x0000200e,
+ PML_ADDRESS_HIGH = 0x0000200f,
TSC_OFFSET = 0x00002010,
TSC_OFFSET_HIGH = 0x00002011,
VIRTUAL_APIC_PAGE_ADDR = 0x00002012,
diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
index ff2b8e2..c5f1a1d 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -73,6 +73,7 @@
#define EXIT_REASON_XSETBV 55
#define EXIT_REASON_APIC_WRITE 56
#define EXIT_REASON_INVPCID 58
+#define EXIT_REASON_PML_FULL 62
#define EXIT_REASON_XSAVES 63
#define EXIT_REASON_XRSTORS 64
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 587149b..a139977 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -846,6 +846,24 @@ TRACE_EVENT(kvm_track_tsc,
__print_symbolic(__entry->host_clock, host_clocks))
);
+/*
+ * Tracepoint for PML full VMEXIT.
+ */
+TRACE_EVENT(kvm_pml_full,
+ TP_PROTO(unsigned int vcpu_id),
+ TP_ARGS(vcpu_id),
+
+ TP_STRUCT__entry(
+ __field( unsigned int, vcpu_id )
+ ),
+
+ TP_fast_assign(
+ __entry->vcpu_id = vcpu_id;
+ ),
+
+ TP_printk("vcpu %d: PML full", __entry->vcpu_id)
+);
+
#endif /* CONFIG_X86_64 */
TRACE_EVENT(kvm_ple_window,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c987374..de5ce82 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -101,6 +101,9 @@ module_param(nested, bool, S_IRUGO);
static u64 __read_mostly host_xss;
+static bool __read_mostly enable_pml = 1;
+module_param_named(pml, enable_pml, bool, S_IRUGO);
+
#define KVM_GUEST_CR0_MASK (X86_CR0_NW | X86_CR0_CD)
#define KVM_VM_CR0_ALWAYS_ON_UNRESTRICTED_GUEST (X86_CR0_WP | X86_CR0_NE)
#define KVM_VM_CR0_ALWAYS_ON \
@@ -516,6 +519,10 @@ struct vcpu_vmx {
/* Dynamic PLE window. */
int ple_window;
bool ple_window_dirty;
+
+ /* Support for PML */
+#define PML_ENTITY_NUM 512
+ struct page *pml_pg;
};
enum segment_cache_field {
@@ -1068,6 +1075,11 @@ static inline bool cpu_has_vmx_shadow_vmcs(void)
SECONDARY_EXEC_SHADOW_VMCS;
}
+static inline bool cpu_has_vmx_pml(void)
+{
+ return vmcs_config.cpu_based_2nd_exec_ctrl & SECONDARY_EXEC_ENABLE_PML;
+}
+
static inline bool report_flexpriority(void)
{
return flexpriority_enabled;
@@ -2924,7 +2936,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
SECONDARY_EXEC_APIC_REGISTER_VIRT |
SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
SECONDARY_EXEC_SHADOW_VMCS |
- SECONDARY_EXEC_XSAVES;
+ SECONDARY_EXEC_XSAVES |
+ SECONDARY_EXEC_ENABLE_PML;
if (adjust_vmx_controls(min2, opt2,
MSR_IA32_VMX_PROCBASED_CTLS2,
&_cpu_based_2nd_exec_control) < 0)
@@ -4355,6 +4368,9 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx)
a current VMCS12
*/
exec_control &= ~SECONDARY_EXEC_SHADOW_VMCS;
+ /* PML is enabled/disabled in creating/destorying vcpu */
+ exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
+
return exec_control;
}
@@ -5942,6 +5958,20 @@ static __init int hardware_setup(void)
update_ple_window_actual_max();
+ /*
+ * Only enable PML when hardware supports PML feature, and both EPT
+ * and EPT A/D bit features are enabled -- PML depends on them to work.
+ */
+ if (!enable_ept || !enable_ept_ad_bits || !cpu_has_vmx_pml())
+ enable_pml = 0;
+
+ if (!enable_pml) {
+ kvm_x86_ops->slot_enable_log_dirty = NULL;
+ kvm_x86_ops->slot_disable_log_dirty = NULL;
+ kvm_x86_ops->flush_log_dirty = NULL;
+ kvm_x86_ops->enable_log_dirty_pt_masked = NULL;
+ }
+
return alloc_kvm_area();
out7:
@@ -6971,6 +7001,31 @@ static bool vmx_test_pir(struct kvm_vcpu *vcpu, int vector)
return pi_test_pir(vector, &vmx->pi_desc);
}
+static int handle_pml_full(struct kvm_vcpu *vcpu)
+{
+ unsigned long exit_qualification;
+
+ trace_kvm_pml_full(vcpu->vcpu_id);
+
+ exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
+
+ /*
+ * PML buffer FULL happened while executing iret from NMI,
+ * "blocked by NMI" bit has to be set before next VM entry.
+ */
+ if (!(to_vmx(vcpu)->idt_vectoring_info & VECTORING_INFO_VALID_MASK) &&
+ cpu_has_virtual_nmis() &&
+ (exit_qualification & INTR_INFO_UNBLOCK_NMI))
+ vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO,
+ GUEST_INTR_STATE_NMI);
+
+ /*
+ * PML buffer already flushed at beginning of VMEXIT. Nothing to do
+ * here.., and there's no userspace involvement needed for PML.
+ */
+ return 1;
+}
+
/*
* The exit handlers return 1 if the exit was handled fully and guest execution
* may resume. Otherwise they set the kvm_run parameter to indicate what needs
@@ -7019,6 +7074,7 @@ static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
[EXIT_REASON_INVVPID] = handle_invvpid,
[EXIT_REASON_XSAVES] = handle_xsaves,
[EXIT_REASON_XRSTORS] = handle_xrstors,
+ [EXIT_REASON_PML_FULL] = handle_pml_full,
};
static const int kvm_vmx_max_exit_handlers =
@@ -7325,6 +7381,89 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2)
*info2 = vmcs_read32(VM_EXIT_INTR_INFO);
}
+static int vmx_enable_pml(struct vcpu_vmx *vmx)
+{
+ struct page *pml_pg;
+ u32 exec_control;
+
+ pml_pg = alloc_page(GFP_KERNEL | __GFP_ZERO);
+ if (!pml_pg)
+ return -ENOMEM;
+
+ vmx->pml_pg = pml_pg;
+
+ vmcs_write64(PML_ADDRESS, page_to_phys(vmx->pml_pg));
+ vmcs_write16(GUEST_PML_INDEX, PML_ENTITY_NUM - 1);
+
+ exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
+ exec_control |= SECONDARY_EXEC_ENABLE_PML;
+ vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
+
+ return 0;
+}
+
+static void vmx_disable_pml(struct vcpu_vmx *vmx)
+{
+ u32 exec_control;
+
+ ASSERT(vmx->pml_pg);
+ __free_page(vmx->pml_pg);
+ vmx->pml_pg = NULL;
+
+ exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
+ exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
+ vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
+}
+
+static void vmx_flush_pml_buffer(struct vcpu_vmx *vmx)
+{
+ struct kvm *kvm = vmx->vcpu.kvm;
+ u64 *pml_buf;
+ u16 pml_idx;
+
+ pml_idx = vmcs_read16(GUEST_PML_INDEX);
+
+ /* Do nothing if PML buffer is empty */
+ if (pml_idx == (PML_ENTITY_NUM - 1))
+ return;
+
+ /* PML index always points to next available PML buffer entity */
+ if (pml_idx >= PML_ENTITY_NUM)
+ pml_idx = 0;
+ else
+ pml_idx++;
+
+ pml_buf = page_address(vmx->pml_pg);
+ for (; pml_idx < PML_ENTITY_NUM; pml_idx++) {
+ u64 gpa;
+
+ gpa = pml_buf[pml_idx];
+ WARN_ON(gpa & (PAGE_SIZE - 1));
+ mark_page_dirty(kvm, gpa >> PAGE_SHIFT);
+ }
+
+ /* reset PML index */
+ vmcs_write16(GUEST_PML_INDEX, PML_ENTITY_NUM - 1);
+}
+
+/*
+ * Flush all vcpus' PML buffer and update logged GPAs to dirty_bitmap.
+ * Called before reporting dirty_bitmap to userspace.
+ */
+static void kvm_flush_pml_buffers(struct kvm *kvm)
+{
+ int i;
+ struct kvm_vcpu *vcpu;
+ /*
+ * We only need to kick vcpu out of guest mode here, as PML buffer
+ * is flushed at beginning of all VMEXITs, and it's obvious that only
+ * vcpus running in guest are possible to have unflushed GPAs in PML
+ * buffer.
+ */
+ kvm_for_each_vcpu(i, vcpu, kvm)
+ kvm_vcpu_kick(vcpu);
+}
+
/*
* The guest has exited. See if we can fix it or if we need userspace
* assistance.
@@ -7335,6 +7474,16 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
u32 exit_reason = vmx->exit_reason;
u32 vectoring_info = vmx->idt_vectoring_info;
+ /*
+ * Flush logged GPAs PML buffer, this will make dirty_bitmap more
+ * updated. Another good is, in kvm_vm_ioctl_get_dirty_log, before
+ * querying dirty_bitmap, we only need to kick all vcpus out of guest
+ * mode as if vcpus is in root mode, the PML buffer must has been
+ * flushed already.
+ */
+ if (enable_pml)
+ vmx_flush_pml_buffer(vmx);
+
/* If guest state is invalid, start emulating */
if (vmx->emulation_required)
return handle_invalid_guest_state(vcpu);
@@ -7981,6 +8130,8 @@ static void vmx_free_vcpu(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
+ if (enable_pml)
+ vmx_disable_pml(vmx);
free_vpid(vmx);
leave_guest_mode(vcpu);
vmx_load_vmcs01(vcpu);
@@ -8051,6 +8202,18 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
vmx->nested.current_vmptr = -1ull;
vmx->nested.current_vmcs12 = NULL;
+ /*
+ * If PML is turned on, failure on enabling PML just results in failure
+ * of creating the vcpu, therefore we can simplify PML logic (by
+ * avoiding dealing with cases, such as enabling PML partially on vcpus
+ * for the guest, etc.
+ */
+ if (enable_pml) {
+ err = vmx_enable_pml(vmx);
+ if (err)
+ goto free_vmcs;
+ }
+
return &vmx->vcpu;
free_vmcs:
@@ -9492,6 +9655,31 @@ static void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
shrink_ple_window(vcpu);
}
+static void vmx_slot_enable_log_dirty(struct kvm *kvm,
+ struct kvm_memory_slot *slot)
+{
+ kvm_mmu_slot_leaf_clear_dirty(kvm, slot);
+ kvm_mmu_slot_largepage_remove_write_access(kvm, slot);
+}
+
+static void vmx_slot_disable_log_dirty(struct kvm *kvm,
+ struct kvm_memory_slot *slot)
+{
+ kvm_mmu_slot_set_dirty(kvm, slot);
+}
+
+static void vmx_flush_log_dirty(struct kvm *kvm)
+{
+ kvm_flush_pml_buffers(kvm);
+}
+
+static void vmx_enable_log_dirty_pt_masked(struct kvm *kvm,
+ struct kvm_memory_slot *memslot,
+ gfn_t offset, unsigned long mask)
+{
+ kvm_mmu_clear_dirty_pt_masked(kvm, memslot, offset, mask);
+}
+
static struct kvm_x86_ops vmx_x86_ops = {
.cpu_has_kvm_support = cpu_has_kvm_support,
.disabled_by_bios = vmx_disabled_by_bios,
@@ -9601,6 +9789,11 @@ static struct kvm_x86_ops vmx_x86_ops = {
.check_nested_events = vmx_check_nested_events,
.sched_in = vmx_sched_in,
+
+ .slot_enable_log_dirty = vmx_slot_enable_log_dirty,
+ .slot_disable_log_dirty = vmx_slot_disable_log_dirty,
+ .flush_log_dirty = vmx_flush_log_dirty,
+ .enable_log_dirty_pt_masked = vmx_enable_log_dirty_pt_masked,
};
static int __init vmx_init(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 442ee7d..1373e04 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7880,3 +7880,4 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_skinit);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_intercepts);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_write_tsc_offset);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_ple_window);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_pml_full);
--
2.1.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 6/6] KVM: VMX: Add PML support in VMX
2015-01-28 2:54 ` [PATCH 6/6] KVM: VMX: Add PML support in VMX Kai Huang
@ 2015-02-03 15:18 ` Radim Krčmář
2015-02-03 15:39 ` Paolo Bonzini
2015-02-05 6:23 ` Kai Huang
0 siblings, 2 replies; 22+ messages in thread
From: Radim Krčmář @ 2015-02-03 15:18 UTC (permalink / raw)
To: Kai Huang; +Cc: pbonzini, gleb, linux, kvm
2015-01-28 10:54+0800, Kai Huang:
> This patch adds PML support in VMX. A new module parameter 'enable_pml' is added
(+module_param_named(pml, enable_pml, bool, S_IRUGO);)
> to allow user to enable/disable it manually.
>
> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
> ---
> diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
> index 587149b..a139977 100644
> --- a/arch/x86/kvm/trace.h
> +++ b/arch/x86/kvm/trace.h
> @@ -846,6 +846,24 @@ TRACE_EVENT(kvm_track_tsc,
> +TRACE_EVENT(kvm_pml_full,
> + TP_PROTO(unsigned int vcpu_id),
> + TP_ARGS(vcpu_id),
> +
> + TP_STRUCT__entry(
> + __field( unsigned int, vcpu_id )
> + ),
> +
> + TP_fast_assign(
> + __entry->vcpu_id = vcpu_id;
> + ),
> +
> + TP_printk("vcpu %d: PML full", __entry->vcpu_id)
> +);
> +
> #endif /* CONFIG_X86_64 */
You have it protected by CONFIG_X86_64, but use it unconditionally.
(Also, we can get all this information from kvm_exit tracepoint;
I'd just drop it.)
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index c987374..de5ce82 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -516,6 +519,10 @@ struct vcpu_vmx {
> + /* Support for PML */
> +#define PML_ENTITY_NUM 512
(Readers of this struct likely aren't interested in this number, so it
would be nicer to define it outside. I thought it is here to hint the
amount of allocated pages, but PML_ENTITY_NUM / 512 isn't obvious.)
> + struct page *pml_pg;
> @@ -4355,6 +4368,9 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx)
> a current VMCS12
> */
> exec_control &= ~SECONDARY_EXEC_SHADOW_VMCS;
> + /* PML is enabled/disabled in creating/destorying vcpu */
> + exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
What is the harm of enabling it here?
(SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES seems similar and does it.)
> +
> return exec_control;
> }
> @@ -6971,6 +7001,31 @@ static bool vmx_test_pir(struct kvm_vcpu *vcpu, int vector)
> +static int handle_pml_full(struct kvm_vcpu *vcpu)
> +{
> + unsigned long exit_qualification;
> +
> + trace_kvm_pml_full(vcpu->vcpu_id);
> +
> + exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
> +
> + /*
> + * PML buffer FULL happened while executing iret from NMI,
> + * "blocked by NMI" bit has to be set before next VM entry.
> + */
> + if (!(to_vmx(vcpu)->idt_vectoring_info & VECTORING_INFO_VALID_MASK) &&
> + cpu_has_virtual_nmis() &&
> + (exit_qualification & INTR_INFO_UNBLOCK_NMI))
> + vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO,
> + GUEST_INTR_STATE_NMI);
Relevant part of the specification is pasted from 27.2.2 (Information
for VM Exits Due to Vectored Events), which makes me think that this bit
is mirrored to IDT-vectoring information field ...
Isn't vmx_recover_nmi_blocking() enough?
(I see the same code in handle_ept_violation(), but wasn't that needed
just because of a hardware error?)
> +
> + /*
> + * PML buffer already flushed at beginning of VMEXIT. Nothing to do
> + * here.., and there's no userspace involvement needed for PML.
> + */
> + return 1;
> +}
> @@ -7325,6 +7381,89 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2)
> +static int vmx_enable_pml(struct vcpu_vmx *vmx)
> +{
> + struct page *pml_pg;
> + u32 exec_control;
> +
> + pml_pg = alloc_page(GFP_KERNEL | __GFP_ZERO);
> + if (!pml_pg)
> + return -ENOMEM;
> +
> + vmx->pml_pg = pml_pg;
(It's safe to use vmx->pml_pg directly.)
> +
> + vmcs_write64(PML_ADDRESS, page_to_phys(vmx->pml_pg));
> + vmcs_write16(GUEST_PML_INDEX, PML_ENTITY_NUM - 1);
> +
> + exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
> + exec_control |= SECONDARY_EXEC_ENABLE_PML;
> + vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
> +
> + return 0;
> +}
> +static void vmx_flush_pml_buffer(struct vcpu_vmx *vmx)
> +{
> + struct kvm *kvm = vmx->vcpu.kvm;
> + u64 *pml_buf;
> + u16 pml_idx;
> +
> + pml_idx = vmcs_read16(GUEST_PML_INDEX);
> +
> + /* Do nothing if PML buffer is empty */
> + if (pml_idx == (PML_ENTITY_NUM - 1))
> + return;
> +
> + /* PML index always points to next available PML buffer entity */
> + if (pml_idx >= PML_ENTITY_NUM)
> + pml_idx = 0;
> + else
> + pml_idx++;
If the pml_idx is >= PML_ENTITY_NUM and < 0xffff, the log is empty,
so it would be better to use just 'pml_idx++' and unsigned modulo.
(Could also 'assert(pml_idx < PML_ENTITY_NUM)' then.)
> +
> + pml_buf = page_address(vmx->pml_pg);
> + for (; pml_idx < PML_ENTITY_NUM; pml_idx++) {
> + u64 gpa;
> +
> + gpa = pml_buf[pml_idx];
> + WARN_ON(gpa & (PAGE_SIZE - 1));
> + mark_page_dirty(kvm, gpa >> PAGE_SHIFT);
> + }
> +
> + /* reset PML index */
> + vmcs_write16(GUEST_PML_INDEX, PML_ENTITY_NUM - 1);
> +}
> @@ -9492,6 +9655,31 @@ static void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
> +static void vmx_slot_enable_log_dirty(struct kvm *kvm,
> + struct kvm_memory_slot *slot)
> +{
> + kvm_mmu_slot_leaf_clear_dirty(kvm, slot);
(New slot contains dirty pages?)
> + kvm_mmu_slot_largepage_remove_write_access(kvm, slot);
> +}
Thanks.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 6/6] KVM: VMX: Add PML support in VMX
2015-02-03 15:18 ` Radim Krčmář
@ 2015-02-03 15:39 ` Paolo Bonzini
2015-02-03 16:02 ` Radim Krčmář
2015-02-05 6:23 ` Kai Huang
1 sibling, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2015-02-03 15:39 UTC (permalink / raw)
To: Radim Krčmář, Kai Huang; +Cc: gleb, linux, kvm
On 03/02/2015 16:18, Radim Krčmář wrote:
> (I see the same code in handle_ept_violation(), but wasn't that needed
> just because of a hardware error?)
That was how I read it initially, but actually that means: "this
statement could be broken if the processor has that erratum".
>> +static void vmx_slot_enable_log_dirty(struct kvm *kvm,
>> + struct kvm_memory_slot *slot)
>> +{
>> + kvm_mmu_slot_leaf_clear_dirty(kvm, slot);
>
> (New slot contains dirty pages?)
New slots contain clean pages as far as the KVM dirty log is concerned.
In the case of PML, note that D=1 does not mean the page is dirty. It
only means that writes will not be logged by PML. The page may thus
also have logging disabled.
Paolo
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 5/6] KVM: x86: Add new dirty logging kvm_x86_ops for PML
2015-01-28 2:54 ` [PATCH 5/6] KVM: x86: Add new dirty logging kvm_x86_ops for PML Kai Huang
@ 2015-02-03 15:53 ` Radim Krčmář
2015-02-05 6:29 ` Kai Huang
0 siblings, 1 reply; 22+ messages in thread
From: Radim Krčmář @ 2015-02-03 15:53 UTC (permalink / raw)
To: Kai Huang; +Cc: pbonzini, gleb, linux, kvm
2015-01-28 10:54+0800, Kai Huang:
> This patch adds new kvm_x86_ops dirty logging hooks to enable/disable dirty
> logging for particular memory slot, and to flush potentially logged dirty GPAs
> before reporting slot->dirty_bitmap to userspace.
>
> kvm x86 common code calls these hooks when they are available so PML logic can
> be hidden to VMX specific. Other ARCHs won't be impacted as these hooks are NULL
> for them.
>
> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
> ---
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -802,6 +802,31 @@ struct kvm_x86_ops {
> +
> + /*
> + * Arch-specific dirty logging hooks. These hooks are only supposed to
> + * be valid if the specific arch has hardware-accelerated dirty logging
> + * mechanism. Currently only for PML on VMX.
> + *
> + * - slot_enable_log_dirty:
> + * called when enabling log dirty mode for the slot.
(I guess that "log dirty mode" isn't the meaning that people will think
after seeing 'log_dirty' ...
I'd at least change 'log_dirty' to 'dirty_log' in these names.)
> + * - slot_disable_log_dirty:
> + * called when disabling log dirty mode for the slot.
> + * also called when slot is created with log dirty disabled.
> + * - flush_log_dirty:
> + * called before reporting dirty_bitmap to userspace.
> + * - enable_log_dirty_pt_masked:
> + * called when reenabling log dirty for the GFNs in the mask after
> + * corresponding bits are cleared in slot->dirty_bitmap.
This name is very confusing ... I think we should hint that this is
called after we learn that the page has been written to and would like
to monitor it again.
Using something like collected/refresh? (I'd have to do horrible things
to come up with a good name, sorry.)
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3780,6 +3780,12 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>
> mutex_lock(&kvm->slots_lock);
>
> + /*
> + * Flush potentially hardware-cached dirty pages to dirty_bitmap.
> + */
> + if (kvm_x86_ops->flush_log_dirty)
> + kvm_x86_ops->flush_log_dirty(kvm);
(Flushing would make more sense in kvm_get_dirty_log_protect().)
> +
> r = kvm_get_dirty_log_protect(kvm, log, &is_dirty);
>
> /*
> @@ -7533,6 +7539,56 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
> return 0;
> }
>
> +static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
> + struct kvm_memory_slot *new)
> +{
> + /* Still write protect RO slot */
> + if (new->flags & KVM_MEM_READONLY) {
> + kvm_mmu_slot_remove_write_access(kvm, new);
We didn't write protect RO slots before, does this patch depend on it?
> @@ -7562,16 +7618,15 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
> - if ((change != KVM_MR_DELETE) && (new->flags & KVM_MEM_LOG_DIRTY_PAGES))
> - kvm_mmu_slot_remove_write_access(kvm, new);
> + if (change != KVM_MR_DELETE)
> + kvm_mmu_slot_apply_flags(kvm, new);
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 6/6] KVM: VMX: Add PML support in VMX
2015-02-03 15:39 ` Paolo Bonzini
@ 2015-02-03 16:02 ` Radim Krčmář
0 siblings, 0 replies; 22+ messages in thread
From: Radim Krčmář @ 2015-02-03 16:02 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Kai Huang, gleb, linux, kvm
2015-02-03 16:39+0100, Paolo Bonzini:
>
>
> On 03/02/2015 16:18, Radim Krčmář wrote:
> > (I see the same code in handle_ept_violation(), but wasn't that needed
> > just because of a hardware error?)
>
> That was how I read it initially, but actually that means: "this
> statement could be broken if the processor has that erratum".
Thanks, that was a nice ruse for the original bug :)
> >> +static void vmx_slot_enable_log_dirty(struct kvm *kvm,
> >> + struct kvm_memory_slot *slot)
> >> +{
> >> + kvm_mmu_slot_leaf_clear_dirty(kvm, slot);
> >
> > (New slot contains dirty pages?)
>
> New slots contain clean pages as far as the KVM dirty log is concerned.
>
> In the case of PML, note that D=1 does not mean the page is dirty. It
> only means that writes will not be logged by PML. The page may thus
> also have logging disabled.
Yeah, it would be a problem if we had dirty pages at the beginning, but
I don't think it is possible as was too lazy to check.
(It's not important and I wanted to do this review today :)
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 4/6] KVM: x86: Change parameter of kvm_mmu_slot_remove_write_access
2015-01-28 2:54 ` [PATCH 4/6] KVM: x86: Change parameter of kvm_mmu_slot_remove_write_access Kai Huang
@ 2015-02-03 16:28 ` Radim Krčmář
0 siblings, 0 replies; 22+ messages in thread
From: Radim Krčmář @ 2015-02-03 16:28 UTC (permalink / raw)
To: Kai Huang; +Cc: pbonzini, gleb, linux, kvm
2015-01-28 10:54+0800, Kai Huang:
> This patch changes the second parameter of kvm_mmu_slot_remove_write_access from
> 'slot id' to 'struct kvm_memory_slot *' to align with kvm_x86_ops dirty logging
> hooks, which will be introduced in further patch.
>
> Better way is to change second parameter of kvm_arch_commit_memory_region from
> 'struct kvm_userspace_memory_region *' to 'struct kvm_memory_slot * new', but it
> requires changes on other non-x86 ARCH too, so avoid it now.
>
> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
> ---
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7557,6 +7557,10 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>
> if (nr_mmu_pages)
> kvm_mmu_change_mmu_pages(kvm, nr_mmu_pages);
> +
> + /* It's OK to get 'new' slot here as it has already been installed */
> + new = id_to_memslot(kvm->memslots, mem->slot);
(Doint it early doesn't hurt.)
> @@ -7566,8 +7570,8 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
> *
> * See the comments in fast_page_fault().
> */
> - if ((change != KVM_MR_DELETE) && (mem->flags & KVM_MEM_LOG_DIRTY_PAGES))
> - kvm_mmu_slot_remove_write_access(kvm, mem->slot);
> + if ((change != KVM_MR_DELETE) && (new->flags & KVM_MEM_LOG_DIRTY_PAGES))
> + kvm_mmu_slot_remove_write_access(kvm, new);
> }
>
> void kvm_arch_flush_shadow_all(struct kvm *kvm)
> --
> 2.1.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/6] KVM: MMU: Add mmu help functions to support PML
2015-01-28 2:54 ` [PATCH 2/6] KVM: MMU: Add mmu help functions to support PML Kai Huang
@ 2015-02-03 17:34 ` Radim Krčmář
2015-02-05 5:59 ` Kai Huang
0 siblings, 1 reply; 22+ messages in thread
From: Radim Krčmář @ 2015-02-03 17:34 UTC (permalink / raw)
To: Kai Huang; +Cc: pbonzini, gleb, linux, kvm
2015-01-28 10:54+0800, Kai Huang:
> This patch adds new mmu layer functions to clear/set D-bit for memory slot, and
> to write protect superpages for memory slot.
>
> In case of PML, CPU logs the dirty GPA automatically to PML buffer when CPU
> updates D-bit from 0 to 1, therefore we don't have to write protect 4K pages,
> instead, we only need to clear D-bit in order to log that GPA.
>
> For superpages, we still write protect it and let page fault code to handle
> dirty page logging, as we still need to split superpage to 4K pages in PML.
>
> As PML is always enabled during guest's lifetime, to eliminate unnecessary PML
> GPA logging, we set D-bit manually for the slot with dirty logging disabled.
>
> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
This message contains just several stylistic tips, I wasn't able to
learn enough about large pages to review today.
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index b18e65c..c438224 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -4368,6 +4448,121 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot)
> +void kvm_mmu_slot_largepage_remove_write_access(struct kvm *kvm,
> + struct kvm_memory_slot *memslot)
> +{
> + gfn_t last_gfn;
> + int i;
> + bool flush = false;
> +
> + last_gfn = memslot->base_gfn + memslot->npages - 1;
> +
> + spin_lock(&kvm->mmu_lock);
> +
> + for (i = PT_PAGE_TABLE_LEVEL + 1; /* skip rmap for 4K page */
("+ 1" is the only difference from kvm_mmu_slot_remove_write_access();
new argument, initial level, would be better.
Btw, PT_PAGE_TABLE_LEVEL + 1 == PT_DIRECTORY_LEVEL)
> + i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
> + unsigned long *rmapp;
> + unsigned long last_index, index;
> +
> + rmapp = memslot->arch.rmap[i - PT_PAGE_TABLE_LEVEL];
> + last_index = gfn_to_index(last_gfn, memslot->base_gfn, i);
> +
> + for (index = 0; index <= last_index; ++index, ++rmapp) {
> + if (*rmapp)
> + flush |= __rmap_write_protect(kvm, rmapp,
> + false);
> +
> + if (need_resched() || spin_needbreak(&kvm->mmu_lock))
> + cond_resched_lock(&kvm->mmu_lock);
> + }
> + }
> + spin_unlock(&kvm->mmu_lock);
> +
> + /* see kvm_mmu_slot_remove_write_access */
> + lockdep_assert_held(&kvm->slots_lock);
> +
> + if (flush)
> + kvm_flush_remote_tlbs(kvm);
> +}
> +void kvm_mmu_slot_set_dirty(struct kvm *kvm,
> + struct kvm_memory_slot *memslot)
> +{
> + gfn_t last_gfn;
> + int i;
> + bool flush = false;
> +
> + last_gfn = memslot->base_gfn + memslot->npages - 1;
> +
> + spin_lock(&kvm->mmu_lock);
> +
> + for (i = PT_PAGE_TABLE_LEVEL;
> + i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
> + unsigned long *rmapp;
> + unsigned long last_index, index;
> +
> + rmapp = memslot->arch.rmap[i - PT_PAGE_TABLE_LEVEL];
> + last_index = gfn_to_index(last_gfn, memslot->base_gfn, i);
> +
> + for (index = 0; index <= last_index; ++index, ++rmapp) {
> + if (*rmapp)
> + flush |= __rmap_set_dirty(kvm, rmapp);
(And yet another similar walker ... We can either pass a worker function
accepting 'kvm' and 'rmapp', use a 'switch' with a new operations enum,
or have code duplication. Paolo?)
> +
> + if (need_resched() || spin_needbreak(&kvm->mmu_lock))
> + cond_resched_lock(&kvm->mmu_lock);
> + }
> + }
> +
> + spin_unlock(&kvm->mmu_lock);
> +
> + lockdep_assert_held(&kvm->slots_lock);
> +
> + /* see kvm_mmu_slot_leaf_clear_dirty */
> + if (flush)
> + kvm_flush_remote_tlbs(kvm);
> +}
> +EXPORT_SYMBOL_GPL(kvm_mmu_slot_set_dirty);
> +void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
> + struct kvm_memory_slot *memslot)
> +{
> + gfn_t last_gfn;
> + unsigned long *rmapp;
> + unsigned long last_index, index;
> + bool flush = false;
> +
> + last_gfn = memslot->base_gfn + memslot->npages - 1;
> +
> + spin_lock(&kvm->mmu_lock);
> +
(If we abstracted with switch or function, we could also add max level
argument to cover this function.)
> + rmapp = memslot->arch.rmap[PT_PAGE_TABLE_LEVEL - 1];
> + last_index = gfn_to_index(last_gfn, memslot->base_gfn,
> + PT_PAGE_TABLE_LEVEL);
> +
> + for (index = 0; index <= last_index; ++index, ++rmapp) {
> + if (*rmapp)
> + flush |= __rmap_clear_dirty(kvm, rmapp);
> +
> + if (need_resched() || spin_needbreak(&kvm->mmu_lock))
> + cond_resched_lock(&kvm->mmu_lock);
> + }
> +
> + spin_unlock(&kvm->mmu_lock);
> +
> + lockdep_assert_held(&kvm->slots_lock);
> +
> + /*
> + * It's also safe to flush TLBs out of mmu lock here as currently this
> + * function is only used for dirty logging, in which case flushing TLB
> + * out of mmu lock also guarantees no dirty pages will be lost in
> + * dirty_bitmap.
> + */
> + if (flush)
> + kvm_flush_remote_tlbs(kvm);
> +}
> +EXPORT_SYMBOL_GPL(kvm_mmu_slot_leaf_clear_dirty);
> @@ -1215,6 +1215,60 @@ static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
(After looking at following two pairs of functions, you'll hopefully
understand why I don't like C very much.)
> +static bool spte_clear_dirty(struct kvm *kvm, u64 *sptep)
> +{
> + u64 spte = *sptep;
> +
> + rmap_printk("rmap_clear_dirty: spte %p %llx\n", sptep, *sptep);
> +
> + spte &= ~shadow_dirty_mask;
> +
> + return mmu_spte_update(sptep, spte);
> +}
> +static bool spte_set_dirty(struct kvm *kvm, u64 *sptep)
> +{
> + u64 spte = *sptep;
> +
> + rmap_printk("rmap_set_dirty: spte %p %llx\n", sptep, *sptep);
> +
> + spte |= shadow_dirty_mask;
> +
> + return mmu_spte_update(sptep, spte);
> +}
> +static bool __rmap_clear_dirty(struct kvm *kvm, unsigned long *rmapp)
> +{
> + u64 *sptep;
> + struct rmap_iterator iter;
> + bool flush = false;
> +
> + for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
> + BUG_ON(!(*sptep & PT_PRESENT_MASK));
> +
> + flush |= spte_clear_dirty(kvm, sptep);
> + sptep = rmap_get_next(&iter);
> + }
> +
> + return flush;
> +}
> +static bool __rmap_set_dirty(struct kvm *kvm, unsigned long *rmapp)
> +{
> + u64 *sptep;
> + struct rmap_iterator iter;
> + bool flush = false;
> +
> + for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
> + BUG_ON(!(*sptep & PT_PRESENT_MASK));
> +
> + flush |= spte_set_dirty(kvm, sptep);
> + sptep = rmap_get_next(&iter);
> + }
> +
> + return flush;
> +}
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/6] KVM: MMU: Add mmu help functions to support PML
2015-02-03 17:34 ` Radim Krčmář
@ 2015-02-05 5:59 ` Kai Huang
2015-02-05 14:51 ` Radim Krčmář
0 siblings, 1 reply; 22+ messages in thread
From: Kai Huang @ 2015-02-05 5:59 UTC (permalink / raw)
To: Radim Krčmář; +Cc: pbonzini, gleb, linux, kvm
On 02/04/2015 01:34 AM, Radim Krčmář wrote:
> 2015-01-28 10:54+0800, Kai Huang:
>> This patch adds new mmu layer functions to clear/set D-bit for memory slot, and
>> to write protect superpages for memory slot.
>>
>> In case of PML, CPU logs the dirty GPA automatically to PML buffer when CPU
>> updates D-bit from 0 to 1, therefore we don't have to write protect 4K pages,
>> instead, we only need to clear D-bit in order to log that GPA.
>>
>> For superpages, we still write protect it and let page fault code to handle
>> dirty page logging, as we still need to split superpage to 4K pages in PML.
>>
>> As PML is always enabled during guest's lifetime, to eliminate unnecessary PML
>> GPA logging, we set D-bit manually for the slot with dirty logging disabled.
>>
>> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
> This message contains just several stylistic tips, I wasn't able to
> learn enough about large pages to review today.
Hi Radim,
Thanks for your review and sorry for the late reply. I was working on
something else these days.
There are two reasons we still write protect the large page in case of
PML. One is we still need to split large page to 4K page in dirty
logging mode, and PML doesn't split large page automatically. The second
is PML only logs dirty GPA but it doesn't distinguish if the dirty GPA
is in 4K page or large page.
For rest of your comments, as this patch series have already been in
Paolo's queue branch, I think I can send further patches to enhance
them, if Paolo agrees the enhancement is needed.
Thanks,
-Kai
>
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index b18e65c..c438224 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -4368,6 +4448,121 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot)
>> +void kvm_mmu_slot_largepage_remove_write_access(struct kvm *kvm,
>> + struct kvm_memory_slot *memslot)
>> +{
>> + gfn_t last_gfn;
>> + int i;
>> + bool flush = false;
>> +
>> + last_gfn = memslot->base_gfn + memslot->npages - 1;
>> +
>> + spin_lock(&kvm->mmu_lock);
>> +
>> + for (i = PT_PAGE_TABLE_LEVEL + 1; /* skip rmap for 4K page */
> ("+ 1" is the only difference from kvm_mmu_slot_remove_write_access();
> new argument, initial level, would be better.
>
> Btw, PT_PAGE_TABLE_LEVEL + 1 == PT_DIRECTORY_LEVEL)
>
>> + i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
>> + unsigned long *rmapp;
>> + unsigned long last_index, index;
>> +
>> + rmapp = memslot->arch.rmap[i - PT_PAGE_TABLE_LEVEL];
>> + last_index = gfn_to_index(last_gfn, memslot->base_gfn, i);
>> +
>> + for (index = 0; index <= last_index; ++index, ++rmapp) {
>> + if (*rmapp)
>> + flush |= __rmap_write_protect(kvm, rmapp,
>> + false);
>> +
>> + if (need_resched() || spin_needbreak(&kvm->mmu_lock))
>> + cond_resched_lock(&kvm->mmu_lock);
>> + }
>> + }
>> + spin_unlock(&kvm->mmu_lock);
>> +
>> + /* see kvm_mmu_slot_remove_write_access */
>> + lockdep_assert_held(&kvm->slots_lock);
>> +
>> + if (flush)
>> + kvm_flush_remote_tlbs(kvm);
>> +}
>> +void kvm_mmu_slot_set_dirty(struct kvm *kvm,
>> + struct kvm_memory_slot *memslot)
>> +{
>> + gfn_t last_gfn;
>> + int i;
>> + bool flush = false;
>> +
>> + last_gfn = memslot->base_gfn + memslot->npages - 1;
>> +
>> + spin_lock(&kvm->mmu_lock);
>> +
>> + for (i = PT_PAGE_TABLE_LEVEL;
>> + i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
>> + unsigned long *rmapp;
>> + unsigned long last_index, index;
>> +
>> + rmapp = memslot->arch.rmap[i - PT_PAGE_TABLE_LEVEL];
>> + last_index = gfn_to_index(last_gfn, memslot->base_gfn, i);
>> +
>> + for (index = 0; index <= last_index; ++index, ++rmapp) {
>> + if (*rmapp)
>> + flush |= __rmap_set_dirty(kvm, rmapp);
> (And yet another similar walker ... We can either pass a worker function
> accepting 'kvm' and 'rmapp', use a 'switch' with a new operations enum,
> or have code duplication. Paolo?)
>
>> +
>> + if (need_resched() || spin_needbreak(&kvm->mmu_lock))
>> + cond_resched_lock(&kvm->mmu_lock);
>> + }
>> + }
>> +
>> + spin_unlock(&kvm->mmu_lock);
>> +
>> + lockdep_assert_held(&kvm->slots_lock);
>> +
>> + /* see kvm_mmu_slot_leaf_clear_dirty */
>> + if (flush)
>> + kvm_flush_remote_tlbs(kvm);
>> +}
>> +EXPORT_SYMBOL_GPL(kvm_mmu_slot_set_dirty);
>> +void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
>> + struct kvm_memory_slot *memslot)
>> +{
>> + gfn_t last_gfn;
>> + unsigned long *rmapp;
>> + unsigned long last_index, index;
>> + bool flush = false;
>> +
>> + last_gfn = memslot->base_gfn + memslot->npages - 1;
>> +
>> + spin_lock(&kvm->mmu_lock);
>> +
> (If we abstracted with switch or function, we could also add max level
> argument to cover this function.)
>
>> + rmapp = memslot->arch.rmap[PT_PAGE_TABLE_LEVEL - 1];
>> + last_index = gfn_to_index(last_gfn, memslot->base_gfn,
>> + PT_PAGE_TABLE_LEVEL);
>> +
>> + for (index = 0; index <= last_index; ++index, ++rmapp) {
>> + if (*rmapp)
>> + flush |= __rmap_clear_dirty(kvm, rmapp);
>> +
>> + if (need_resched() || spin_needbreak(&kvm->mmu_lock))
>> + cond_resched_lock(&kvm->mmu_lock);
>> + }
>> +
>> + spin_unlock(&kvm->mmu_lock);
>> +
>> + lockdep_assert_held(&kvm->slots_lock);
>> +
>> + /*
>> + * It's also safe to flush TLBs out of mmu lock here as currently this
>> + * function is only used for dirty logging, in which case flushing TLB
>> + * out of mmu lock also guarantees no dirty pages will be lost in
>> + * dirty_bitmap.
>> + */
>> + if (flush)
>> + kvm_flush_remote_tlbs(kvm);
>> +}
>> +EXPORT_SYMBOL_GPL(kvm_mmu_slot_leaf_clear_dirty);
>> @@ -1215,6 +1215,60 @@ static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
> (After looking at following two pairs of functions, you'll hopefully
> understand why I don't like C very much.)
>
>> +static bool spte_clear_dirty(struct kvm *kvm, u64 *sptep)
>> +{
>> + u64 spte = *sptep;
>> +
>> + rmap_printk("rmap_clear_dirty: spte %p %llx\n", sptep, *sptep);
>> +
>> + spte &= ~shadow_dirty_mask;
>> +
>> + return mmu_spte_update(sptep, spte);
>> +}
>> +static bool spte_set_dirty(struct kvm *kvm, u64 *sptep)
>> +{
>> + u64 spte = *sptep;
>> +
>> + rmap_printk("rmap_set_dirty: spte %p %llx\n", sptep, *sptep);
>> +
>> + spte |= shadow_dirty_mask;
>> +
>> + return mmu_spte_update(sptep, spte);
>> +}
>> +static bool __rmap_clear_dirty(struct kvm *kvm, unsigned long *rmapp)
>> +{
>> + u64 *sptep;
>> + struct rmap_iterator iter;
>> + bool flush = false;
>> +
>> + for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
>> + BUG_ON(!(*sptep & PT_PRESENT_MASK));
>> +
>> + flush |= spte_clear_dirty(kvm, sptep);
>> + sptep = rmap_get_next(&iter);
>> + }
>> +
>> + return flush;
>> +}
>> +static bool __rmap_set_dirty(struct kvm *kvm, unsigned long *rmapp)
>> +{
>> + u64 *sptep;
>> + struct rmap_iterator iter;
>> + bool flush = false;
>> +
>> + for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
>> + BUG_ON(!(*sptep & PT_PRESENT_MASK));
>> +
>> + flush |= spte_set_dirty(kvm, sptep);
>> + sptep = rmap_get_next(&iter);
>> + }
>> +
>> + return flush;
>> +}
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 6/6] KVM: VMX: Add PML support in VMX
2015-02-03 15:18 ` Radim Krčmář
2015-02-03 15:39 ` Paolo Bonzini
@ 2015-02-05 6:23 ` Kai Huang
2015-02-05 15:04 ` Radim Krčmář
2015-02-06 16:00 ` Paolo Bonzini
1 sibling, 2 replies; 22+ messages in thread
From: Kai Huang @ 2015-02-05 6:23 UTC (permalink / raw)
To: Radim Krčmář; +Cc: pbonzini, gleb, linux, kvm
On 02/03/2015 11:18 PM, Radim Krčmář wrote:
> 2015-01-28 10:54+0800, Kai Huang:
>> This patch adds PML support in VMX. A new module parameter 'enable_pml' is added
> (+module_param_named(pml, enable_pml, bool, S_IRUGO);)
>
>> to allow user to enable/disable it manually.
>>
>> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
>> ---
>> diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
>> index 587149b..a139977 100644
>> --- a/arch/x86/kvm/trace.h
>> +++ b/arch/x86/kvm/trace.h
>> @@ -846,6 +846,24 @@ TRACE_EVENT(kvm_track_tsc,
>> +TRACE_EVENT(kvm_pml_full,
>> + TP_PROTO(unsigned int vcpu_id),
>> + TP_ARGS(vcpu_id),
>> +
>> + TP_STRUCT__entry(
>> + __field( unsigned int, vcpu_id )
>> + ),
>> +
>> + TP_fast_assign(
>> + __entry->vcpu_id = vcpu_id;
>> + ),
>> +
>> + TP_printk("vcpu %d: PML full", __entry->vcpu_id)
>> +);
>> +
>> #endif /* CONFIG_X86_64 */
> You have it protected by CONFIG_X86_64, but use it unconditionally.
Thanks for catching. This has been fixed by another patch, and the fix
has also been merged by Paolo.
Thanks,
-Kai
>
> (Also, we can get all this information from kvm_exit tracepoint;
> I'd just drop it.)
>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index c987374..de5ce82 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -516,6 +519,10 @@ struct vcpu_vmx {
>> + /* Support for PML */
>> +#define PML_ENTITY_NUM 512
> (Readers of this struct likely aren't interested in this number, so it
> would be nicer to define it outside. I thought it is here to hint the
> amount of allocated pages, but PML_ENTITY_NUM / 512 isn't obvious.)
>
>> + struct page *pml_pg;
>> @@ -4355,6 +4368,9 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx)
>> a current VMCS12
>> */
>> exec_control &= ~SECONDARY_EXEC_SHADOW_VMCS;
>> + /* PML is enabled/disabled in creating/destorying vcpu */
>> + exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
> What is the harm of enabling it here?
>
> (SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES seems similar and does it.)
Because the PML feature detection is unconditional (meaning
SECONDARY_EXEC_ENABLE_PML is always in
vmcs_config.cpu_based_2nd_exec_ctrl), but the PML buffer is only created
when vcpu is created, and it is controlled by 'enable_pml' module
parameter, if we always enable SECONDARY_EXEC_ENABLE_PML here, no PML
buffer will be created if PML is disabled by 'enable_pml' parameter, so
it's better to enable it along with creating PML buffer.
>
>> +
>> return exec_control;
>> }
>> @@ -6971,6 +7001,31 @@ static bool vmx_test_pir(struct kvm_vcpu *vcpu, int vector)
>> +static int handle_pml_full(struct kvm_vcpu *vcpu)
>> +{
>> + unsigned long exit_qualification;
>> +
>> + trace_kvm_pml_full(vcpu->vcpu_id);
>> +
>> + exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
>> +
>> + /*
>> + * PML buffer FULL happened while executing iret from NMI,
>> + * "blocked by NMI" bit has to be set before next VM entry.
>> + */
>> + if (!(to_vmx(vcpu)->idt_vectoring_info & VECTORING_INFO_VALID_MASK) &&
>> + cpu_has_virtual_nmis() &&
>> + (exit_qualification & INTR_INFO_UNBLOCK_NMI))
>> + vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO,
>> + GUEST_INTR_STATE_NMI);
> Relevant part of the specification is pasted from 27.2.2 (Information
> for VM Exits Due to Vectored Events), which makes me think that this bit
> is mirrored to IDT-vectoring information field ...
>
> Isn't vmx_recover_nmi_blocking() enough?
>
> (I see the same code in handle_ept_violation(), but wasn't that needed
> just because of a hardware error?)
This needs to be handled in both EPT violation and PML. If you look at
the PML specification (the link is in cover letter), you can see the
definition of bit 12 of exit_qualification (section 1.5), which explains
above code. The same definition of exit_qualification is in EPT
violation part in Intel's SDM.
>> +
>> + /*
>> + * PML buffer already flushed at beginning of VMEXIT. Nothing to do
>> + * here.., and there's no userspace involvement needed for PML.
>> + */
>> + return 1;
>> +}
>> @@ -7325,6 +7381,89 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2)
>> +static int vmx_enable_pml(struct vcpu_vmx *vmx)
>> +{
>> + struct page *pml_pg;
>> + u32 exec_control;
>> +
>> + pml_pg = alloc_page(GFP_KERNEL | __GFP_ZERO);
>> + if (!pml_pg)
>> + return -ENOMEM;
>> +
>> + vmx->pml_pg = pml_pg;
> (It's safe to use vmx->pml_pg directly.)
>
>> +
>> + vmcs_write64(PML_ADDRESS, page_to_phys(vmx->pml_pg));
>> + vmcs_write16(GUEST_PML_INDEX, PML_ENTITY_NUM - 1);
>> +
>> + exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
>> + exec_control |= SECONDARY_EXEC_ENABLE_PML;
>> + vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
>> +
>> + return 0;
>> +}
>> +static void vmx_flush_pml_buffer(struct vcpu_vmx *vmx)
>> +{
>> + struct kvm *kvm = vmx->vcpu.kvm;
>> + u64 *pml_buf;
>> + u16 pml_idx;
>> +
>> + pml_idx = vmcs_read16(GUEST_PML_INDEX);
>> +
>> + /* Do nothing if PML buffer is empty */
>> + if (pml_idx == (PML_ENTITY_NUM - 1))
>> + return;
>> +
>> + /* PML index always points to next available PML buffer entity */
>> + if (pml_idx >= PML_ENTITY_NUM)
>> + pml_idx = 0;
>> + else
>> + pml_idx++;
> If the pml_idx is >= PML_ENTITY_NUM and < 0xffff, the log is empty,
In this case, the log is full, actually. PML index is set to
PML_ENTITY_NUM - 1 initially, and hardware decrease it after GPA is logged.
Below is the pseudocode of PML hardware logic.
IF (PML Index[15:9] ≠ 0)
THEN VM exit;
FI;
set accessed and dirty flags for EPT;
IF (a dirty flag was updated from 0 to 1)
THEN
PML address[PML index] ← 4-KByte-aligned guest-physical address;
PML index is decremented;
FI;
Thanks,
-Kai
> so it would be better to use just 'pml_idx++' and unsigned modulo.
>
> (Could also 'assert(pml_idx < PML_ENTITY_NUM)' then.)
>
>> +
>> + pml_buf = page_address(vmx->pml_pg);
>> + for (; pml_idx < PML_ENTITY_NUM; pml_idx++) {
>> + u64 gpa;
>> +
>> + gpa = pml_buf[pml_idx];
>> + WARN_ON(gpa & (PAGE_SIZE - 1));
>> + mark_page_dirty(kvm, gpa >> PAGE_SHIFT);
>> + }
>> +
>> + /* reset PML index */
>> + vmcs_write16(GUEST_PML_INDEX, PML_ENTITY_NUM - 1);
>> +}
>> @@ -9492,6 +9655,31 @@ static void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
>> +static void vmx_slot_enable_log_dirty(struct kvm *kvm,
>> + struct kvm_memory_slot *slot)
>> +{
>> + kvm_mmu_slot_leaf_clear_dirty(kvm, slot);
> (New slot contains dirty pages?)
>
>> + kvm_mmu_slot_largepage_remove_write_access(kvm, slot);
>> +}
> Thanks.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 5/6] KVM: x86: Add new dirty logging kvm_x86_ops for PML
2015-02-03 15:53 ` Radim Krčmář
@ 2015-02-05 6:29 ` Kai Huang
2015-02-05 14:52 ` Radim Krčmář
0 siblings, 1 reply; 22+ messages in thread
From: Kai Huang @ 2015-02-05 6:29 UTC (permalink / raw)
To: Radim Krčmář; +Cc: pbonzini, gleb, linux, kvm
On 02/03/2015 11:53 PM, Radim Krčmář wrote:
> 2015-01-28 10:54+0800, Kai Huang:
>> This patch adds new kvm_x86_ops dirty logging hooks to enable/disable dirty
>> logging for particular memory slot, and to flush potentially logged dirty GPAs
>> before reporting slot->dirty_bitmap to userspace.
>>
>> kvm x86 common code calls these hooks when they are available so PML logic can
>> be hidden to VMX specific. Other ARCHs won't be impacted as these hooks are NULL
>> for them.
>>
>> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
>> ---
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -802,6 +802,31 @@ struct kvm_x86_ops {
>> +
>> + /*
>> + * Arch-specific dirty logging hooks. These hooks are only supposed to
>> + * be valid if the specific arch has hardware-accelerated dirty logging
>> + * mechanism. Currently only for PML on VMX.
>> + *
>> + * - slot_enable_log_dirty:
>> + * called when enabling log dirty mode for the slot.
> (I guess that "log dirty mode" isn't the meaning that people will think
> after seeing 'log_dirty' ...
> I'd at least change 'log_dirty' to 'dirty_log' in these names.)
>
>> + * - slot_disable_log_dirty:
>> + * called when disabling log dirty mode for the slot.
>> + * also called when slot is created with log dirty disabled.
>> + * - flush_log_dirty:
>> + * called before reporting dirty_bitmap to userspace.
>> + * - enable_log_dirty_pt_masked:
>> + * called when reenabling log dirty for the GFNs in the mask after
>> + * corresponding bits are cleared in slot->dirty_bitmap.
> This name is very confusing ... I think we should hint that this is
> called after we learn that the page has been written to and would like
> to monitor it again.
>
> Using something like collected/refresh? (I'd have to do horrible things
> to come up with a good name, sorry.)
>
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -3780,6 +3780,12 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>>
>> mutex_lock(&kvm->slots_lock);
>>
>> + /*
>> + * Flush potentially hardware-cached dirty pages to dirty_bitmap.
>> + */
>> + if (kvm_x86_ops->flush_log_dirty)
>> + kvm_x86_ops->flush_log_dirty(kvm);
> (Flushing would make more sense in kvm_get_dirty_log_protect().)
>
>> +
>> r = kvm_get_dirty_log_protect(kvm, log, &is_dirty);
>>
>> /*
>> @@ -7533,6 +7539,56 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>> return 0;
>> }
>>
>> +static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
>> + struct kvm_memory_slot *new)
>> +{
>> + /* Still write protect RO slot */
>> + if (new->flags & KVM_MEM_READONLY) {
>> + kvm_mmu_slot_remove_write_access(kvm, new);
> We didn't write protect RO slots before, does this patch depend on it?
No PML doesn't depend on it to work. It's suggested by Paolo.
Thanks,
-Kai
>
>> @@ -7562,16 +7618,15 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>> - if ((change != KVM_MR_DELETE) && (new->flags & KVM_MEM_LOG_DIRTY_PAGES))
>> - kvm_mmu_slot_remove_write_access(kvm, new);
>> + if (change != KVM_MR_DELETE)
>> + kvm_mmu_slot_apply_flags(kvm, new);
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/6] KVM: MMU: Add mmu help functions to support PML
2015-02-05 5:59 ` Kai Huang
@ 2015-02-05 14:51 ` Radim Krčmář
0 siblings, 0 replies; 22+ messages in thread
From: Radim Krčmář @ 2015-02-05 14:51 UTC (permalink / raw)
To: Kai Huang; +Cc: pbonzini, gleb, linux, kvm
2015-02-05 13:59+0800, Kai Huang:
> Thanks for your review and sorry for the late reply. I was working on
> something else these days.
(No problem, a reply within a week is instantaneous for me :)
> For rest of your comments, as this patch series have already been in Paolo's
> queue branch, I think I can send further patches to enhance them, if Paolo
> agrees the enhancement is needed.
Ah, sorry, I didn't notice the patches were already in ...
it's better to wait for a bug in the common part now.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 5/6] KVM: x86: Add new dirty logging kvm_x86_ops for PML
2015-02-05 6:29 ` Kai Huang
@ 2015-02-05 14:52 ` Radim Krčmář
0 siblings, 0 replies; 22+ messages in thread
From: Radim Krčmář @ 2015-02-05 14:52 UTC (permalink / raw)
To: Kai Huang; +Cc: pbonzini, gleb, linux, kvm
2015-02-05 14:29+0800, Kai Huang:
> >>+ /* Still write protect RO slot */
> >>+ if (new->flags & KVM_MEM_READONLY) {
> >>+ kvm_mmu_slot_remove_write_access(kvm, new);
> >We didn't write protect RO slots before, does this patch depend on it?
> No PML doesn't depend on it to work. It's suggested by Paolo.
Thanks, it would have deserved a separate patch, IMO.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 6/6] KVM: VMX: Add PML support in VMX
2015-02-05 6:23 ` Kai Huang
@ 2015-02-05 15:04 ` Radim Krčmář
2015-02-06 0:22 ` Kai Huang
2015-02-06 0:28 ` Kai Huang
2015-02-06 16:00 ` Paolo Bonzini
1 sibling, 2 replies; 22+ messages in thread
From: Radim Krčmář @ 2015-02-05 15:04 UTC (permalink / raw)
To: Kai Huang; +Cc: pbonzini, gleb, linux, kvm
2015-02-05 14:23+0800, Kai Huang:
> On 02/03/2015 11:18 PM, Radim Krčmář wrote:
> >You have it protected by CONFIG_X86_64, but use it unconditionally.
> Thanks for catching. This has been fixed by another patch, and the fix has
> also been merged by Paolo.
(And I haven't noticed the followup either ...)
I don't know of any present bugs in this patch and all questions have
been cleared,
Thanks.
> >>+ exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
> >What is the harm of enabling it here?
> >
> >(SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES seems similar and does it.)
> Because the PML feature detection is unconditional (meaning
> SECONDARY_EXEC_ENABLE_PML is always in vmcs_config.cpu_based_2nd_exec_ctrl),
> but the PML buffer is only created when vcpu is created, and it is
> controlled by 'enable_pml' module parameter, if we always enable
> SECONDARY_EXEC_ENABLE_PML here, no PML buffer will be created if PML is
> disabled by 'enable_pml' parameter,
I meant
if (!enable_pml)
exec_control &= ~SECONDARY_EXEC_ENABLE_PML
here and no exec_control operations in vmx_create_vcpu().
> so it's better to enable it along with
> creating PML buffer.
I think the reason why KVM split the setup into vmx_create_vcpu() and
vmx_secondary_exec_control() is to simplify nested virtualization.
> >>+ if (!(to_vmx(vcpu)->idt_vectoring_info & VECTORING_INFO_VALID_MASK) &&
> >>+ cpu_has_virtual_nmis() &&
> >>+ (exit_qualification & INTR_INFO_UNBLOCK_NMI))
> >>+ vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO,
> >>+ GUEST_INTR_STATE_NMI);
> >Relevant part of the specification is pasted from 27.2.2 (Information
> >for VM Exits Due to Vectored Events), which makes me think that this bit
> >is mirrored to IDT-vectoring information field ...
> >
> >Isn't vmx_recover_nmi_blocking() enough?
> >
> >(I see the same code in handle_ept_violation(), but wasn't that needed
> > just because of a hardware error?)
> This needs to be handled in both EPT violation and PML. If you look at the
> PML specification (the link is in cover letter), you can see the definition
> of bit 12 of exit_qualification (section 1.5), which explains above code.
> The same definition of exit_qualification is in EPT violation part in
> Intel's SDM.
True ... IDT-vectoring doesn't set bit 12 because PML exit isn't a fault
and 27.2.2: "— For all other relevant VM exits, bit 12 is cleared to 0."
(This was humourously pasted to PML as well.)
> >>+ /* PML index always points to next available PML buffer entity */
> >>+ if (pml_idx >= PML_ENTITY_NUM)
> >>+ pml_idx = 0;
> >>+ else
> >>+ pml_idx++;
> >If the pml_idx is >= PML_ENTITY_NUM and < 0xffff, the log is empty,
> In this case, the log is full, actually. PML index is set to PML_ENTITY_NUM
> - 1 initially, and hardware decrease it after GPA is logged.
>
> Below is the pseudocode of PML hardware logic.
>
> IF (PML Index[15:9] ≠ 0)
> THEN VM exit;
> FI;
pml_idx >= PML_ENTITY_NUM exits without modifying the buffer,
> set accessed and dirty flags for EPT;
> IF (a dirty flag was updated from 0 to 1)
> THEN
> PML address[PML index] ← 4-KByte-aligned guest-physical address;
> PML index is decremented;
0xffff is the only value that specifies full buffer, the rest means
that we incorrectly set up the initial index and VMX exited without
changing the buffer => we shouldn't read it.
(I should have said "untouched" instead of "empty".)
> FI;
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 6/6] KVM: VMX: Add PML support in VMX
2015-02-05 15:04 ` Radim Krčmář
@ 2015-02-06 0:22 ` Kai Huang
2015-02-06 0:28 ` Kai Huang
1 sibling, 0 replies; 22+ messages in thread
From: Kai Huang @ 2015-02-06 0:22 UTC (permalink / raw)
To: Radim Krčmář; +Cc: pbonzini, gleb, linux, kvm
On 02/05/2015 11:04 PM, Radim Krčmář wrote:
> 2015-02-05 14:23+0800, Kai Huang:
>> On 02/03/2015 11:18 PM, Radim Krčmář wrote:
>>> You have it protected by CONFIG_X86_64, but use it unconditionally.
>> Thanks for catching. This has been fixed by another patch, and the fix has
>> also been merged by Paolo.
> (And I haven't noticed the followup either ...)
>
> I don't know of any present bugs in this patch and all questions have
> been cleared,
>
> Thanks.
>
>>>> + exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
>>> What is the harm of enabling it here?
>>>
>>> (SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES seems similar and does it.)
>> Because the PML feature detection is unconditional (meaning
>> SECONDARY_EXEC_ENABLE_PML is always in vmcs_config.cpu_based_2nd_exec_ctrl),
>> but the PML buffer is only created when vcpu is created, and it is
>> controlled by 'enable_pml' module parameter, if we always enable
>> SECONDARY_EXEC_ENABLE_PML here, no PML buffer will be created if PML is
>> disabled by 'enable_pml' parameter,
> I meant
>
> if (!enable_pml)
> exec_control &= ~SECONDARY_EXEC_ENABLE_PML
>
> here and no exec_control operations in vmx_create_vcpu().
This also works. I originally thought put the enabling part and buffer
allocation together is more straightforward.
>
>> so it's better to enable it along with
>> creating PML buffer.
> I think the reason why KVM split the setup into vmx_create_vcpu() and
> vmx_secondary_exec_control() is to simplify nested virtualization.
This is a good point and I'll think about it :)
>
>>>> + if (!(to_vmx(vcpu)->idt_vectoring_info & VECTORING_INFO_VALID_MASK) &&
>>>> + cpu_has_virtual_nmis() &&
>>>> + (exit_qualification & INTR_INFO_UNBLOCK_NMI))
>>>> + vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO,
>>>> + GUEST_INTR_STATE_NMI);
>>> Relevant part of the specification is pasted from 27.2.2 (Information
>>> for VM Exits Due to Vectored Events), which makes me think that this bit
>>> is mirrored to IDT-vectoring information field ...
>>>
>>> Isn't vmx_recover_nmi_blocking() enough?
>>>
>>> (I see the same code in handle_ept_violation(), but wasn't that needed
>>> just because of a hardware error?)
>> This needs to be handled in both EPT violation and PML. If you look at the
>> PML specification (the link is in cover letter), you can see the definition
>> of bit 12 of exit_qualification (section 1.5), which explains above code.
>> The same definition of exit_qualification is in EPT violation part in
>> Intel's SDM.
> True ... IDT-vectoring doesn't set bit 12 because PML exit isn't a fault
> and 27.2.2: "— For all other relevant VM exits, bit 12 is cleared to 0."
> (This was humourously pasted to PML as well.)
>
>>>> + /* PML index always points to next available PML buffer entity */
>>>> + if (pml_idx >= PML_ENTITY_NUM)
>>>> + pml_idx = 0;
>>>> + else
>>>> + pml_idx++;
>>> If the pml_idx is >= PML_ENTITY_NUM and < 0xffff, the log is empty,
>> In this case, the log is full, actually. PML index is set to PML_ENTITY_NUM
>> - 1 initially, and hardware decrease it after GPA is logged.
>>
>> Below is the pseudocode of PML hardware logic.
>>
>> IF (PML Index[15:9] ≠ 0)
>> THEN VM exit;
>> FI;
> pml_idx >= PML_ENTITY_NUM exits without modifying the buffer,
Yes.
>
>> set accessed and dirty flags for EPT;
>> IF (a dirty flag was updated from 0 to 1)
>> THEN
>> PML address[PML index] ← 4-KByte-aligned guest-physical address;
>> PML index is decremented;
> 0xffff is the only value that specifies full buffer, the rest means
> that we incorrectly set up the initial index and VMX exited without
> changing the buffer => we shouldn't read it.
> (I should have said "untouched" instead of "empty".)
Yes theoretically, according to the pseudocode, 0xffff is the only
possible value that specifies full buffer. Probably a better way is to
check if pml_idx is in range [-1, 511] at the beginning and give a
warning if it's not, or just ASSERT it. Then we can simply to do a
++pml_idx (with changing pml_idx to signed short type). But the current
code should also work anyway.
Thanks,
-Kai
>
>> FI;
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 6/6] KVM: VMX: Add PML support in VMX
2015-02-05 15:04 ` Radim Krčmář
2015-02-06 0:22 ` Kai Huang
@ 2015-02-06 0:28 ` Kai Huang
1 sibling, 0 replies; 22+ messages in thread
From: Kai Huang @ 2015-02-06 0:28 UTC (permalink / raw)
To: Radim Krčmář; +Cc: pbonzini, gleb, linux, kvm
On 02/05/2015 11:04 PM, Radim Krčmář wrote:
> 2015-02-05 14:23+0800, Kai Huang:
>> On 02/03/2015 11:18 PM, Radim Krčmář wrote:
>>> You have it protected by CONFIG_X86_64, but use it unconditionally.
>> Thanks for catching. This has been fixed by another patch, and the fix has
>> also been merged by Paolo.
> (And I haven't noticed the followup either ...)
>
> I don't know of any present bugs in this patch and all questions have
> been cleared,
So far I see no other bug reported :)
Thanks,
-Kai
>
> Thanks.
>
>>>> + exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
>>> What is the harm of enabling it here?
>>>
>>> (SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES seems similar and does it.)
>> Because the PML feature detection is unconditional (meaning
>> SECONDARY_EXEC_ENABLE_PML is always in vmcs_config.cpu_based_2nd_exec_ctrl),
>> but the PML buffer is only created when vcpu is created, and it is
>> controlled by 'enable_pml' module parameter, if we always enable
>> SECONDARY_EXEC_ENABLE_PML here, no PML buffer will be created if PML is
>> disabled by 'enable_pml' parameter,
> I meant
>
> if (!enable_pml)
> exec_control &= ~SECONDARY_EXEC_ENABLE_PML
>
> here and no exec_control operations in vmx_create_vcpu().
>
>> so it's better to enable it along with
>> creating PML buffer.
> I think the reason why KVM split the setup into vmx_create_vcpu() and
> vmx_secondary_exec_control() is to simplify nested virtualization.
>
>>>> + if (!(to_vmx(vcpu)->idt_vectoring_info & VECTORING_INFO_VALID_MASK) &&
>>>> + cpu_has_virtual_nmis() &&
>>>> + (exit_qualification & INTR_INFO_UNBLOCK_NMI))
>>>> + vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO,
>>>> + GUEST_INTR_STATE_NMI);
>>> Relevant part of the specification is pasted from 27.2.2 (Information
>>> for VM Exits Due to Vectored Events), which makes me think that this bit
>>> is mirrored to IDT-vectoring information field ...
>>>
>>> Isn't vmx_recover_nmi_blocking() enough?
>>>
>>> (I see the same code in handle_ept_violation(), but wasn't that needed
>>> just because of a hardware error?)
>> This needs to be handled in both EPT violation and PML. If you look at the
>> PML specification (the link is in cover letter), you can see the definition
>> of bit 12 of exit_qualification (section 1.5), which explains above code.
>> The same definition of exit_qualification is in EPT violation part in
>> Intel's SDM.
> True ... IDT-vectoring doesn't set bit 12 because PML exit isn't a fault
> and 27.2.2: "— For all other relevant VM exits, bit 12 is cleared to 0."
> (This was humourously pasted to PML as well.)
>
>>>> + /* PML index always points to next available PML buffer entity */
>>>> + if (pml_idx >= PML_ENTITY_NUM)
>>>> + pml_idx = 0;
>>>> + else
>>>> + pml_idx++;
>>> If the pml_idx is >= PML_ENTITY_NUM and < 0xffff, the log is empty,
>> In this case, the log is full, actually. PML index is set to PML_ENTITY_NUM
>> - 1 initially, and hardware decrease it after GPA is logged.
>>
>> Below is the pseudocode of PML hardware logic.
>>
>> IF (PML Index[15:9] ≠ 0)
>> THEN VM exit;
>> FI;
> pml_idx >= PML_ENTITY_NUM exits without modifying the buffer,
>
>> set accessed and dirty flags for EPT;
>> IF (a dirty flag was updated from 0 to 1)
>> THEN
>> PML address[PML index] ← 4-KByte-aligned guest-physical address;
>> PML index is decremented;
> 0xffff is the only value that specifies full buffer, the rest means
> that we incorrectly set up the initial index and VMX exited without
> changing the buffer => we shouldn't read it.
> (I should have said "untouched" instead of "empty".)
>
>> FI;
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 6/6] KVM: VMX: Add PML support in VMX
2015-02-05 6:23 ` Kai Huang
2015-02-05 15:04 ` Radim Krčmář
@ 2015-02-06 16:00 ` Paolo Bonzini
1 sibling, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2015-02-06 16:00 UTC (permalink / raw)
To: Kai Huang, Radim Krčmář; +Cc: gleb, linux, kvm
On 05/02/2015 07:23, Kai Huang wrote:
>>>
>>> + /* PML is enabled/disabled in creating/destorying vcpu */
>>> + exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
>> What is the harm of enabling it here?
>>
>> (SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES seems similar and does it.)
>
> Because the PML feature detection is unconditional (meaning
> SECONDARY_EXEC_ENABLE_PML is always in
> vmcs_config.cpu_based_2nd_exec_ctrl), but the PML buffer is only created
> when vcpu is created, and it is controlled by 'enable_pml' module
> parameter, if we always enable SECONDARY_EXEC_ENABLE_PML here, no PML
> buffer will be created if PML is disabled by 'enable_pml' parameter, so
> it's better to enable it along with creating PML buffer.
I guess this is the most interesting comment from Radim.
Paolo
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2015-02-06 16:00 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-28 2:54 [PATCH 0/6] KVM: VMX: Page Modification Logging (PML) support Kai Huang
2015-01-28 2:54 ` [PATCH 1/6] KVM: Rename kvm_arch_mmu_write_protect_pt_masked to be more generic for log dirty Kai Huang
2015-01-28 2:54 ` [PATCH 2/6] KVM: MMU: Add mmu help functions to support PML Kai Huang
2015-02-03 17:34 ` Radim Krčmář
2015-02-05 5:59 ` Kai Huang
2015-02-05 14:51 ` Radim Krčmář
2015-01-28 2:54 ` [PATCH 3/6] KVM: MMU: Explicitly set D-bit for writable spte Kai Huang
2015-01-28 2:54 ` [PATCH 4/6] KVM: x86: Change parameter of kvm_mmu_slot_remove_write_access Kai Huang
2015-02-03 16:28 ` Radim Krčmář
2015-01-28 2:54 ` [PATCH 5/6] KVM: x86: Add new dirty logging kvm_x86_ops for PML Kai Huang
2015-02-03 15:53 ` Radim Krčmář
2015-02-05 6:29 ` Kai Huang
2015-02-05 14:52 ` Radim Krčmář
2015-01-28 2:54 ` [PATCH 6/6] KVM: VMX: Add PML support in VMX Kai Huang
2015-02-03 15:18 ` Radim Krčmář
2015-02-03 15:39 ` Paolo Bonzini
2015-02-03 16:02 ` Radim Krčmář
2015-02-05 6:23 ` Kai Huang
2015-02-05 15:04 ` Radim Krčmář
2015-02-06 0:22 ` Kai Huang
2015-02-06 0:28 ` Kai Huang
2015-02-06 16:00 ` Paolo Bonzini
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).