From: Kai Huang <kai.huang@linux.intel.com>
To: pbonzini@redhat.com, gleb@kernel.org, linux@arm.linux.org.uk,
kvm@vger.kernel.org
Cc: Kai Huang <kai.huang@linux.intel.com>
Subject: [PATCH 0/6] KVM: VMX: Page Modification Logging (PML) support
Date: Wed, 28 Jan 2015 10:54:22 +0800 [thread overview]
Message-ID: <1422413668-3509-1-git-send-email-kai.huang@linux.intel.com> (raw)
This patch series adds Page Modification Logging (PML) support in VMX.
1) Introduction
PML is a new feature on Intel's Boardwell server platfrom targeted to reduce
overhead of dirty logging mechanism.
The specification can be found at:
http://www.intel.com/content/www/us/en/processors/page-modification-logging-vmm-white-paper.html
Currently, dirty logging is done by write protection, which write protects guest
memory, and mark dirty GFN to dirty_bitmap in subsequent write fault. This works
fine, except with overhead of additional write fault for logging each dirty GFN.
The overhead can be large if the write operations from geust is intensive.
PML is a hardware-assisted efficient way for dirty logging. PML logs dirty GPA
automatically to a 4K PML memory buffer when CPU changes EPT table's D-bit from
0 to 1. To do this, A new 4K PML buffer base address, and a PML index were added
to VMCS. Initially PML index is set to 512 (8 bytes for each GPA), and CPU
decreases PML index after logging one GPA, and eventually a PML buffer full
VMEXIT happens when PML buffer is fully logged.
With PML, we don't have to use write protection so the intensive write fault EPT
violation can be avoided, with an additional PML buffer full VMEXIT for 512
dirty GPAs. Theoretically, this can reduce hypervisor overhead when guest is in
dirty logging mode, and therefore more CPU cycles can be allocated to guest, so
it's expected benchmarks in guest will have better performance comparing to
non-PML.
2) Design
a. Enable/Disable PML
PML is per-vcpu (per-VMCS), while EPT table can be shared by vcpus, so we need
to enable/disable PML for all vcpus of guest. A dedicated 4K page will be
allocated for each vcpu when PML is enabled for that vcpu.
Currently, we choose to always enable PML for guest, which means we enables PML
when creating VCPU, and never disable it during guest's life time. This avoids
the complicated logic to enable PML by demand when guest is running. And to
eliminate potential unnecessary GPA logging in non-dirty logging mode, we set
D-bit manually for the slots with dirty logging disabled.
b. Flush PML buffer
When userspace querys dirty_bitmap, it's possible that there are GPAs logged in
vcpu's PML buffer, but as PML buffer is not full, so no VMEXIT happens. In this
case, we'd better to manually flush PML buffer for all vcpus and update the
dirty GPAs to dirty_bitmap.
We do PML buffer flush at the beginning of each VMEXIT, this makes dirty_bitmap
more updated, and also makes logic of flushing PML buffer for all vcpus easier
-- we only need to kick all vcpus out of guest and PML buffer for each vcpu will
be flushed automatically.
3) Tests and benchmark results
I tested specjbb benchmark, which is memory intensive to measure PML. All tests
are done in below configuration:
Machine (Boardwell server): 16 CPUs (1.4G) + 4G memory
Host Kernel: KVM queue branch. Transparent Hugepage disabled. C-state, P-state,
S-state disabled. Swap disabled.
Guest: Ubuntu 14.04 with kernel 3.13.0-36-generic
Guest: 4 vcpus + 1G memory. All vcpus are pinned.
a. Comapre score with and without PML enabled.
This is to make sure PML won't bring any performance regression as it's always
enabled for guest.
Booting guest with graphic window (no --nographic)
NOPML PML
109755 109379
108786 109300
109234 109663
109257 107471
108514 108904
109740 107623
avg: 109214 108723
performance regression: (109214 - 108723) / 109214 = 0.45%
Booting guest without graphic window (--nographic)
NOPML PML
109090 109686
109461 110533
110523 108550
109960 110775
109090 109802
110787 109192
avg: 109818 109756
performance regression: (109818 - 109756) / 109818 = 0.06%
So there's no noticeable performance regression leaving PML always enabled.
b. Compare specjbb score between PML and Write Protection.
This is used to see how much performance gain PML can bring when guest is in
dirty logging mode.
I modified qemu by adding an additional "Monitoring thread" to query
dirty_bitmap periodically (once per 1 second). With this thread, we can get
performance gain of PML by comparing specjbb score under PML code path and
write protection code path.
Again, I got score for both with/without graphic window of guest.
Booting guest with graphic window (no --nographic)
PML WP No monitoring thread
104748 101358
102934 99895
103525 98832
105331 100678
106038 99476
104776 99851
avg: 104558 100015 108723 (== PML score in test a)
percent: 96.17% 91.99% 100%
performance gain: 96.17% - 91.99% = 4.18%
Booting guest without graphic window (--nographic)
PML WP No monithring thread
104778 98967
104856 99380
103783 99406
105210 100638
106218 99763
105475 99287
avg: 105053 99573 109756 (== PML score in test a)
percent: 95.72% 90.72% 100%
performance gain: 95.72% - 90.72% = 5%
So there's noticeable performance gain (around 4%~5%) of PML comparing to Write
Protection.
Kai Huang (6):
KVM: Rename kvm_arch_mmu_write_protect_pt_masked to be more generic
for log dirty
KVM: MMU: Add mmu help functions to support PML
KVM: MMU: Explicitly set D-bit for writable spte.
KVM: x86: Change parameter of kvm_mmu_slot_remove_write_access
KVM: x86: Add new dirty logging kvm_x86_ops for PML
KVM: VMX: Add PML support in VMX
arch/arm/kvm/mmu.c | 18 ++-
arch/x86/include/asm/kvm_host.h | 37 +++++-
arch/x86/include/asm/vmx.h | 4 +
arch/x86/include/uapi/asm/vmx.h | 1 +
arch/x86/kvm/mmu.c | 243 +++++++++++++++++++++++++++++++++++++++-
arch/x86/kvm/trace.h | 18 +++
arch/x86/kvm/vmx.c | 195 +++++++++++++++++++++++++++++++-
arch/x86/kvm/x86.c | 78 +++++++++++--
include/linux/kvm_host.h | 2 +-
virt/kvm/kvm_main.c | 2 +-
10 files changed, 577 insertions(+), 21 deletions(-)
--
2.1.0
next reply other threads:[~2015-01-28 3:03 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-01-28 2:54 Kai Huang [this message]
2015-01-28 2:54 ` [PATCH 1/6] KVM: Rename kvm_arch_mmu_write_protect_pt_masked to be more generic for log dirty Kai Huang
2015-01-28 2:54 ` [PATCH 2/6] KVM: MMU: Add mmu help functions to support PML Kai Huang
2015-02-03 17:34 ` Radim Krčmář
2015-02-05 5:59 ` Kai Huang
2015-02-05 14:51 ` Radim Krčmář
2015-01-28 2:54 ` [PATCH 3/6] KVM: MMU: Explicitly set D-bit for writable spte Kai Huang
2015-01-28 2:54 ` [PATCH 4/6] KVM: x86: Change parameter of kvm_mmu_slot_remove_write_access Kai Huang
2015-02-03 16:28 ` Radim Krčmář
2015-01-28 2:54 ` [PATCH 5/6] KVM: x86: Add new dirty logging kvm_x86_ops for PML Kai Huang
2015-02-03 15:53 ` Radim Krčmář
2015-02-05 6:29 ` Kai Huang
2015-02-05 14:52 ` Radim Krčmář
2015-01-28 2:54 ` [PATCH 6/6] KVM: VMX: Add PML support in VMX Kai Huang
2015-02-03 15:18 ` Radim Krčmář
2015-02-03 15:39 ` Paolo Bonzini
2015-02-03 16:02 ` Radim Krčmář
2015-02-05 6:23 ` Kai Huang
2015-02-05 15:04 ` Radim Krčmář
2015-02-06 0:22 ` Kai Huang
2015-02-06 0:28 ` Kai Huang
2015-02-06 16:00 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1422413668-3509-1-git-send-email-kai.huang@linux.intel.com \
--to=kai.huang@linux.intel.com \
--cc=gleb@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linux@arm.linux.org.uk \
--cc=pbonzini@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.