public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] KVM: nVMX: prepare_vmcs02 optimizations
@ 2017-12-21 12:43 Paolo Bonzini
  2017-12-21 12:43 ` [PATCH 1/4] KVM: VMX: split list of shadowed VMCS field to a separate file Paolo Bonzini
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: Paolo Bonzini @ 2017-12-21 12:43 UTC (permalink / raw)
  To: linux-kernel, kvm

That's about 800-1000 clock cycles more that can be easily peeled, by
saving about 60 VMWRITEs on every exit.

My numbers so far have been collected on a Haswell system vs. the
Broadwell that Jim used for his KVM Forum talk, and I am now down
from 22000 (compared to 18000 that Jim gave as the baseline) to 14000.
Also the guest is running 4.14, so it didn't have the XSETBV and DEBUGCTL
patches; that removes two ancillary exit to L1, each costing about 1000
cycles on my machine).  So we are probably pretty close to VMware's
6500 cycles on Broadwell.

After these patches there may still be some low-hanging fruit; the remaining
large deltas between non-nested and nested workloads with lots of vmexits are:

   4.80%  vmx_set_cr3
   4.35%  native_read_msr
   3.73%  vmcs_load
   3.65%  update_permission_bitmask
   2.49%  _raw_spin_lock
   2.37%  sync_vmcs12
   2.20%  copy_shadow_to_vmcs12
   1.19%  kvm_load_guest_fpu

There is a large cost associated to resetting the MMU.  Making that smarter
could probably be worth a 10-15% improvement; not easy, but actually even
more worthwhile than that on SMP nested guests because that's where the
spinlock contention comes from.

The MSR accesses are probably also interesting, but I haven't tried to see
what they are about.  One somewhat crazy idea in that area is to set
CR4.FSGSBASE at vcpu_load/sched_in and clear it at vcpu_put/sched_out.
Then we could skip the costly setup of the FS/GS/kernelGS base MSRs.
However the cost of writes to CR4 might make it less appealing for
userspace exits; I haven't benchmarked it.

Paolo

Paolo Bonzini (4):
  KVM: VMX: split list of shadowed VMCS field to a separate file
  KVM: nVMX: track dirty state of non-shadowed VMCS fields
  KVM: nVMX: move descriptor cache handling to prepare_vmcs02_full
  KVM: nVMX: move other simple fields to prepare_vmcs02_full

 arch/x86/kvm/vmx.c               | 301 +++++++++++++++++++--------------------
 arch/x86/kvm/vmx_shadow_fields.h |  71 +++++++++
 2 files changed, 214 insertions(+), 158 deletions(-)
 create mode 100644 arch/x86/kvm/vmx_shadow_fields.h

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2018-01-02 13:02 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-12-21 12:43 [PATCH 0/4] KVM: nVMX: prepare_vmcs02 optimizations Paolo Bonzini
2017-12-21 12:43 ` [PATCH 1/4] KVM: VMX: split list of shadowed VMCS field to a separate file Paolo Bonzini
2017-12-21 22:51   ` Jim Mattson
2017-12-21 12:43 ` [PATCH 2/4] KVM: nVMX: track dirty state of non-shadowed VMCS fields Paolo Bonzini
2017-12-21 22:57   ` Jim Mattson
2017-12-25  3:03   ` Wanpeng Li
2017-12-31  8:08     ` Paolo Bonzini
2017-12-31 22:48       ` Wanpeng Li
2017-12-21 12:43 ` [PATCH 3/4] KVM: nVMX: initialize descriptor cache fields in prepare_vmcs02_full Paolo Bonzini
2017-12-21 12:43 ` [PATCH 4/4] KVM: nVMX: initialize more non-shadowed " Paolo Bonzini
2017-12-25  3:09   ` Wanpeng Li
2017-12-27  9:54     ` Paolo Bonzini
2017-12-28  2:07       ` Wanpeng Li
2017-12-25 10:07 ` [PATCH 0/4] KVM: nVMX: prepare_vmcs02 optimizations Wanpeng Li
2017-12-25 10:08   ` Wanpeng Li
2017-12-27 14:28     ` Paolo Bonzini
2017-12-28  8:39       ` Wanpeng Li
2018-01-01  9:36         ` Paolo Bonzini
2018-01-01 23:01           ` Paolo Bonzini
2018-01-02  1:05             ` Wanpeng Li
2018-01-02 13:02               ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox