* [PATCH] KVM: X86: Reduce calls to vcpu_load
@ 2023-09-06 6:24 Hao Peng
2023-09-06 6:40 ` Xiaoyao Li
0 siblings, 1 reply; 4+ messages in thread
From: Hao Peng @ 2023-09-06 6:24 UTC (permalink / raw)
To: pbonzini; +Cc: kvm, Sean Christopherson, linux-kernel
From: Peng Hao <flyingpeng@tencent.com>
The call of vcpu_load/put takes about 1-2us. Each
kvm_arch_vcpu_create will call vcpu_load/put
to initialize some fields of vmcs, which can be
delayed until the call of vcpu_ioctl to process
this part of the vmcs field, which can reduce calls
to vcpu_load.
Signed-off-by: Peng Hao <flyingpeng@tencent.com>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/x86.c | 21 ++++++++++++++++-----
2 files changed, 17 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9320019708f9..2f2dcd283788 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -984,6 +984,7 @@ struct kvm_vcpu_arch {
/* Flush the L1 Data cache for L1TF mitigation on VMENTER */
bool l1tf_flush_l1d;
+ bool initialized;
/* Host CPU on which VM-entry was most recently attempted */
int last_vmentry_cpu;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4fd08a5e0e98..a3671a54e850 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -317,7 +317,20 @@ const struct kvm_stats_header kvm_vcpu_stats_header = {
u64 __read_mostly host_xcr0;
static struct kmem_cache *x86_emulator_cache;
+static int kvm_set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz);
+static inline bool kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
+{
+ return vcpu->arch.initialized;
+}
+
+static void kvm_vcpu_initial_reset(struct kvm_vcpu *vcpu)
+{
+ kvm_set_tsc_khz(vcpu, vcpu->kvm->arch.default_tsc_khz);
+ kvm_vcpu_reset(vcpu, false);
+ kvm_init_mmu(vcpu);
+ vcpu->arch.initialized = true;
+}
/*
* When called, it means the previous get/set msr reached an invalid msr.
* Return true if we want to ignore/silent this failed msr access.
@@ -5647,6 +5660,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
vcpu_load(vcpu);
+ if (!kvm_vcpu_initialized(vcpu))
+ kvm_vcpu_initial_reset(vcpu);
+
u.buffer = NULL;
switch (ioctl) {
case KVM_GET_LAPIC: {
@@ -11930,11 +11946,6 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
vcpu->arch.msr_platform_info = MSR_PLATFORM_INFO_CPUID_FAULT;
kvm_xen_init_vcpu(vcpu);
kvm_vcpu_mtrr_init(vcpu);
- vcpu_load(vcpu);
- kvm_set_tsc_khz(vcpu, vcpu->kvm->arch.default_tsc_khz);
- kvm_vcpu_reset(vcpu, false);
- kvm_init_mmu(vcpu);
- vcpu_put(vcpu);
return 0;
free_guest_fpu:
--
2.31.1
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH] KVM: X86: Reduce calls to vcpu_load
2023-09-06 6:24 [PATCH] KVM: X86: Reduce calls to vcpu_load Hao Peng
@ 2023-09-06 6:40 ` Xiaoyao Li
2023-09-06 20:08 ` Sean Christopherson
0 siblings, 1 reply; 4+ messages in thread
From: Xiaoyao Li @ 2023-09-06 6:40 UTC (permalink / raw)
To: Hao Peng, pbonzini; +Cc: kvm, Sean Christopherson, linux-kernel
On 9/6/2023 2:24 PM, Hao Peng wrote:
> From: Peng Hao <flyingpeng@tencent.com>
>
> The call of vcpu_load/put takes about 1-2us. Each
> kvm_arch_vcpu_create will call vcpu_load/put
> to initialize some fields of vmcs, which can be
> delayed until the call of vcpu_ioctl to process
> this part of the vmcs field, which can reduce calls
> to vcpu_load.
what if no vcpu ioctl is called after vcpu creation?
And will the first (it was second before this patch) vcpu_load() becomes
longer? have you measured it?
I don't think it worth the optimization unless a strong reason.
> Signed-off-by: Peng Hao <flyingpeng@tencent.com>
> ---
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/x86.c | 21 ++++++++++++++++-----
> 2 files changed, 17 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 9320019708f9..2f2dcd283788 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -984,6 +984,7 @@ struct kvm_vcpu_arch {
> /* Flush the L1 Data cache for L1TF mitigation on VMENTER */
> bool l1tf_flush_l1d;
>
> + bool initialized;
> /* Host CPU on which VM-entry was most recently attempted */
> int last_vmentry_cpu;
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 4fd08a5e0e98..a3671a54e850 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -317,7 +317,20 @@ const struct kvm_stats_header kvm_vcpu_stats_header = {
> u64 __read_mostly host_xcr0;
>
> static struct kmem_cache *x86_emulator_cache;
> +static int kvm_set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz);
>
> +static inline bool kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
> +{
> + return vcpu->arch.initialized;
> +}
> +
> +static void kvm_vcpu_initial_reset(struct kvm_vcpu *vcpu)
> +{
> + kvm_set_tsc_khz(vcpu, vcpu->kvm->arch.default_tsc_khz);
> + kvm_vcpu_reset(vcpu, false);
> + kvm_init_mmu(vcpu);
> + vcpu->arch.initialized = true;
> +}
> /*
> * When called, it means the previous get/set msr reached an invalid msr.
> * Return true if we want to ignore/silent this failed msr access.
> @@ -5647,6 +5660,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>
> vcpu_load(vcpu);
>
> + if (!kvm_vcpu_initialized(vcpu))
> + kvm_vcpu_initial_reset(vcpu);
> +
> u.buffer = NULL;
> switch (ioctl) {
> case KVM_GET_LAPIC: {
> @@ -11930,11 +11946,6 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
> vcpu->arch.msr_platform_info = MSR_PLATFORM_INFO_CPUID_FAULT;
> kvm_xen_init_vcpu(vcpu);
> kvm_vcpu_mtrr_init(vcpu);
> - vcpu_load(vcpu);
> - kvm_set_tsc_khz(vcpu, vcpu->kvm->arch.default_tsc_khz);
> - kvm_vcpu_reset(vcpu, false);
> - kvm_init_mmu(vcpu);
> - vcpu_put(vcpu);
> return 0;
>
> free_guest_fpu:
> --
> 2.31.1
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH] KVM: X86: Reduce calls to vcpu_load
2023-09-06 6:40 ` Xiaoyao Li
@ 2023-09-06 20:08 ` Sean Christopherson
2023-09-07 1:27 ` Hao Peng
0 siblings, 1 reply; 4+ messages in thread
From: Sean Christopherson @ 2023-09-06 20:08 UTC (permalink / raw)
To: Xiaoyao Li; +Cc: Hao Peng, pbonzini, kvm, linux-kernel
On Wed, Sep 06, 2023, Xiaoyao Li wrote:
> On 9/6/2023 2:24 PM, Hao Peng wrote:
> > From: Peng Hao <flyingpeng@tencent.com>
> >
> > The call of vcpu_load/put takes about 1-2us. Each
> > kvm_arch_vcpu_create will call vcpu_load/put
> > to initialize some fields of vmcs, which can be
> > delayed until the call of vcpu_ioctl to process
> > this part of the vmcs field, which can reduce calls
> > to vcpu_load.
>
> what if no vcpu ioctl is called after vcpu creation?
>
> And will the first (it was second before this patch) vcpu_load() becomes
> longer? have you measured it?
I don't think the first vcpu_load() becomes longer, this avoids an entire
load()+put() pair by doing the initialization in the first ioctl().
That said, the patch is obviously buggy, it hooks kvm_arch_vcpu_ioctl() instead
of kvm_vcpu_ioctl(), e.g. doing KVM_RUN, KVM_SET_SREGS, etc. will cause explosions.
It will also break the TSC synchronization logic in kvm_arch_vcpu_postcreate(),
which can "race" with ioctls() as the vCPU file descriptor is accessible by
userspace the instant it's installed into the fd tables, i.e. userspace doesn't
have to wait for KVM_CREATE_VCPU to complete.
And I gotta imagine there are other interactions I haven't thought of off the
top of my head, e.g. the vCPU is also reachable via kvm_for_each_vcpu(). All it
takes is one path that touches a lazily initialized field for this to fall apart.
> I don't think it worth the optimization unless a strong reason.
Yeah, this is a lot of subtle complexity to shave 1-2us.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] KVM: X86: Reduce calls to vcpu_load
2023-09-06 20:08 ` Sean Christopherson
@ 2023-09-07 1:27 ` Hao Peng
0 siblings, 0 replies; 4+ messages in thread
From: Hao Peng @ 2023-09-07 1:27 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Xiaoyao Li, pbonzini, kvm, linux-kernel
On Thu, Sep 7, 2023 at 4:08 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, Sep 06, 2023, Xiaoyao Li wrote:
> > On 9/6/2023 2:24 PM, Hao Peng wrote:
> > > From: Peng Hao <flyingpeng@tencent.com>
> > >
> > > The call of vcpu_load/put takes about 1-2us. Each
> > > kvm_arch_vcpu_create will call vcpu_load/put
> > > to initialize some fields of vmcs, which can be
> > > delayed until the call of vcpu_ioctl to process
> > > this part of the vmcs field, which can reduce calls
> > > to vcpu_load.
> >
> > what if no vcpu ioctl is called after vcpu creation?
> >
> > And will the first (it was second before this patch) vcpu_load() becomes
> > longer? have you measured it?
>
> I don't think the first vcpu_load() becomes longer, this avoids an entire
> load()+put() pair by doing the initialization in the first ioctl().
>
> That said, the patch is obviously buggy, it hooks kvm_arch_vcpu_ioctl() instead
> of kvm_vcpu_ioctl(), e.g. doing KVM_RUN, KVM_SET_SREGS, etc. will cause explosions.
>
> It will also break the TSC synchronization logic in kvm_arch_vcpu_postcreate(),
> which can "race" with ioctls() as the vCPU file descriptor is accessible by
> userspace the instant it's installed into the fd tables, i.e. userspace doesn't
> have to wait for KVM_CREATE_VCPU to complete.
>
It works when there are many cores. The hook point problem mentioned
above can still be adjusted,
but the tsc synchronization problem is difficult to deal with.
thanks.
> And I gotta imagine there are other interactions I haven't thought of off the
> top of my head, e.g. the vCPU is also reachable via kvm_for_each_vcpu(). All it
> takes is one path that touches a lazily initialized field for this to fall apart.
>
> > I don't think it worth the optimization unless a strong reason.
>
> Yeah, this is a lot of subtle complexity to shave 1-2us.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-09-07 1:27 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-06 6:24 [PATCH] KVM: X86: Reduce calls to vcpu_load Hao Peng
2023-09-06 6:40 ` Xiaoyao Li
2023-09-06 20:08 ` Sean Christopherson
2023-09-07 1:27 ` Hao Peng
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox