From mboxrd@z Thu Jan 1 00:00:00 1970 From: Radim =?utf-8?B?S3LEjW3DocWZ?= Subject: Re: [PATCH v2] KVM: nVMX: Track active shadow VMCSs Date: Thu, 28 Jul 2016 16:01:35 +0200 Message-ID: <20160728140134.GB29177@potion> References: <1469211903-21781-1-git-send-email-jmattson@google.com> <20160727204533.GA29177@potion> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Bandan Das , kvm@vger.kernel.org, Paolo Bonzini , David Matlack , Peter Feiner To: Jim Mattson Return-path: Received: from mx1.redhat.com ([209.132.183.28]:47794 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758102AbcG1OBj (ORCPT ); Thu, 28 Jul 2016 10:01:39 -0400 Content-Disposition: inline In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: 2016-07-28 06:30-0700, Jim Mattson: > I'm going to be on vacation next week, but I'll address these concern= s > when I return. If time permits, I'll send out a patch this week that > just addresses the issue of doing a VMPTRLD before putting the VMCS o= n > the loaded VMCSs list. Ah, I missed the point of that hunk. It is not related to nVMX so fixing it in a separate patch would be best. Thanks. > On Wed, Jul 27, 2016 at 2:53 PM, Bandan Das wrote: >> Radim Kr=C4=8Dm=C3=A1=C5=99 writes: >> >>> 2016-07-22 11:25-0700, Jim Mattson: >>>> If a shadow VMCS is referenced by the VMCS link pointer in the >>>> current VMCS, then VM-entry makes the shadow VMCS active on the >>>> current processor. No VMCS should ever be active on more than one >>>> processor. If a VMCS is to be migrated from one processor to >>>> another, software should execute a VMCLEAR for the VMCS on the >>>> first processor before the VMCS is made active on the second >>>> processor. >>>> >>>> We already keep track of ordinary VMCSs that are made active by >>>> VMPTRLD. Extend that tracking to handle shadow VMCSs that are >>>> made active by VM-entry to their parents. >>>> >>>> Signed-off-by: Jim Mattson >>>> --- >>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >>>> @@ -2113,6 +2113,15 @@ static void vmx_vcpu_pi_load(struct kvm_vcp= u *vcpu, int cpu) >>>> new.control) !=3D old.control); >>>> } >>>> >>>> +static void record_loaded_vmcs(struct loaded_vmcs *loaded_vmcs, i= nt cpu) >>>> +{ >>>> + if (loaded_vmcs->vmcs) { >>>> + list_add(&loaded_vmcs->loaded_vmcss_on_cpu_link, >>>> + &per_cpu(loaded_vmcss_on_cpu, cpu)); >>>> + loaded_vmcs->cpu =3D cpu; >>>> + } >>>> +} >>>> + >>>> /* >>>> * Switches to specified vcpu, until a matching vcpu_put(), but a= ssumes >>>> * vcpu mutex is already taken. >>>> @@ -2124,15 +2133,13 @@ static void vmx_vcpu_load(struct kvm_vcpu = *vcpu, int cpu) >>>> >>>> if (!vmm_exclusive) >>>> kvm_cpu_vmxon(phys_addr); >>>> - else if (vmx->loaded_vmcs->cpu !=3D cpu) >>>> + else if (vmx->loaded_vmcs->cpu !=3D cpu) { >>>> loaded_vmcs_clear(vmx->loaded_vmcs); >>>> + if (vmx->nested.current_shadow_vmcs.vmcs) >>>> + loaded_vmcs_clear(&vmx->nested.current_shadow= _vmcs); >>> >>> loaded_vmcs_clear() uses expensive IPI, so would want to do both in= one >>> call in future patches as they are always active on the same CPU. >> >> Maybe just move the check for an active shadow vmcs to loaded_vmcs_c= lear >> and clear it there unconditionally. >> >>> Another possible complication is marking current_shadow_vmcs as act= ive >>> on a cpu only after a successful vmlaunch. >>> (We don't seem to ever vmptrld shadow vmcs explicitly.) >>> >>>> + } >>> >>> Incorrect whitespace for indentation. >>> >>>> >>>> if (per_cpu(current_vmcs, cpu) !=3D vmx->loaded_vmcs->vmcs) { >>>> - per_cpu(current_vmcs, cpu) =3D vmx->loaded_vmcs->vmcs= ; >>>> - vmcs_load(vmx->loaded_vmcs->vmcs); >>>> - } >>>> - >>>> - if (vmx->loaded_vmcs->cpu !=3D cpu) { >>> >>> This condition is nice for performance because a non-current vmcs i= s >>> usually already active on the same CPU, so we skip all the code bel= ow. >>> >>> (This is the only thing that has to be fixed as it regresses non-ne= sted, >>> the rest are mostly ideas for simplifications.) >> >> I think he wanted to make sure to call vmcs_load after the call >> to crash_disable_local_vmclear() but that should be possible without >> removing the check. >> >> Bandan >> >>>> struct desc_ptr *gdt =3D this_cpu_ptr(&host_gdt); >>>> unsigned long sysenter_esp; >>>> >>>> @@ -2147,11 +2154,15 @@ static void vmx_vcpu_load(struct kvm_vcpu = *vcpu, int cpu) >>>> */ >>>> smp_rmb(); >>>> >>>> - list_add(&vmx->loaded_vmcs->loaded_vmcss_on_cpu_link, >>>> - &per_cpu(loaded_vmcss_on_cpu, cpu)); >>>> + record_loaded_vmcs(vmx->loaded_vmcs, cpu); >>> >>> Adding and an element to a list multiple times seems invalid, which= the >>> condition was also guarding. >>> >>>> + record_loaded_vmcs(&vmx->nested.current_shadow_vmcs, = cpu); >>> >>> current_shadow_vmcs is always active on the same cpu as loaded_vmcs= ... >>> I think we could skip the list and just clear current_shadow_vmcs w= hen >>> clearing its loaded_vmcs. >>> >>>> crash_enable_local_vmclear(cpu); >>>> local_irq_enable(); >>>> >>>> + per_cpu(current_vmcs, cpu) =3D vmx->loaded_vmcs->vmcs= ; >>>> + vmcs_load(vmx->loaded_vmcs->vmcs); >>>> + >>>> /* >>>> * Linux uses per-cpu TSS and GDT, so set these when = switching >>>> * processors. >>>> @@ -2161,8 +2172,6 @@ static void vmx_vcpu_load(struct kvm_vcpu *v= cpu, int cpu) >>>> >>>> rdmsrl(MSR_IA32_SYSENTER_ESP, sysenter_esp); >>>> vmcs_writel(HOST_IA32_SYSENTER_ESP, sysenter_esp); /*= 22.2.3 */ >>>> - >>>> - vmx->loaded_vmcs->cpu =3D cpu; >>>> } >>>> >>>> /* Setup TSC multiplier */ >>>> @@ -6812,6 +6821,34 @@ static int nested_vmx_check_vmptr(struct kv= m_vcpu *vcpu, int exit_reason, >>>> return 0; >>>> } >>>> >>>> +static int setup_shadow_vmcs(struct vcpu_vmx *vmx) >>>> +{ >>>> + struct vmcs *shadow_vmcs; >>>> + int cpu; >>>> + >>>> + shadow_vmcs =3D alloc_vmcs(); >>>> + if (!shadow_vmcs) >>>> + return -ENOMEM; >>>> + >>>> + /* mark vmcs as shadow */ >>>> + shadow_vmcs->revision_id |=3D (1u << 31); >>>> + /* init shadow vmcs */ >>>> + vmx->nested.current_shadow_vmcs.vmcs =3D shadow_vmcs; >>>> + loaded_vmcs_init(&vmx->nested.current_shadow_vmcs); >>>> + >>>> + cpu =3D get_cpu(); >>>> + local_irq_disable(); >>>> + crash_disable_local_vmclear(cpu); >>>> + >>>> + record_loaded_vmcs(&vmx->nested.current_shadow_vmcs, cpu); >>> >>> This could be avoided if we assumed that shadow vmcs is always acti= ve on >>> the same vcpu. The assumption looks rock-solid, because shadow vmc= s is >>> activated on vmlaunch and its linking vmcs must be active (and curr= ent) >>> on the same CPU. >>> >>>> + >>>> + crash_enable_local_vmclear(cpu); >>>> + local_irq_enable(); >>>> + put_cpu(); >>>> + >>>> + return 0; >>>> +} >>>> + >>>> /* >>>> * Emulate the VMXON instruction. >>>> * Currently, we just remember that VMX is active, and do not sav= e or even >>>> @@ -6867,14 +6903,9 @@ static int handle_vmon(struct kvm_vcpu *vcp= u) >>>> } >>>> >>>> if (enable_shadow_vmcs) { >>>> - shadow_vmcs =3D alloc_vmcs(); >>>> - if (!shadow_vmcs) >>>> - return -ENOMEM; >>>> - /* mark vmcs as shadow */ >>>> - shadow_vmcs->revision_id |=3D (1u << 31); >>>> - /* init shadow vmcs */ >>>> - vmcs_clear(shadow_vmcs); >>>> - vmx->nested.current_shadow_vmcs =3D shadow_vmcs; >>>> + int ret =3D setup_shadow_vmcs(vmx); >>>> + if (ret < 0) >>>> + return ret; >>>> } >>>> >>>> INIT_LIST_HEAD(&(vmx->nested.vmcs02_pool));