From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3CFFCC433E0 for ; Tue, 19 May 2020 13:11:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 22CE920825 for ; Tue, 19 May 2020 13:11:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728901AbgESNLF (ORCPT ); Tue, 19 May 2020 09:11:05 -0400 Received: from mga02.intel.com ([134.134.136.20]:10145 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728705AbgESNLE (ORCPT ); Tue, 19 May 2020 09:11:04 -0400 IronPort-SDR: c4t4SSwR/JNBF3lCrIwTh+v3Qmhi/MhIMmA9ss2up+76oStqQWOpnw/Mpgdb2lEgZ8wrYYzFVp r2Ga4UrGIi6Q== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 May 2020 06:11:04 -0700 IronPort-SDR: Q1oJwjhWRHQZVxyRT2XVq6iFqHxtT7QDhVGshWTbMBpzbO6OUFVFWexIReRlCIeAOBMM1iDsnf eRNW9Bzes32g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.73,410,1583222400"; d="scan'208";a="288951757" Received: from likexu-mobl1.ccr.corp.intel.com (HELO [10.249.171.98]) ([10.249.171.98]) by fmsmga004.fm.intel.com with ESMTP; 19 May 2020 06:10:59 -0700 Reply-To: like.xu@intel.com Subject: Re: [PATCH v11 10/11] KVM: x86/pmu: Check guest LBR availability in case host reclaims them To: Peter Zijlstra , Like Xu Cc: Paolo Bonzini , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , ak@linux.intel.com, wei.w.wang@intel.com References: <20200514083054.62538-1-like.xu@linux.intel.com> <20200514083054.62538-11-like.xu@linux.intel.com> <20200519111559.GJ279861@hirez.programming.kicks-ass.net> From: "Xu, Like" Organization: Intel OTC Message-ID: <3a234754-e103-907f-9b06-44b5e7ae12d3@intel.com> Date: Tue, 19 May 2020 21:10:58 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.8.0 MIME-Version: 1.0 In-Reply-To: <20200519111559.GJ279861@hirez.programming.kicks-ass.net> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On 2020/5/19 19:15, Peter Zijlstra wrote: > On Thu, May 14, 2020 at 04:30:53PM +0800, Like Xu wrote: > >> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c >> index ea4faae56473..db185dca903d 100644 >> --- a/arch/x86/kvm/vmx/pmu_intel.c >> +++ b/arch/x86/kvm/vmx/pmu_intel.c >> @@ -646,6 +646,43 @@ static void intel_pmu_lbr_cleanup(struct kvm_vcpu *vcpu) >> intel_pmu_free_lbr_event(vcpu); >> } >> >> +static bool intel_pmu_lbr_is_availabile(struct kvm_vcpu *vcpu) >> +{ >> + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); >> + >> + if (!pmu->lbr_event) >> + return false; >> + >> + if (event_is_oncpu(pmu->lbr_event)) { >> + intel_pmu_intercept_lbr_msrs(vcpu, false); >> + } else { >> + intel_pmu_intercept_lbr_msrs(vcpu, true); >> + return false; >> + } >> + >> + return true; >> +} > This is unreadable gunk, what? Abstractly, it is saying "KVM would passthrough the LBR satck MSRs if event_is_oncpu() is true, otherwise cancel the passthrough state if any." I'm using 'event->oncpu != -1' to represent the guest LBR event is scheduled on rather than 'event->state == PERF_EVENT_STATE_ERROR'. For intel_pmu_intercept_lbr_msrs(), false means to passthrough the LBR stack MSRs to the vCPU, and true means to cancel the passthrough state and make LBR MSR accesses trapped by the KVM. > >> +/* >> + * Higher priority host perf events (e.g. cpu pinned) could reclaim the >> + * pmu resources (e.g. LBR) that were assigned to the guest. This is >> + * usually done via ipi calls (more details in perf_install_in_context). >> + * >> + * Before entering the non-root mode (with irq disabled here), double >> + * confirm that the pmu features enabled to the guest are not reclaimed >> + * by higher priority host events. Otherwise, disallow vcpu's access to >> + * the reclaimed features. >> + */ >> +static void intel_pmu_availability_check(struct kvm_vcpu *vcpu) >> +{ >> + lockdep_assert_irqs_disabled(); >> + >> + if (lbr_is_enabled(vcpu) && !intel_pmu_lbr_is_availabile(vcpu) && >> + (vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR)) >> + pr_warn_ratelimited("kvm: vcpu-%d: LBR is temporarily unavailable.\n", >> + vcpu->vcpu_id); > More unreadable nonsense; when the events go into ERROR state, it's a > permanent fail, they'll not come back. It's not true.  The guest LBR event with 'ERROR state' or 'oncpu != -1' would be lazy released and re-created in the next time the intel_pmu_create_lbr_event() is called and it's supposed to be re-scheduled and re-do availability_check() as well. From the perspective of the guest user, the guest LBR is only temporarily unavailable until the host no longer reclaims the LBR. > >> +} >> + >> struct kvm_pmu_ops intel_pmu_ops = { >> .find_arch_event = intel_find_arch_event, >> .find_fixed_event = intel_find_fixed_event, >> @@ -662,4 +699,5 @@ struct kvm_pmu_ops intel_pmu_ops = { >> .reset = intel_pmu_reset, >> .deliver_pmi = intel_pmu_deliver_pmi, >> .lbr_cleanup = intel_pmu_lbr_cleanup, >> + .availability_check = intel_pmu_availability_check, >> }; >> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c >> index 9969d663826a..80d036c5f64a 100644 >> --- a/arch/x86/kvm/vmx/vmx.c >> +++ b/arch/x86/kvm/vmx/vmx.c >> @@ -6696,8 +6696,10 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu) >> >> pt_guest_enter(vmx); >> >> - if (vcpu_to_pmu(vcpu)->version) >> + if (vcpu_to_pmu(vcpu)->version) { >> atomic_switch_perf_msrs(vmx); >> + kvm_x86_ops.pmu_ops->availability_check(vcpu); >> + } > AFAICT you just did a call out to the kvm_pmu crud in > atomic_switch_perf_msrs(), why do another call? In fact, availability_check() is only called here for just one time. The callchain looks like: - vmx_vcpu_run()     - kvm_x86_ops.pmu_ops->availability_check();         - intel_pmu_availability_check()             - intel_pmu_lbr_is_availabile()                 - event_is_oncpu() ... Thanks, Like Xu > >