From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DD17C3F9CC for ; Tue, 19 Nov 2024 13:44:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732023853; cv=none; b=PGPJ0zqj75X/PsTCP8ffBBMD5kWorB8Mr9nSnWi28V2G3HjKUwS9mmTrEBfkl5Xj+Qb+lkG8HZDzIynPovD/nc0l0O4Lx847qrXQVRnKy/w6giIIRevbMV/dapxFFCJMdco/L5XN5woSADDJj+DBO6hfJYNWKnvdgoGpl1hzXO0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732023853; c=relaxed/simple; bh=u06XGTZbxb3oXJKlyll3G+tbzotpOuuYrSpxrl9DZQw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=gX64tr/4yC4Yg/Kpy0GpnrqbxbOVbY4Wn0XgU1JahiLyha6uN/9AA2O+YqvBISuZrd/E1BP+amxNYucXM2si94aKXFDSEAgRPttyINhathspf7uIu+E5QdD5D/tDugsm0VDj4BNfduhzQVAoeV71rPJWjFCZtDej6wEZCbImkXo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=mmiKvVhn; arc=none smtp.client-ip=209.85.219.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="mmiKvVhn" Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-e3890bc5ec6so4385207276.2 for ; Tue, 19 Nov 2024 05:44:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1732023851; x=1732628651; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=5CkBUuaeP2+eh+1nuZsk4/gpv7iDku/8q2+t4vG3IRg=; b=mmiKvVhnRI4CxfmNEtrqVRcBSBgcVXP76T4PDrwP5YQfhBkhsCdKtLcd+zjsr/NWVj Vx93AJQ9SQldUOHGh0EQzYcIX3e7pade7FRZYktTXz6lFTaE9Dg4qGh+u9YyWnEtT/0C i6oQQOV/kolR016B9Q/ilPqFARLr7P105LPgl5l4+aj76hLVAMPQ1CDrjIBNu/z4ZlE+ UFtKsyzTYFKmwjcwNo0+IvDmuHmv/5kZ4/S0jhQ9Z6Af4Ie2/ZwVDfVVhJ2A3OYqbFM+ 8jpsYJLEhd3ytZtIPQe0A27Pw7T7wUMQKNMOwv6WRhbRyp46SzQzxgb/tDbtkZgZIuPN 9i5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732023851; x=1732628651; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=5CkBUuaeP2+eh+1nuZsk4/gpv7iDku/8q2+t4vG3IRg=; b=ih0cnfoj3ZN053qQxhj1ggZWpskWWNqjOdl0AxDi72RINxJ8vacOfIAJN//J8+1UD4 9QdbwCEX1zN7KwD66fHWm1vBL7MjDGZOgfkYVSpt18hQQgfSOO82UPdoi2r61F+mb8nM AB/Nsh43aS5pWwwFFo8SIBa0zPRylTjI+fgBw2ptusnWX2+JceK7W0T3tBw76Uz3brX8 QrUqWxG4lnVkgdm6glZfMoXKfviLBvLhWwLG0harsx0Sga312Bqw6CCLjXt3k+vy4Poi LwPhOoXpjnvjHPXhh5HGPKuVthuEmqld1yVt0yAcoK9TQBREscyJ8+VDuf5aPPgyRuxG HSFw== X-Forwarded-Encrypted: i=1; AJvYcCX49UgghHf9ka9NZho2DzJyRAzrYfn/rzIHgHnoFQ7feyAPGaEjzXtPVezh3zUGmwHBSxm6HymuToOm4EOUYi7z@vger.kernel.org X-Gm-Message-State: AOJu0YxhO+OvRffW+MZOewr0OcffXubT1pQpS9WdAUpXbocFJavevXWI /6nYQnXOg5kjeSnuw1T7TN9WGQ34qbHvbLcvw2XWdDLnEuVgV4SXUFyan05OOMtddLnBYYe5cV2 EzQ== X-Google-Smtp-Source: AGHT+IHpDGRiUSA5RDravQEdPZmbfx93BH1B54BaMEZPvlDiKyKrhIp1+KFD17MAnezPh1tiOHFxtBw3iEM= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:9d:3983:ac13:c240]) (user=seanjc job=sendgmr) by 2002:a25:c5c3:0:b0:e25:5cb1:77cd with SMTP id 3f1490d57ef6-e3826612092mr243804276.10.1732023850425; Tue, 19 Nov 2024 05:44:10 -0800 (PST) Date: Tue, 19 Nov 2024 05:44:09 -0800 In-Reply-To: <67013550-9739-4943-812f-4ba6f01e4fb4@linux.intel.com> Precedence: bulk X-Mailing-List: linux-perf-users@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240801045907.4010984-1-mizhang@google.com> <20240801045907.4010984-38-mizhang@google.com> <5309ec1b-edc6-4e59-af88-b223dbf2a455@intel.com> <67013550-9739-4943-812f-4ba6f01e4fb4@linux.intel.com> Message-ID: Subject: Re: [RFC PATCH v3 37/58] KVM: x86/pmu: Switch IA32_PERF_GLOBAL_CTRL at VM boundary From: Sean Christopherson To: Dapeng Mi Cc: Zide Chen , Mingwei Zhang , Paolo Bonzini , Xiong Zhang , Kan Liang , Zhenyu Wang , Manali Shukla , Sandipan Das , Jim Mattson , Stephane Eranian , Ian Rogers , Namhyung Kim , gce-passthrou-pmu-dev@google.com, Samantha Alt , Zhiyuan Lv , Yanfei Xu , Like Xu , Peter Zijlstra , Raghavendra Rao Ananta , kvm@vger.kernel.org, linux-perf-users@vger.kernel.org Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On Tue, Nov 19, 2024, Dapeng Mi wrote: >=20 > On 11/19/2024 9:46 AM, Sean Christopherson wrote: > > On Fri, Oct 25, 2024, Dapeng Mi wrote: > >> On 10/25/2024 4:26 AM, Chen, Zide wrote: > >>> On 7/31/2024 9:58 PM, Mingwei Zhang wrote: > >>> > >>>> @@ -7295,6 +7299,46 @@ static void atomic_switch_perf_msrs(struct vc= pu_vmx *vmx) > >>>> msrs[i].host, false); > >>>> } > >>>> =20 > >>>> +static void save_perf_global_ctrl_in_passthrough_pmu(struct vcpu_vm= x *vmx) > >>>> +{ > >>>> + struct kvm_pmu *pmu =3D vcpu_to_pmu(&vmx->vcpu); > >>>> + int i; > >>>> + > >>>> + if (vm_exit_controls_get(vmx) & VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL= ) { > >>>> + pmu->global_ctrl =3D vmcs_read64(GUEST_IA32_PERF_GLOBAL_CTRL); > >>> As commented in patch 26, compared with MSR auto save/store area > >>> approach, the exec control way needs one relatively expensive VMCS re= ad > >>> on every VM exit. > >> Anyway, let us have a evaluation and data speaks. > > No, drop the unconditional VMREAD and VMWRITE, one way or another. No = benchmark > > will notice ~50 extra cycles, but if we write poor code for every featu= re, those > > 50 cycles per feature add up. > > > > Furthermore, checking to see if the CPU supports the load/save VMCS con= trols at > > runtime beyond ridiculous. The mediated PMU requires ***VERSION 4***; = if a CPU > > supports PMU version 4 and doesn't support the VMCS controls, KVM shoul= d yell and > > disable the passthrough PMU. The amount of complexity added here to su= pport a > > CPU that should never exist is silly. > > > >>>> +static void load_perf_global_ctrl_in_passthrough_pmu(struct vcpu_vm= x *vmx) > >>>> +{ > >>>> + struct kvm_pmu *pmu =3D vcpu_to_pmu(&vmx->vcpu); > >>>> + u64 global_ctrl =3D pmu->global_ctrl; > >>>> + int i; > >>>> + > >>>> + if (vm_entry_controls_get(vmx) & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CT= RL) { > >>>> + vmcs_write64(GUEST_IA32_PERF_GLOBAL_CTRL, global_ctrl); > >>> ditto. > >>> > >>> We may optimize it by introducing a new flag pmu->global_ctrl_dirty a= nd > >>> update GUEST_IA32_PERF_GLOBAL_CTRL only when it's needed. But this > >>> makes the code even more complicated. > > I haven't looked at surrounding code too much, but I guarantee there's = _zero_ > > reason to eat a VMWRITE+VMREAD on every transition. If, emphasis on *i= f*, KVM > > accesses PERF_GLOBAL_CTRL frequently, e.g. on most exits, then add a VC= PU_EXREG_XXX > > and let KVM's caching infrastructure do the heavy lifting. Don't reinv= ent the > > wheel. But first, convince the world that KVM actually accesses the MS= R somewhat > > frequently. >=20 > Sean, let me give more background here. >=20 > VMX supports two ways to save/restore PERF_GLOBAL_CTRL MSR, one is to > leverage VMCS_EXIT_CTRL/VMCS_ENTRY_CTRL to save/restore guest > PERF_GLOBAL_CTRL value to/from VMCS guest state. The other is to use the > VMCS MSR auto-load/restore bitmap to save/restore guest PERF_GLOBAL_CTRL.= =C2=A0 I know. > Currently we prefer to use the former way to save/restore guest > PERF_GLOBAL_CTRL as long as HW supports it. There is a limitation on the > MSR auto-load/restore feature. When there are multiple MSRs, the MSRs are > saved/restored in the order of MSR index. As the suggestion of SDM, > PERF_GLOBAL_CTRL should always be written at last after all other PMU MSR= s > are manipulated. So if there are some PMU MSRs whose index is larger than > PERF_GLOBAL_CTRL (It would be true in archPerfmon v6+, all PMU MSRs in th= e > new MSR range have larger index than PERF_GLOBAL_CTRL), No, the entries in the load/store lists are processed in sequential order a= s they appear in the lists. Ordering them based on their MSR index would be insan= e and would make the lists useless. VM entries may load MSRs from the VM-entry MSR-load area (see Section 25.= 8.2). Specifically each entry in that area (up to the number specified in the V= M-entry MSR-load count) is processed in order by loading the MSR indexed by bits = 31:0 with the contents of bits 127:64 as they would be written by WRMSR.1 > these PMU MSRs would be restored after PERF_GLOBAL_CTRL. That would break= the > rule. Of course, it's good to save/restore PERF_GLOBAL_CTRL right now wit= h > the VMCS VMCS MSR auto-load/restore bitmap feature since only one PMU MSR > PERF_GLOBAL_CTRL is saved/restored in current implementation. No, it's never good to use the load/store lists. They're slow as mud, beca= use they're essentially just wrappers to the standard WRMSR/RDMSR ucode. Where= as dedicated VMCS fields have dedicated, streamlined ucode to make loads and s= tores as fast as possible. I haven't measured PERF_GLOBAL_CTRL specifically, at least not in recent me= mory, but generally speaking using a load/store entry is 100+ cycles, whereas usi= ng a dedicated VMCS field is <20 cyles (often far less). So what I am saying is that the mediated PMU should _require_ support for l= oading and saving PERF_GLOBAL_CTRL via dedicated fields, and WARN if a CPU with a = v4+ PMU doesn't support said fields. E.g. diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index a4b2b0b69a68..cab8305e7bf0 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -8620,6 +8620,15 @@ __init int vmx_hardware_setup(void) enable_sgx =3D false; #endif =20 + /* + * All CPUs that support a mediated PMU are expected to support loa= ding + * and saving PERF_GLOBAL_CTRL via dedicated VMCS fields. + */ + if (enable_passthrough_pmu && + (WARN_ON_ONCE(!cpu_has_load_perf_global_ctrl() || + !cpu_has_save_perf_global_ctrl()))) + enable_passthrough_pmu =3D false; + /* * set_apic_access_page_addr() is used to reload apic access * page upon invalidation. No need to do anything if not That will provide better, more consistent performance, and will eliminate a= big pile of non-trivial code. > PERF_GLOBAL_CTRL MSR could be frequently accessed by perf/pmu driver, e.g= . > on each task switch, so PERF_GLOBAL_CTRL MSR is configured to passthrough > to reduce the performance impact in mediated vPMU proposal if guest own a= ll > PMU HW resource. But if guest only owns part of PMU HW resource, > PERF_GLOBAL_CTRL would be set to interception mode. Again, I know. What I am saying is that propagating PERF_GLOBAL_CTRL to/fr= om the VMCS on every entry and exit is extremely wasteful and completely unnecessa= ry. > I suppose KVM doesn't need access PERF_GLOBAL_CTRL in passthrough mode. > This piece of code is intently just for PERF_GLOBAL_CTRL interception mod= e, No, it's even more useless if PERF_GLOBAL_CTRL is intercepted, because in t= hat case the _only_ time KVM needs move the guest's value to/from the VMCS is w= hen the guest (or host userspace) is explicitly accessing the field. > but think twice it looks unnecessary to save/restore PERF_GLOBAL_CTRL via > VMCS as KVM would always maintain the guest=C2=A0PERF_GLOBAL_CTRL value? = Anyway, > this part of code can be optimized.