From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E0BF9525A for ; Tue, 15 Aug 2023 06:27:13 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4A65DC433C7; Tue, 15 Aug 2023 06:27:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1692080833; bh=AodI7zIGu1RaCkc0AQzN+Yz5PQHEep5hiT9BLg/KHro=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=CDf5IRfBB1eW1wXf7GRdC0qxK3sY+Yu9bLNhbA5WKKxbQc1IxIbmmvHuZLY3fW8dd BdxZxPgL/0+Rd72V4sq7tCVPX+vJHUTYtjIjAD1hPYfFzN3m6dpLAuB5KETOVXurjg IppA8nG9DU+H3kmhjCoHaxnEwUNcSP6WJPNPUAr5hXic4Eow+9hMaFwxRTzCF5AFa4 87sGQaFe7NDKk+gl08LBXp6E/qvhYwY5F5fgmt6ZqkUEo+4tSSsoBqVNUBZePmTxIE XvAL3PK0D8WAcl7+geJMVIr3ebXq2ukpX2GImxOnNPeIdMITVgxMmXmmdAhjsM7X79 VJyvg8jg+r++g== Received: from [194.182.8.86] (helo=wait-a-minute.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1qVnWI-004u14-PS; Tue, 15 Aug 2023 07:27:10 +0100 Date: Tue, 15 Aug 2023 07:27:21 +0100 Message-ID: <87msysq0qe.wl-maz@kernel.org> From: Marc Zyngier To: Shijie Huang Cc: kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org, James Morse , Suzuki K Poulose , Oliver Upton , Zenghui Yu , Huang Shijie , Mark Rutland , Will Deacon , Frank Wang Subject: Re: [PATCH] KVM: arm64: pmu: Resync EL0 state on counter rotation In-Reply-To: <1ce05000-264e-1fda-d193-8b27c3c293d8@amperemail.onmicrosoft.com> References: <20230811180520.131727-1-maz@kernel.org> <1ce05000-264e-1fda-d193-8b27c3c293d8@amperemail.onmicrosoft.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.2 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) Precedence: bulk X-Mailing-List: kvmarm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-SA-Exim-Connect-IP: 194.182.8.86 X-SA-Exim-Rcpt-To: shijie@amperemail.onmicrosoft.com, kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, oliver.upton@linux.dev, yuzenghui@huawei.com, shijie@os.amperecomputing.com, mark.rutland@arm.com, will@kernel.org, zwang@amperecomputing.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false On Mon, 14 Aug 2023 03:58:32 +0100, Shijie Huang wrote: >=20 > Hi Marc, >=20 > =E5=9C=A8 2023/8/12 2:05, Marc Zyngier =E5=86=99=E9=81=93: > > Huang Shijie reports that, when profiling a guest from the host > > with a number of events that exceeds the number of available > > counters, the reported counts are wildly inaccurate. Without > > the counter oversubscription, the reported counts are correct. > >=20 > > Their investigation indicates that upon counter rotation (which > > takes place on the back of a timer interrupt), we fail to > > re-apply the guest EL0 enabling, leading to the counting of host > > events instead of guest events. > >=20 > > In order to solve this, add yet another hook between the host PMU > > driver and KVM, re-applying the guest EL0 configuration if the > > right conditions apply (the host is VHE, we are in interrupt > > context, and we interrupted a running vcpu). This triggers a new > > vcpu request which will apply the correct configuration on guest > > reentry. > >=20 > > With this, we have the correct counts, even when the counters are > > oversubscribed. > >=20 > > Reported-by: Huang Shijie > > Suggested-by: Oliver Upton > > Signed-off-by: Marc Zyngier > > Cc: Mark Rutland > > Cc: Will Deacon > > Link: https://lore.kernel.org/r/20230809013953.7692-1-shijie@os.amperec= omputing.com > > --- > > arch/arm64/include/asm/kvm_host.h | 1 + > > arch/arm64/kvm/arm.c | 3 +++ > > arch/arm64/kvm/pmu.c | 18 ++++++++++++++++++ > > drivers/perf/arm_pmuv3.c | 2 ++ > > include/kvm/arm_pmu.h | 2 ++ > > 5 files changed, 26 insertions(+) > >=20 > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm= /kvm_host.h > > index d3dd05bbfe23..553040e0e375 100644 > > --- a/arch/arm64/include/asm/kvm_host.h > > +++ b/arch/arm64/include/asm/kvm_host.h > > @@ -49,6 +49,7 @@ > > #define KVM_REQ_RELOAD_GICv4 KVM_ARCH_REQ(4) > > #define KVM_REQ_RELOAD_PMU KVM_ARCH_REQ(5) > > #define KVM_REQ_SUSPEND KVM_ARCH_REQ(6) > > +#define KVM_REQ_RESYNC_PMU_EL0 KVM_ARCH_REQ(7) > > #define KVM_DIRTY_LOG_MANUAL_CAPS > > (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \ > > KVM_DIRTY_LOG_INITIALLY_SET) > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c > > index 72dc53a75d1c..978b0411082f 100644 > > --- a/arch/arm64/kvm/arm.c > > +++ b/arch/arm64/kvm/arm.c > > @@ -803,6 +803,9 @@ static int check_vcpu_requests(struct kvm_vcpu *vcp= u) > > kvm_pmu_handle_pmcr(vcpu, > > __vcpu_sys_reg(vcpu, PMCR_EL0)); > > + if (kvm_check_request(KVM_REQ_RESYNC_PMU_EL0, vcpu)) > > + kvm_vcpu_pmu_restore_guest(vcpu); > > + > > if (kvm_check_request(KVM_REQ_SUSPEND, vcpu)) > > return kvm_vcpu_suspend(vcpu); > > diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c > > index 121f1a14c829..0eea225fd09a 100644 > > --- a/arch/arm64/kvm/pmu.c > > +++ b/arch/arm64/kvm/pmu.c > > @@ -236,3 +236,21 @@ bool kvm_set_pmuserenr(u64 val) > > ctxt_sys_reg(hctxt, PMUSERENR_EL0) =3D val; > > return true; > > } > > + > > +/* > > + * If we interrupted the guest to update the host PMU context, make > > + * sure we re-apply the guest EL0 state. > > + */ > > +void kvm_vcpu_pmu_resync_el0(void) > > +{ > > + struct kvm_vcpu *vcpu; > > + > > + if (!has_vhe() || !in_interrupt()) > > + return; > > + > > + vcpu =3D kvm_get_running_vcpu(); > > + if (!vcpu) > > + return; > > + > > + kvm_make_request(KVM_REQ_RESYNC_PMU_EL0, vcpu); > > +} > > diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c > > index 08b3a1bf0ef6..6a3d8176f54a 100644 > > --- a/drivers/perf/arm_pmuv3.c > > +++ b/drivers/perf/arm_pmuv3.c > > @@ -772,6 +772,8 @@ static void armv8pmu_start(struct arm_pmu *cpu_pmu) > > /* Enable all counters */ > > armv8pmu_pmcr_write(armv8pmu_pmcr_read() | ARMV8_PMU_PMCR_E); > > + > > + kvm_vcpu_pmu_resync_el0(); > > } >=20 > I read the perf code again, and find it maybe not good to do it in > armv8pmu_start. >=20 > =C2=A0=C2=A0 Assume we install a new perf event to a CPU "x" from CPU 0,= =C2=A0 a VM > guest is running on CPU "x": >=20 > =C2=A0=C2=A0=C2=A0 perf_event_open() --> perf_install_in_context() --> >=20 > =C2=A0=C2=A0=C2=A0 call this function in=C2=A0 IPI interrupt: ___perf_ins= tall_in_context(). >=20 > =C2=A0=C2=A0 armv8pmu_start() will be called in ___perf_install_in_contex= t() in IPI. >=20 > =C2=A0=C2=A0 so kvm_vcpu_pmu_resync_el0() will _make_ a request by meetin= g the > conditions: >=20 > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = 1.) in interrupt context. >=20 > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = 2.) a guest is running on this CPU. >=20 >=20 > But in actually, the request should not send out. Why shouldn't it be applied? This isn't supposed to be always necessary, but it needs to be applied whenever there is a possibility for counters to be updated behind our back, something that is a pretty event anyway. > Please correct me if I am wrong. >=20 > IMHO, the best solution is add=C2=A0 a hook in the perf/core code, and ma= ke > the request there. I disagree. I'm still completely opposed to anything that will add such a hook in the core perf code, specially as a weak symbol. The interactions must be strictly between the PMUv3 driver and KVM, because they are the only parties involved here. I will *not* take such a patch. M. --=20 Without deviation from the norm, progress is not possible.