public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Jim Mattson <jmattson@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Eric Hankland <ehankland@google.com>,
	Roman Kagan <rkagan@amazon.de>,
	kvm@vger.kernel.org, Dave Hansen <dave.hansen@linux.intel.com>,
	Like Xu <likexu@tencent.com>,
	x86@kernel.org, Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	Borislav Petkov <bp@alien8.de>, Ingo Molnar <mingo@redhat.com>,
	Mingwei Zhang <mizhang@google.com>
Subject: Re: [PATCH] KVM: x86: vPMU: truncate counter value to allowed width
Date: Thu, 29 Jun 2023 17:11:06 -0700	[thread overview]
Message-ID: <ZJ4dmrQSduY8aWap@google.com> (raw)
In-Reply-To: <CALMp9eS3F08cwUJbKjTRAEL0KyZ=MC==YSH+DW-qsFkNfMpqEQ@mail.gmail.com>

+Mingwei

On Thu, Jun 29, 2023, Jim Mattson wrote:
> On Mon, Jun 5, 2023 at 5:51 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Thu, May 04, 2023, Roman Kagan wrote:
> > > diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
> > > index 5c7bbf03b599..6a91e1afef5a 100644
> > > --- a/arch/x86/kvm/pmu.h
> > > +++ b/arch/x86/kvm/pmu.h
> > > @@ -60,6 +60,12 @@ static inline u64 pmc_read_counter(struct kvm_pmc *pmc)
> > >       return counter & pmc_bitmask(pmc);
> > >  }
> > >
> > > +static inline void pmc_set_counter(struct kvm_pmc *pmc, u64 val)
> > > +{
> > > +     pmc->counter += val - pmc_read_counter(pmc);
> >
> > Ugh, not your code, but I don't see how the current code can possibly be correct.
> >
> > The above unpacks to
> >
> >         counter = pmc->counter;
> >         if (pmc->perf_event && !pmc->is_paused)
> >                 counter += perf_event_read_value(pmc->perf_event,
> >                                                  &enabled, &running);
> >         pmc->counter += val - (counter & pmc_bitmask(pmc));
> >
> > which distills down to
> >
> >         counter = 0;
> >         if (pmc->perf_event && !pmc->is_paused)
> >                 counter += perf_event_read_value(pmc->perf_event,
> >                                                  &enabled, &running);
> >         pmc->counter = val - (counter & pmc_bitmask(pmc));
> >
> > or more succinctly
> >
> >         if (pmc->perf_event && !pmc->is_paused)
> >                 val -= perf_event_read_value(pmc->perf_event, &enabled, &running);
> >
> >         pmc->counter = val;
> >
> > which is obviously wrong.  E.g. if the guest writes '0' to an active counter, the
> > adjustment will cause pmc->counter to be loaded with a large (in unsigned terms)
> > value, and thus quickly overflow after a write of '0'.
> 
> This weird construct goes all the way back to commit f5132b01386b
> ("KVM: Expose a version 2 architectural PMU to a guests"). Paolo
> killed it in commit 2924b52117b2 ("KVM: x86/pmu: do not mask the value
> that is written to fixed PMUs"), perhaps by accident. Eric then
> resurrected it in commit 4400cf546b4b ("KVM: x86: Fix perfctr WRMSR
> for running counters").
> 
> It makes no sense to me. WRMSR should just set the new value of the
> counter, regardless of the old value or whether or not it is running.

Heh, didn't stop you from giving Eric's patch a thumbs-up[*] :-)

Thanks to Eric's testcase [Wow, tests do help!  We should try writing more of them!],
I finally figured out what's going on.  I wrongly assumed perf_event_read_value()
is destructive, but it's not, it just reads the current value.  So on a WRMSR,
KVM offsets the value with the current perf event, and then *mostly* adjusts for
it when reading the counter.

But that is obviously super fragile because it means pmc->counter must never be
read directly unless the perf event is paused and the accumulated counter has been
propagated to pmc->counter.  Blech.

I fiddled with a variety of things, but AFAICT the easiest solution is also the
most obviously correct: set perf's count to the guest's count.  Lightly tested
patch below.

On a related topic, Mingwei also appears to have found another bug: prev_counter
needs to be set when the counter is written, i.e. my proposed pmc_write_counter()
also needs to update prev_counter.

Though that also raises the question of whether or not zeroing prev_counter in
reprogram_counter() is correct.  Unless I'm missing something, reprogram_counter()
should also set pmc->prev_counter to pmc->counter when the counter is successfully
(re)enabled.

And Jim also pointed out that prev_counter needs to be set even when KVM fails
to enable a perf event (software counting should still work).

[*] https://lore.kernel.org/all/CALMp9eRfeFFb6n22Uf4R2Pf8WW7BVLX_Vuf04WFwiMtrk14Y-Q@mail.gmail.com

---
 arch/x86/kvm/pmu.h           |  8 ++++++++
 arch/x86/kvm/svm/pmu.c       |  2 +-
 arch/x86/kvm/vmx/pmu_intel.c |  4 ++--
 include/linux/perf_event.h   |  2 ++
 kernel/events/core.c         | 11 +++++++++++
 5 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 7d9ba301c090..ba91a78e4dc1 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -74,6 +74,14 @@ static inline u64 pmc_read_counter(struct kvm_pmc *pmc)
 	return counter & pmc_bitmask(pmc);
 }
 
+static inline void pmc_write_counter(struct kvm_pmc *pmc, u64 val)
+{
+	if (pmc->perf_event && !pmc->is_paused)
+		perf_event_set_count(pmc->perf_event, val);
+
+	pmc->counter = val;
+}
+
 static inline void pmc_release_perf_event(struct kvm_pmc *pmc)
 {
 	if (pmc->perf_event) {
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index cef5a3d0abd0..373ff6a6687b 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -160,7 +160,7 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	/* MSR_PERFCTRn */
 	pmc = get_gp_pmc_amd(pmu, msr, PMU_TYPE_COUNTER);
 	if (pmc) {
-		pmc->counter += data - pmc_read_counter(pmc);
+		pmc_write_counter(pmc, data);
 		pmc_update_sample_period(pmc);
 		return 0;
 	}
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 80c769c58a87..18a658aa2a8d 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -406,11 +406,11 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			if (!msr_info->host_initiated &&
 			    !(msr & MSR_PMC_FULL_WIDTH_BIT))
 				data = (s64)(s32)data;
-			pmc->counter += data - pmc_read_counter(pmc);
+			pmc_write_counter(pmc, data);
 			pmc_update_sample_period(pmc);
 			break;
 		} else if ((pmc = get_fixed_pmc(pmu, msr))) {
-			pmc->counter += data - pmc_read_counter(pmc);
+			pmc_write_counter(pmc, data);
 			pmc_update_sample_period(pmc);
 			break;
 		} else if ((pmc = get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0))) {
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index d5628a7b5eaa..8fcd52a87ba2 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1677,6 +1677,7 @@ extern void perf_event_disable_inatomic(struct perf_event *event);
 extern void perf_event_task_tick(void);
 extern int perf_event_account_interrupt(struct perf_event *event);
 extern int perf_event_period(struct perf_event *event, u64 value);
+extern void perf_event_set_count(struct perf_event *event, u64 count);
 extern u64 perf_event_pause(struct perf_event *event, bool reset);
 #else /* !CONFIG_PERF_EVENTS: */
 static inline void *
@@ -1760,6 +1761,7 @@ static inline int perf_event_period(struct perf_event *event, u64 value)
 {
 	return -EINVAL;
 }
+static inline perf_event_set_count(struct perf_event *event, u64 count) { }
 static inline u64 perf_event_pause(struct perf_event *event, bool reset)
 {
 	return 0;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index db016e418931..d368c283eba5 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5646,6 +5646,17 @@ static void _perf_event_reset(struct perf_event *event)
 	perf_event_update_userpage(event);
 }
 
+void perf_event_set_count(struct perf_event *event, u64 count)
+{
+	struct perf_event_context *ctx;
+
+	ctx = perf_event_ctx_lock(event);
+	(void)perf_event_read(event, false);
+	local64_set(&event->count, count);
+	perf_event_ctx_unlock(event, ctx);
+}
+EXPORT_SYMBOL_GPL(perf_event_set_count);
+
 /* Assume it's not an event with inherit set. */
 u64 perf_event_pause(struct perf_event *event, bool reset)
 {

base-commit: 5ae85a1bd17b959796f6cc4c1153ceada2cf8f24
-- 

  reply	other threads:[~2023-06-30  0:11 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-04 12:00 [PATCH] KVM: x86: vPMU: truncate counter value to allowed width Roman Kagan
2023-05-23 12:40 ` Like Xu
2023-05-23 16:42   ` Roman Kagan
2023-06-06  0:26     ` Sean Christopherson
2023-06-06  0:51 ` Sean Christopherson
2023-06-29 21:16   ` Jim Mattson
2023-06-30  0:11     ` Sean Christopherson [this message]
2023-06-30  0:31       ` Jim Mattson
2023-06-30 11:14       ` Roman Kagan
2023-06-30 14:28         ` Sean Christopherson
2023-06-30 15:21           ` Roman Kagan
2023-06-30 15:45             ` Sean Christopherson
2023-06-30 17:07               ` Mingwei Zhang
2023-06-30 17:16                 ` Jim Mattson
2023-06-30 17:32                   ` Mingwei Zhang
2023-06-30 18:03                     ` Mingwei Zhang
2023-06-30 21:26               ` Sean Christopherson
2023-07-01 19:51                 ` Mingwei Zhang
2023-08-11  8:30                 ` Dapeng Mi
2023-08-22  9:29                 ` Like Xu
2023-08-23 18:28                   ` Mingwei Zhang
2023-07-03 13:33               ` Roman Kagan
2023-06-30 16:40             ` Jim Mattson
2023-06-30 23:25               ` Jim Mattson
2023-09-28 16:41 ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZJ4dmrQSduY8aWap@google.com \
    --to=seanjc@google.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=ehankland@google.com \
    --cc=hpa@zytor.com \
    --cc=jmattson@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=likexu@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mizhang@google.com \
    --cc=pbonzini@redhat.com \
    --cc=rkagan@amazon.de \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox