All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Jim Mattson <jmattson@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Eric Hankland <ehankland@google.com>,
	Roman Kagan <rkagan@amazon.de>,
	kvm@vger.kernel.org, Dave Hansen <dave.hansen@linux.intel.com>,
	Like Xu <likexu@tencent.com>,
	x86@kernel.org, Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	Borislav Petkov <bp@alien8.de>, Ingo Molnar <mingo@redhat.com>,
	Mingwei Zhang <mizhang@google.com>
Subject: Re: [PATCH] KVM: x86: vPMU: truncate counter value to allowed width
Date: Thu, 29 Jun 2023 17:11:06 -0700	[thread overview]
Message-ID: <ZJ4dmrQSduY8aWap@google.com> (raw)
In-Reply-To: <CALMp9eS3F08cwUJbKjTRAEL0KyZ=MC==YSH+DW-qsFkNfMpqEQ@mail.gmail.com>

+Mingwei

On Thu, Jun 29, 2023, Jim Mattson wrote:
> On Mon, Jun 5, 2023 at 5:51 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Thu, May 04, 2023, Roman Kagan wrote:
> > > diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
> > > index 5c7bbf03b599..6a91e1afef5a 100644
> > > --- a/arch/x86/kvm/pmu.h
> > > +++ b/arch/x86/kvm/pmu.h
> > > @@ -60,6 +60,12 @@ static inline u64 pmc_read_counter(struct kvm_pmc *pmc)
> > >       return counter & pmc_bitmask(pmc);
> > >  }
> > >
> > > +static inline void pmc_set_counter(struct kvm_pmc *pmc, u64 val)
> > > +{
> > > +     pmc->counter += val - pmc_read_counter(pmc);
> >
> > Ugh, not your code, but I don't see how the current code can possibly be correct.
> >
> > The above unpacks to
> >
> >         counter = pmc->counter;
> >         if (pmc->perf_event && !pmc->is_paused)
> >                 counter += perf_event_read_value(pmc->perf_event,
> >                                                  &enabled, &running);
> >         pmc->counter += val - (counter & pmc_bitmask(pmc));
> >
> > which distills down to
> >
> >         counter = 0;
> >         if (pmc->perf_event && !pmc->is_paused)
> >                 counter += perf_event_read_value(pmc->perf_event,
> >                                                  &enabled, &running);
> >         pmc->counter = val - (counter & pmc_bitmask(pmc));
> >
> > or more succinctly
> >
> >         if (pmc->perf_event && !pmc->is_paused)
> >                 val -= perf_event_read_value(pmc->perf_event, &enabled, &running);
> >
> >         pmc->counter = val;
> >
> > which is obviously wrong.  E.g. if the guest writes '0' to an active counter, the
> > adjustment will cause pmc->counter to be loaded with a large (in unsigned terms)
> > value, and thus quickly overflow after a write of '0'.
> 
> This weird construct goes all the way back to commit f5132b01386b
> ("KVM: Expose a version 2 architectural PMU to a guests"). Paolo
> killed it in commit 2924b52117b2 ("KVM: x86/pmu: do not mask the value
> that is written to fixed PMUs"), perhaps by accident. Eric then
> resurrected it in commit 4400cf546b4b ("KVM: x86: Fix perfctr WRMSR
> for running counters").
> 
> It makes no sense to me. WRMSR should just set the new value of the
> counter, regardless of the old value or whether or not it is running.

Heh, didn't stop you from giving Eric's patch a thumbs-up[*] :-)

Thanks to Eric's testcase [Wow, tests do help!  We should try writing more of them!],
I finally figured out what's going on.  I wrongly assumed perf_event_read_value()
is destructive, but it's not, it just reads the current value.  So on a WRMSR,
KVM offsets the value with the current perf event, and then *mostly* adjusts for
it when reading the counter.

But that is obviously super fragile because it means pmc->counter must never be
read directly unless the perf event is paused and the accumulated counter has been
propagated to pmc->counter.  Blech.

I fiddled with a variety of things, but AFAICT the easiest solution is also the
most obviously correct: set perf's count to the guest's count.  Lightly tested
patch below.

On a related topic, Mingwei also appears to have found another bug: prev_counter
needs to be set when the counter is written, i.e. my proposed pmc_write_counter()
also needs to update prev_counter.

Though that also raises the question of whether or not zeroing prev_counter in
reprogram_counter() is correct.  Unless I'm missing something, reprogram_counter()
should also set pmc->prev_counter to pmc->counter when the counter is successfully
(re)enabled.

And Jim also pointed out that prev_counter needs to be set even when KVM fails
to enable a perf event (software counting should still work).

[*] https://lore.kernel.org/all/CALMp9eRfeFFb6n22Uf4R2Pf8WW7BVLX_Vuf04WFwiMtrk14Y-Q@mail.gmail.com

---
 arch/x86/kvm/pmu.h           |  8 ++++++++
 arch/x86/kvm/svm/pmu.c       |  2 +-
 arch/x86/kvm/vmx/pmu_intel.c |  4 ++--
 include/linux/perf_event.h   |  2 ++
 kernel/events/core.c         | 11 +++++++++++
 5 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 7d9ba301c090..ba91a78e4dc1 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -74,6 +74,14 @@ static inline u64 pmc_read_counter(struct kvm_pmc *pmc)
 	return counter & pmc_bitmask(pmc);
 }
 
+static inline void pmc_write_counter(struct kvm_pmc *pmc, u64 val)
+{
+	if (pmc->perf_event && !pmc->is_paused)
+		perf_event_set_count(pmc->perf_event, val);
+
+	pmc->counter = val;
+}
+
 static inline void pmc_release_perf_event(struct kvm_pmc *pmc)
 {
 	if (pmc->perf_event) {
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index cef5a3d0abd0..373ff6a6687b 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -160,7 +160,7 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	/* MSR_PERFCTRn */
 	pmc = get_gp_pmc_amd(pmu, msr, PMU_TYPE_COUNTER);
 	if (pmc) {
-		pmc->counter += data - pmc_read_counter(pmc);
+		pmc_write_counter(pmc, data);
 		pmc_update_sample_period(pmc);
 		return 0;
 	}
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 80c769c58a87..18a658aa2a8d 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -406,11 +406,11 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			if (!msr_info->host_initiated &&
 			    !(msr & MSR_PMC_FULL_WIDTH_BIT))
 				data = (s64)(s32)data;
-			pmc->counter += data - pmc_read_counter(pmc);
+			pmc_write_counter(pmc, data);
 			pmc_update_sample_period(pmc);
 			break;
 		} else if ((pmc = get_fixed_pmc(pmu, msr))) {
-			pmc->counter += data - pmc_read_counter(pmc);
+			pmc_write_counter(pmc, data);
 			pmc_update_sample_period(pmc);
 			break;
 		} else if ((pmc = get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0))) {
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index d5628a7b5eaa..8fcd52a87ba2 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1677,6 +1677,7 @@ extern void perf_event_disable_inatomic(struct perf_event *event);
 extern void perf_event_task_tick(void);
 extern int perf_event_account_interrupt(struct perf_event *event);
 extern int perf_event_period(struct perf_event *event, u64 value);
+extern void perf_event_set_count(struct perf_event *event, u64 count);
 extern u64 perf_event_pause(struct perf_event *event, bool reset);
 #else /* !CONFIG_PERF_EVENTS: */
 static inline void *
@@ -1760,6 +1761,7 @@ static inline int perf_event_period(struct perf_event *event, u64 value)
 {
 	return -EINVAL;
 }
+static inline perf_event_set_count(struct perf_event *event, u64 count) { }
 static inline u64 perf_event_pause(struct perf_event *event, bool reset)
 {
 	return 0;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index db016e418931..d368c283eba5 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5646,6 +5646,17 @@ static void _perf_event_reset(struct perf_event *event)
 	perf_event_update_userpage(event);
 }
 
+void perf_event_set_count(struct perf_event *event, u64 count)
+{
+	struct perf_event_context *ctx;
+
+	ctx = perf_event_ctx_lock(event);
+	(void)perf_event_read(event, false);
+	local64_set(&event->count, count);
+	perf_event_ctx_unlock(event, ctx);
+}
+EXPORT_SYMBOL_GPL(perf_event_set_count);
+
 /* Assume it's not an event with inherit set. */
 u64 perf_event_pause(struct perf_event *event, bool reset)
 {

base-commit: 5ae85a1bd17b959796f6cc4c1153ceada2cf8f24
-- 

  reply	other threads:[~2023-06-30  0:11 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-04 12:00 [PATCH] KVM: x86: vPMU: truncate counter value to allowed width Roman Kagan
2023-05-23 12:40 ` Like Xu
2023-05-23 16:42   ` Roman Kagan
2023-06-06  0:26     ` Sean Christopherson
2023-06-06  0:51 ` Sean Christopherson
2023-06-29 21:16   ` Jim Mattson
2023-06-30  0:11     ` Sean Christopherson [this message]
2023-06-30  0:31       ` Jim Mattson
2023-06-30 11:14       ` Roman Kagan
2023-06-30 14:28         ` Sean Christopherson
2023-06-30 15:21           ` Roman Kagan
2023-06-30 15:45             ` Sean Christopherson
2023-06-30 17:07               ` Mingwei Zhang
2023-06-30 17:16                 ` Jim Mattson
2023-06-30 17:32                   ` Mingwei Zhang
2023-06-30 18:03                     ` Mingwei Zhang
2023-06-30 21:26               ` Sean Christopherson
2023-07-01 19:51                 ` Mingwei Zhang
2023-08-11  8:30                 ` Dapeng Mi
2023-08-22  9:29                 ` Like Xu
2023-08-23 18:28                   ` Mingwei Zhang
2023-07-03 13:33               ` Roman Kagan
2023-06-30 16:40             ` Jim Mattson
2023-06-30 23:25               ` Jim Mattson
2023-09-28 16:41 ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZJ4dmrQSduY8aWap@google.com \
    --to=seanjc@google.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=ehankland@google.com \
    --cc=hpa@zytor.com \
    --cc=jmattson@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=likexu@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mizhang@google.com \
    --cc=pbonzini@redhat.com \
    --cc=rkagan@amazon.de \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.