All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Sean Christopherson <seanjc@google.com>
Cc: Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org, linux-perf-users@vger.kernel.org,
	linux-kernel@vger.kernel.org, Like Xu <likexu@tencent.com>,
	Jim Mattson <jmattson@google.com>,
	Mingwei Zhang <mizhang@google.com>
Subject: Re: [PATCH] perf/x86: Don't enforce minimum period for KVM guest-only events
Date: Fri, 17 Nov 2023 11:32:36 +0100	[thread overview]
Message-ID: <20231117103236.GI3818@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <20231107183605.409588-1-seanjc@google.com>

On Tue, Nov 07, 2023 at 10:36:05AM -0800, Sean Christopherson wrote:
> Don't apply minimum period workarounds/requirements to events that are
> being created by KVM to virtualize PMCs for guests, i.e. skip limit
> enforcement for events that exclude the host.  Perf's somewhat arbitrary
> limits prevents KVM from correctly virtualizing counter overflow, e.g. if
> the guest sets a counter to have an effective period of '1', forcing a
> minimum period of '2' results in overflow occurring at the incorrect time.
> 
> Whether or not a "real" profiling use case is affected is debatable, but
> the incorrect behavior is trivially easy to observe and reproduce, and is
> deterministic enough to make the PMU appear to be broken from the guest's
> perspective.
> 
> Furthermore, the "period" set by KVM isn't actually a period, as KVM won't
> automatically reprogram the event with the same period on overflow.  KVM
> will synthesize a PMI into the guest when appropriate, but what the guest
> does in response to the PMI is purely a guest decision.  In other words,
> KVM effectively operates in a one-shot mode, not a periodic mode.
> 
> Letting KVM and/or the guest program "too small" periods is safe for the
> host, as events that exclude the host are atomically disabled with respect
> to VM-Exit, i.e. are guaranteed to stop counting upon transitioning to the
> host.  And whether or not *explicitly* programming a short period is safe
> is somewhat of a moot point, as transitions to/from the guest effectively
> yield the same effect, e.g. an unrelated VM-Exit => VM-Enter transition
> will re-enable guest PMCs with whatever count happened to be in the PMC at
> the time of VM-Exit.
> 
> Cc: Like Xu <likexu@tencent.com>
> Cc: Jim Mattson <jmattson@google.com>
> Cc: Mingwei Zhang <mizhang@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> 
> Disclaimer: I've only tested this from KVM's side of things.
> 
>  arch/x86/events/core.c | 21 +++++++++++++++------
>  1 file changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 40ad1425ffa2..f8a8a4ea4d47 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -1388,16 +1388,25 @@ int x86_perf_event_set_period(struct perf_event *event)
>  		hwc->last_period = period;
>  		ret = 1;
>  	}
> -	/*
> -	 * Quirk: certain CPUs dont like it if just 1 hw_event is left:
> -	 */
> -	if (unlikely(left < 2))
> -		left = 2;
>  
>  	if (left > x86_pmu.max_period)
>  		left = x86_pmu.max_period;
>  
> -	static_call_cond(x86_pmu_limit_period)(event, &left);
> +	/*
> +	 * Exempt KVM guest events from the minimum period requirements.  It's
> +	 * the guest's responsibility to ensure it can make forward progress,
> +	 * and it's KVM's responsibility to configure an appropriate "period"
> +	 * to correctly virtualize overflow for the guest's PMCs.
> +	 */
> +	if (!event->attr.exclude_host) {
> +		/*
> +		 * Quirk: certain CPUs dont like it if just 1 event is left:
> +		 */
> +		if (unlikely(left < 2))
> +			left = 2;
> +
> +		static_call_cond(x86_pmu_limit_period)(event, &left);
> +	}

Hmm, IIRC we can disable that left < 2 thing for anything that doesn't
have x86_pmu.pebs_no_isolation IIRC.

I'm not sure about taking out the limit_period call, why does it make
sense to allow the guest to program obviously invalid settings?

That is, would something like the below work for you?

---
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 40ad1425ffa2..5543a0bab1f8 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -152,6 +152,14 @@ u64 x86_perf_event_update(struct perf_event *event)
 	return new_raw_count;
 }
 
+static void x86_perf_limit_period(struct perf_event *event, s64 *left)
+{
+	/*
+	 * Quirk: certain CPUs dont like it if just 1 hw_event is left:
+	 */
+	*left = max(*left, 2);
+}
+
 /*
  * Find and validate any extra registers to set up.
  */
@@ -1388,11 +1396,6 @@ int x86_perf_event_set_period(struct perf_event *event)
 		hwc->last_period = period;
 		ret = 1;
 	}
-	/*
-	 * Quirk: certain CPUs dont like it if just 1 hw_event is left:
-	 */
-	if (unlikely(left < 2))
-		left = 2;
 
 	if (left > x86_pmu.max_period)
 		left = x86_pmu.max_period;
@@ -2130,6 +2133,10 @@ static int __init init_hw_perf_events(void)
 	if (!x86_pmu.update)
 		x86_pmu.update = x86_perf_event_update;
 
+	// XXX check non-Intel
+	if (!x86_pmu.limit_period && x86_pmu.pebs_no_isolation)
+		x86_pmu.limit_update = x86_perf_limit_period;
+
 	x86_pmu_static_call_update();
 
 	/*
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index a08f794a0e79..9fe0f241779e 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4471,7 +4471,10 @@ static void bdw_limit_period(struct perf_event *event, s64 *left)
 		if (*left < 128)
 			*left = 128;
 		*left &= ~0x3fULL;
+		return;
 	}
+	if (unlikely(x86_pmu.pebs_no_isolation))
+		*left = max(*left, 2);
 }
 
 static void nhm_limit_period(struct perf_event *event, s64 *left)

  parent reply	other threads:[~2023-11-17 10:33 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-07 18:36 [PATCH] perf/x86: Don't enforce minimum period for KVM guest-only events Sean Christopherson
2023-11-07 19:38 ` Mingwei Zhang
2023-11-07 23:02   ` Sean Christopherson
2023-11-07 23:47     ` Mingwei Zhang
2023-11-17 10:32 ` Peter Zijlstra [this message]
2023-11-29  1:33   ` Sean Christopherson
2023-11-29 11:20     ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231117103236.GI3818@noisy.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=acme@kernel.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=jmattson@google.com \
    --cc=likexu@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mizhang@google.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.