Re: [RESEND PATCH V2 1/4] perf/x86/intel: fix event update for auto-reload

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: kan.liang@intel.com
Cc: mingo@redhat.com, acme@kernel.org, linux-kernel@vger.kernel.org,
	tglx@linutronix.de, jolsa@redhat.com, eranian@google.com,
	ak@linux.intel.com, Kan Liang <kan.liang@linux.intel.com>
Subject: Re: [RESEND PATCH V2 1/4] perf/x86/intel: fix event update for auto-reload
Date: Wed, 24 Jan 2018 13:26:18 +0100	[thread overview]
Message-ID: <20180124122618.GH2249@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <1515424516-143728-2-git-send-email-kan.liang@intel.com>

On Mon, Jan 08, 2018 at 07:15:13AM -0800, kan.liang@intel.com wrote:

> The formula to calculate the event->count is as below:

>   event->count = period left from last time +
>                  (reload_times - 1) * reload_val +
>                  latency of PMI handler
> 
> prev_count is the last observed hardware counter value. Just the same as
> non-auto-reload, its absolute value is the period of the first record.
> It should not update with each reload. Because it doesn't 'observe' the
> hardware counter for each auto-reload.
> 
> For the second and later records, the period is exactly the reload
> value. Just need to simply add (reload_times - 1) * reload_val to
> event->count.
> 
> The calculation of the latency of PMI handler is a little bit different
> as non-auto-reload. Because the start point is -reload_value. It needs
> to be adjusted by adding reload_value.
> The period_left needs to do the same adjustment.

What's this about the PMI latency, we don't care about that in any other
situation, right? Sure the PMI takes a bit of time, but we're not
correcting for that anywhere, so why start now?

> There is nothing need to do in x86_perf_event_set_period(). Because it
> is fixed period. The period_left is already adjusted.

Fixes tag is missing.

> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> ---
>  arch/x86/events/intel/ds.c | 69 ++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 67 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
> index 3674a4b..cc1f373 100644
> --- a/arch/x86/events/intel/ds.c
> +++ b/arch/x86/events/intel/ds.c
> @@ -1251,17 +1251,82 @@ get_next_pebs_record_by_bit(void *base, void *top, int bit)
>  	return NULL;
>  }
>  
> +/*
> + * Specific intel_pmu_save_and_restart() for auto-reload.
> + */
> +static int intel_pmu_save_and_restart_reload(struct perf_event *event,
> +					     u64 reload_val,
> +					     int reload_times)
> +{
> +	struct hw_perf_event *hwc = &event->hw;
> +	int shift = 64 - x86_pmu.cntval_bits;
> +	u64 prev_raw_count, new_raw_count;
> +	u64 delta;
> +
> +	if ((reload_times == 0) || (reload_val == 0))
> +		return intel_pmu_save_and_restart(event);

Like Jiri, I find this confusing at best. If we need to call that one,
you shouldn't have called this function to begin with.

At best, have a WARN here or something.

> +
> +	/*
> +	 * Careful: an NMI might modify the previous event value.
> +	 *
> +	 * Our tactic to handle this is to first atomically read and
> +	 * exchange a new raw count - then add that new-prev delta
> +	 * count to the generic event atomically:
> +	 */

For now this seems to only get called from *drain_pebs* which afaict
only happens when we've disabled the PMU (either from sched_task or
PMI).

Now, you want to put this in the pmu::read() path, and that does not
disable the PMU, but I don't think we can drain the PEBS buffer while
its active, that's too full of races, so even there you'll have to
disable stuff.

So I don't think this is correct/desired for this case.

> +again:
> +	prev_raw_count = local64_read(&hwc->prev_count);
> +	rdpmcl(hwc->event_base_rdpmc, new_raw_count);
> +
> +	if (local64_cmpxchg(&hwc->prev_count, prev_raw_count,
> +					new_raw_count) != prev_raw_count)
> +		goto again;
> +
> +	/*
> +	 * Now we have the new raw value and have updated the prev
> +	 * timestamp already. We can now calculate the elapsed delta
> +	 * (event-)time and add that to the generic event.
> +	 *
> +	 * Careful, not all hw sign-extends above the physical width
> +	 * of the count.
> +	 *
> +	 *   event->count = period left from last time +
> +	 *                  (reload_times - 1) * reload_val +
> +	 *                  latency of PMI handler
         *
> +	 * The period left from last time can be got from -prev_count.
> +	 * The start points of counting is always -reload_val.
> +	 * So the real latency of PMI handler is reload_val + new_raw_count.
> +	 */

That is very confused, the PMI latency is utterly unrelated to anything
you do here.

> +	delta = (reload_val << shift) + (new_raw_count << shift) -
> +		(prev_raw_count << shift);
> +	delta >>= shift;
> +
> +	local64_add(reload_val * (reload_times - 1), &event->count);
> +	local64_add(delta, &event->count);

And this is still wrong I think. Consider the case where !reload_times.

We can easily call pmu::read() twice in one period. In that case we
should increment count with (new - prev).

Only once we get a new sample and are known to have wrapped, do we need
to consider that wrap.

> +	local64_sub(delta, &hwc->period_left);
> +
> +	return x86_perf_event_set_period(event);
> +}
> +
>  static void __intel_pmu_pebs_event(struct perf_event *event,
>  				   struct pt_regs *iregs,
>  				   void *base, void *top,
>  				   int bit, int count)
>  {
> +	struct hw_perf_event *hwc = &event->hw;
>  	struct perf_sample_data data;
>  	struct pt_regs regs;
>  	void *at = get_next_pebs_record_by_bit(base, top, bit);
>  
> -	if (!intel_pmu_save_and_restart(event) &&
> -	    !(event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD))
> +	if (hwc->flags & PERF_X86_EVENT_AUTO_RELOAD) {
> +		/*
> +		 * Now, auto-reload is only enabled in fixed period mode.
> +		 * The reload value is always hwc->sample_period.
> +		 * May need to change it, if auto-reload is enabled in
> +		 * freq mode later.
> +		 */
> +		intel_pmu_save_and_restart_reload(event, hwc->sample_period,
> +						  count);

Since you pass in @event, hwc->sample_period is already available to it,
no need to pass that in as well.

> +	} else if (!intel_pmu_save_and_restart(event))
>  		return;
>  
>  	while (count > 1) {

next prev parent reply	other threads:[~2018-01-24 12:26 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-08 15:15 [RESEND PATCH V2 0/4] bug fix mmap read and rdpmc read in large PEBS kan.liang
2018-01-08 15:15 ` [RESEND PATCH V2 1/4] perf/x86/intel: fix event update for auto-reload kan.liang
2018-01-10 10:22   ` Jiri Olsa
2018-01-10 14:31     ` Liang, Kan
2018-01-11 10:54       ` Jiri Olsa
2018-01-24 12:26   ` Peter Zijlstra [this message]
2018-01-24 15:45     ` Liang, Kan
2018-01-08 15:15 ` [RESEND PATCH V2 2/4] perf/x86: introduce read function for x86_pmu kan.liang
2018-01-08 15:15 ` [RESEND PATCH V2 3/4] perf/x86/intel: drain PEBS buffer in event read kan.liang
2018-01-10 10:39   ` Jiri Olsa
2018-01-10 14:31     ` Liang, Kan
2018-01-11 11:10       ` Jiri Olsa
2018-01-11 15:21         ` Liang, Kan
2018-01-11 15:45           ` Jiri Olsa
2018-01-16 18:49             ` Liang, Kan
2018-01-18  9:49               ` Jiri Olsa
2018-01-18 13:30                 ` Liang, Kan
2018-01-18 14:05                   ` Jiri Olsa
2018-01-08 15:15 ` [RESEND PATCH V2 4/4] perf/x86: fix: disable userspace RDPMC usage for large PEBS kan.liang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180124122618.GH2249@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=acme@kernel.org \
    --cc=ak@linux.intel.com \
    --cc=eranian@google.com \
    --cc=jolsa@redhat.com \
    --cc=kan.liang@intel.com \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.