Re: [PATCH V5 4/6] perf, x86: handle multiple records in PEBS buffer

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Kan Liang <kan.liang@intel.com>
Cc: linux-kernel@vger.kernel.org, mingo@kernel.org,
	acme@infradead.org, eranian@google.com, andi@firstfloor.org
Subject: Re: [PATCH V5 4/6] perf, x86: handle multiple records in PEBS buffer
Date: Mon, 30 Mar 2015 15:45:31 +0200	[thread overview]
Message-ID: <20150330134531.GV23123@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <1424701556-28270-5-git-send-email-kan.liang@intel.com>

On Mon, Feb 23, 2015 at 09:25:54AM -0500, Kan Liang wrote:
> From: Yan, Zheng <zheng.z.yan@intel.com>
> 
> When PEBS interrupt threshold is larger than one, the PEBS buffer
> may include multiple records for each PEBS event. This patch makes
> the code first count how many records each PEBS event has, then
> output the samples in batch.
> 
> One corner case needs to mention is that the PEBS hardware doesn't
> deal well with collisions, when PEBS events happen near to each
> other. The records for the events can be collapsed into a single
> one, and it's not possible to reconstruct all events that caused
> the PEBS record, However in practice collisions are extremely rare,
> as long as different events are used. The periods are typically very
> large, so any collision is unlikely. When collision happens, we drop
> the PEBS record.
> 
> Here are some numbers about collisions.
> Four frequently occurring events
> (cycles:p,instructions:p,branches:p,mem-stores:p) are tested
> 
> Test events which are sampled together                   collision rate
> cycles:p,instructions:p                                  0.25%
> cycles:p,instructions:p,branches:p                       0.30%
> cycles:p,instructions:p,branches:p,mem-stores:p          0.35%
> 
> cycles:p,cycles:p                                        98.52%
> 
> collisions are extremely rare as long as different events are used. The
> only way you can get a lot of collision is when you count the same thing
> multiple times. But it is not a useful configuration.

This fails to mention the other problem the status field has.  You also
did not specify what exact condition you counted as a collision.

The PEBS status field is a copy of the GLOBAL_STATUS MSR at assist time,
this means that:

 - its possible (and harmless) for the status field to contain set bits
   for !PEBS events -- the proposed code is buggy here.
 - its possible to have multiple PEBS bits set even though the event
   really only was for a single event -- if you count everything with
   multiple PEBS bits set as a collision you're counting wrong.

So once again, a coherent story here please.

>  static void __intel_pmu_pebs_event(struct perf_event *event,
> +				   struct pt_regs *iregs,
> +				   void *at, void *top, int count)
>  {
> +	struct perf_output_handle handle;
> +	struct perf_event_header header;
>  	struct perf_sample_data data;
>  	struct pt_regs regs;
>  
> +	if (!intel_pmu_save_and_restart(event) &&
> +	    !(event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD))
>  		return;
>  
> +	setup_pebs_sample_data(event, iregs, at, &data, &regs);
>  
> +	if (perf_event_overflow(event, &data, &regs)) {
>  		x86_pmu_stop(event, 0);
> +		return;
> +	}
> +
> +	if (count <= 1)
> +		return;
> +
> +	at += x86_pmu.pebs_record_size;
> +	count--;
> +
> +	perf_sample_data_init(&data, 0, event->hw.last_period);
> +	perf_prepare_sample(&header, &data, event, &regs);
> +
> +	if (perf_output_begin(&handle, event, header.size * count))
> +		return;
> +
> +	for (; at < top; at += x86_pmu.pebs_record_size) {
> +		struct pebs_record_nhm *p = at;
> +
> +		if (p->status != (1 << event->hw.idx))
> +			continue;
> +
> +		setup_pebs_sample_data(event, iregs, at, &data, &regs);
> +		perf_output_sample(&handle, &header, &data, event);
> +
> +		count--;
> +		if (count == 0)
> +			break;
> +	}
> +
> +	perf_output_end(&handle);
>  }

This can use a comment on why this is funny like this. I have vague
memories, but a comment helps everybody who doesn't have those memories
-- which will include me in a year or so.

What I cannot remember is why we call overflow on the first, not the
last event.

>  static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
>  {
>  	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>  	struct debug_store *ds = cpuc->ds;
> +	struct perf_event *event;
> +	void *base, *at, *top;
>  	int bit;
> +	int counts[MAX_PEBS_EVENTS] = {};
>  
>  	if (!x86_pmu.pebs_active)
>  		return;
>  
> +	base = (struct pebs_record_nhm *)(unsigned long)ds->pebs_buffer_base;
>  	top = (struct pebs_record_nhm *)(unsigned long)ds->pebs_index;
>  
>  	ds->pebs_index = ds->pebs_buffer_base;
>  
> +	if (unlikely(base >= top))
>  		return;
>  
> +	for (at = base; at < top; at += x86_pmu.pebs_record_size) {
>  		struct pebs_record_nhm *p = at;
>  
> +		bit = find_first_bit((unsigned long *)&p->status,
> +					x86_pmu.max_pebs_events);
> +		if (bit >= x86_pmu.max_pebs_events)
> +			continue;
> +		/*
> +		 * The PEBS hardware does not deal well with collisions,
> +		 * when the same event happens near to each other. The
> +		 * records for the events can be collapsed into a single
> +		 * one, and it's not possible to reconstruct all events
> +		 * that caused the PEBS record. However in practice, the
> +		 * collisions are extremely rare. If collision happened,
> +		 * we drop the record. its the safest choice.
> +		 */
> +		if (p->status != (1 << bit))
> +			continue;

As per the above, this is buggy. You should start by masking p->status
with x86_pmu.pebs_active to clear all !PEBS counter bits.

> +		if (!test_bit(bit, cpuc->active_mask))
> +			continue;
> +		event = cpuc->events[bit];
> +		WARN_ON_ONCE(!event);
> +		if (!event->attr.precise_ip)
> +			continue;
> +		counts[bit]++;
> +	}
>  
> +	for (bit = 0; bit < x86_pmu.max_pebs_events; bit++) {
> +		if (counts[bit] == 0)
>  			continue;
> +		event = cpuc->events[bit];
> +		for (at = base; at < top; at += x86_pmu.pebs_record_size) {
> +			struct pebs_record_nhm *p = at;
>  
> +			if (p->status == (1 << bit))
> +				break;
> +		}
> +		__intel_pmu_pebs_event(event, iregs, at, top, counts[bit]);
>  	}
>  }

next prev parent reply	other threads:[~2015-03-30 13:45 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-23 14:25 [PATCH V5 0/6] large PEBS interrupt threshold Kan Liang
2015-02-23 14:25 ` [PATCH V5 1/6] perf, x86: use the PEBS auto reload mechanism when possible Kan Liang
2015-03-30 12:06   ` Peter Zijlstra
2015-03-30 14:02     ` Peter Zijlstra
2015-02-23 14:25 ` [PATCH V5 2/6] perf, x86: introduce setup_pebs_sample_data() Kan Liang
2015-02-23 14:25 ` [PATCH V5 3/6] perf, x86: large PEBS interrupt threshold Kan Liang
2015-03-02 17:08   ` Stephane Eranian
2015-03-02 17:59     ` Andi Kleen
2015-03-02 18:07       ` Stephane Eranian
2015-03-30 13:54   ` Peter Zijlstra
2015-02-23 14:25 ` [PATCH V5 4/6] perf, x86: handle multiple records in PEBS buffer Kan Liang
2015-03-30 13:45   ` Peter Zijlstra [this message]
2015-03-30 17:19     ` Liang, Kan
2015-03-30 17:25       ` Andi Kleen
2015-03-30 17:43         ` Liang, Kan
2015-03-30 17:45           ` Andi Kleen
2015-03-30 20:07           ` Peter Zijlstra
2015-03-30 20:11             ` Andi Kleen
2015-03-30 21:24               ` Peter Zijlstra
2015-03-30 21:53                 ` Andi Kleen
2015-02-23 14:25 ` [PATCH V5 5/6] perf, x86: drain PEBS buffer during context switch Kan Liang
2015-03-30 13:50   ` Peter Zijlstra
2015-02-23 14:25 ` [PATCH V5 6/6] perf, x86: enlarge PEBS buffer Kan Liang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150330134531.GV23123@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=acme@infradead.org \
    --cc=andi@firstfloor.org \
    --cc=eranian@google.com \
    --cc=kan.liang@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox