public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: kan.liang@linux.intel.com
To: peterz@infradead.org, mingo@redhat.com, linux-kernel@vger.kernel.org
Cc: acme@kernel.org, tglx@linutronix.de, jolsa@redhat.com,
	eranian@google.com, ak@linux.intel.com,
	Kan Liang <kan.liang@linux.intel.com>
Subject: [PATCH V3 1/5] perf/x86/intel: fix event update for auto-reload
Date: Mon, 29 Jan 2018 08:29:29 -0800	[thread overview]
Message-ID: <1517243373-355481-2-git-send-email-kan.liang@linux.intel.com> (raw)
In-Reply-To: <1517243373-355481-1-git-send-email-kan.liang@linux.intel.com>

From: Kan Liang <kan.liang@linux.intel.com>

There is a bug when mmap read event->count with large PEBS enabled.
Here is an example.
 #./read_count
 0x71f0
 0x122c0
 0x1000000001c54
 0x100000001257d
 0x200000000bdc5

In fixed period mode, the auto-reload mechanism could be enabled for
PEBS events. But the calculation of event->count does not take the
auto-reload values into account. Anyone who read the event->count will
get wrong result, e.g x86_pmu_read. The calculation of hwc->period_left
is wrong either, which will impact the calculation of the period for
the first record in PEBS multiple records.

The issue was introduced with the auto-reload mechanism enabled since
commit 851559e35fd5 ("perf/x86/intel: Use the PEBS auto reload mechanism
when possible")

Introduce intel_pmu_save_and_restart_reload to calculate the event->count
only for auto-reload.

There is a small gap between 'PEBS hardware is armed' and 'the NMI is
handled'. Because of the gap, the first record also needs to be specially
handled.
The formula to calculate the increments of event->count is as below.
The increments = the period for first record +
                 (reload_times - 1) * reload_val +
                 the gap
- The 'the period for first record' is the period left from last PMI,
  which can be got from the previous event value.
- For the second and later records, the period is exactly the reload
  value. Just need to simply add (reload_times - 1) * reload_val
- Because of the auto-reload, the start point of counting is alwyas
  (-reload_val). So the calculation of 'the gap' needs to be corrected by
  adding reload_val.
  The period_left needs to do the same adjustment as well.

There is nothing need to do in x86_perf_event_set_period(). Because it
is fixed period. The period_left is already adjusted.

Fixes: 851559e35fd5 ("perf/x86/intel: Use the PEBS auto reload mechanism
when possible")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/intel/ds.c | 69 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 67 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 8156e47..6533426 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1303,17 +1303,82 @@ get_next_pebs_record_by_bit(void *base, void *top, int bit)
 	return NULL;
 }
 
+/*
+ * Specific intel_pmu_save_and_restart() for auto-reload.
+ * It only be called from drain_pebs().
+ */
+static int intel_pmu_save_and_restart_reload(struct perf_event *event,
+					     int reload_times)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	int shift = 64 - x86_pmu.cntval_bits;
+	u64 reload_val = hwc->sample_period;
+	u64 prev_raw_count, new_raw_count;
+	u64 delta;
+
+	WARN_ON((reload_times == 0) || (reload_val == 0));
+
+	/*
+	 * drain_pebs() only happens when the PMU is disabled.
+	 * It doesnot need to specially handle the previous event value.
+	 * The hwc->prev_count will be updated in x86_perf_event_set_period().
+	 */
+	prev_raw_count = local64_read(&hwc->prev_count);
+	rdpmcl(hwc->event_base_rdpmc, new_raw_count);
+
+	/*
+	 * Now we have the new raw value and have updated the prev
+	 * timestamp already. We can now calculate the elapsed delta
+	 * (event-)time and add that to the generic event.
+	 *
+	 * Careful, not all hw sign-extends above the physical width
+	 * of the count.
+	 *
+	 * There is a small gap between 'PEBS hardware is armed' and 'the NMI
+	 * is handled'. Because of the gap, the first record also needs to be
+	 * specially handled.
+	 * The formula to calculate the increments of event->count is as below.
+	 * The increments = the period for first record +
+	 *                  (reload_times - 1) * reload_val +
+	 *                  the gap
+	 * 'The period for first record' can be got from -prev_raw_count.
+	 *
+	 * 'The gap' = new_raw_count + reload_val. Because the start point of
+	 * counting is always -reload_val for auto-reload.
+	 *
+	 * The period_left needs to do the same adjustment by adding
+	 * reload_val.
+	 */
+	delta = (reload_val << shift) + (new_raw_count << shift) -
+		(prev_raw_count << shift);
+	delta >>= shift;
+
+	local64_add(reload_val * (reload_times - 1), &event->count);
+	local64_add(delta, &event->count);
+	local64_sub(delta, &hwc->period_left);
+
+	return x86_perf_event_set_period(event);
+}
+
 static void __intel_pmu_pebs_event(struct perf_event *event,
 				   struct pt_regs *iregs,
 				   void *base, void *top,
 				   int bit, int count)
 {
+	struct hw_perf_event *hwc = &event->hw;
 	struct perf_sample_data data;
 	struct pt_regs regs;
 	void *at = get_next_pebs_record_by_bit(base, top, bit);
 
-	if (!intel_pmu_save_and_restart(event) &&
-	    !(event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD))
+	if (hwc->flags & PERF_X86_EVENT_AUTO_RELOAD) {
+		/*
+		 * Now, auto-reload is only enabled in fixed period mode.
+		 * The reload value is always hwc->sample_period.
+		 * May need to change it, if auto-reload is enabled in
+		 * freq mode later.
+		 */
+		intel_pmu_save_and_restart_reload(event, count);
+	} else if (!intel_pmu_save_and_restart(event))
 		return;
 
 	while (count > 1) {
-- 
2.7.4

  reply	other threads:[~2018-01-29 16:30 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-29 16:29 [PATCH V3 0/5] bugs fix for large PEBS mmap read and rdpmc read kan.liang
2018-01-29 16:29 ` kan.liang [this message]
2018-02-06 15:06   ` [PATCH V3 1/5] perf/x86/intel: fix event update for auto-reload Peter Zijlstra
2018-02-06 17:58     ` Liang, Kan
2018-02-09 14:09       ` Peter Zijlstra
2018-02-09 15:49         ` Liang, Kan
2018-02-10 14:09           ` Peter Zijlstra
2018-01-29 16:29 ` [PATCH V3 2/5] perf/x86: introduce read function for x86_pmu kan.liang
2018-01-29 16:29 ` [PATCH V3 3/5] perf/x86/intel/ds: introduce read function for large pebs kan.liang
2018-01-29 16:29 ` [PATCH V3 4/5] perf/x86/intel: fix pmu read for large PEBS kan.liang
2018-01-29 16:29 ` [PATCH V3 5/5] perf/x86: fix: disable userspace RDPMC usage " kan.liang
2018-01-30  9:16 ` [PATCH V3 0/5] bugs fix for large PEBS mmap read and rdpmc read Stephane Eranian
2018-01-30 13:39   ` Jiri Olsa
2018-01-30 14:59     ` Liang, Kan
2018-01-30 15:04       ` Jiri Olsa
2018-01-30 15:25         ` Liang, Kan
2018-01-30 16:36           ` Stephane Eranian
2018-01-30 16:48             ` Liang, Kan
2018-01-30 18:52               ` Jiri Olsa
2018-01-30 19:56                 ` Stephane Eranian
2018-01-31  3:59                   ` Andi Kleen
2018-01-31  9:15                     ` Jiri Olsa
2018-01-31 13:15                       ` Jiri Olsa
2018-01-31 15:15                         ` Liang, Kan
2018-01-31 15:41                           ` Jiri Olsa
2018-01-30 14:41   ` Liang, Kan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1517243373-355481-2-git-send-email-kan.liang@linux.intel.com \
    --to=kan.liang@linux.intel.com \
    --cc=acme@kernel.org \
    --cc=ak@linux.intel.com \
    --cc=eranian@google.com \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox