public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Vince Weaver <vince@deater.net>
Cc: mingo@kernel.org, alexander.shishkin@linux.intel.com,
	eranian@google.com, linux-kernel@vger.kernel.org,
	dvyukov@google.com, andi@firstfloor.org, jolsa@redhat.com,
	panand@redhat.com, sasha.levin@oracle.com, oleg@redhat.com,
	Borislav Petkov <bp@alien8.de>
Subject: Re: [PATCH 00/12] perf: more fixes
Date: Wed, 16 Mar 2016 23:59:33 +0100	[thread overview]
Message-ID: <20160316225933.GS6375@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20160315153830.GA6356@twins.programming.kicks-ass.net>

On Tue, Mar 15, 2016 at 04:38:30PM +0100, Peter Zijlstra wrote:

> Running perf_fuzzer on that AMD box is still producing lots of fail, I
> seen long strings of dazed and confused msgs, indicating we have a
> 'spurious' NMI problem somewhere.
> 
> And occasionally it locks up..
> 
> So we're not there yet.

So the below appears to alleviate some of this; but the hangs are
quicker now, so maybe I just made it worse.

---
Subject: perf, ibs: Fix race with IBS_STARTING state
From: Peter Zijlstra <peterz@infradead.org>
Date: Wed Mar 16 23:55:21 CET 2016

While tracing the IBS bits I saw the NMI hitting between clearing
IBS_STARTING and the actual MSR writes to disable the counter.

Since IBS_STARTING was cleared, the handler assumed these were spurious
NMIs and because STOPPING wasn't set yet either, insta-triggered an
"Unknown NMI".

Cure this by clearing IBS_STARTING after disabling the hardware.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/events/amd/ibs.c |   32 +++++++++++++++++++++++++++++---
 1 file changed, 29 insertions(+), 3 deletions(-)

--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -376,7 +376,13 @@ static void perf_ibs_start(struct perf_e
 	hwc->state = 0;
 
 	perf_ibs_set_period(perf_ibs, hwc, &period);
+	/*
+	 * Set STARTED before enabling the hardware, such that
+	 * a subsequent NMI must observe it. Then clear STOPPING
+	 * such that we don't consume NMIs by accident.
+	 */
 	set_bit(IBS_STARTED, pcpu->state);
+	clear_bit(IBS_STOPPING, pcpu->state);
 	perf_ibs_enable_event(perf_ibs, hwc, period >> 4);
 
 	perf_event_update_userpage(event);
@@ -390,7 +396,7 @@ static void perf_ibs_stop(struct perf_ev
 	u64 config;
 	int stopping;
 
-	stopping = test_and_clear_bit(IBS_STARTED, pcpu->state);
+	stopping = test_bit(IBS_STARTED, pcpu->state);
 
 	if (!stopping && (hwc->state & PERF_HES_UPTODATE))
 		return;
@@ -398,8 +404,24 @@ static void perf_ibs_stop(struct perf_ev
 	rdmsrl(hwc->config_base, config);
 
 	if (stopping) {
+		/*
+		 * Set STOPPING before disabling the hardware, such that it
+		 * must be visible to NMIs the moment we clear the EN bit,
+		 * at which point we can generate an !VALID sample which
+		 * we need to consume.
+		 */
 		set_bit(IBS_STOPPING, pcpu->state);
 		perf_ibs_disable_event(perf_ibs, hwc, config);
+		/*
+		 * Clear STARTED after disabling the hardware; if it were
+		 * cleared before an NMI hitting after the clear but before
+		 * clearing the EN bit might think it a spurious NMI and not
+		 * handle it.
+		 *
+		 * Clearing it after, however, creates the problem of the NMI
+		 * handler seeing STARTED but not having a valid sample.
+		 */
+		clear_bit(IBS_STARTED, pcpu->state);
 		WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);
 		hwc->state |= PERF_HES_STOPPED;
 	}
@@ -527,20 +549,24 @@ static int perf_ibs_handle_irq(struct pe
 	u64 *buf, *config, period;
 
 	if (!test_bit(IBS_STARTED, pcpu->state)) {
+fail:
 		/*
 		 * Catch spurious interrupts after stopping IBS: After
 		 * disabling IBS there could be still incoming NMIs
 		 * with samples that even have the valid bit cleared.
 		 * Mark all this NMIs as handled.
 		 */
-		return test_and_clear_bit(IBS_STOPPING, pcpu->state) ? 1 : 0;
+		if (test_and_clear_bit(IBS_STOPPING, pcpu->state))
+			return 1;
+
+		return 0;
 	}
 
 	msr = hwc->config_base;
 	buf = ibs_data.regs;
 	rdmsrl(msr, *buf);
 	if (!(*buf++ & perf_ibs->valid_mask))
-		return 0;
+		goto fail;
 
 	config = &ibs_data.regs[0];
 	perf_ibs_event_update(perf_ibs, event, config);

  reply	other threads:[~2016-03-16 22:59 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-24 17:45 [PATCH 00/12] perf: more fixes Peter Zijlstra
2016-02-24 17:45 ` [PATCH 01/12] perf: Close install vs exit race Peter Zijlstra
2016-02-25  8:03   ` [tip:perf/urgent] perf: Close install vs. " tip-bot for Peter Zijlstra
2016-02-24 17:45 ` [PATCH 02/12] perf: Do not double free Peter Zijlstra
2016-02-25  8:04   ` [tip:perf/urgent] " tip-bot for Peter Zijlstra
2016-02-24 17:45 ` [PATCH 03/12] perf: Allow perf_release() with !event->ctx Peter Zijlstra
2016-02-25  8:04   ` [tip:perf/urgent] " tip-bot for Peter Zijlstra
2016-02-24 17:45 ` [PATCH 04/12] perf: Only update context time when active Peter Zijlstra
2016-02-25  8:04   ` [tip:perf/urgent] " tip-bot for Peter Zijlstra
2016-02-24 17:45 ` [PATCH 05/12] perf: Fix cloning Peter Zijlstra
2016-02-25  8:05   ` [tip:perf/urgent] " tip-bot for Peter Zijlstra
2016-02-24 17:45 ` [PATCH 06/12] perf: Fix race between event install and jump_labels Peter Zijlstra
2016-02-25  8:05   ` [tip:perf/urgent] " tip-bot for Peter Zijlstra
2016-02-24 17:45 ` [PATCH 07/12] perf: Cure event->pending_disable race Peter Zijlstra
2016-02-25  8:06   ` [tip:perf/urgent] " tip-bot for Peter Zijlstra
2016-02-24 17:45 ` [PATCH 08/12] perf: Introduce EVENT_TIME Peter Zijlstra
2016-02-25  8:06   ` [tip:perf/urgent] perf: Fix ctx time tracking by introducing EVENT_TIME tip-bot for Peter Zijlstra
2016-02-24 17:45 ` [PATCH 09/12] perf: Fix scaling vs enable_on_exec Peter Zijlstra
2016-02-25  8:07   ` [tip:perf/urgent] perf: Fix scaling vs. perf_event_enable_on_exec() tip-bot for Peter Zijlstra
2016-02-24 17:45 ` [PATCH 10/12] perf: Fix scaling vs perf_event_enable Peter Zijlstra
2016-02-25  8:07   ` [tip:perf/urgent] perf: Fix scaling vs. perf_event_enable() tip-bot for Peter Zijlstra
2016-02-24 17:45 ` [PATCH 11/12] perf: Fix scaling vs perf_install_in_context Peter Zijlstra
2016-02-25  8:07   ` [tip:perf/urgent] perf: Fix scaling vs. perf_install_in_context() tip-bot for Peter Zijlstra
2016-02-24 17:45 ` [PATCH 12/12] perf: Robustify task_function_call() Peter Zijlstra
2016-02-25  8:08   ` [tip:perf/urgent] " tip-bot for Peter Zijlstra
2016-03-10 14:39 ` [PATCH 00/12] perf: more fixes Peter Zijlstra
2016-03-10 14:44   ` Vince Weaver
2016-03-11 14:23     ` Peter Zijlstra
2016-03-11 15:41       ` Borislav Petkov
2016-03-21  9:49       ` [tip:perf/urgent] perf/x86/ibs: Fix IBS throttle tip-bot for Peter Zijlstra
2016-03-15 15:38     ` [PATCH 00/12] perf: more fixes Peter Zijlstra
2016-03-16 22:59       ` Peter Zijlstra [this message]
2016-03-16 23:10         ` Peter Zijlstra
2016-03-17 11:54         ` Borislav Petkov
2016-03-11 10:12   ` Alexander Shishkin
2016-03-21  9:48   ` [tip:perf/urgent] perf/core: Fix the unthrottle logic tip-bot for Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160316225933.GS6375@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=andi@firstfloor.org \
    --cc=bp@alien8.de \
    --cc=dvyukov@google.com \
    --cc=eranian@google.com \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=panand@redhat.com \
    --cc=sasha.levin@oracle.com \
    --cc=vince@deater.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox