public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Lendacky, Thomas" <Thomas.Lendacky@amd.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: "x86@kernel.org" <x86@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Namhyung Kim <namhyung@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Jiri Olsa <jolsa@redhat.com>
Subject: Re: [RFC PATCH 1/2] x86/perf/amd: Resolve race condition when disabling PMC
Date: Fri, 15 Mar 2019 14:06:52 +0000	[thread overview]
Message-ID: <41b30e9d-dcfe-ccae-8cd4-e88df8e4cb9c@amd.com> (raw)
In-Reply-To: <20190315105004.GW5996@hirez.programming.kicks-ass.net>

On 3/15/19 5:50 AM, Peter Zijlstra wrote:
> On Mon, Mar 11, 2019 at 04:48:44PM +0000, Lendacky, Thomas wrote:
>> On AMD processors, the detection of an overflowed counter in the NMI
>> handler relies on the current value of the counter. So, for example, to
>> check for overflow on a 48 bit counter, bit 47 is checked to see if it
>> is 1 (not overflowed) or 0 (overflowed).
>>
>> There is currently a race condition present when disabling and then
>> updating the PMC. Increased NMI latency in newer AMD processors makes this
>> race condition more pronounced.
> 
> Increased NMI latency also makes the results less useful :/ What amount
> of skid are we talking about, and is there anything AMD is going to do
> about this?

I haven't looked into the amount of skid, but, yes, the hardware team is
looking at this.

> 
>> If the counter value has overflowed, it is
>> possible to update the PMC value before the NMI handler can run.
> 
> Arguably the WRMSR should sync against the PMI. That is the beahviour
> one would expect.
> 
> Isn't that something you can fix in ucode? And could you very please
> tell the hardware people this is disguisting?

Currently, there's nothing they've found that can be done in ucode for
this. But, yes, they know it's a problem and they're looking at what they
can do.

> 
>> The updated PMC value is not an overflowed value, so when the perf NMI
>> handler does run, it will not find an overflowed counter. This may
>> appear as an unknown NMI resulting in either a panic or a series of
>> messages, depending on how the kernel is configured.
>>
>> To eliminate this race condition, the PMC value must be checked after
>> disabling the counter in x86_pmu_disable_all(), and, if overflowed, must
>> wait for the NMI handler to reset the value before continuing. Add a new,
>> optional, callable function that can be used to test for and resolve this
>> condition.
>>
>> Cc: <stable@vger.kernel.org> # 4.14.x-
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> 
>> +static void amd_pmu_wait_on_overflow(int idx, u64 config)
>> +{
>> +	unsigned int i;
>> +	u64 counter;
>> +
>> +	/*
>> +	 * We shouldn't be calling this from NMI context, but add a safeguard
>> +	 * here to return, since if we're in NMI context we can't wait for an
>> +	 * NMI to reset an overflowed counter value.
>> +	 */
>> +	if (in_nmi())
>> +		return;
>> +
>> +	/*
>> +	 * If the interrupt isn't enabled then we won't get the NMI that will
>> +	 * reset the overflow condition, so return.
>> +	 */
>> +	if (!(config & ARCH_PERFMON_EVENTSEL_INT))
>> +		return;
>> +
>> +	/*
>> +	 * Wait for the counter to be reset if it has overflowed. This loop
>> +	 * should exit very, very quickly, but just in case, don't wait
>> +	 * forever...
>> +	 */
>> +	for (i = 0; i < OVERFLOW_WAIT_COUNT; i++) {
>> +		rdmsrl(x86_pmu_event_addr(idx), counter);
>> +		if (counter & (1ULL << (x86_pmu.cntval_bits - 1)))
>> +			break;
>> +
>> +		/* Might be in IRQ context, so can't sleep */
>> +		udelay(1);
>> +	}
>> +}
> 
> Argh.. that's horrible, as I'm sure you fully appreciate :/

Yeah...

> 
>> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
>> index b684f0294f35..f1d2f70000cd 100644
>> --- a/arch/x86/events/core.c
>> +++ b/arch/x86/events/core.c
>> @@ -606,6 +606,9 @@ void x86_pmu_disable_all(void)
>>   			continue;
>>   		val &= ~ARCH_PERFMON_EVENTSEL_ENABLE;
>>   		wrmsrl(x86_pmu_config_addr(idx), val);
>> +
>> +		if (x86_pmu.wait_on_overflow)
>> +			x86_pmu.wait_on_overflow(idx, val);
>>   	}
>>   }
> 
> One alternative is adding amd_pmu_disable_all() to amd/core.c and using
> that. Then you can also change the loop to do the wait after all the
> WRMSRs, if that helps with latency.

I thought about that for both this and the next patch. But since it would
be duplicating all the code I went with the added callbacks. It might be
worth it for this patch to have an AMD disable_all callback since it's
not a lot of code to duplicate compared to handle_irq and I like the idea
of doing the overflow check after all the WRMSRs.

Thanks for the review, Peter.

Tom

> 

  reply	other threads:[~2019-03-15 14:06 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-11 16:48 [RFC PATCH 0/2] x86/perf/amd: AMD PMC counters and NMI latency Lendacky, Thomas
2019-03-11 16:48 ` [RFC PATCH 1/2] x86/perf/amd: Resolve race condition when disabling PMC Lendacky, Thomas
2019-03-15 10:50   ` Peter Zijlstra
2019-03-15 14:06     ` Lendacky, Thomas [this message]
2019-03-11 16:48 ` [RFC PATCH 2/2] x86/perf/amd: Resolve NMI latency issues when multiple PMCs are active Lendacky, Thomas
2019-03-15 12:03   ` Peter Zijlstra
2019-03-15 14:44     ` Lendacky, Thomas
2019-03-15 15:11       ` Peter Zijlstra
2019-03-15 15:50         ` Lendacky, Thomas
2019-03-15 17:47           ` Lendacky, Thomas
2019-03-18  9:47     ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=41b30e9d-dcfe-ccae-8cd4-e88df8e4cb9c@amd.com \
    --to=thomas.lendacky@amd.com \
    --cc=acme@kernel.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox