From: Cyrill Gorcunov <gorcunov@gmail.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: "Lendacky, Thomas" <Thomas.Lendacky@amd.com>,
"x86@kernel.org" <x86@kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Namhyung Kim <namhyung@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Jiri Olsa <jolsa@redhat.com>, Vince Weaver <vince@deater.net>,
Stephane Eranian <eranian@google.com>
Subject: Re: [RFC PATCH v3 0/3] x86/perf/amd: AMD PMC counters and NMI latency
Date: Tue, 2 Apr 2019 16:22:00 +0300 [thread overview]
Message-ID: <20190402132200.GA23501@uranus> (raw)
In-Reply-To: <20190402130302.GL12232@hirez.programming.kicks-ass.net>
On Tue, Apr 02, 2019 at 03:03:02PM +0200, Peter Zijlstra wrote:
> On Mon, Apr 01, 2019 at 09:46:33PM +0000, Lendacky, Thomas wrote:
> > This patch series addresses issues with increased NMI latency in newer
> > AMD processors that can result in unknown NMI messages when PMC counters
> > are active.
> >
> > The following fixes are included in this series:
> >
> > - Resolve a race condition when disabling an overflowed PMC counter,
> > specifically when updating the PMC counter with a new value.
> > - Resolve handling of active PMC counter overflows in the perf NMI
> > handler and when to report that the NMI is not related to a PMC.
> > - Remove earlier workaround for spurious NMIs by re-ordering the
> > PMC stop sequence to disable the PMC first and then remove the PMC
> > bit from the active_mask bitmap. As part of disabling the PMC, the
> > code will wait for an overflow to be reset.
> >
> > The last patch re-works the order of when the PMC is removed from the
> > active_mask. There was a comment from a long time ago about having
> > to clear the bit in active_mask before disabling the counter because
> > the perf NMI handler could re-enable the PMC again. Looking at the
> > handler today, I don't see that as possible, hence the reordering. The
> > question will be whether the Intel PMC support will now have issues.
> > There is still support for using x86_pmu_handle_irq() in the Intel
> > core.c file. Did Intel have any issues with spurious NMIs in the past?
> > Peter Z, any thoughts on this?
>
> I can't remember :/ I suppose we'll see if anything pops up after these
> here patches. At least then we get a chance to properly document things.
>
> > Also, I couldn't completely get rid of the "running" bit because it
> > is used by arch/x86/events/intel/p4.c. An old commit comment that
> > seems to indicate the p4 code suffered the spurious interrupts:
> > 03e22198d237 ("perf, x86: Handle in flight NMIs on P4 platform").
> > So maybe that partially answers my previous question...
>
> Yeah, the P4 code is magic, and I don't have any such machines left, nor
> do I think does Cyrill who wrote much of that.
It was so long ago :) What I remember from the head is some of the counters
were borken on hardware level so that I had to use only one counter instead
of two present in the system. And there were spurious NMIs too. I think
we can move this "running" bit to per-cpu base declared inside p4 code
only, so get rid of it from cpu_hw_events?
> I have vague memories of the P4 thing crashing with Vince's perf_fuzzer,
> but maybe I'm wrong.
No, you're correct. p4 was crashing many times before we manage to make
it more-less stable. The main problem though that to find working p4 box
is really a problem.
> Ideally we'd find a willing victim to maintain that thing, or possibly
> just delete it, dunno if anybody still cares.
As to me, I would rather mark this p4pmu code as deprecated, until there
is *real* need for its support.
>
> Anyway, I like these patches, but I cannot apply since you send them
> base64 encoded and my script chokes on that.
next prev parent reply other threads:[~2019-04-02 13:22 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-01 21:46 [RFC PATCH v3 0/3] x86/perf/amd: AMD PMC counters and NMI latency Lendacky, Thomas
2019-04-01 21:46 ` [RFC PATCH v3 1/3] x86/perf/amd: Resolve race condition when disabling PMC Lendacky, Thomas
2019-04-01 21:46 ` [RFC PATCH v3 2/3] x86/perf/amd: Resolve NMI latency issues for active PMCs Lendacky, Thomas
2019-04-01 21:46 ` [RFC PATCH v3 3/3] x86/perf/amd: Remove need to check "running" bit in NMI handler Lendacky, Thomas
2019-04-02 13:03 ` [RFC PATCH v3 0/3] x86/perf/amd: AMD PMC counters and NMI latency Peter Zijlstra
2019-04-02 13:09 ` Lendacky, Thomas
2019-04-02 13:22 ` Cyrill Gorcunov [this message]
2019-04-02 14:53 ` Vince Weaver
2019-04-02 15:09 ` Cyrill Gorcunov
2019-04-02 21:13 ` Vince Weaver
2019-04-02 21:31 ` Cyrill Gorcunov
2019-04-03 14:15 ` Vince Weaver
2019-04-03 14:27 ` Cyrill Gorcunov
2019-04-03 15:00 ` Cyrill Gorcunov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190402132200.GA23501@uranus \
--to=gorcunov@gmail.com \
--cc=Thomas.Lendacky@amd.com \
--cc=acme@kernel.org \
--cc=alexander.shishkin@linux.intel.com \
--cc=bp@alien8.de \
--cc=eranian@google.com \
--cc=jolsa@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=vince@deater.net \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.