* [RFC PATCH 0/2] x86/perf/amd: AMD PMC counters and NMI latency
@ 2019-03-11 16:48 Lendacky, Thomas
2019-03-11 16:48 ` [RFC PATCH 1/2] x86/perf/amd: Resolve race condition when disabling PMC Lendacky, Thomas
2019-03-11 16:48 ` [RFC PATCH 2/2] x86/perf/amd: Resolve NMI latency issues when multiple PMCs are active Lendacky, Thomas
0 siblings, 2 replies; 11+ messages in thread
From: Lendacky, Thomas @ 2019-03-11 16:48 UTC (permalink / raw)
To: x86@kernel.org, linux-kernel@vger.kernel.org
Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Alexander Shishkin,
Ingo Molnar, Borislav Petkov, Namhyung Kim, Thomas Gleixner,
Jiri Olsa
This patch series addresses issues with increased NMI latency in newer
AMD processors that can result in unknown NMI messages when PMC counters
are active.
The following fixes are included in this series:
- Resolve a race condition when disabling an overflowed PMC counter,
specifically when updating the PMC counter with a new value.
- Resolve handling of multiple active PMC counter overflows in the perf
NMI handler and when to report that the NMI is not related to a PMC.
I'm looking for feedback on the direction used in these two patches. In
the first patch, the reset check loop typically only runs one iteration,
and very rarely ever ran 3 or 4 times. I also looked at an alternative to
the first patch where I set a per-CPU, per-PMC flag that can be checked
in the NMI handler when it is found that the PMC hasn't overflowed. That
would mean the sample would be lost, though (on average I was seeing about
a 0.20% sample loss that way).
---
This patch series is based off of the perf/core branch of tip:
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/core
Commit c978b9460fe1 ("Merge tag 'perf-core-for-mingo-5.1-20190225' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core")
^ permalink raw reply [flat|nested] 11+ messages in thread* [RFC PATCH 1/2] x86/perf/amd: Resolve race condition when disabling PMC 2019-03-11 16:48 [RFC PATCH 0/2] x86/perf/amd: AMD PMC counters and NMI latency Lendacky, Thomas @ 2019-03-11 16:48 ` Lendacky, Thomas 2019-03-15 10:50 ` Peter Zijlstra 2019-03-11 16:48 ` [RFC PATCH 2/2] x86/perf/amd: Resolve NMI latency issues when multiple PMCs are active Lendacky, Thomas 1 sibling, 1 reply; 11+ messages in thread From: Lendacky, Thomas @ 2019-03-11 16:48 UTC (permalink / raw) To: x86@kernel.org, linux-kernel@vger.kernel.org Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Alexander Shishkin, Ingo Molnar, Borislav Petkov, Namhyung Kim, Thomas Gleixner, Jiri Olsa On AMD processors, the detection of an overflowed counter in the NMI handler relies on the current value of the counter. So, for example, to check for overflow on a 48 bit counter, bit 47 is checked to see if it is 1 (not overflowed) or 0 (overflowed). There is currently a race condition present when disabling and then updating the PMC. Increased NMI latency in newer AMD processors makes this race condition more pronounced. If the counter value has overflowed, it is possible to update the PMC value before the NMI handler can run. The updated PMC value is not an overflowed value, so when the perf NMI handler does run, it will not find an overflowed counter. This may appear as an unknown NMI resulting in either a panic or a series of messages, depending on how the kernel is configured. To eliminate this race condition, the PMC value must be checked after disabling the counter in x86_pmu_disable_all(), and, if overflowed, must wait for the NMI handler to reset the value before continuing. Add a new, optional, callable function that can be used to test for and resolve this condition. Cc: <stable@vger.kernel.org> # 4.14.x- Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> --- arch/x86/events/amd/core.c | 39 +++++++++++++++++++++++++++++++++++++++ arch/x86/events/core.c | 3 +++ arch/x86/events/perf_event.h | 1 + 3 files changed, 43 insertions(+) diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c index 7d2d7c801dba..d989640fa87d 100644 --- a/arch/x86/events/amd/core.c +++ b/arch/x86/events/amd/core.c @@ -3,6 +3,7 @@ #include <linux/types.h> #include <linux/init.h> #include <linux/slab.h> +#include <linux/delay.h> #include <asm/apicdef.h> #include "../perf_event.h" @@ -429,6 +430,43 @@ static void amd_pmu_cpu_dead(int cpu) } } +#define OVERFLOW_WAIT_COUNT 50 + +static void amd_pmu_wait_on_overflow(int idx, u64 config) +{ + unsigned int i; + u64 counter; + + /* + * We shouldn't be calling this from NMI context, but add a safeguard + * here to return, since if we're in NMI context we can't wait for an + * NMI to reset an overflowed counter value. + */ + if (in_nmi()) + return; + + /* + * If the interrupt isn't enabled then we won't get the NMI that will + * reset the overflow condition, so return. + */ + if (!(config & ARCH_PERFMON_EVENTSEL_INT)) + return; + + /* + * Wait for the counter to be reset if it has overflowed. This loop + * should exit very, very quickly, but just in case, don't wait + * forever... + */ + for (i = 0; i < OVERFLOW_WAIT_COUNT; i++) { + rdmsrl(x86_pmu_event_addr(idx), counter); + if (counter & (1ULL << (x86_pmu.cntval_bits - 1))) + break; + + /* Might be in IRQ context, so can't sleep */ + udelay(1); + } +} + static struct event_constraint * amd_get_event_constraints(struct cpu_hw_events *cpuc, int idx, struct perf_event *event) @@ -650,6 +688,7 @@ static __initconst const struct x86_pmu amd_pmu = { .cpu_dead = amd_pmu_cpu_dead, .amd_nb_constraints = 1, + .wait_on_overflow = amd_pmu_wait_on_overflow, }; static int __init amd_core_pmu_init(void) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index b684f0294f35..f1d2f70000cd 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -606,6 +606,9 @@ void x86_pmu_disable_all(void) continue; val &= ~ARCH_PERFMON_EVENTSEL_ENABLE; wrmsrl(x86_pmu_config_addr(idx), val); + + if (x86_pmu.wait_on_overflow) + x86_pmu.wait_on_overflow(idx, val); } } diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 7e75f474b076..a37490a26a09 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -636,6 +636,7 @@ struct x86_pmu { * AMD bits */ unsigned int amd_nb_constraints : 1; + void (*wait_on_overflow)(int idx, u64 config); /* * Extra registers for events ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 1/2] x86/perf/amd: Resolve race condition when disabling PMC 2019-03-11 16:48 ` [RFC PATCH 1/2] x86/perf/amd: Resolve race condition when disabling PMC Lendacky, Thomas @ 2019-03-15 10:50 ` Peter Zijlstra 2019-03-15 14:06 ` Lendacky, Thomas 0 siblings, 1 reply; 11+ messages in thread From: Peter Zijlstra @ 2019-03-15 10:50 UTC (permalink / raw) To: Lendacky, Thomas Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Arnaldo Carvalho de Melo, Alexander Shishkin, Ingo Molnar, Borislav Petkov, Namhyung Kim, Thomas Gleixner, Jiri Olsa On Mon, Mar 11, 2019 at 04:48:44PM +0000, Lendacky, Thomas wrote: > On AMD processors, the detection of an overflowed counter in the NMI > handler relies on the current value of the counter. So, for example, to > check for overflow on a 48 bit counter, bit 47 is checked to see if it > is 1 (not overflowed) or 0 (overflowed). > > There is currently a race condition present when disabling and then > updating the PMC. Increased NMI latency in newer AMD processors makes this > race condition more pronounced. Increased NMI latency also makes the results less useful :/ What amount of skid are we talking about, and is there anything AMD is going to do about this? > If the counter value has overflowed, it is > possible to update the PMC value before the NMI handler can run. Arguably the WRMSR should sync against the PMI. That is the beahviour one would expect. Isn't that something you can fix in ucode? And could you very please tell the hardware people this is disguisting? > The updated PMC value is not an overflowed value, so when the perf NMI > handler does run, it will not find an overflowed counter. This may > appear as an unknown NMI resulting in either a panic or a series of > messages, depending on how the kernel is configured. > > To eliminate this race condition, the PMC value must be checked after > disabling the counter in x86_pmu_disable_all(), and, if overflowed, must > wait for the NMI handler to reset the value before continuing. Add a new, > optional, callable function that can be used to test for and resolve this > condition. > > Cc: <stable@vger.kernel.org> # 4.14.x- > Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> > +static void amd_pmu_wait_on_overflow(int idx, u64 config) > +{ > + unsigned int i; > + u64 counter; > + > + /* > + * We shouldn't be calling this from NMI context, but add a safeguard > + * here to return, since if we're in NMI context we can't wait for an > + * NMI to reset an overflowed counter value. > + */ > + if (in_nmi()) > + return; > + > + /* > + * If the interrupt isn't enabled then we won't get the NMI that will > + * reset the overflow condition, so return. > + */ > + if (!(config & ARCH_PERFMON_EVENTSEL_INT)) > + return; > + > + /* > + * Wait for the counter to be reset if it has overflowed. This loop > + * should exit very, very quickly, but just in case, don't wait > + * forever... > + */ > + for (i = 0; i < OVERFLOW_WAIT_COUNT; i++) { > + rdmsrl(x86_pmu_event_addr(idx), counter); > + if (counter & (1ULL << (x86_pmu.cntval_bits - 1))) > + break; > + > + /* Might be in IRQ context, so can't sleep */ > + udelay(1); > + } > +} Argh.. that's horrible, as I'm sure you fully appreciate :/ > diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c > index b684f0294f35..f1d2f70000cd 100644 > --- a/arch/x86/events/core.c > +++ b/arch/x86/events/core.c > @@ -606,6 +606,9 @@ void x86_pmu_disable_all(void) > continue; > val &= ~ARCH_PERFMON_EVENTSEL_ENABLE; > wrmsrl(x86_pmu_config_addr(idx), val); > + > + if (x86_pmu.wait_on_overflow) > + x86_pmu.wait_on_overflow(idx, val); > } > } One alternative is adding amd_pmu_disable_all() to amd/core.c and using that. Then you can also change the loop to do the wait after all the WRMSRs, if that helps with latency. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 1/2] x86/perf/amd: Resolve race condition when disabling PMC 2019-03-15 10:50 ` Peter Zijlstra @ 2019-03-15 14:06 ` Lendacky, Thomas 0 siblings, 0 replies; 11+ messages in thread From: Lendacky, Thomas @ 2019-03-15 14:06 UTC (permalink / raw) To: Peter Zijlstra Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Arnaldo Carvalho de Melo, Alexander Shishkin, Ingo Molnar, Borislav Petkov, Namhyung Kim, Thomas Gleixner, Jiri Olsa On 3/15/19 5:50 AM, Peter Zijlstra wrote: > On Mon, Mar 11, 2019 at 04:48:44PM +0000, Lendacky, Thomas wrote: >> On AMD processors, the detection of an overflowed counter in the NMI >> handler relies on the current value of the counter. So, for example, to >> check for overflow on a 48 bit counter, bit 47 is checked to see if it >> is 1 (not overflowed) or 0 (overflowed). >> >> There is currently a race condition present when disabling and then >> updating the PMC. Increased NMI latency in newer AMD processors makes this >> race condition more pronounced. > > Increased NMI latency also makes the results less useful :/ What amount > of skid are we talking about, and is there anything AMD is going to do > about this? I haven't looked into the amount of skid, but, yes, the hardware team is looking at this. > >> If the counter value has overflowed, it is >> possible to update the PMC value before the NMI handler can run. > > Arguably the WRMSR should sync against the PMI. That is the beahviour > one would expect. > > Isn't that something you can fix in ucode? And could you very please > tell the hardware people this is disguisting? Currently, there's nothing they've found that can be done in ucode for this. But, yes, they know it's a problem and they're looking at what they can do. > >> The updated PMC value is not an overflowed value, so when the perf NMI >> handler does run, it will not find an overflowed counter. This may >> appear as an unknown NMI resulting in either a panic or a series of >> messages, depending on how the kernel is configured. >> >> To eliminate this race condition, the PMC value must be checked after >> disabling the counter in x86_pmu_disable_all(), and, if overflowed, must >> wait for the NMI handler to reset the value before continuing. Add a new, >> optional, callable function that can be used to test for and resolve this >> condition. >> >> Cc: <stable@vger.kernel.org> # 4.14.x- >> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> > >> +static void amd_pmu_wait_on_overflow(int idx, u64 config) >> +{ >> + unsigned int i; >> + u64 counter; >> + >> + /* >> + * We shouldn't be calling this from NMI context, but add a safeguard >> + * here to return, since if we're in NMI context we can't wait for an >> + * NMI to reset an overflowed counter value. >> + */ >> + if (in_nmi()) >> + return; >> + >> + /* >> + * If the interrupt isn't enabled then we won't get the NMI that will >> + * reset the overflow condition, so return. >> + */ >> + if (!(config & ARCH_PERFMON_EVENTSEL_INT)) >> + return; >> + >> + /* >> + * Wait for the counter to be reset if it has overflowed. This loop >> + * should exit very, very quickly, but just in case, don't wait >> + * forever... >> + */ >> + for (i = 0; i < OVERFLOW_WAIT_COUNT; i++) { >> + rdmsrl(x86_pmu_event_addr(idx), counter); >> + if (counter & (1ULL << (x86_pmu.cntval_bits - 1))) >> + break; >> + >> + /* Might be in IRQ context, so can't sleep */ >> + udelay(1); >> + } >> +} > > Argh.. that's horrible, as I'm sure you fully appreciate :/ Yeah... > >> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c >> index b684f0294f35..f1d2f70000cd 100644 >> --- a/arch/x86/events/core.c >> +++ b/arch/x86/events/core.c >> @@ -606,6 +606,9 @@ void x86_pmu_disable_all(void) >> continue; >> val &= ~ARCH_PERFMON_EVENTSEL_ENABLE; >> wrmsrl(x86_pmu_config_addr(idx), val); >> + >> + if (x86_pmu.wait_on_overflow) >> + x86_pmu.wait_on_overflow(idx, val); >> } >> } > > One alternative is adding amd_pmu_disable_all() to amd/core.c and using > that. Then you can also change the loop to do the wait after all the > WRMSRs, if that helps with latency. I thought about that for both this and the next patch. But since it would be duplicating all the code I went with the added callbacks. It might be worth it for this patch to have an AMD disable_all callback since it's not a lot of code to duplicate compared to handle_irq and I like the idea of doing the overflow check after all the WRMSRs. Thanks for the review, Peter. Tom > ^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC PATCH 2/2] x86/perf/amd: Resolve NMI latency issues when multiple PMCs are active 2019-03-11 16:48 [RFC PATCH 0/2] x86/perf/amd: AMD PMC counters and NMI latency Lendacky, Thomas 2019-03-11 16:48 ` [RFC PATCH 1/2] x86/perf/amd: Resolve race condition when disabling PMC Lendacky, Thomas @ 2019-03-11 16:48 ` Lendacky, Thomas 2019-03-15 12:03 ` Peter Zijlstra 1 sibling, 1 reply; 11+ messages in thread From: Lendacky, Thomas @ 2019-03-11 16:48 UTC (permalink / raw) To: x86@kernel.org, linux-kernel@vger.kernel.org Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Alexander Shishkin, Ingo Molnar, Borislav Petkov, Namhyung Kim, Thomas Gleixner, Jiri Olsa On AMD processors, the detection of an overflowed PMC counter in the NMI handler relies on the current value of the PMC. So, for example, to check for overflow on a 48-bit counter, bit 47 is checked to see if it is 1 (not overflowed) or 0 (overflowed). When the perf NMI handler executes it does not know in advance which PMC counters have overflowed. As such, the NMI handler will process all active PMC counters that have overflowed. NMI latency in newer AMD processors can result in multiple overflowed PMC counters being processed in one NMI and then a subsequent NMI, that does not appear to be a back-to-back NMI, not finding any PMC counters that have overflowed. This may appear to be an unhandled NMI resulting in either a panic or a series of messages, depending on how the kernel was configured. To mitigate this issue, a new, optional, callable function is introduced that is called before returning from x86_pmu_handle_irq(). The AMD perf support will use this function to indicate if the NMI has been handled or would have been handled had an earlier NMI not handled the overflowed PMC. Using a per-CPU variable, a minimum value of one less than the number of active PMCs or 2 will be set when any PMC overflow has been handled and more than one PMC is active. This is used to indicate the possible number of NMIs that can still occur. The value of 2 is used for when an NMI does not arrive at the APIC in time to be collapsed into an already pending NMI. Each time the function is called without having handled an overflowed counter, the per-CPU value is checked. If the value is non-zero, it is decremented and the NMI indicates that it handled the NMI. If the value is zero, then the NMI indicates that it did not handle the NMI. Cc: <stable@vger.kernel.org> # 4.14.x- Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> --- arch/x86/events/amd/core.c | 43 ++++++++++++++++++++++++++++++++++++++++++ arch/x86/events/core.c | 6 ++++++ arch/x86/events/perf_event.h | 2 ++ 3 files changed, 51 insertions(+) diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c index d989640fa87d..30457b511e6d 100644 --- a/arch/x86/events/amd/core.c +++ b/arch/x86/events/amd/core.c @@ -5,9 +5,12 @@ #include <linux/slab.h> #include <linux/delay.h> #include <asm/apicdef.h> +#include <asm/nmi.h> #include "../perf_event.h" +static DEFINE_PER_CPU(unsigned int, perf_nmi_counter); + static __initconst const u64 amd_hw_cache_event_ids [PERF_COUNT_HW_CACHE_MAX] [PERF_COUNT_HW_CACHE_OP_MAX] @@ -467,6 +470,45 @@ static void amd_pmu_wait_on_overflow(int idx, u64 config) } } +/* + * Because of NMI latency, if multiple PMC counters are active we need to take + * into account that multiple PMC overflows can generate multiple NMIs but be + * handled by a single invocation of the NMI handler (think PMC overflow while + * in the NMI handler). This could result in subsequent unknown NMI messages + * being issued. + * + * Attempt to mitigate this by using the number of active PMCs to determine + * whether to return NMI_HANDLED if the perf NMI handler did not handle/reset + * any PMCs. The per-CPU perf_nmi_counter variable is set to a minimum of one + * less than the number of active PMCs or 2. The value of 2 is used in case the + * NMI does not arrive at the APIC in time to be collapsed into an already + * pending NMI. + */ +static int amd_pmu_mitigate_nmi_latency(unsigned int active, int handled) +{ + /* If multiple counters are not active return original handled count */ + if (active <= 1) + return handled; + + /* + * If a counter was handled, record the number of possible remaining + * NMIs that can occur. + */ + if (handled) { + this_cpu_write(perf_nmi_counter, + min_t(unsigned int, 2, active - 1)); + + return handled; + } + + if (!this_cpu_read(perf_nmi_counter)) + return NMI_DONE; + + this_cpu_dec(perf_nmi_counter); + + return NMI_HANDLED; +} + static struct event_constraint * amd_get_event_constraints(struct cpu_hw_events *cpuc, int idx, struct perf_event *event) @@ -689,6 +731,7 @@ static __initconst const struct x86_pmu amd_pmu = { .amd_nb_constraints = 1, .wait_on_overflow = amd_pmu_wait_on_overflow, + .mitigate_nmi_latency = amd_pmu_mitigate_nmi_latency, }; static int __init amd_core_pmu_init(void) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index f1d2f70000cd..a59c3fcbae6a 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -1434,6 +1434,7 @@ int x86_pmu_handle_irq(struct pt_regs *regs) struct perf_sample_data data; struct cpu_hw_events *cpuc; struct perf_event *event; + unsigned int active = 0; int idx, handled = 0; u64 val; @@ -1461,6 +1462,8 @@ int x86_pmu_handle_irq(struct pt_regs *regs) continue; } + active++; + event = cpuc->events[idx]; val = x86_perf_event_update(event); @@ -1483,6 +1486,9 @@ int x86_pmu_handle_irq(struct pt_regs *regs) if (handled) inc_irq_stat(apic_perf_irqs); + if (x86_pmu.mitigate_nmi_latency) + handled = x86_pmu.mitigate_nmi_latency(active, handled); + return handled; } diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index a37490a26a09..619214bda92e 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -637,6 +637,8 @@ struct x86_pmu { */ unsigned int amd_nb_constraints : 1; void (*wait_on_overflow)(int idx, u64 config); + int (*mitigate_nmi_latency)(unsigned int active, + int handled); /* * Extra registers for events ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 2/2] x86/perf/amd: Resolve NMI latency issues when multiple PMCs are active 2019-03-11 16:48 ` [RFC PATCH 2/2] x86/perf/amd: Resolve NMI latency issues when multiple PMCs are active Lendacky, Thomas @ 2019-03-15 12:03 ` Peter Zijlstra 2019-03-15 14:44 ` Lendacky, Thomas 2019-03-18 9:47 ` Peter Zijlstra 0 siblings, 2 replies; 11+ messages in thread From: Peter Zijlstra @ 2019-03-15 12:03 UTC (permalink / raw) To: Lendacky, Thomas Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Arnaldo Carvalho de Melo, Alexander Shishkin, Ingo Molnar, Borislav Petkov, Namhyung Kim, Thomas Gleixner, Jiri Olsa On Mon, Mar 11, 2019 at 04:48:51PM +0000, Lendacky, Thomas wrote: > @@ -467,6 +470,45 @@ static void amd_pmu_wait_on_overflow(int idx, u64 config) > } > } > > +/* > + * Because of NMI latency, if multiple PMC counters are active we need to take > + * into account that multiple PMC overflows can generate multiple NMIs but be > + * handled by a single invocation of the NMI handler (think PMC overflow while > + * in the NMI handler). This could result in subsequent unknown NMI messages > + * being issued. > + * > + * Attempt to mitigate this by using the number of active PMCs to determine > + * whether to return NMI_HANDLED if the perf NMI handler did not handle/reset > + * any PMCs. The per-CPU perf_nmi_counter variable is set to a minimum of one > + * less than the number of active PMCs or 2. The value of 2 is used in case the > + * NMI does not arrive at the APIC in time to be collapsed into an already > + * pending NMI. LAPIC I really do hope?! > + */ > +static int amd_pmu_mitigate_nmi_latency(unsigned int active, int handled) > +{ > + /* If multiple counters are not active return original handled count */ > + if (active <= 1) > + return handled; Should we not reset perf_nmi_counter in this case? > + > + /* > + * If a counter was handled, record the number of possible remaining > + * NMIs that can occur. > + */ > + if (handled) { > + this_cpu_write(perf_nmi_counter, > + min_t(unsigned int, 2, active - 1)); > + > + return handled; > + } > + > + if (!this_cpu_read(perf_nmi_counter)) > + return NMI_DONE; > + > + this_cpu_dec(perf_nmi_counter); > + > + return NMI_HANDLED; > +} > + > static struct event_constraint * > amd_get_event_constraints(struct cpu_hw_events *cpuc, int idx, > struct perf_event *event) > @@ -689,6 +731,7 @@ static __initconst const struct x86_pmu amd_pmu = { > > .amd_nb_constraints = 1, > .wait_on_overflow = amd_pmu_wait_on_overflow, > + .mitigate_nmi_latency = amd_pmu_mitigate_nmi_latency, > }; Again, you could just do amd_pmu_handle_irq() and avoid an extra callback. Anyway, we already had code to deal with spurious NMIs from AMD; see commit: 63e6be6d98e1 ("perf, x86: Catch spurious interrupts after disabling counters") And that looks to be doing something very much the same. Why then do you still need this on top? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 2/2] x86/perf/amd: Resolve NMI latency issues when multiple PMCs are active 2019-03-15 12:03 ` Peter Zijlstra @ 2019-03-15 14:44 ` Lendacky, Thomas 2019-03-15 15:11 ` Peter Zijlstra 2019-03-18 9:47 ` Peter Zijlstra 1 sibling, 1 reply; 11+ messages in thread From: Lendacky, Thomas @ 2019-03-15 14:44 UTC (permalink / raw) To: Peter Zijlstra Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Arnaldo Carvalho de Melo, Alexander Shishkin, Ingo Molnar, Borislav Petkov, Namhyung Kim, Thomas Gleixner, Jiri Olsa On 3/15/19 7:03 AM, Peter Zijlstra wrote: > On Mon, Mar 11, 2019 at 04:48:51PM +0000, Lendacky, Thomas wrote: >> @@ -467,6 +470,45 @@ static void amd_pmu_wait_on_overflow(int idx, u64 config) >> } >> } >> >> +/* >> + * Because of NMI latency, if multiple PMC counters are active we need to take >> + * into account that multiple PMC overflows can generate multiple NMIs but be >> + * handled by a single invocation of the NMI handler (think PMC overflow while >> + * in the NMI handler). This could result in subsequent unknown NMI messages >> + * being issued. >> + * >> + * Attempt to mitigate this by using the number of active PMCs to determine >> + * whether to return NMI_HANDLED if the perf NMI handler did not handle/reset >> + * any PMCs. The per-CPU perf_nmi_counter variable is set to a minimum of one >> + * less than the number of active PMCs or 2. The value of 2 is used in case the >> + * NMI does not arrive at the APIC in time to be collapsed into an already >> + * pending NMI. > > LAPIC I really do hope?! Yes, LAPIC. > >> + */ >> +static int amd_pmu_mitigate_nmi_latency(unsigned int active, int handled) >> +{ >> + /* If multiple counters are not active return original handled count */ >> + if (active <= 1) >> + return handled; > > Should we not reset perf_nmi_counter in this case? Yes, we should. I'll make that change. > >> + >> + /* >> + * If a counter was handled, record the number of possible remaining >> + * NMIs that can occur. >> + */ >> + if (handled) { >> + this_cpu_write(perf_nmi_counter, >> + min_t(unsigned int, 2, active - 1)); >> + >> + return handled; >> + } >> + >> + if (!this_cpu_read(perf_nmi_counter)) >> + return NMI_DONE; >> + >> + this_cpu_dec(perf_nmi_counter); >> + >> + return NMI_HANDLED; >> +} >> + >> static struct event_constraint * >> amd_get_event_constraints(struct cpu_hw_events *cpuc, int idx, >> struct perf_event *event) >> @@ -689,6 +731,7 @@ static __initconst const struct x86_pmu amd_pmu = { >> >> .amd_nb_constraints = 1, >> .wait_on_overflow = amd_pmu_wait_on_overflow, >> + .mitigate_nmi_latency = amd_pmu_mitigate_nmi_latency, >> }; > > Again, you could just do amd_pmu_handle_irq() and avoid an extra > callback. This is where there would be a bunch of code duplication where I thought adding the callback at the end would be better. But if it's best to add an AMD handle_irq callback I can do that. I'm easy, let me know if you'd prefer that. > > Anyway, we already had code to deal with spurious NMIs from AMD; see > commit: > > 63e6be6d98e1 ("perf, x86: Catch spurious interrupts after disabling counters") > > And that looks to be doing something very much the same. Why then do you > still need this on top? This can happen while perf is handling normal counter overflow as opposed to covering the disabling of the counter case. When multiple counters overflow at roughly the same time, but the NMI doesn't arrive in time to get collapsed into a pending NMI, the back-to-back support in do_default_nmi() doesn't kick in. Hmmm... I wonder if the wait on overflow in the disable_all() function would eliminate the need for 63e6be6d98e1. That would take a more testing on some older hardware to verify. That's something I can look into separate from this series. Thanks, Tom > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 2/2] x86/perf/amd: Resolve NMI latency issues when multiple PMCs are active 2019-03-15 14:44 ` Lendacky, Thomas @ 2019-03-15 15:11 ` Peter Zijlstra 2019-03-15 15:50 ` Lendacky, Thomas 0 siblings, 1 reply; 11+ messages in thread From: Peter Zijlstra @ 2019-03-15 15:11 UTC (permalink / raw) To: Lendacky, Thomas Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Arnaldo Carvalho de Melo, Alexander Shishkin, Ingo Molnar, Borislav Petkov, Namhyung Kim, Thomas Gleixner, Jiri Olsa On Fri, Mar 15, 2019 at 02:44:32PM +0000, Lendacky, Thomas wrote: > >> @@ -689,6 +731,7 @@ static __initconst const struct x86_pmu amd_pmu = { > >> > >> .amd_nb_constraints = 1, > >> .wait_on_overflow = amd_pmu_wait_on_overflow, > >> + .mitigate_nmi_latency = amd_pmu_mitigate_nmi_latency, > >> }; > > > > Again, you could just do amd_pmu_handle_irq() and avoid an extra > > callback. > > This is where there would be a bunch of code duplication where I thought > adding the callback at the end would be better. But if it's best to add > an AMD handle_irq callback I can do that. I'm easy, let me know if you'd > prefer that. Hmm, the thing that avoids you directly using x86_pmu_handle_irq() is that added active count, but is that not the same as the POPCNT of cpuc->active_mask? Is the latency of POPCNT so bad that we need avoid it? That is, I was thinking of something like: int amd_pmu_handle_irq(struct pt_regs *regs) { struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); int active = hweight_long(cpuc->active_mask); int handled = x86_pmu_handle_irq(regs); + if (active <= 1) { this_cpu_write(perf_nmi_counter, 0); + return handled; } + + /* + * If a counter was handled, record the number of possible remaining + * NMIs that can occur. + */ + if (handled) { + this_cpu_write(perf_nmi_counter, + min_t(unsigned int, 2, active - 1)); + + return handled; + } + + if (!this_cpu_read(perf_nmi_counter)) + return NMI_DONE; + + this_cpu_dec(perf_nmi_counter); + + return NMI_HANDLED; } > > Anyway, we already had code to deal with spurious NMIs from AMD; see > > commit: > > > > 63e6be6d98e1 ("perf, x86: Catch spurious interrupts after disabling counters") > > > > And that looks to be doing something very much the same. Why then do you > > still need this on top? > > This can happen while perf is handling normal counter overflow as opposed > to covering the disabling of the counter case. When multiple counters > overflow at roughly the same time, but the NMI doesn't arrive in time to > get collapsed into a pending NMI, the back-to-back support in > do_default_nmi() doesn't kick in. > > Hmmm... I wonder if the wait on overflow in the disable_all() function > would eliminate the need for 63e6be6d98e1. That would take a more testing > on some older hardware to verify. That's something I can look into > separate from this series. Yes please, or at least better document the reason for their separate existence. It's all turning into a bit of magic it seems. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 2/2] x86/perf/amd: Resolve NMI latency issues when multiple PMCs are active 2019-03-15 15:11 ` Peter Zijlstra @ 2019-03-15 15:50 ` Lendacky, Thomas 2019-03-15 17:47 ` Lendacky, Thomas 0 siblings, 1 reply; 11+ messages in thread From: Lendacky, Thomas @ 2019-03-15 15:50 UTC (permalink / raw) To: Peter Zijlstra Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Arnaldo Carvalho de Melo, Alexander Shishkin, Ingo Molnar, Borislav Petkov, Namhyung Kim, Thomas Gleixner, Jiri Olsa On 3/15/19 10:11 AM, Peter Zijlstra wrote: > On Fri, Mar 15, 2019 at 02:44:32PM +0000, Lendacky, Thomas wrote: > >>>> @@ -689,6 +731,7 @@ static __initconst const struct x86_pmu amd_pmu = { >>>> >>>> .amd_nb_constraints = 1, >>>> .wait_on_overflow = amd_pmu_wait_on_overflow, >>>> + .mitigate_nmi_latency = amd_pmu_mitigate_nmi_latency, >>>> }; >>> >>> Again, you could just do amd_pmu_handle_irq() and avoid an extra >>> callback. >> >> This is where there would be a bunch of code duplication where I thought >> adding the callback at the end would be better. But if it's best to add >> an AMD handle_irq callback I can do that. I'm easy, let me know if you'd >> prefer that. > > Hmm, the thing that avoids you directly using x86_pmu_handle_irq() is > that added active count, but is that not the same as the POPCNT of > cpuc->active_mask? > > Is the latency of POPCNT so bad that we need avoid it? > > That is, I was thinking of something like: > > int amd_pmu_handle_irq(struct pt_regs *regs) > { > struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); > int active = hweight_long(cpuc->active_mask); > int handled = x86_pmu_handle_irq(regs); Yup, I had a total brain lapse there of just calling x86_pmu_handle_irq() from the new routine. > > + if (active <= 1) { > this_cpu_write(perf_nmi_counter, 0); > + return handled; > } > + > + /* > + * If a counter was handled, record the number of possible remaining > + * NMIs that can occur. > + */ > + if (handled) { > + this_cpu_write(perf_nmi_counter, > + min_t(unsigned int, 2, active - 1)); > + > + return handled; > + } > + > + if (!this_cpu_read(perf_nmi_counter)) > + return NMI_DONE; > + > + this_cpu_dec(perf_nmi_counter); > + > + return NMI_HANDLED; > } > >>> Anyway, we already had code to deal with spurious NMIs from AMD; see >>> commit: >>> >>> 63e6be6d98e1 ("perf, x86: Catch spurious interrupts after disabling counters") >>> >>> And that looks to be doing something very much the same. Why then do you >>> still need this on top? >> >> This can happen while perf is handling normal counter overflow as opposed >> to covering the disabling of the counter case. When multiple counters >> overflow at roughly the same time, but the NMI doesn't arrive in time to >> get collapsed into a pending NMI, the back-to-back support in >> do_default_nmi() doesn't kick in. >> >> Hmmm... I wonder if the wait on overflow in the disable_all() function >> would eliminate the need for 63e6be6d98e1. That would take a more testing >> on some older hardware to verify. That's something I can look into >> separate from this series. > > Yes please, or at least better document the reason for their separate > existence. It's all turning into a bit of magic it seems. Ok, I'll update the commit message with a bit more info and add to the comment of the new AMD handle_irq function. Thanks, Tom > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 2/2] x86/perf/amd: Resolve NMI latency issues when multiple PMCs are active 2019-03-15 15:50 ` Lendacky, Thomas @ 2019-03-15 17:47 ` Lendacky, Thomas 0 siblings, 0 replies; 11+ messages in thread From: Lendacky, Thomas @ 2019-03-15 17:47 UTC (permalink / raw) To: Peter Zijlstra Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Arnaldo Carvalho de Melo, Alexander Shishkin, Ingo Molnar, Borislav Petkov, Namhyung Kim, Thomas Gleixner, Jiri Olsa On 3/15/19 10:49 AM, Tom Lendacky wrote: > On 3/15/19 10:11 AM, Peter Zijlstra wrote: >> On Fri, Mar 15, 2019 at 02:44:32PM +0000, Lendacky, Thomas wrote: >> >>>>> @@ -689,6 +731,7 @@ static __initconst const struct x86_pmu amd_pmu = { >>>>> .amd_nb_constraints = 1, >>>>> .wait_on_overflow = amd_pmu_wait_on_overflow, >>>>> + .mitigate_nmi_latency = amd_pmu_mitigate_nmi_latency, >>>>> }; >>>> >>>> Again, you could just do amd_pmu_handle_irq() and avoid an extra >>>> callback. >>> >>> This is where there would be a bunch of code duplication where I thought >>> adding the callback at the end would be better. But if it's best to add >>> an AMD handle_irq callback I can do that. I'm easy, let me know if you'd >>> prefer that. >> >> Hmm, the thing that avoids you directly using x86_pmu_handle_irq() is >> that added active count, but is that not the same as the POPCNT of >> cpuc->active_mask? >> >> Is the latency of POPCNT so bad that we need avoid it? >> >> That is, I was thinking of something like: >> >> int amd_pmu_handle_irq(struct pt_regs *regs) >> { >> struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); >> int active = hweight_long(cpuc->active_mask); >> int handled = x86_pmu_handle_irq(regs); > > Yup, I had a total brain lapse there of just calling x86_pmu_handle_irq() > from the new routine. > >> >> + if (active <= 1) { And I wasn't taking into account other sources of NMIs triggering the running of the handler while perf is running. I was only thinking in terms of NMIs coming from the PMCs. So this really needs to be a !active check and the setting of the perf_nmi_counter below needs to be the min of 2 or active. Thanks, Tom >> this_cpu_write(perf_nmi_counter, 0); >> + return handled; >> } >> + >> + /* >> + * If a counter was handled, record the number of possible >> remaining >> + * NMIs that can occur. >> + */ >> + if (handled) { >> + this_cpu_write(perf_nmi_counter, >> + min_t(unsigned int, 2, active - 1)); >> + >> + return handled; >> + } >> + >> + if (!this_cpu_read(perf_nmi_counter)) >> + return NMI_DONE; >> + >> + this_cpu_dec(perf_nmi_counter); >> + >> + return NMI_HANDLED; >> } >> >>>> Anyway, we already had code to deal with spurious NMIs from AMD; see >>>> commit: >>>> >>>> 63e6be6d98e1 ("perf, x86: Catch spurious interrupts after >>>> disabling counters") >>>> >>>> And that looks to be doing something very much the same. Why then do you >>>> still need this on top? >>> >>> This can happen while perf is handling normal counter overflow as opposed >>> to covering the disabling of the counter case. When multiple counters >>> overflow at roughly the same time, but the NMI doesn't arrive in time to >>> get collapsed into a pending NMI, the back-to-back support in >>> do_default_nmi() doesn't kick in. >>> >>> Hmmm... I wonder if the wait on overflow in the disable_all() function >>> would eliminate the need for 63e6be6d98e1. That would take a more testing >>> on some older hardware to verify. That's something I can look into >>> separate from this series. >> >> Yes please, or at least better document the reason for their separate >> existence. It's all turning into a bit of magic it seems. > > Ok, I'll update the commit message with a bit more info and add to the > comment of the new AMD handle_irq function. > > Thanks, > Tom > >> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 2/2] x86/perf/amd: Resolve NMI latency issues when multiple PMCs are active 2019-03-15 12:03 ` Peter Zijlstra 2019-03-15 14:44 ` Lendacky, Thomas @ 2019-03-18 9:47 ` Peter Zijlstra 1 sibling, 0 replies; 11+ messages in thread From: Peter Zijlstra @ 2019-03-18 9:47 UTC (permalink / raw) To: Lendacky, Thomas Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Arnaldo Carvalho de Melo, Alexander Shishkin, Ingo Molnar, Borislav Petkov, Namhyung Kim, Thomas Gleixner, Jiri Olsa On Fri, Mar 15, 2019 at 01:03:11PM +0100, Peter Zijlstra wrote: > Anyway, we already had code to deal with spurious NMIs from AMD; see > commit: > > 63e6be6d98e1 ("perf, x86: Catch spurious interrupts after disabling counters") > > And that looks to be doing something very much the same. Why then do you > still need this on top? And I think I've spotted a bug there; consider the case where only PMC3 has an active event left, then the interrupt would consume the running state for PMC0-2, not leaving it for the spurious interrupt that might come after it. Similarly, if there's nothing running anymore, a single spurious interrupt will clear the entire state. It effectively is a single state, not a per-pmc one. Something like the below would cure that... would that help with something? Or were we going to get get rid of this entirely with your patches... diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index e2b1447192a8..a8b5535f7888 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -1432,6 +1432,7 @@ int x86_pmu_handle_irq(struct pt_regs *regs) struct cpu_hw_events *cpuc; struct perf_event *event; int idx, handled = 0; + int ridx = -1; u64 val; cpuc = this_cpu_ptr(&cpu_hw_events); @@ -1453,8 +1454,8 @@ int x86_pmu_handle_irq(struct pt_regs *regs) * might still deliver spurious interrupts still * in flight. Catch them: */ - if (__test_and_clear_bit(idx, cpuc->running)) - handled++; + if (test_bit(idx, cpuc->running)) + ridx = idx; continue; } @@ -1477,6 +1478,11 @@ int x86_pmu_handle_irq(struct pt_regs *regs) x86_pmu_stop(event, 0); } + if (!handled && ridx >= 0) { + __clear_bit(ridx, cpuc->running); + handled++; + } + if (handled) inc_irq_stat(apic_perf_irqs); ^ permalink raw reply related [flat|nested] 11+ messages in thread
end of thread, other threads:[~2019-03-18 9:47 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-03-11 16:48 [RFC PATCH 0/2] x86/perf/amd: AMD PMC counters and NMI latency Lendacky, Thomas 2019-03-11 16:48 ` [RFC PATCH 1/2] x86/perf/amd: Resolve race condition when disabling PMC Lendacky, Thomas 2019-03-15 10:50 ` Peter Zijlstra 2019-03-15 14:06 ` Lendacky, Thomas 2019-03-11 16:48 ` [RFC PATCH 2/2] x86/perf/amd: Resolve NMI latency issues when multiple PMCs are active Lendacky, Thomas 2019-03-15 12:03 ` Peter Zijlstra 2019-03-15 14:44 ` Lendacky, Thomas 2019-03-15 15:11 ` Peter Zijlstra 2019-03-15 15:50 ` Lendacky, Thomas 2019-03-15 17:47 ` Lendacky, Thomas 2019-03-18 9:47 ` Peter Zijlstra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox