* rom 3540f985652f41041e54ee82aa53e7dbd55739ae Mon Sep 17 00:00:00 2001
@ 2023-09-14 14:06 Sandipan Das
2023-09-14 14:27 ` Sandipan Das
2023-09-22 10:03 ` Ingo Molnar
0 siblings, 2 replies; 4+ messages in thread
From: Sandipan Das @ 2023-09-14 14:06 UTC (permalink / raw)
To: linux-kernel, linux-perf-users
Cc: x86, peterz, leitao, mingo, acme, mark.rutland,
alexander.shishkin, jolsa, namhyung, irogers, adrian.hunter, tglx,
bp, dave.hansen, hpa, leit, dcostantino, jhladky, eranian,
ananth.narayan, ravi.bangoria, santosh.shukla, sandipan.das
Zen 4 systems running buggy microcode can hit a WARN_ON() in the PMI
handler, as shown below, several times while perf runs. A simple
`perf top` run is enough to render the system unusable.
WARNING: CPU: 18 PID: 20608 at arch/x86/events/amd/core.c:944 amd_pmu_v2_handle_irq+0x1be/0x2b0
This happens because the Performance Counter Global Status Register
(PerfCntGlobalStatus) has one or more bits set which are considered
reserved according to the "AMD64 Architecture Programmerâs Manual,
Volume 2: System Programming, 24593". The document can be found at
https://www.amd.com/system/files/TechDocs/24593.pdf
To make this less intrusive, warn just once if any reserved bit is set
and prompt the user to update the microcode. Also sanitize the value to
what the code is handling, so that the overflow events continue to be
handled for the number of counters that are known to be sane.
Going forward, the following microcode patch levels are recommended
for Zen 4 processors in order to avoid such issues with reserved bits.
Family=0x19 Model=0x11 Stepping=0x01: Patch=0x0a10113e
Family=0x19 Model=0x11 Stepping=0x02: Patch=0x0a10123e
Family=0x19 Model=0xa0 Stepping=0x01: Patch=0x0aa00116
Family=0x19 Model=0xa0 Stepping=0x02: Patch=0x0aa00212
Commit f2eb058afc57 ("linux-firmware: Update AMD cpu microcode") from
the linux-firmware tree has binaries that meet the minimum required
patch levels.
Fixes: 7685665c390d ("perf/x86/amd/core: Add PerfMonV2 overflow handling")
Reported-by: Jirka Hladky <jhladky@redhat.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
[sandipan: add message to prompt users to update microcode]
[sandipan: rework commit message and call out required microcode levels]
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
---
v1: https://lore.kernel.org/all/20230616115316.3652155-1-leitao@debian.org/
v2:
- Use pr_warn_once() instead of WARN_ON_ONCE() to prompt users to
update microcode
- Rework commit message and add details of minimum required microcode
patch levels.
---
arch/x86/events/amd/core.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index abadd5f23425..b04956cbd085 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -884,7 +884,7 @@ static int amd_pmu_v2_handle_irq(struct pt_regs *regs)
struct hw_perf_event *hwc;
struct perf_event *event;
int handled = 0, idx;
- u64 status, mask;
+ u64 reserved, status, mask;
bool pmu_enabled;
/*
@@ -909,6 +909,14 @@ static int amd_pmu_v2_handle_irq(struct pt_regs *regs)
status &= ~GLOBAL_STATUS_LBRS_FROZEN;
}
+ reserved = status & ~amd_pmu_global_cntr_mask;
+ if (reserved)
+ pr_warn_once("Reserved PerfCntrGlobalStatus bits are set (0x%llx), please consider updating microcode\n",
+ reserved);
+
+ /* Clear any reserved bits set by buggy microcode */
+ status &= amd_pmu_global_cntr_mask;
+
for (idx = 0; idx < x86_pmu.num_counters; idx++) {
if (!test_bit(idx, cpuc->active_mask))
continue;
--
2.34.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: rom 3540f985652f41041e54ee82aa53e7dbd55739ae Mon Sep 17 00:00:00 2001
2023-09-14 14:06 rom 3540f985652f41041e54ee82aa53e7dbd55739ae Mon Sep 17 00:00:00 2001 Sandipan Das
@ 2023-09-14 14:27 ` Sandipan Das
2023-09-22 10:03 ` Ingo Molnar
1 sibling, 0 replies; 4+ messages in thread
From: Sandipan Das @ 2023-09-14 14:27 UTC (permalink / raw)
To: linux-kernel, linux-perf-users
Cc: x86, peterz, leitao, mingo, acme, mark.rutland,
alexander.shishkin, jolsa, namhyung, irogers, adrian.hunter, tglx,
bp, dave.hansen, hpa, leit, dcostantino, jhladky, eranian,
ananth.narayan, ravi.bangoria, santosh.shukla
On 9/14/2023 7:36 PM, Sandipan Das wrote:
> Zen 4 systems running buggy microcode can hit a WARN_ON() in the PMI
> handler, as shown below, several times while perf runs. A simple
> `perf top` run is enough to render the system unusable.
>
> ...
The email header has a problem which breaks the subject line. Apologies
for having to resend this.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: rom 3540f985652f41041e54ee82aa53e7dbd55739ae Mon Sep 17 00:00:00 2001
2023-09-14 14:06 rom 3540f985652f41041e54ee82aa53e7dbd55739ae Mon Sep 17 00:00:00 2001 Sandipan Das
2023-09-14 14:27 ` Sandipan Das
@ 2023-09-22 10:03 ` Ingo Molnar
2023-09-22 10:23 ` Sandipan Das
1 sibling, 1 reply; 4+ messages in thread
From: Ingo Molnar @ 2023-09-22 10:03 UTC (permalink / raw)
To: Sandipan Das
Cc: linux-kernel, linux-perf-users, x86, peterz, leitao, mingo, acme,
mark.rutland, alexander.shishkin, jolsa, namhyung, irogers,
adrian.hunter, tglx, bp, dave.hansen, hpa, leit, dcostantino,
jhladky, eranian, ananth.narayan, ravi.bangoria, santosh.shukla
* Sandipan Das <sandipan.das@amd.com> wrote:
> Zen 4 systems running buggy microcode can hit a WARN_ON() in the PMI
> handler, as shown below, several times while perf runs. A simple
> `perf top` run is enough to render the system unusable.
>
> WARNING: CPU: 18 PID: 20608 at arch/x86/events/amd/core.c:944 amd_pmu_v2_handle_irq+0x1be/0x2b0
>
> This happens because the Performance Counter Global Status Register
> (PerfCntGlobalStatus) has one or more bits set which are considered
> reserved according to the "AMD64 Architecture Programmer???s Manual,
> Volume 2: System Programming, 24593". The document can be found at
> https://www.amd.com/system/files/TechDocs/24593.pdf
>
> To make this less intrusive, warn just once if any reserved bit is set
> and prompt the user to update the microcode. Also sanitize the value to
> what the code is handling, so that the overflow events continue to be
> handled for the number of counters that are known to be sane.
>
> Going forward, the following microcode patch levels are recommended
> for Zen 4 processors in order to avoid such issues with reserved bits.
>
> Family=0x19 Model=0x11 Stepping=0x01: Patch=0x0a10113e
> Family=0x19 Model=0x11 Stepping=0x02: Patch=0x0a10123e
> Family=0x19 Model=0xa0 Stepping=0x01: Patch=0x0aa00116
> Family=0x19 Model=0xa0 Stepping=0x02: Patch=0x0aa00212
>
> Commit f2eb058afc57 ("linux-firmware: Update AMD cpu microcode") from
> the linux-firmware tree has binaries that meet the minimum required
> patch levels.
>
> Fixes: 7685665c390d ("perf/x86/amd/core: Add PerfMonV2 overflow handling")
> Reported-by: Jirka Hladky <jhladky@redhat.com>
> Signed-off-by: Breno Leitao <leitao@debian.org>
> [sandipan: add message to prompt users to update microcode]
> [sandipan: rework commit message and call out required microcode levels]
> Signed-off-by: Sandipan Das <sandipan.das@amd.com>
> v2:
> - Use pr_warn_once() instead of WARN_ON_ONCE() to prompt users to
> update microcode
> - Rework commit message and add details of minimum required microcode
> patch levels.
1)
I don't think you ever re-sent this patch with the correct subject line.
( Or at least it's not in my mbox. )
2)
So if the fix is from Breno Leitao originally, then there should be a:
From: Breno Leitao <leitao@debian.org>
at the beginning of the patch to make authorship clear.
You might also want to add:
Co-developed-by: Sandipan Das <sandipan.das@amd.com>
to make your contributions clear.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: rom 3540f985652f41041e54ee82aa53e7dbd55739ae Mon Sep 17 00:00:00 2001
2023-09-22 10:03 ` Ingo Molnar
@ 2023-09-22 10:23 ` Sandipan Das
0 siblings, 0 replies; 4+ messages in thread
From: Sandipan Das @ 2023-09-22 10:23 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, linux-perf-users, x86, peterz, leitao, mingo, acme,
mark.rutland, alexander.shishkin, jolsa, namhyung, irogers,
adrian.hunter, tglx, bp, dave.hansen, hpa, leit, dcostantino,
jhladky, eranian, ananth.narayan, ravi.bangoria, santosh.shukla
On 9/22/2023 3:33 PM, Ingo Molnar wrote:
>
> * Sandipan Das <sandipan.das@amd.com> wrote:
>
>> Zen 4 systems running buggy microcode can hit a WARN_ON() in the PMI
>> handler, as shown below, several times while perf runs. A simple
>> `perf top` run is enough to render the system unusable.
>>
>> WARNING: CPU: 18 PID: 20608 at arch/x86/events/amd/core.c:944 amd_pmu_v2_handle_irq+0x1be/0x2b0
>>
>> This happens because the Performance Counter Global Status Register
>> (PerfCntGlobalStatus) has one or more bits set which are considered
>> reserved according to the "AMD64 Architecture Programmer???s Manual,
>> Volume 2: System Programming, 24593". The document can be found at
>> https://www.amd.com/system/files/TechDocs/24593.pdf
>>
>> To make this less intrusive, warn just once if any reserved bit is set
>> and prompt the user to update the microcode. Also sanitize the value to
>> what the code is handling, so that the overflow events continue to be
>> handled for the number of counters that are known to be sane.
>>
>> Going forward, the following microcode patch levels are recommended
>> for Zen 4 processors in order to avoid such issues with reserved bits.
>>
>> Family=0x19 Model=0x11 Stepping=0x01: Patch=0x0a10113e
>> Family=0x19 Model=0x11 Stepping=0x02: Patch=0x0a10123e
>> Family=0x19 Model=0xa0 Stepping=0x01: Patch=0x0aa00116
>> Family=0x19 Model=0xa0 Stepping=0x02: Patch=0x0aa00212
>>
>> Commit f2eb058afc57 ("linux-firmware: Update AMD cpu microcode") from
>> the linux-firmware tree has binaries that meet the minimum required
>> patch levels.
>>
>> Fixes: 7685665c390d ("perf/x86/amd/core: Add PerfMonV2 overflow handling")
>> Reported-by: Jirka Hladky <jhladky@redhat.com>
>> Signed-off-by: Breno Leitao <leitao@debian.org>
>> [sandipan: add message to prompt users to update microcode]
>> [sandipan: rework commit message and call out required microcode levels]
>> Signed-off-by: Sandipan Das <sandipan.das@amd.com>
>
>> v2:
>> - Use pr_warn_once() instead of WARN_ON_ONCE() to prompt users to
>> update microcode
>> - Rework commit message and add details of minimum required microcode
>> patch levels.
>
> 1)
>
> I don't think you ever re-sent this patch with the correct subject line.
> ( Or at least it's not in my mbox. )
>
> 2)
>
> So if the fix is from Breno Leitao originally, then there should be a:
>
> From: Breno Leitao <leitao@debian.org>
>
> at the beginning of the patch to make authorship clear.
>
> You might also want to add:
>
> Co-developed-by: Sandipan Das <sandipan.das@amd.com>
>
> to make your contributions clear.
>
Sorry for the confusion. I did resend this patch with the correct authorship
and it can be found here:
https://lore.kernel.org/all/3540f985652f41041e54ee82aa53e7dbd55739ae.1694696888.git.sandipan.das@amd.com/
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-09-22 10:23 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-14 14:06 rom 3540f985652f41041e54ee82aa53e7dbd55739ae Mon Sep 17 00:00:00 2001 Sandipan Das
2023-09-14 14:27 ` Sandipan Das
2023-09-22 10:03 ` Ingo Molnar
2023-09-22 10:23 ` Sandipan Das
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).