* Small question about reserved bits in MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR
@ 2024-09-16 18:54 Maxim Levitsky
2024-09-16 20:41 ` dongli.zhang
0 siblings, 1 reply; 5+ messages in thread
From: Maxim Levitsky @ 2024-09-16 18:54 UTC (permalink / raw)
To: Sandipan Das; +Cc: linux-perf-users, x86, kvm
Hi!
We recently saw a failure in one of the aws VM instances that causes the following error during the guest boot:
0.480051] unchecked MSR access error: WRMSR to 0xc0000302 (tried to write 0x040000000000001f) at rIP: 0xffffffff96c093e2 (amd_pmu_cpu_reset.constprop.0+0x42/0x80)
I investigated the issue and I see that the hypervisor does expose PerfmonV2, but not the LBRv2 support:
# cpuid -1 -l 0x80000022
CPU:
Extended Performance Monitoring and Debugging (0x80000022):
AMD performance monitoring V2 = true
AMD LBR V2 = false
AMD LBR stack & PMC freezing = false
number of core perf ctrs = 0x5 (5)
number of LBR stack entries = 0x0 (0)
number of avail Northbridge perf ctrs = 0x0 (0)
number of available UMC PMCs = 0x0 (0)
active UMCs bitmask = 0x0
I also verified that I can write 0x1f to 0xc0000302 but not 0x040000000000001f:
# wrmsr 0xc0000302 0x1f
# wrmsr 0xc0000302 0x040000000000001f
wrmsr: CPU 0 cannot set MSR 0xc0000302 to 0x040000000000001f
#
The AMD's APM is not clear on what should happen if unsupported bits are attempted to be cleared
using this MSR.
Also I noticed that amd_pmu_v2_handle_irq writes 0xffffffffffffffff to this msrs.
It has the following code:
WARN_ON(status > 0);
/* Clear overflow and freeze bits */
amd_pmu_ack_global_status(~status);
This implies that it is OK to set all bits in this MSR.
Can you please take a look?
Thanks in advance,
Best regards,
Maxim Levitsky
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: Small question about reserved bits in MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR
2024-09-16 18:54 Small question about reserved bits in MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR Maxim Levitsky
@ 2024-09-16 20:41 ` dongli.zhang
2024-09-17 5:52 ` Sandipan Das
0 siblings, 1 reply; 5+ messages in thread
From: dongli.zhang @ 2024-09-16 20:41 UTC (permalink / raw)
To: Maxim Levitsky, Sandipan Das; +Cc: linux-perf-users, x86, kvm
On 9/16/24 11:54 AM, Maxim Levitsky wrote:
> Hi!
>
> We recently saw a failure in one of the aws VM instances that causes the following error during the guest boot:
>
> 0.480051] unchecked MSR access error: WRMSR to 0xc0000302 (tried to write 0x040000000000001f) at rIP: 0xffffffff96c093e2 (amd_pmu_cpu_reset.constprop.0+0x42/0x80)
>
>
> I investigated the issue and I see that the hypervisor does expose PerfmonV2, but not the LBRv2 support:
>
> # cpuid -1 -l 0x80000022
> CPU:
> Extended Performance Monitoring and Debugging (0x80000022):
> AMD performance monitoring V2 = true
> AMD LBR V2 = false
> AMD LBR stack & PMC freezing = false
> number of core perf ctrs = 0x5 (5)
> number of LBR stack entries = 0x0 (0)
> number of avail Northbridge perf ctrs = 0x0 (0)
> number of available UMC PMCs = 0x0 (0)
> active UMCs bitmask = 0x0
>
> I also verified that I can write 0x1f to 0xc0000302 but not 0x040000000000001f:
>
> # wrmsr 0xc0000302 0x1f
> # wrmsr 0xc0000302 0x040000000000001f
> wrmsr: CPU 0 cannot set MSR 0xc0000302 to 0x040000000000001f
> #
>
> The AMD's APM is not clear on what should happen if unsupported bits are attempted to be cleared
> using this MSR.
>
> Also I noticed that amd_pmu_v2_handle_irq writes 0xffffffffffffffff to this msrs.
> It has the following code:
>
>
> WARN_ON(status > 0);
>
> /* Clear overflow and freeze bits */
> amd_pmu_ack_global_status(~status);
>
>
> This implies that it is OK to set all bits in this MSR.
>
To share my data point on QEMU+KVM: I am not able to reproduce with the most
recent QEMU (not AWS) + below patch.
[PATCH v2 2/4] i386/cpu: Add PerfMonV2 feature bit
https://lore.kernel.org/all/69905b486218f8287b9703d1a9001175d04c2f02.1723068946.git.babu.moger@amd.com/
Both my VM and KVM are 6.10.
vm# cpuid -1 -l 0x80000022
CPU:
Extended Performance Monitoring and Debugging (0x80000022):
AMD performance monitoring V2 = true
AMD LBR V2 = false
AMD LBR stack & PMC freezing = false
number of core perf ctrs = 0x6 (6)
number of LBR stack entries = 0x0 (0)
number of avail Northbridge perf ctrs = 0x0 (0)
number of available UMC PMCs = 0x0 (0)
active UMCs bitmask = 0x0
Both writes are passed.
vm# wrmsr 0xc0000302 0x1f
vm# wrmsr 0xc0000302 0x040000000000001f
Here is bcc output. Both writes are good.
kvm# /usr/share/bcc/tools/trace -t -C 'kvm_pmu_set_msr "%x", retval'
... ...
4.748614 19 43545 43550 CPU 0/KVM kvm_pmu_set_msr 0
10.97396 19 43545 43550 CPU 0/KVM kvm_pmu_set_msr 0
Dongli Zhang
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: Small question about reserved bits in MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR
2024-09-16 20:41 ` dongli.zhang
@ 2024-09-17 5:52 ` Sandipan Das
2024-09-17 12:54 ` Maxim Levitsky
0 siblings, 1 reply; 5+ messages in thread
From: Sandipan Das @ 2024-09-17 5:52 UTC (permalink / raw)
To: dongli.zhang, Maxim Levitsky; +Cc: linux-perf-users, x86, kvm
On 9/17/2024 2:11 AM, dongli.zhang@oracle.com wrote:
>
> On 9/16/24 11:54 AM, Maxim Levitsky wrote:
>> Hi!
>>
>> We recently saw a failure in one of the aws VM instances that causes the following error during the guest boot:
>>
>> 0.480051] unchecked MSR access error: WRMSR to 0xc0000302 (tried to write 0x040000000000001f) at rIP: 0xffffffff96c093e2 (amd_pmu_cpu_reset.constprop.0+0x42/0x80)
>>
>>
>> I investigated the issue and I see that the hypervisor does expose PerfmonV2, but not the LBRv2 support:
>>
>> # cpuid -1 -l 0x80000022
>> CPU:
>> Extended Performance Monitoring and Debugging (0x80000022):
>> AMD performance monitoring V2 = true
>> AMD LBR V2 = false
>> AMD LBR stack & PMC freezing = false
>> number of core perf ctrs = 0x5 (5)
>> number of LBR stack entries = 0x0 (0)
>> number of avail Northbridge perf ctrs = 0x0 (0)
>> number of available UMC PMCs = 0x0 (0)
>> active UMCs bitmask = 0x0
>>
That's expected. LBRv2 is currently not available to KVM guests. However, PerfMonV2 should be the
only feature bit required to indicate the availability of MSRs 0xc0000300..0xc0000303
>> I also verified that I can write 0x1f to 0xc0000302 but not 0x040000000000001f:
>>
>> # wrmsr 0xc0000302 0x1f
>> # wrmsr 0xc0000302 0x040000000000001f
>> wrmsr: CPU 0 cannot set MSR 0xc0000302 to 0x040000000000001f
>> #
>>
>> The AMD's APM is not clear on what should happen if unsupported bits are attempted to be cleared
>> using this MSR.
>>
>> Also I noticed that amd_pmu_v2_handle_irq writes 0xffffffffffffffff to this msrs.
>> It has the following code:
>>
>>
>> WARN_ON(status > 0);
>>
>> /* Clear overflow and freeze bits */
>> amd_pmu_ack_global_status(~status);
>>
>>
>> This implies that it is OK to set all bits in this MSR.
>>
It is, but writes to the reserved bits are ignored.
>
> To share my data point on QEMU+KVM: I am not able to reproduce with the most
> recent QEMU (not AWS) + below patch.
>
> [PATCH v2 2/4] i386/cpu: Add PerfMonV2 feature bit
> https://lore.kernel.org/all/69905b486218f8287b9703d1a9001175d04c2f02.1723068946.git.babu.moger@amd.com/
>
> Both my VM and KVM are 6.10.
>
> vm# cpuid -1 -l 0x80000022
> CPU:
> Extended Performance Monitoring and Debugging (0x80000022):
> AMD performance monitoring V2 = true
> AMD LBR V2 = false
> AMD LBR stack & PMC freezing = false
> number of core perf ctrs = 0x6 (6)
> number of LBR stack entries = 0x0 (0)
> number of avail Northbridge perf ctrs = 0x0 (0)
> number of available UMC PMCs = 0x0 (0)
> active UMCs bitmask = 0x0
>
>
> Both writes are passed.
>
> vm# wrmsr 0xc0000302 0x1f
> vm# wrmsr 0xc0000302 0x040000000000001f
>
> Here is bcc output. Both writes are good.
>
> kvm# /usr/share/bcc/tools/trace -t -C 'kvm_pmu_set_msr "%x", retval'
> ... ...
> 4.748614 19 43545 43550 CPU 0/KVM kvm_pmu_set_msr 0
> 10.97396 19 43545 43550 CPU 0/KVM kvm_pmu_set_msr 0
>
Thanks for testing. I cannot replicate this either with an upstream kernel.
- Sandipan
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Small question about reserved bits in MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR
2024-09-17 5:52 ` Sandipan Das
@ 2024-09-17 12:54 ` Maxim Levitsky
2024-11-11 16:57 ` Josh Triplett
0 siblings, 1 reply; 5+ messages in thread
From: Maxim Levitsky @ 2024-09-17 12:54 UTC (permalink / raw)
To: Sandipan Das, dongli.zhang; +Cc: linux-perf-users, x86, kvm
On Tue, 2024-09-17 at 11:22 +0530, Sandipan Das wrote:
> On 9/17/2024 2:11 AM, dongli.zhang@oracle.com wrote:
> > On 9/16/24 11:54 AM, Maxim Levitsky wrote:
> > > Hi!
> > >
> > > We recently saw a failure in one of the aws VM instances that causes the following error during the guest boot:
> > >
> > > 0.480051] unchecked MSR access error: WRMSR to 0xc0000302 (tried to write 0x040000000000001f) at rIP: 0xffffffff96c093e2 (amd_pmu_cpu_reset.constprop.0+0x42/0x80)
> > >
> > >
> > > I investigated the issue and I see that the hypervisor does expose PerfmonV2, but not the LBRv2 support:
> > >
> > > # cpuid -1 -l 0x80000022
> > > CPU:
> > > Extended Performance Monitoring and Debugging (0x80000022):
> > > AMD performance monitoring V2 = true
> > > AMD LBR V2 = false
> > > AMD LBR stack & PMC freezing = false
> > > number of core perf ctrs = 0x5 (5)
> > > number of LBR stack entries = 0x0 (0)
> > > number of avail Northbridge perf ctrs = 0x0 (0)
> > > number of available UMC PMCs = 0x0 (0)
> > > active UMCs bitmask = 0x0
> > >
>
> That's expected. LBRv2 is currently not available to KVM guests. However, PerfMonV2 should be the
> only feature bit required to indicate the availability of MSRs 0xc0000300..0xc0000303
>
> > > I also verified that I can write 0x1f to 0xc0000302 but not 0x040000000000001f:
> > >
> > > # wrmsr 0xc0000302 0x1f
> > > # wrmsr 0xc0000302 0x040000000000001f
> > > wrmsr: CPU 0 cannot set MSR 0xc0000302 to 0x040000000000001f
> > > #
> > >
> > > The AMD's APM is not clear on what should happen if unsupported bits are attempted to be cleared
> > > using this MSR.
> > >
> > > Also I noticed that amd_pmu_v2_handle_irq writes 0xffffffffffffffff to this msr.
> > > It has the following code:
> > >
> > >
> > > WARN_ON(status > 0);
> > >
> > > /* Clear overflow and freeze bits */
> > > amd_pmu_ack_global_status(~status);
> > >
> > >
> > > This implies that it is OK to set all bits in this MSR.
> > >
>
> It is, but writes to the reserved bits are ignored.
>
> > To share my data point on QEMU+KVM: I am not able to reproduce with the most
> > recent QEMU (not AWS) + below patch.
> >
> > [PATCH v2 2/4] i386/cpu: Add PerfMonV2 feature bit
> > https://lore.kernel.org/all/69905b486218f8287b9703d1a9001175d04c2f02.1723068946.git.babu.moger@amd.com/
> >
> > Both my VM and KVM are 6.10.
> >
> > vm# cpuid -1 -l 0x80000022
> > CPU:
> > Extended Performance Monitoring and Debugging (0x80000022):
> > AMD performance monitoring V2 = true
> > AMD LBR V2 = false
> > AMD LBR stack & PMC freezing = false
> > number of core perf ctrs = 0x6 (6)
> > number of LBR stack entries = 0x0 (0)
> > number of avail Northbridge perf ctrs = 0x0 (0)
> > number of available UMC PMCs = 0x0 (0)
> > active UMCs bitmask = 0x0
> >
> >
> > Both writes are passed.
> >
> > vm# wrmsr 0xc0000302 0x1f
> > vm# wrmsr 0xc0000302 0x040000000000001f
> >
> > Here is bcc output. Both writes are good.
> >
> > kvm# /usr/share/bcc/tools/trace -t -C 'kvm_pmu_set_msr "%x", retval'
> > ... ...
> > 4.748614 19 43545 43550 CPU 0/KVM kvm_pmu_set_msr 0
> > 10.97396 19 43545 43550 CPU 0/KVM kvm_pmu_set_msr 0
> >
>
> Thanks for testing. I cannot replicate this either with an upstream kernel.
Hi,
I also tested on bare metal Zen4 system just now, and I also see that MSR 0xc0000302 can be set
to any value.
So this is a hypervisor bug, I'll report it to AWS.
Best regards,
Maxim Levitsky
>
> - Sandipan
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Small question about reserved bits in MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR
2024-09-17 12:54 ` Maxim Levitsky
@ 2024-11-11 16:57 ` Josh Triplett
0 siblings, 0 replies; 5+ messages in thread
From: Josh Triplett @ 2024-11-11 16:57 UTC (permalink / raw)
To: Maxim Levitsky; +Cc: Sandipan Das, dongli.zhang, linux-perf-users, x86, kvm
On Tue, Sep 17, 2024 at 08:54:04AM -0400, Maxim Levitsky wrote:
> I also tested on bare metal Zen4 system just now, and I also see that MSR 0xc0000302 can be set
> to any value.
>
> So this is a hypervisor bug, I'll report it to AWS.
Have you gotten any response from AWS? I'm still seeing this bug on
current c7a instances.
- Josh Triplett
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-11-11 16:57 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-16 18:54 Small question about reserved bits in MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR Maxim Levitsky
2024-09-16 20:41 ` dongli.zhang
2024-09-17 5:52 ` Sandipan Das
2024-09-17 12:54 ` Maxim Levitsky
2024-11-11 16:57 ` Josh Triplett
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).