public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] kvm ignores ignore_msrs=1 VETO for some MSRs
@ 2023-09-05 18:07 Jari Ruusu
  2023-09-05 19:27 ` Sean Christopherson
  0 siblings, 1 reply; 6+ messages in thread
From: Jari Ruusu @ 2023-09-05 18:07 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini; +Cc: kvm@vger.kernel.org

I am having trouble booting old linux-3.10.108 x86_64 guest kernel on
qemu-2.11(+ubuntu patches) running on linux-5.10.194 x86_64 host kernel.
Same problem was observed with qemu version that shipped with Debian 11.
This problem is old regression. This type of setup worked fine on older
linux-4.x hosts but fails on linux-5.10.x hosts. I remember seeing this fail
as early as year 2021. I just haven't had time to look at it earlier.

Relevant qemu parameters:
  -machine pc-1.0
  -cpu Skylake-Server-IBRS,+md-clear,+pcid,+invpcid,+ssbd,+clflushopt
  -enable-kvm
If I change CPU model to "Nehalem" then it boots OK.

KVM stuff is built-in to host kernel and my kernel boot parameters include:
  kvm-intel.ept=0 l1tf=off kvm.ignore_msrs=1
so any invalid RDMSR reads should not fail because of ignore_msrs=1 VETO,
but at least MSR_IA32_PERF_CAPABILITIES RDMSR read does indeed fail.
 
Below is relevent guest kernel log data captured via virtual serial port:

[    0.041453] general protection fault: 0000 [#1] 
[    0.046166] Modules linked in:
[    0.046464] CPU: 0 PID: 1 Comm: swapper Not tainted 3.10.108 #1
[    0.046990] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[    0.047509] task: ffff88011f069890 ti: ffff88011f06a000 task.ti: ffff88011f06a000
[    0.048171] RIP: 0010:[<ffffffff816d0c77>]  [<ffffffff816d0c77>] intel_pmu_init+0x2ee/0x7b8
[    0.048943] RSP: 0000:ffff88011f06be68  EFLAGS: 00010202
[    0.049423] RAX: 0000000000000003 RBX: 0000000000000000 RCX: 0000000000000345
[    0.050000] RDX: 0000000000000003 RSI: 0000000000000001 RDI: ffffffff816c1230
[    0.050000] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000007
[    0.050000] R10: 0000000000000000 R11: 00000001816bee00 R12: 0000000000000000
[    0.050000] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    0.050000] FS:  0000000000000000(0000) GS:ffffffff81655000(0000) knlGS:0000000000000000
[    0.050000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.050000] CR2: ffff88011ffff000 CR3: 0000000001648000 CR4: 00000000003406b0
[    0.050000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    0.050000] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[    0.050000] Stack:
[    0.050000]  0000860300000000 0000000007300402 ffffffff816cfc3d ffffffff816cfc72
[    0.050000]  0000000000000000 ffffffff8101c266 0000000000000000 ffff88000009b000
[    0.050000]  ffffffff81744020 ffffffff816cfc3d 0000000000000000 0000000000000000
[    0.050000] Call Trace:
[    0.050000]  [<ffffffff816cfc3d>] ? check_bugs+0x45/0x45
[    0.050000]  [<ffffffff816cfc72>] ? init_hw_perf_events+0x35/0x4cf
[    0.050000]  [<ffffffff8101c266>] ? set_memory_x+0x2b/0x30
[    0.050000]  [<ffffffff816cfc3d>] ? check_bugs+0x45/0x45
[    0.050000]  [<ffffffff8100032d>] ? do_one_initcall+0x73/0xfe
[    0.050000]  [<ffffffff816c9cb0>] ? kernel_init_freeable+0x53/0x180
[    0.050000]  [<ffffffff81445771>] ? rest_init+0x65/0x65
[    0.050000]  [<ffffffff81445777>] ? kernel_init+0x6/0xc9
[    0.050000]  [<ffffffff8144f987>] ? ret_from_fork+0x57/0x90
[    0.050000]  [<ffffffff81445771>] ? rest_init+0x65/0x65
[    0.050000] Code: 00 0f 46 d0 ff ce 89 15 88 05 ff ff 7e 2f 8a 54 24 04 b8 03 00 00 00 b9 45 03 00 00 83 e2 1f 83 fa 02 0f 4f c2 89 05 99 04 ff ff <0f> 32 48 c1 e2 20 89 c0 48 09 c2 48 89 15 2f 05 ff ff e8 69 e7 
[    0.050000] RIP  [<ffffffff816d0c77>] intel_pmu_init+0x2ee/0x7b8
[    0.050000]  RSP <ffff88011f06be68>
[    0.050011] ---[ end trace 3baec0b388c1f452 ]---
[    0.050436] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

System.map address:

ffffffff816d0989 T intel_pmu_init

Partial linux-3.10.108/arch/x86/kernel/cpu/perf_event_intel.o disassembly:

00000000000000c3 <intel_pmu_init>:              
  c3:   53                      push   %rbx      
  c4:   48 83 ec 10             sub    $0x10,%rsp
  c8:   48 8b 05 00 00 00 00    mov    0x0(%rip),%rax        # cf <intel_pmu_init+0xc>
[SNIP]
 398:   b8 03 00 00 00          mov    $0x3,%eax
 39d:   b9 45 03 00 00          mov    $0x345,%ecx
 3a2:   83 e2 1f                and    $0x1f,%edx
 3a5:   83 fa 02                cmp    $0x2,%edx
 3a8:   0f 4f c2                cmovg  %edx,%eax
 3ab:   89 05 00 00 00 00       mov    %eax,0x0(%rip)        # 3b1 <intel_pmu_init+0x2ee>
 3b1:   0f 32                   rdmsr
                                ^^^^^----------FAILS-HERE----------
 3b3:   48 c1 e2 20             shl    $0x20,%rdx
 3b7:   89 c0                   mov    %eax,%eax
 3b9:   48 09 c2                or     %rax,%rdx
 3bc:   48 89 15 00 00 00 00    mov    %rdx,0x0(%rip)        # 3c3 <intel_pmu_init+0x300>

Above RDMSR assembly instruction maps to this C-language source file:
linux-3.10.108/arch/x86/kernel/cpu/perf_event_intel.c line 2036:

    rdmsrl(MSR_IA32_PERF_CAPABILITIES, capabilities);

Full C-language source file can be viewed here:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/x86/kernel/cpu/perf_event_intel.c?h=linux-3.10.y#n2023

My understanding of this failure is that it is combination of many factors,
including:

1) Qemu version is old
2) Qemu guest CPUID flags may be "Frankenstein" 
3) old linux-3.10.108 x86_64 kernel may be doing something questionable
4) newer host linux KVM is not always honoring RDMSR ignore_msrs=1 VETO

My reading linux-5.10.194 kernel source identified following questionable
handling ignore_msrs=1 VETO. This same problem appears to be present in
recently released linux-6.5 too, but so far I have not tested this
with linux-6.5.x host kernels yet.

kvm_get_msr(...)
  kvm_get_msr_ignored_check(...)
    ret = __kvm_get_msr(...)            // returns 1 for some invalid MSRs
    if (ret == KVM_MSR_RET_INVALID) {   // checks for value 2
      if (kvm_msr_ignored_check(...))
        ret = 0;                        // fails to get here with ignore_msrs=1
    }

Below is my quick-and-dirty fix for that one problematic
MSR_IA32_PERF_CAPABILITIES RDMSR case. This patch does not fix
other cases where __kvm_get_msr() returns 1 for invalid MSR reads.
This patch makes guest linux-3.10.108 x86_64 Skylake boot OK.
This patch is for linux-5.10.194 but seems to apply OK to
linux-6.5 also with some offset.

--- ./arch/x86/kvm/x86.c.OLD
+++ ./arch/x86/kvm/x86.c
@@ -3518,7 +3518,7 @@
 		msr_info->data = vcpu->arch.arch_capabilities;
 		break;
 	case MSR_IA32_PERF_CAPABILITIES:
-		if (!msr_info->host_initiated &&
+		if (!msr_info->host_initiated && !ignore_msrs &&
 		    !guest_cpuid_has(vcpu, X86_FEATURE_PDCM))
 			return 1;
 		msr_info->data = vcpu->arch.perf_capabilities;

--
Jari Ruusu  4096R/8132F189 12D6 4C3A DCDA 0AA4 27BD  ACDF F073 3C80 8132 F189


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] kvm ignores ignore_msrs=1 VETO for some MSRs
  2023-09-05 18:07 [PATCH] kvm ignores ignore_msrs=1 VETO for some MSRs Jari Ruusu
@ 2023-09-05 19:27 ` Sean Christopherson
  2023-09-05 20:41   ` Jari Ruusu
  0 siblings, 1 reply; 6+ messages in thread
From: Sean Christopherson @ 2023-09-05 19:27 UTC (permalink / raw)
  To: Jari Ruusu; +Cc: Paolo Bonzini, kvm@vger.kernel.org

On Tue, Sep 05, 2023, Jari Ruusu wrote:
> This problem is old regression. This type of setup worked fine on older
> linux-4.x hosts but fails on linux-5.10.x hosts. I remember seeing this fail
> as early as year 2021. I just haven't had time to look at it earlier.
> 
> Relevant qemu parameters:
>   -machine pc-1.0
>   -cpu Skylake-Server-IBRS,+md-clear,+pcid,+invpcid,+ssbd,+clflushopt
>   -enable-kvm
> If I change CPU model to "Nehalem" then it boots OK.
> 
> KVM stuff is built-in to host kernel and my kernel boot parameters include:
>   kvm-intel.ept=0 l1tf=off kvm.ignore_msrs=1
> so any invalid RDMSR reads should not fail because of ignore_msrs=1 VETO,
> but at least MSR_IA32_PERF_CAPABILITIES RDMSR read does indeed fail.

No, as documented in Documentation/admin-guide/kernel-parameters.txt, ignore_msrs
only applies to _unhandled_ MSRs, i.e. MSRs that KVM knows nothing about.

  kvm.ignore_msrs=[KVM] Ignore guest accesses to unhandled MSRs.

The reason this introduces a failure in your setup is that KVM didn't have any
handling for MSR_IA32_PERF_CAPABILITIES prior to commit 27461da31089 ("KVM: x86/pmu:
Support full width counting"). 

> Full C-language source file can be viewed here:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/x86/kernel/cpu/perf_event_intel.c?h=linux-3.10.y#n2023
> 
> My understanding of this failure is that it is combination of many factors,
> including:
> 
> 1) Qemu version is old
> 2) Qemu guest CPUID flags may be "Frankenstein" 

It's a bit Frankenstein, but architecturally it's completely valid.

> 3) old linux-3.10.108 x86_64 kernel may be doing something questionable

The guest kernel is the real culprit.  It is assuming that an MSR exists based on
the PMU version instead of checking the CPUID feature flag that enumerates the
existence of the MSR.

The bug was fixed almost a decade ago, but that fix obviously didn't make it to
the 3.10 kernel.

commit c9b08884c9c98929ec2d8abafd78e89062d01ee7
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Mon Feb 3 14:29:03 2014 +0100

    perf/x86: Correctly use FEATURE_PDCM
    
    The current code simply assumes Intel Arch PerfMon v2+ to have
    the IA32_PERF_CAPABILITIES MSR; the SDM specifies that we should check
    CPUID[1].ECX[15] (aka, FEATURE_PDCM) instead.
    
    This was found by KVM which implements v2+ but didn't provide the
    capabilities MSR. Change the code to DTRT; KVM will also implement the
    MSR and return 0.


> 4) newer host linux KVM is not always honoring RDMSR ignore_msrs=1 VETO
> 
> My reading linux-5.10.194 kernel source identified following questionable
> handling ignore_msrs=1 VETO. This same problem appears to be present in
> recently released linux-6.5 too, but so far I have not tested this
> with linux-6.5.x host kernels yet.

While this is arguably a regression, this isn't going to be addressed in KVM.

ignore_msrs is off by default, and is explicitly documented as applying only to
unhandled MSRs.  The documentation could certainly do a better job of explaining
the potential pitfalls and long-term consequences of enabling ignore_msrs, but
hack-a-fixing this one MSR to fudge around a guest bug isn't going to happen,
and a broad "ignore all RDMSR/WRMSR faults" knob would likely break other guests,
e.g. would make it impossible to probe for MSR existence, and so such a knob would
be unusable.

As for working around this in your setup, assuming you don't actually need a
virtual PMU in the guest, the simplest workaround would be to turn off vPMU
support in KVM, i.e. boot with kvm.enable_pmu=0.  That _should_ cause QEMU to not
advertise a PMU to the guest.  Alternatively, if supported by QEMU, you could try
enumerating a version 1 vPMU to the guest.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] kvm ignores ignore_msrs=1 VETO for some MSRs
  2023-09-05 19:27 ` Sean Christopherson
@ 2023-09-05 20:41   ` Jari Ruusu
  2023-09-05 20:55     ` Sean Christopherson
  0 siblings, 1 reply; 6+ messages in thread
From: Jari Ruusu @ 2023-09-05 20:41 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm@vger.kernel.org

On Tuesday, September 5th, 2023 at 22:27, Sean Christopherson <seanjc@google.com> wrote:
> While this is arguably a regression, this isn't going to be addressed in KVM.

OK, I understand.

> As for working around this in your setup, assuming you don't actually need a
> virtual PMU in the guest, the simplest workaround would be to turn off vPMU
> support in KVM, i.e. boot with kvm.enable_pmu=0. That should cause QEMU to not
> advertise a PMU to the guest.

Newer host kernels seem to have kvm.enable_pmu parameter,
but linux-5.10.y kernels do not have that.

> Alternatively, if supported by QEMU, you could try enumerating a version 1
> vPMU to the guest.

That old version of Qemu does not seem to have that available.

I am perfectly OK patching my kernels with my quick-and-dirty fix until I
upgrade to newer kernel series.

Thank you for your reply. It clarified many things for me.

--
Jari Ruusu  4096R/8132F189 12D6 4C3A DCDA 0AA4 27BD  ACDF F073 3C80 8132 F189


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] kvm ignores ignore_msrs=1 VETO for some MSRs
  2023-09-05 20:41   ` Jari Ruusu
@ 2023-09-05 20:55     ` Sean Christopherson
  2023-09-05 21:02       ` Jari Ruusu
  2023-09-07 10:55       ` Paolo Bonzini
  0 siblings, 2 replies; 6+ messages in thread
From: Sean Christopherson @ 2023-09-05 20:55 UTC (permalink / raw)
  To: Jari Ruusu; +Cc: Paolo Bonzini, kvm@vger.kernel.org

On Tue, Sep 05, 2023, Jari Ruusu wrote:
> On Tuesday, September 5th, 2023 at 22:27, Sean Christopherson <seanjc@google.com> wrote:
> > As for working around this in your setup, assuming you don't actually need a
> > virtual PMU in the guest, the simplest workaround would be to turn off vPMU
> > support in KVM, i.e. boot with kvm.enable_pmu=0. That should cause QEMU to not
> > advertise a PMU to the guest.
> 
> Newer host kernels seem to have kvm.enable_pmu parameter,
> but linux-5.10.y kernels do not have that.

Gah, try kvm.pmu.

Commit 4732f2444acd ("KVM: x86: Making the module parameter of vPMU more common")
renamed the variable to avoid collisions, but it unnecessarily changed the name
exposed to userspace too.  My gut reaction is to revert the param name back to
"pmu".

Paolo, any idea if reverting "enable_pmu" back to "pmu" would be worth the churn?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] kvm ignores ignore_msrs=1 VETO for some MSRs
  2023-09-05 20:55     ` Sean Christopherson
@ 2023-09-05 21:02       ` Jari Ruusu
  2023-09-07 10:55       ` Paolo Bonzini
  1 sibling, 0 replies; 6+ messages in thread
From: Jari Ruusu @ 2023-09-05 21:02 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm@vger.kernel.org

On Tuesday, September 5th, 2023 at 23:55, Sean Christopherson <seanjc@google.com> wrote:
> On Tue, Sep 05, 2023, Jari Ruusu wrote:
> > Newer host kernels seem to have kvm.enable_pmu parameter,
> > but linux-5.10.y kernels do not have that.
> 
> Gah, try kvm.pmu.

No kvm.pmu parameter in linux-5.10.y either.

--
Jari Ruusu  4096R/8132F189 12D6 4C3A DCDA 0AA4 27BD  ACDF F073 3C80 8132 F189


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] kvm ignores ignore_msrs=1 VETO for some MSRs
  2023-09-05 20:55     ` Sean Christopherson
  2023-09-05 21:02       ` Jari Ruusu
@ 2023-09-07 10:55       ` Paolo Bonzini
  1 sibling, 0 replies; 6+ messages in thread
From: Paolo Bonzini @ 2023-09-07 10:55 UTC (permalink / raw)
  To: Sean Christopherson, Jari Ruusu; +Cc: kvm@vger.kernel.org

On 9/5/23 22:55, Sean Christopherson wrote:
> On Tue, Sep 05, 2023, Jari Ruusu wrote:
>> On Tuesday, September 5th, 2023 at 22:27, Sean Christopherson <seanjc@google.com> wrote:
>>> As for working around this in your setup, assuming you don't actually need a
>>> virtual PMU in the guest, the simplest workaround would be to turn off vPMU
>>> support in KVM, i.e. boot with kvm.enable_pmu=0. That should cause QEMU to not
>>> advertise a PMU to the guest.
>>
>> Newer host kernels seem to have kvm.enable_pmu parameter,
>> but linux-5.10.y kernels do not have that.
> 
> Gah, try kvm.pmu.
> 
> Commit 4732f2444acd ("KVM: x86: Making the module parameter of vPMU more common")
> renamed the variable to avoid collisions, but it unnecessarily changed the name
> exposed to userspace too.  My gut reaction is to revert the param name back to
> "pmu".
> 
> Paolo, any idea if reverting "enable_pmu" back to "pmu" would be worth the churn?

Hmm, 5.17 is almost a couple years old so I'm debating it...  Probably 
not, but I wouldn't complain either way.

Paolo


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-09-07 19:31 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-05 18:07 [PATCH] kvm ignores ignore_msrs=1 VETO for some MSRs Jari Ruusu
2023-09-05 19:27 ` Sean Christopherson
2023-09-05 20:41   ` Jari Ruusu
2023-09-05 20:55     ` Sean Christopherson
2023-09-05 21:02       ` Jari Ruusu
2023-09-07 10:55       ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox