public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [Bug 218792] New: Guest call trace with mwait enabled
@ 2024-04-30  7:32 bugzilla-daemon
  2024-04-30 11:32 ` [Bug 218792] " bugzilla-daemon
                   ` (7 more replies)
  0 siblings, 8 replies; 11+ messages in thread
From: bugzilla-daemon @ 2024-04-30  7:32 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=218792

            Bug ID: 218792
           Summary: Guest call trace with mwait enabled
           Product: Virtualization
           Version: unspecified
          Hardware: Intel
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P3
         Component: kvm
          Assignee: virtualization_kvm@kernel-bugs.osdl.org
          Reporter: farrah.chen@intel.com
        Regression: No

Environment:
host/guest kernel:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
e67572cd220(v6.9-rc6)
QEMU: https://gitlab.com/qemu-project/qemu.git master 5c6528dce86d
Host/Guest OS: Centos stream9/Ubuntu24.04

Bug detail description: 
Boot Guest with mwait enabled(-overcommit cpu-pm=on), guest call trace
"unchecked MSR access error"

Reproduce steps:
img=centos9.qcow2
qemu-system-x86_64 \
    -name legacy,debug-threads=on \
    -overcommit cpu-pm=on \
    -accel kvm -smp 8 -m 8G -cpu host \
    -drive file=${img},if=none,id=virtio-disk0 \
    -device virtio-blk-pci,drive=virtio-disk0 \
    -device virtio-net-pci,netdev=nic0 -netdev
user,id=nic0,hostfwd=tcp::10023-:22 \
    -vnc :1 -serial stdio

Guest boot with call trace:
[ 0.475344] unchecked MSR access error: RDMSR from 0xe2 at rIP:
0xffffffffb5a966b8 (native_read_msr+0x8/0x40)
[ 0.476465] Call Trace:
[ 0.476763] <TASK>
[ 0.477027] ? ex_handler_msr+0x128/0x140
[ 0.477460] ? fixup_exception+0x166/0x3c0
[ 0.477934] ? exc_general_protection+0xdc/0x3c0
[ 0.478481] ? asm_exc_general_protection+0x26/0x30
[ 0.479052] ? __pfx_intel_idle_init+0x10/0x10
[ 0.479587] ? native_read_msr+0x8/0x40
[ 0.480057] intel_idle_init_cstates_icpu.constprop.0+0x5e/0x560
[ 0.480747] ? __pfx_intel_idle_init+0x10/0x10
[ 0.481275] intel_idle_init+0x161/0x360
[ 0.481742] do_one_initcall+0x45/0x220
[ 0.482209] do_initcalls+0xac/0x130
[ 0.482643] kernel_init_freeable+0x134/0x1e0
[ 0.483159] ? __pfx_kernel_init+0x10/0x10
[ 0.483648] kernel_init+0x1a/0x1c0
[ 0.484087] ret_from_fork+0x31/0x50
[ 0.484541] ? __pfx_kernel_init+0x10/0x10
[ 0.485030] ret_from_fork_asm+0x1a/0x30
[ 0.485462] </TASK>

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 218792] Guest call trace with mwait enabled
  2024-04-30  7:32 [Bug 218792] New: Guest call trace with mwait enabled bugzilla-daemon
@ 2024-04-30 11:32 ` bugzilla-daemon
  2024-04-30 16:41 ` [Bug 218792] New: " Sean Christopherson
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2024-04-30 11:32 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=218792

Artem S. Tashkinov (aros@gmx.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Kernel Version|                            |6.9-rc6

--- Comment #1 from Artem S. Tashkinov (aros@gmx.com) ---
Is this a regression? Could you bisect?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Bug 218792] New: Guest call trace with mwait enabled
  2024-04-30  7:32 [Bug 218792] New: Guest call trace with mwait enabled bugzilla-daemon
  2024-04-30 11:32 ` [Bug 218792] " bugzilla-daemon
@ 2024-04-30 16:41 ` Sean Christopherson
  2025-07-31  8:59   ` Chenyi Qiang
  2024-04-30 16:42 ` [Bug 218792] " bugzilla-daemon
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 11+ messages in thread
From: Sean Christopherson @ 2024-04-30 16:41 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: kvm

On Tue, Apr 30, 2024, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=218792
> 
>             Bug ID: 218792
>            Summary: Guest call trace with mwait enabled
>            Product: Virtualization
>            Version: unspecified
>           Hardware: Intel
>                 OS: Linux
>             Status: NEW
>           Severity: normal
>           Priority: P3
>          Component: kvm
>           Assignee: virtualization_kvm@kernel-bugs.osdl.org
>           Reporter: farrah.chen@intel.com
>         Regression: No
> 
> Environment:
> host/guest kernel:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> e67572cd220(v6.9-rc6)
> QEMU: https://gitlab.com/qemu-project/qemu.git master 5c6528dce86d
> Host/Guest OS: Centos stream9/Ubuntu24.04
> 
> Bug detail description: 
> Boot Guest with mwait enabled(-overcommit cpu-pm=on), guest call trace
> "unchecked MSR access error"
> 
> Reproduce steps:
> img=centos9.qcow2
> qemu-system-x86_64 \
>     -name legacy,debug-threads=on \
>     -overcommit cpu-pm=on \
>     -accel kvm -smp 8 -m 8G -cpu host \
>     -drive file=${img},if=none,id=virtio-disk0 \
>     -device virtio-blk-pci,drive=virtio-disk0 \
>     -device virtio-net-pci,netdev=nic0 -netdev
> user,id=nic0,hostfwd=tcp::10023-:22 \
>     -vnc :1 -serial stdio
> 
> Guest boot with call trace:
> [ 0.475344] unchecked MSR access error: RDMSR from 0xe2 at rIP:

MSR 0xE2 is MSR_PKG_CST_CONFIG_CONTROL, which hpet_is_pc10_damaged() assumes
exists if PC10 substates are supported. KVM doesn't emulate/support
MSR_PKG_CST_CONFIG_CONTROL, i.e. injects a #GP on the guest RDMSR, hence the
splat.  This isn't a KVM bug as KVM explicitly advertises all zeros for the
MWAIT CPUID leaf, i.e. QEMU is effectively telling the guest that PC10 substates
are support without KVM's explicit blessing.

That said, this is arguably a kernel bug (guest side), as I don't see anything
in the SDM that _requires_ MSR_PKG_CST_CONFIG_CONTROL to exist if PC10 substates
are supported.

The issue is likely benign, other that than obvious WARN.  The kernel gracefully
handles the #GP and zeros the result, i.e. will always think PC10 is _disabled_,
which may or may not be correct, but is functionally ok if the HPET is being
emulated by the host, which it probably is.

	rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, pcfg);
	if ((pcfg & 0xF) < 8)
		return false;

The most straightforward fix, and probably the most correct all around, would be
to use rdmsrl_safe() to suppress the WARN, i.e. have the kernel not yell if
MSR_PKG_CST_CONFIG_CONTROL doesn't exist.  Unless HPET is also being passed
through, that'll do the right thing when Linux is a guest.  And if a setup also
passes through HPET, then the VMM can also trap-and-emulate MSR_PKG_CST_CONFIG_CONTROL
as appropriate (doing so in QEMU without KVM support might be impossible, though
again it's unnecessary if QEMU is emulating the HPET).

diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index c96ae8fee95e..2afafff18f92 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -980,7 +980,9 @@ static bool __init hpet_is_pc10_damaged(void)
                return false;
 
        /* Check whether PC10 is enabled in PKG C-state limit */
-       rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, pcfg);
+       if (rdmsrl_safe(MSR_PKG_CST_CONFIG_CONTROL, pcfg))
+               return false;
+
        if ((pcfg & 0xF) < 8)
                return false;

> 0xffffffffb5a966b8 (native_read_msr+0x8/0x40)
> [ 0.476465] Call Trace:
> [ 0.476763] <TASK>
> [ 0.477027] ? ex_handler_msr+0x128/0x140
> [ 0.477460] ? fixup_exception+0x166/0x3c0
> [ 0.477934] ? exc_general_protection+0xdc/0x3c0
> [ 0.478481] ? asm_exc_general_protection+0x26/0x30
> [ 0.479052] ? __pfx_intel_idle_init+0x10/0x10
> [ 0.479587] ? native_read_msr+0x8/0x40
> [ 0.480057] intel_idle_init_cstates_icpu.constprop.0+0x5e/0x560
> [ 0.480747] ? __pfx_intel_idle_init+0x10/0x10
> [ 0.481275] intel_idle_init+0x161/0x360
> [ 0.481742] do_one_initcall+0x45/0x220
> [ 0.482209] do_initcalls+0xac/0x130
> [ 0.482643] kernel_init_freeable+0x134/0x1e0
> [ 0.483159] ? __pfx_kernel_init+0x10/0x10
> [ 0.483648] kernel_init+0x1a/0x1c0
> [ 0.484087] ret_from_fork+0x31/0x50
> [ 0.484541] ? __pfx_kernel_init+0x10/0x10
> [ 0.485030] ret_from_fork_asm+0x1a/0x30
> [ 0.485462] </TASK>
> 
> -- 
> You may reply to this email to add a comment.
> 
> You are receiving this mail because:
> You are watching the assignee of the bug.

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Bug 218792] Guest call trace with mwait enabled
  2024-04-30  7:32 [Bug 218792] New: Guest call trace with mwait enabled bugzilla-daemon
  2024-04-30 11:32 ` [Bug 218792] " bugzilla-daemon
  2024-04-30 16:41 ` [Bug 218792] New: " Sean Christopherson
@ 2024-04-30 16:42 ` bugzilla-daemon
  2024-07-12  8:11 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2024-04-30 16:42 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=218792

--- Comment #2 from Sean Christopherson (seanjc@google.com) ---
On Tue, Apr 30, 2024, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=218792
> 
>             Bug ID: 218792
>            Summary: Guest call trace with mwait enabled
>            Product: Virtualization
>            Version: unspecified
>           Hardware: Intel
>                 OS: Linux
>             Status: NEW
>           Severity: normal
>           Priority: P3
>          Component: kvm
>           Assignee: virtualization_kvm@kernel-bugs.osdl.org
>           Reporter: farrah.chen@intel.com
>         Regression: No
> 
> Environment:
> host/guest kernel:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> e67572cd220(v6.9-rc6)
> QEMU: https://gitlab.com/qemu-project/qemu.git master 5c6528dce86d
> Host/Guest OS: Centos stream9/Ubuntu24.04
> 
> Bug detail description: 
> Boot Guest with mwait enabled(-overcommit cpu-pm=on), guest call trace
> "unchecked MSR access error"
> 
> Reproduce steps:
> img=centos9.qcow2
> qemu-system-x86_64 \
>     -name legacy,debug-threads=on \
>     -overcommit cpu-pm=on \
>     -accel kvm -smp 8 -m 8G -cpu host \
>     -drive file=${img},if=none,id=virtio-disk0 \
>     -device virtio-blk-pci,drive=virtio-disk0 \
>     -device virtio-net-pci,netdev=nic0 -netdev
> user,id=nic0,hostfwd=tcp::10023-:22 \
>     -vnc :1 -serial stdio
> 
> Guest boot with call trace:
> [ 0.475344] unchecked MSR access error: RDMSR from 0xe2 at rIP:

MSR 0xE2 is MSR_PKG_CST_CONFIG_CONTROL, which hpet_is_pc10_damaged() assumes
exists if PC10 substates are supported. KVM doesn't emulate/support
MSR_PKG_CST_CONFIG_CONTROL, i.e. injects a #GP on the guest RDMSR, hence the
splat.  This isn't a KVM bug as KVM explicitly advertises all zeros for the
MWAIT CPUID leaf, i.e. QEMU is effectively telling the guest that PC10
substates
are support without KVM's explicit blessing.

That said, this is arguably a kernel bug (guest side), as I don't see anything
in the SDM that _requires_ MSR_PKG_CST_CONFIG_CONTROL to exist if PC10
substates
are supported.

The issue is likely benign, other that than obvious WARN.  The kernel
gracefully
handles the #GP and zeros the result, i.e. will always think PC10 is
_disabled_,
which may or may not be correct, but is functionally ok if the HPET is being
emulated by the host, which it probably is.

        rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, pcfg);
        if ((pcfg & 0xF) < 8)
                return false;

The most straightforward fix, and probably the most correct all around, would
be
to use rdmsrl_safe() to suppress the WARN, i.e. have the kernel not yell if
MSR_PKG_CST_CONFIG_CONTROL doesn't exist.  Unless HPET is also being passed
through, that'll do the right thing when Linux is a guest.  And if a setup also
passes through HPET, then the VMM can also trap-and-emulate
MSR_PKG_CST_CONFIG_CONTROL
as appropriate (doing so in QEMU without KVM support might be impossible,
though
again it's unnecessary if QEMU is emulating the HPET).

diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index c96ae8fee95e..2afafff18f92 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -980,7 +980,9 @@ static bool __init hpet_is_pc10_damaged(void)
                return false;

        /* Check whether PC10 is enabled in PKG C-state limit */
-       rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, pcfg);
+       if (rdmsrl_safe(MSR_PKG_CST_CONFIG_CONTROL, pcfg))
+               return false;
+
        if ((pcfg & 0xF) < 8)
                return false;

> 0xffffffffb5a966b8 (native_read_msr+0x8/0x40)
> [ 0.476465] Call Trace:
> [ 0.476763] <TASK>
> [ 0.477027] ? ex_handler_msr+0x128/0x140
> [ 0.477460] ? fixup_exception+0x166/0x3c0
> [ 0.477934] ? exc_general_protection+0xdc/0x3c0
> [ 0.478481] ? asm_exc_general_protection+0x26/0x30
> [ 0.479052] ? __pfx_intel_idle_init+0x10/0x10
> [ 0.479587] ? native_read_msr+0x8/0x40
> [ 0.480057] intel_idle_init_cstates_icpu.constprop.0+0x5e/0x560
> [ 0.480747] ? __pfx_intel_idle_init+0x10/0x10
> [ 0.481275] intel_idle_init+0x161/0x360
> [ 0.481742] do_one_initcall+0x45/0x220
> [ 0.482209] do_initcalls+0xac/0x130
> [ 0.482643] kernel_init_freeable+0x134/0x1e0
> [ 0.483159] ? __pfx_kernel_init+0x10/0x10
> [ 0.483648] kernel_init+0x1a/0x1c0
> [ 0.484087] ret_from_fork+0x31/0x50
> [ 0.484541] ? __pfx_kernel_init+0x10/0x10
> [ 0.485030] ret_from_fork_asm+0x1a/0x30
> [ 0.485462] </TASK>
> 
> -- 
> You may reply to this email to add a comment.
> 
> You are receiving this mail because:
> You are watching the assignee of the bug.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Bug 218792] Guest call trace with mwait enabled
  2024-04-30  7:32 [Bug 218792] New: Guest call trace with mwait enabled bugzilla-daemon
                   ` (2 preceding siblings ...)
  2024-04-30 16:42 ` [Bug 218792] " bugzilla-daemon
@ 2024-07-12  8:11 ` bugzilla-daemon
  2024-07-12  8:40 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2024-07-12  8:11 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=218792

Ma Xiangfei (xiangfeix.ma@intel.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |xiangfeix.ma@intel.com

--- Comment #3 from Ma Xiangfei (xiangfeix.ma@intel.com) ---
I have tried this patch, but it can still be reproduced.
Host/Guest OS: CentOS 9
Host kernel: 6.10.0-rc2
Guest kernel: 6.10.0-rc7+
Host commit: 02b0d3b9 (https://git.kernel.org/pub/scm/virt/kvm/kvm.git)
Guest commit: 43db1e03c086ed20cc75808d3f45e780ec4ca26e
QEMU commit: b9ee1387

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 218792] Guest call trace with mwait enabled
  2024-04-30  7:32 [Bug 218792] New: Guest call trace with mwait enabled bugzilla-daemon
                   ` (3 preceding siblings ...)
  2024-07-12  8:11 ` bugzilla-daemon
@ 2024-07-12  8:40 ` bugzilla-daemon
  2025-07-31  8:59 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2024-07-12  8:40 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=218792

--- Comment #4 from Ma Xiangfei (xiangfeix.ma@intel.com) ---
(In reply to Ma Xiangfei from comment #3)
> I have tried this patch, but it can still be reproduced.
> Host/Guest OS: CentOS 9
> Host kernel: 6.10.0-rc2
> Guest kernel: 6.10.0-rc7+ (Using Sean patch)
> Host commit: 02b0d3b9 (https://git.kernel.org/pub/scm/virt/kvm/kvm.git)
> Guest commit: 43db1e03c086ed20cc75808d3f45e780ec4ca26e
> QEMU commit: b9ee1387

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Bug 218792] New: Guest call trace with mwait enabled
  2024-04-30 16:41 ` [Bug 218792] New: " Sean Christopherson
@ 2025-07-31  8:59   ` Chenyi Qiang
  0 siblings, 0 replies; 11+ messages in thread
From: Chenyi Qiang @ 2025-07-31  8:59 UTC (permalink / raw)
  To: Sean Christopherson, bugzilla-daemon, Rafael J. Wysocki,
	Len Brown
  Cc: kvm, Xiaoyao Li



On 5/1/2024 12:41 AM, Sean Christopherson wrote:
> On Tue, Apr 30, 2024, bugzilla-daemon@kernel.org wrote:
>> https://bugzilla.kernel.org/show_bug.cgi?id=218792
>>
>>             Bug ID: 218792
>>            Summary: Guest call trace with mwait enabled
>>            Product: Virtualization
>>            Version: unspecified
>>           Hardware: Intel
>>                 OS: Linux
>>             Status: NEW
>>           Severity: normal
>>           Priority: P3
>>          Component: kvm
>>           Assignee: virtualization_kvm@kernel-bugs.osdl.org
>>           Reporter: farrah.chen@intel.com
>>         Regression: No
>>
>> Environment:
>> host/guest kernel:
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>> e67572cd220(v6.9-rc6)
>> QEMU: https://gitlab.com/qemu-project/qemu.git master 5c6528dce86d
>> Host/Guest OS: Centos stream9/Ubuntu24.04
>>
>> Bug detail description: 
>> Boot Guest with mwait enabled(-overcommit cpu-pm=on), guest call trace
>> "unchecked MSR access error"
>>
>> Reproduce steps:
>> img=centos9.qcow2
>> qemu-system-x86_64 \
>>     -name legacy,debug-threads=on \
>>     -overcommit cpu-pm=on \
>>     -accel kvm -smp 8 -m 8G -cpu host \
>>     -drive file=${img},if=none,id=virtio-disk0 \
>>     -device virtio-blk-pci,drive=virtio-disk0 \
>>     -device virtio-net-pci,netdev=nic0 -netdev
>> user,id=nic0,hostfwd=tcp::10023-:22 \
>>     -vnc :1 -serial stdio
>>
>> Guest boot with call trace:
>> [ 0.475344] unchecked MSR access error: RDMSR from 0xe2 at rIP:
> 
> MSR 0xE2 is MSR_PKG_CST_CONFIG_CONTROL, which hpet_is_pc10_damaged() assumes
> exists if PC10 substates are supported. KVM doesn't emulate/support
> MSR_PKG_CST_CONFIG_CONTROL, i.e. injects a #GP on the guest RDMSR, hence the
> splat.  This isn't a KVM bug as KVM explicitly advertises all zeros for the
> MWAIT CPUID leaf, i.e. QEMU is effectively telling the guest that PC10 substates
> are support without KVM's explicit blessing.
> 
> That said, this is arguably a kernel bug (guest side), as I don't see anything
> in the SDM that _requires_ MSR_PKG_CST_CONFIG_CONTROL to exist if PC10 substates
> are supported.
> 
> The issue is likely benign, other that than obvious WARN.  The kernel gracefully
> handles the #GP and zeros the result, i.e. will always think PC10 is _disabled_,
> which may or may not be correct, but is functionally ok if the HPET is being
> emulated by the host, which it probably is.
> 
> 	rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, pcfg);
> 	if ((pcfg & 0xF) < 8)
> 		return false;
> 
> The most straightforward fix, and probably the most correct all around, would be
> to use rdmsrl_safe() to suppress the WARN, i.e. have the kernel not yell if
> MSR_PKG_CST_CONFIG_CONTROL doesn't exist.  Unless HPET is also being passed
> through, that'll do the right thing when Linux is a guest.  And if a setup also
> passes through HPET, then the VMM can also trap-and-emulate MSR_PKG_CST_CONFIG_CONTROL
> as appropriate (doing so in QEMU without KVM support might be impossible, though
> again it's unnecessary if QEMU is emulating the HPET).
> 
> diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
> index c96ae8fee95e..2afafff18f92 100644
> --- a/arch/x86/kernel/hpet.c
> +++ b/arch/x86/kernel/hpet.c
> @@ -980,7 +980,9 @@ static bool __init hpet_is_pc10_damaged(void)
>                 return false;
>  
>         /* Check whether PC10 is enabled in PKG C-state limit */
> -       rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, pcfg);
> +       if (rdmsrl_safe(MSR_PKG_CST_CONFIG_CONTROL, pcfg))
> +               return false;
> +
>         if ((pcfg & 0xF) < 8)
>                 return false;

There are three places which could access MSR_PKG_CST_CONFIG_CONTROL.
1. hpet_is_pc10_damaged() in hpet.c
2. *_idle_state_table_update() in intel_idle.c (This BUG comes from this path in VMs)
3. auto_demotion_disable() in intel_idle.c

This MSR seems not architectural but CPU model specific.

Besides the case 1 as mentioned, the intel_idle driver also uses it to query the
lowest processor-specific C-state for the package (case 2) and to disable auto demotion
(case 3) based on the specific model.

I assume both case 2 and 3 are aimed to improve energy-efficiency. For example,
spr_idle_state_table_update() adjusts the exit_latency/target_residency to hardcoded ones based on
the package C-state limit. It seems unreasonable in VMs as the hardcoded values are measured in host
and the guest CPU model may not match the host one if we only pass-thru this MSR. Similarly,
for case 3, there is no guarantee that disabling auto demotion can improve energy efficiency in a
emulated CPU model.

Since there is no such fine-grained power management virtualization support yet. Can we change
all the rdmsr/wrmsr(MSR_PKG_CST_CONFIG_CONTROL) to the *_safe() variant to skip the related operation
in VMs?

> 
>> 0xffffffffb5a966b8 (native_read_msr+0x8/0x40)
>> [ 0.476465] Call Trace:
>> [ 0.476763] <TASK>
>> [ 0.477027] ? ex_handler_msr+0x128/0x140
>> [ 0.477460] ? fixup_exception+0x166/0x3c0
>> [ 0.477934] ? exc_general_protection+0xdc/0x3c0
>> [ 0.478481] ? asm_exc_general_protection+0x26/0x30
>> [ 0.479052] ? __pfx_intel_idle_init+0x10/0x10
>> [ 0.479587] ? native_read_msr+0x8/0x40
>> [ 0.480057] intel_idle_init_cstates_icpu.constprop.0+0x5e/0x560
>> [ 0.480747] ? __pfx_intel_idle_init+0x10/0x10
>> [ 0.481275] intel_idle_init+0x161/0x360
>> [ 0.481742] do_one_initcall+0x45/0x220
>> [ 0.482209] do_initcalls+0xac/0x130
>> [ 0.482643] kernel_init_freeable+0x134/0x1e0
>> [ 0.483159] ? __pfx_kernel_init+0x10/0x10
>> [ 0.483648] kernel_init+0x1a/0x1c0
>> [ 0.484087] ret_from_fork+0x31/0x50
>> [ 0.484541] ? __pfx_kernel_init+0x10/0x10
>> [ 0.485030] ret_from_fork_asm+0x1a/0x30
>> [ 0.485462] </TASK>
>>
>> -- 
>> You may reply to this email to add a comment.
>>
>> You are receiving this mail because:
>> You are watching the assignee of the bug.
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 218792] Guest call trace with mwait enabled
  2024-04-30  7:32 [Bug 218792] New: Guest call trace with mwait enabled bugzilla-daemon
                   ` (4 preceding siblings ...)
  2024-07-12  8:40 ` bugzilla-daemon
@ 2025-07-31  8:59 ` bugzilla-daemon
  2025-08-08 21:05 ` bugzilla-daemon
  2025-08-08 22:59 ` bugzilla-daemon
  7 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2025-07-31  8:59 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=218792

--- Comment #5 from chenyi.qiang@intel.com ---
On 5/1/2024 12:41 AM, Sean Christopherson wrote:
> On Tue, Apr 30, 2024, bugzilla-daemon@kernel.org wrote:
>> https://bugzilla.kernel.org/show_bug.cgi?id=218792
>>
>>             Bug ID: 218792
>>            Summary: Guest call trace with mwait enabled
>>            Product: Virtualization
>>            Version: unspecified
>>           Hardware: Intel
>>                 OS: Linux
>>             Status: NEW
>>           Severity: normal
>>           Priority: P3
>>          Component: kvm
>>           Assignee: virtualization_kvm@kernel-bugs.osdl.org
>>           Reporter: farrah.chen@intel.com
>>         Regression: No
>>
>> Environment:
>> host/guest kernel:
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>> e67572cd220(v6.9-rc6)
>> QEMU: https://gitlab.com/qemu-project/qemu.git master 5c6528dce86d
>> Host/Guest OS: Centos stream9/Ubuntu24.04
>>
>> Bug detail description: 
>> Boot Guest with mwait enabled(-overcommit cpu-pm=on), guest call trace
>> "unchecked MSR access error"
>>
>> Reproduce steps:
>> img=centos9.qcow2
>> qemu-system-x86_64 \
>>     -name legacy,debug-threads=on \
>>     -overcommit cpu-pm=on \
>>     -accel kvm -smp 8 -m 8G -cpu host \
>>     -drive file=${img},if=none,id=virtio-disk0 \
>>     -device virtio-blk-pci,drive=virtio-disk0 \
>>     -device virtio-net-pci,netdev=nic0 -netdev
>> user,id=nic0,hostfwd=tcp::10023-:22 \
>>     -vnc :1 -serial stdio
>>
>> Guest boot with call trace:
>> [ 0.475344] unchecked MSR access error: RDMSR from 0xe2 at rIP:
> 
> MSR 0xE2 is MSR_PKG_CST_CONFIG_CONTROL, which hpet_is_pc10_damaged() assumes
> exists if PC10 substates are supported. KVM doesn't emulate/support
> MSR_PKG_CST_CONFIG_CONTROL, i.e. injects a #GP on the guest RDMSR, hence the
> splat.  This isn't a KVM bug as KVM explicitly advertises all zeros for the
> MWAIT CPUID leaf, i.e. QEMU is effectively telling the guest that PC10
> substates
> are support without KVM's explicit blessing.
> 
> That said, this is arguably a kernel bug (guest side), as I don't see
> anything
> in the SDM that _requires_ MSR_PKG_CST_CONFIG_CONTROL to exist if PC10
> substates
> are supported.
> 
> The issue is likely benign, other that than obvious WARN.  The kernel
> gracefully
> handles the #GP and zeros the result, i.e. will always think PC10 is
> _disabled_,
> which may or may not be correct, but is functionally ok if the HPET is being
> emulated by the host, which it probably is.
> 
>       rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, pcfg);
>       if ((pcfg & 0xF) < 8)
>               return false;
> 
> The most straightforward fix, and probably the most correct all around, would
> be
> to use rdmsrl_safe() to suppress the WARN, i.e. have the kernel not yell if
> MSR_PKG_CST_CONFIG_CONTROL doesn't exist.  Unless HPET is also being passed
> through, that'll do the right thing when Linux is a guest.  And if a setup
> also
> passes through HPET, then the VMM can also trap-and-emulate
> MSR_PKG_CST_CONFIG_CONTROL
> as appropriate (doing so in QEMU without KVM support might be impossible,
> though
> again it's unnecessary if QEMU is emulating the HPET).
> 
> diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
> index c96ae8fee95e..2afafff18f92 100644
> --- a/arch/x86/kernel/hpet.c
> +++ b/arch/x86/kernel/hpet.c
> @@ -980,7 +980,9 @@ static bool __init hpet_is_pc10_damaged(void)
>                 return false;
>  
>         /* Check whether PC10 is enabled in PKG C-state limit */
> -       rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, pcfg);
> +       if (rdmsrl_safe(MSR_PKG_CST_CONFIG_CONTROL, pcfg))
> +               return false;
> +
>         if ((pcfg & 0xF) < 8)
>                 return false;

There are three places which could access MSR_PKG_CST_CONFIG_CONTROL.
1. hpet_is_pc10_damaged() in hpet.c
2. *_idle_state_table_update() in intel_idle.c (This BUG comes from this path
in VMs)
3. auto_demotion_disable() in intel_idle.c

This MSR seems not architectural but CPU model specific.

Besides the case 1 as mentioned, the intel_idle driver also uses it to query
the
lowest processor-specific C-state for the package (case 2) and to disable auto
demotion
(case 3) based on the specific model.

I assume both case 2 and 3 are aimed to improve energy-efficiency. For example,
spr_idle_state_table_update() adjusts the exit_latency/target_residency to
hardcoded ones based on
the package C-state limit. It seems unreasonable in VMs as the hardcoded values
are measured in host
and the guest CPU model may not match the host one if we only pass-thru this
MSR. Similarly,
for case 3, there is no guarantee that disabling auto demotion can improve
energy efficiency in a
emulated CPU model.

Since there is no such fine-grained power management virtualization support
yet. Can we change
all the rdmsr/wrmsr(MSR_PKG_CST_CONFIG_CONTROL) to the *_safe() variant to skip
the related operation
in VMs?

> 
>> 0xffffffffb5a966b8 (native_read_msr+0x8/0x40)
>> [ 0.476465] Call Trace:
>> [ 0.476763] <TASK>
>> [ 0.477027] ? ex_handler_msr+0x128/0x140
>> [ 0.477460] ? fixup_exception+0x166/0x3c0
>> [ 0.477934] ? exc_general_protection+0xdc/0x3c0
>> [ 0.478481] ? asm_exc_general_protection+0x26/0x30
>> [ 0.479052] ? __pfx_intel_idle_init+0x10/0x10
>> [ 0.479587] ? native_read_msr+0x8/0x40
>> [ 0.480057] intel_idle_init_cstates_icpu.constprop.0+0x5e/0x560
>> [ 0.480747] ? __pfx_intel_idle_init+0x10/0x10
>> [ 0.481275] intel_idle_init+0x161/0x360
>> [ 0.481742] do_one_initcall+0x45/0x220
>> [ 0.482209] do_initcalls+0xac/0x130
>> [ 0.482643] kernel_init_freeable+0x134/0x1e0
>> [ 0.483159] ? __pfx_kernel_init+0x10/0x10
>> [ 0.483648] kernel_init+0x1a/0x1c0
>> [ 0.484087] ret_from_fork+0x31/0x50
>> [ 0.484541] ? __pfx_kernel_init+0x10/0x10
>> [ 0.485030] ret_from_fork_asm+0x1a/0x30
>> [ 0.485462] </TASK>
>>
>> -- 
>> You may reply to this email to add a comment.
>>
>> You are receiving this mail because:
>> You are watching the assignee of the bug.
>

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 218792] Guest call trace with mwait enabled
  2024-04-30  7:32 [Bug 218792] New: Guest call trace with mwait enabled bugzilla-daemon
                   ` (5 preceding siblings ...)
  2025-07-31  8:59 ` bugzilla-daemon
@ 2025-08-08 21:05 ` bugzilla-daemon
  2025-08-08 22:59   ` Sean Christopherson
  2025-08-08 22:59 ` bugzilla-daemon
  7 siblings, 1 reply; 11+ messages in thread
From: bugzilla-daemon @ 2025-08-08 21:05 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=218792

Len Brown (lenb@kernel.org) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |lenb@kernel.org

--- Comment #6 from Len Brown (lenb@kernel.org) ---
Re: intel_idle

I agree that the SDM doesn't guarantee this MSR exists
based on the presence of PC10.

I'm not opposed to _safe().

but...

Why is this "platform" advertising PC10 (or any MWAIT C-states) to intel_idle
in the first place?

It seems that it should be advertising none, and the intel_idle driver should
not be loading at all, no?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Bug 218792] Guest call trace with mwait enabled
  2025-08-08 21:05 ` bugzilla-daemon
@ 2025-08-08 22:59   ` Sean Christopherson
  0 siblings, 0 replies; 11+ messages in thread
From: Sean Christopherson @ 2025-08-08 22:59 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: kvm

On Fri, Aug 08, 2025, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=218792
> 
> Len Brown (lenb@kernel.org) changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |lenb@kernel.org
> 
> --- Comment #6 from Len Brown (lenb@kernel.org) ---
> Re: intel_idle
> 
> I agree that the SDM doesn't guarantee this MSR exists
> based on the presence of PC10.
> 
> I'm not opposed to _safe().
> 
> but...
> 
> Why is this "platform" advertising PC10 (or any MWAIT C-states) to intel_idle
> in the first place?

Because letting the guest execute MONITOR/MWAIT natively, and thus get into deeper
sleep states, is advantageous for all the same reasons bare metal CPUs want to
get into deep sleep states, e.g. to let active cores hit higher turbo bins.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 218792] Guest call trace with mwait enabled
  2024-04-30  7:32 [Bug 218792] New: Guest call trace with mwait enabled bugzilla-daemon
                   ` (6 preceding siblings ...)
  2025-08-08 21:05 ` bugzilla-daemon
@ 2025-08-08 22:59 ` bugzilla-daemon
  7 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2025-08-08 22:59 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=218792

--- Comment #7 from Sean Christopherson (seanjc@google.com) ---
On Fri, Aug 08, 2025, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=218792
> 
> Len Brown (lenb@kernel.org) changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |lenb@kernel.org
> 
> --- Comment #6 from Len Brown (lenb@kernel.org) ---
> Re: intel_idle
> 
> I agree that the SDM doesn't guarantee this MSR exists
> based on the presence of PC10.
> 
> I'm not opposed to _safe().
> 
> but...
> 
> Why is this "platform" advertising PC10 (or any MWAIT C-states) to intel_idle
> in the first place?

Because letting the guest execute MONITOR/MWAIT natively, and thus get into
deeper
sleep states, is advantageous for all the same reasons bare metal CPUs want to
get into deep sleep states, e.g. to let active cores hit higher turbo bins.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-08-08 22:59 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-30  7:32 [Bug 218792] New: Guest call trace with mwait enabled bugzilla-daemon
2024-04-30 11:32 ` [Bug 218792] " bugzilla-daemon
2024-04-30 16:41 ` [Bug 218792] New: " Sean Christopherson
2025-07-31  8:59   ` Chenyi Qiang
2024-04-30 16:42 ` [Bug 218792] " bugzilla-daemon
2024-07-12  8:11 ` bugzilla-daemon
2024-07-12  8:40 ` bugzilla-daemon
2025-07-31  8:59 ` bugzilla-daemon
2025-08-08 21:05 ` bugzilla-daemon
2025-08-08 22:59   ` Sean Christopherson
2025-08-08 22:59 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox