From: bugzilla-daemon@kernel.org
To: kvm@vger.kernel.org
Subject: [Bug 218792] Guest call trace with mwait enabled
Date: Thu, 31 Jul 2025 08:59:52 +0000 [thread overview]
Message-ID: <bug-218792-28872-QaPSh9KPXG@https.bugzilla.kernel.org/> (raw)
In-Reply-To: <bug-218792-28872@https.bugzilla.kernel.org/>
https://bugzilla.kernel.org/show_bug.cgi?id=218792
--- Comment #5 from chenyi.qiang@intel.com ---
On 5/1/2024 12:41 AM, Sean Christopherson wrote:
> On Tue, Apr 30, 2024, bugzilla-daemon@kernel.org wrote:
>> https://bugzilla.kernel.org/show_bug.cgi?id=218792
>>
>> Bug ID: 218792
>> Summary: Guest call trace with mwait enabled
>> Product: Virtualization
>> Version: unspecified
>> Hardware: Intel
>> OS: Linux
>> Status: NEW
>> Severity: normal
>> Priority: P3
>> Component: kvm
>> Assignee: virtualization_kvm@kernel-bugs.osdl.org
>> Reporter: farrah.chen@intel.com
>> Regression: No
>>
>> Environment:
>> host/guest kernel:
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>> e67572cd220(v6.9-rc6)
>> QEMU: https://gitlab.com/qemu-project/qemu.git master 5c6528dce86d
>> Host/Guest OS: Centos stream9/Ubuntu24.04
>>
>> Bug detail description:
>> Boot Guest with mwait enabled(-overcommit cpu-pm=on), guest call trace
>> "unchecked MSR access error"
>>
>> Reproduce steps:
>> img=centos9.qcow2
>> qemu-system-x86_64 \
>> -name legacy,debug-threads=on \
>> -overcommit cpu-pm=on \
>> -accel kvm -smp 8 -m 8G -cpu host \
>> -drive file=${img},if=none,id=virtio-disk0 \
>> -device virtio-blk-pci,drive=virtio-disk0 \
>> -device virtio-net-pci,netdev=nic0 -netdev
>> user,id=nic0,hostfwd=tcp::10023-:22 \
>> -vnc :1 -serial stdio
>>
>> Guest boot with call trace:
>> [ 0.475344] unchecked MSR access error: RDMSR from 0xe2 at rIP:
>
> MSR 0xE2 is MSR_PKG_CST_CONFIG_CONTROL, which hpet_is_pc10_damaged() assumes
> exists if PC10 substates are supported. KVM doesn't emulate/support
> MSR_PKG_CST_CONFIG_CONTROL, i.e. injects a #GP on the guest RDMSR, hence the
> splat. This isn't a KVM bug as KVM explicitly advertises all zeros for the
> MWAIT CPUID leaf, i.e. QEMU is effectively telling the guest that PC10
> substates
> are support without KVM's explicit blessing.
>
> That said, this is arguably a kernel bug (guest side), as I don't see
> anything
> in the SDM that _requires_ MSR_PKG_CST_CONFIG_CONTROL to exist if PC10
> substates
> are supported.
>
> The issue is likely benign, other that than obvious WARN. The kernel
> gracefully
> handles the #GP and zeros the result, i.e. will always think PC10 is
> _disabled_,
> which may or may not be correct, but is functionally ok if the HPET is being
> emulated by the host, which it probably is.
>
> rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, pcfg);
> if ((pcfg & 0xF) < 8)
> return false;
>
> The most straightforward fix, and probably the most correct all around, would
> be
> to use rdmsrl_safe() to suppress the WARN, i.e. have the kernel not yell if
> MSR_PKG_CST_CONFIG_CONTROL doesn't exist. Unless HPET is also being passed
> through, that'll do the right thing when Linux is a guest. And if a setup
> also
> passes through HPET, then the VMM can also trap-and-emulate
> MSR_PKG_CST_CONFIG_CONTROL
> as appropriate (doing so in QEMU without KVM support might be impossible,
> though
> again it's unnecessary if QEMU is emulating the HPET).
>
> diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
> index c96ae8fee95e..2afafff18f92 100644
> --- a/arch/x86/kernel/hpet.c
> +++ b/arch/x86/kernel/hpet.c
> @@ -980,7 +980,9 @@ static bool __init hpet_is_pc10_damaged(void)
> return false;
>
> /* Check whether PC10 is enabled in PKG C-state limit */
> - rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, pcfg);
> + if (rdmsrl_safe(MSR_PKG_CST_CONFIG_CONTROL, pcfg))
> + return false;
> +
> if ((pcfg & 0xF) < 8)
> return false;
There are three places which could access MSR_PKG_CST_CONFIG_CONTROL.
1. hpet_is_pc10_damaged() in hpet.c
2. *_idle_state_table_update() in intel_idle.c (This BUG comes from this path
in VMs)
3. auto_demotion_disable() in intel_idle.c
This MSR seems not architectural but CPU model specific.
Besides the case 1 as mentioned, the intel_idle driver also uses it to query
the
lowest processor-specific C-state for the package (case 2) and to disable auto
demotion
(case 3) based on the specific model.
I assume both case 2 and 3 are aimed to improve energy-efficiency. For example,
spr_idle_state_table_update() adjusts the exit_latency/target_residency to
hardcoded ones based on
the package C-state limit. It seems unreasonable in VMs as the hardcoded values
are measured in host
and the guest CPU model may not match the host one if we only pass-thru this
MSR. Similarly,
for case 3, there is no guarantee that disabling auto demotion can improve
energy efficiency in a
emulated CPU model.
Since there is no such fine-grained power management virtualization support
yet. Can we change
all the rdmsr/wrmsr(MSR_PKG_CST_CONFIG_CONTROL) to the *_safe() variant to skip
the related operation
in VMs?
>
>> 0xffffffffb5a966b8 (native_read_msr+0x8/0x40)
>> [ 0.476465] Call Trace:
>> [ 0.476763] <TASK>
>> [ 0.477027] ? ex_handler_msr+0x128/0x140
>> [ 0.477460] ? fixup_exception+0x166/0x3c0
>> [ 0.477934] ? exc_general_protection+0xdc/0x3c0
>> [ 0.478481] ? asm_exc_general_protection+0x26/0x30
>> [ 0.479052] ? __pfx_intel_idle_init+0x10/0x10
>> [ 0.479587] ? native_read_msr+0x8/0x40
>> [ 0.480057] intel_idle_init_cstates_icpu.constprop.0+0x5e/0x560
>> [ 0.480747] ? __pfx_intel_idle_init+0x10/0x10
>> [ 0.481275] intel_idle_init+0x161/0x360
>> [ 0.481742] do_one_initcall+0x45/0x220
>> [ 0.482209] do_initcalls+0xac/0x130
>> [ 0.482643] kernel_init_freeable+0x134/0x1e0
>> [ 0.483159] ? __pfx_kernel_init+0x10/0x10
>> [ 0.483648] kernel_init+0x1a/0x1c0
>> [ 0.484087] ret_from_fork+0x31/0x50
>> [ 0.484541] ? __pfx_kernel_init+0x10/0x10
>> [ 0.485030] ret_from_fork_asm+0x1a/0x30
>> [ 0.485462] </TASK>
>>
>> --
>> You may reply to this email to add a comment.
>>
>> You are receiving this mail because:
>> You are watching the assignee of the bug.
>
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
next prev parent reply other threads:[~2025-07-31 8:59 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-30 7:32 [Bug 218792] New: Guest call trace with mwait enabled bugzilla-daemon
2024-04-30 11:32 ` [Bug 218792] " bugzilla-daemon
2024-04-30 16:41 ` [Bug 218792] New: " Sean Christopherson
2025-07-31 8:59 ` Chenyi Qiang
2024-04-30 16:42 ` [Bug 218792] " bugzilla-daemon
2024-07-12 8:11 ` bugzilla-daemon
2024-07-12 8:40 ` bugzilla-daemon
2025-07-31 8:59 ` bugzilla-daemon [this message]
2025-08-08 21:05 ` bugzilla-daemon
2025-08-08 22:59 ` Sean Christopherson
2025-08-08 22:59 ` bugzilla-daemon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bug-218792-28872-QaPSh9KPXG@https.bugzilla.kernel.org/ \
--to=bugzilla-daemon@kernel.org \
--cc=kvm@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox