From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
Hou Wenlong <houwenlong.hwl@antgroup.com>,
Lai Jiangshan <jiangshan.ljs@antgroup.com>
Subject: Re: [PATCH v3 01/10] KVM: VMX: Refresh GUEST_PENDING_DBG_EXCEPTIONS.BS on all injected #DBs
Date: Wed, 20 May 2026 09:11:20 -0700 [thread overview]
Message-ID: <ag3dKPXqPTReuLOE@google.com> (raw)
In-Reply-To: <20260515222638.1949982-2-seanjc@google.com>
On Fri, May 15, 2026, Sean Christopherson wrote:
> Move KVM's stuffing of GUEST_PENDING_DBG_EXCEPTIONS.BS when RFLAGS.TF=1 and
> MOV/POP SS or STI blocking is active into the exception injection code so
> that KVM fixes up the VMCS for all injected #DBs, not only those that are
> reflected back into the guest after #DB interception. E.g. if KVM queues
> a #DB in the emulator, or more importantly if userspace does save/restore
> exactly on the #DB+shadow boundary, then KVM needs to massage the VMCS to
> avoid the VM-Entry consistency check.
>
> Opportunistically update the wording of the comment to describe the
> behavior as a workaround of flawed CPU behavior/architecture, to make it
> clear that the *only* thing KVM is doing is fudging around a consistency
> check. Per the SDM:
>
> There are no pending debug exceptions after VM entry if any of the
> following are true:
>
> * The VM entry is vectoring with one of the following interruption
> types: external interrupt, non-maskable interrupt (NMI), hardware
> exception, or privileged software exception.
>
> I.e. forcing GUEST_PENDING_DBG_EXCEPTIONS.BS does *not* impact guest-
> visible behavior.
>
> Fixes: b9bed78e2fa9 ("KVM: VMX: Set vmcs.PENDING_DBG.BS on #DB in STI/MOVSS blocking shadow")
> Cc: stable@vger.kernel.org
> Reported-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
> Closes: https://lore.kernel.org/all/b1a294bc9ed4dae532474a5dc6c8cb6e5962de7c.1757416809.git.houwenlong.hwl@antgroup.com
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/kvm/vmx/vmx.c | 35 ++++++++++++++++++-----------------
> 1 file changed, 18 insertions(+), 17 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 1701db1b2e18..a0a0ccf342d3 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1909,6 +1909,24 @@ void vmx_inject_exception(struct kvm_vcpu *vcpu)
> u32 intr_info = ex->vector | INTR_INFO_VALID_MASK;
> struct vcpu_vmx *vmx = to_vmx(vcpu);
>
> + /*
> + * When injecting a #DB, single-stepping is enabled in RFLAGS, and STI
> + * or MOV-SS blocking is active, set vmcs.PENDING_DBG_EXCEPTIONS.BS to
> + * prevent a false positive from VM-Entry consistency check. VM-Entry
> + * asserts that a single-step #DB _must_ be pending in this scenario,
> + * as the previous instruction cannot have toggled RFLAGS.TF 0=>1
> + * (because STI and POP/MOV don't modify RFLAGS), therefore the one
> + * instruction delay when activating single-step breakpoints must have
> + * already expired. However, the CPU isn't smart enough to peek at
> + * vmcs.VM_ENTRY_INTR_INFO_FIELD and so doesn't realize that yes, there
> + * is indeed a #DB pending/imminent.
> + */
> + if (ex->vector == DB_VECTOR &&
> + (vmx_get_rflags(vcpu) & X86_EFLAGS_TF) &&
> + vmx_get_interrupt_shadow(vcpu))
> + vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS,
> + vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS) | DR6_BS);
Pulling in a Sashiko comment:
: By restricting this workaround to only when a #DB is injected, does this
: leave the VM vulnerable to a VM-Entry failure regression after live migration?
:
: KVM does not export GUEST_PENDING_DBG_EXCEPTIONS to userspace via
: KVM_GET_VCPU_EVENTS. Therefore, upon migration, the destination KVM
: initializes the VMCS with GUEST_PENDING_DBG_EXCEPTIONS=0.
:
: If a live migration occurs when the guest is in an active interrupt shadow
: with RFLAGS.TF=1, but a different event is pending (or no event is pending
: due to a host timer preemption), this DB_VECTOR check is skipped or
: vmx_inject_exception() is never called.
:
: Consequently, KVM will attempt VM-Entry with TF=1, shadow=1, and BS=0.
: The Intel SDM mandates that if RFLAGS.TF=1 and STI or MOV SS blocking is
: active, the VM-Entry consistency check requires
: GUEST_PENDING_DBG_EXCEPTIONS.BS=1. The hardware VM-Entry will fail due to
: invalid guest state.
:
: Since vmx_guest_state_valid() does not check the GUEST_PENDING_DBG_EXCEPTIONS
: field, KVM's emulation_required flag evaluates to false. KVM then falls
: into the error path in __vmx_handle_exit(), dumping the VMCS and crashing
: the guest by returning KVM_EXIT_FAIL_ENTRY to userspace.
:
: Does KVM need to handle the BS bit requirement in a broader context to
: account for live migration when no #DB is being injected?
Yes, but that's a different problem entirely[*], and isn't even solvable on AMD
because SVM lacks an equivalent for GUEST_PENDING_DBG_EXCEPTIONS. Note, only
MOV/POP-SS blocking matters, because STI blocking doesn't prevent single-step
#DBs, and single-step #DBs have higher priority than IRQs.
[*] https://lore.kernel.org/all/agUgeO5QNenQM9pT@google.com
next prev parent reply other threads:[~2026-05-20 16:11 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-15 22:26 [PATCH v3 00/10] KVM: x86: Improve #DB handling in the emulator Sean Christopherson
2026-05-15 22:26 ` [PATCH v3 01/10] KVM: VMX: Refresh GUEST_PENDING_DBG_EXCEPTIONS.BS on all injected #DBs Sean Christopherson
2026-05-18 8:17 ` Hou Wenlong
2026-05-20 16:11 ` Sean Christopherson [this message]
2026-05-21 12:33 ` Hou Wenlong
2026-05-15 22:26 ` [PATCH v3 02/10] KVM: x86: Capture "struct x86_exception" in inject_emulated_exception() Sean Christopherson
2026-05-18 18:01 ` Yosry Ahmed
2026-05-15 22:26 ` [PATCH v3 03/10] KVM: x86: Set guest DR6 by kvm_queue_exception_p() in instruction emulation Sean Christopherson
2026-05-18 18:13 ` Yosry Ahmed
2026-05-15 22:26 ` [PATCH v3 04/10] KVM: x86: Honor KVM_GUESTDBG_USE_HW_BP when emulating MOV DR (in emulator) Sean Christopherson
2026-05-18 18:17 ` Yosry Ahmed
2026-05-15 22:26 ` [PATCH v3 05/10] KVM: x86: Honor KVM_GUESTDBG_USE_HW_BP when checking for code breakpoints in emulation Sean Christopherson
2026-05-15 22:26 ` [PATCH v3 06/10] KVM: x86: Move KVM_GUESTDBG_SINGLESTEP handling into kvm_inject_emulated_db() Sean Christopherson
2026-05-18 18:22 ` Yosry Ahmed
2026-05-15 22:26 ` [PATCH v3 07/10] KVM: x86: Drop kvm_vcpu_do_singlestep() now that it's been gutted Sean Christopherson
2026-05-18 18:22 ` Yosry Ahmed
2026-05-15 22:26 ` [PATCH v3 08/10] KVM: selftests: Add all (known) EFLAGS bit definitions Sean Christopherson
2026-05-15 22:26 ` [PATCH v3 09/10] KVM: selftests: Verify guest debug DR7.GD checking during instruction emulation Sean Christopherson
2026-05-20 16:13 ` Sean Christopherson
2026-05-15 22:26 ` [PATCH v3 10/10] KVM: selftests: Verify VMX's GUEST_PENDING_DBG_EXCEPTIONS.BS Consistency Check Sean Christopherson
2026-05-20 16:19 ` Sean Christopherson
2026-05-27 18:10 ` [PATCH v3 00/10] KVM: x86: Improve #DB handling in the emulator Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ag3dKPXqPTReuLOE@google.com \
--to=seanjc@google.com \
--cc=houwenlong.hwl@antgroup.com \
--cc=jiangshan.ljs@antgroup.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.