From: Sean Christopherson <seanjc@google.com>
To: Hou Wenlong <houwenlong.hwl@antgroup.com>
Cc: kvm@vger.kernel.org, Lai Jiangshan <jiangshan.ljs@antgroup.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 7/9] KVM: VMX: Refresh 'PENDING_DBG_EXCEPTIONS.BS' bit during instruction emulation
Date: Thu, 14 May 2026 12:06:01 -0700 [thread overview]
Message-ID: <agYdGRM6a8eKDL0b@google.com> (raw)
In-Reply-To: <5e667a338a27ef2392143962466d77432fcd5441.1766066076.git.houwenlong.hwl@antgroup.com>
On Thu, Dec 18, 2025, Hou Wenlong wrote:
> The VM-Entry has a consistency check that when the 'STI' or 'MOVSS'
> blocking is active, the 'PENDING_DBG_EXCEPTIONS.BS' bit must be set if
> 'RFLAGS.TF' is set; otherwise, a VM-Fail is triggered during VM-entry.
> However, when 'STI' or 'MOV SS' is emulated (e.g., using the 'force
> emulation' prefix), the emulator only refreshes interruptibility state
> but pending debug exception state is not refreshed. Since the force
> emulation prefix causes a VM-Exit due to #UD interception, which clears
> the 'PENDING_DBG_EXCEPTIONS' bits, the emulator should refresh the
> 'PENDING_DBG_EXCEPTIONS.BS' bit when the 'RFLAGS.TF' bit is set to
> ensure the success of VM-Entry.
After (way too) much thought, it's not just (forced) emulation that's flawed;
save/restore also needs similar treatment. E.g. if userspace gains control of
the vCPU on a single-step #DB exit with STI-blocking, then KVM will save/restore
the pending #DB, RFLAGS.TF, and STI-blocking, but not PENDING_DBG_EXCEPTIONS.BS.
When the target resumes, VM-Entry will fail due to the "bad" state.
So AFAICT, we can handle both by moving the fixup logic to vmx_inject_exception()
(because it's just fixup for a consistency check, e.g. the state doesn't need to
be saved/restored). I'm not entirely convinced this fixes *all* the flows that
can run afoul of the consistency check, but I think it gets the legimiate ones?
And if not, we can always continue playing whack-a-mole...
I'll send a v3 of the whole series, because with this approach there's no need to
commit RFLAGS before queuing the #DB in the emulator writeback path, and we can
drop kvm_vcpu_do_singlestep() entirely.
---
arch/x86/kvm/vmx/vmx.c | 35 ++++++++++++++++++-----------------
1 file changed, 18 insertions(+), 17 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1701db1b2e18..a0a0ccf342d3 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1909,6 +1909,24 @@ void vmx_inject_exception(struct kvm_vcpu *vcpu)
u32 intr_info = ex->vector | INTR_INFO_VALID_MASK;
struct vcpu_vmx *vmx = to_vmx(vcpu);
+ /*
+ * When injecting a #DB, single-stepping is enabled in RFLAGS, and STI
+ * or MOV-SS blocking is active, set vmcs.PENDING_DBG_EXCEPTIONS.BS to
+ * prevent a false positive from VM-Entry consistency check. VM-Entry
+ * asserts that a single-step #DB _must_ be pending in this scenario,
+ * as the previous instruction cannot have toggled RFLAGS.TF 0=>1
+ * (because STI and POP/MOV don't modify RFLAGS), therefore the one
+ * instruction delay when activating single-step breakpoints must have
+ * already expired. However, the CPU isn't smart enough to peek at
+ * vmcs.VM_ENTRY_INTR_INFO_FIELD and so doesn't realize that yes, there
+ * is indeed a #DB pending/imminent.
+ */
+ if (ex->vector == DB_VECTOR &&
+ (vmx_get_rflags(vcpu) & X86_EFLAGS_TF) &&
+ vmx_get_interrupt_shadow(vcpu))
+ vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS,
+ vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS) | DR6_BS);
+
kvm_deliver_exception_payload(vcpu, ex);
if (ex->has_error_code) {
@@ -5485,26 +5503,9 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
* avoid single-step #DB and MTF updates, as ICEBP is
* higher priority. Note, skipping ICEBP still clears
* STI and MOVSS blocking.
- *
- * For all other #DBs, set vmcs.PENDING_DBG_EXCEPTIONS.BS
- * if single-step is enabled in RFLAGS and STI or MOVSS
- * blocking is active, as the CPU doesn't set the bit
- * on VM-Exit due to #DB interception. VM-Entry has a
- * consistency check that a single-step #DB is pending
- * in this scenario as the previous instruction cannot
- * have toggled RFLAGS.TF 0=>1 (because STI and POP/MOV
- * don't modify RFLAGS), therefore the one instruction
- * delay when activating single-step breakpoints must
- * have already expired. Note, the CPU sets/clears BS
- * as appropriate for all other VM-Exits types.
*/
if (is_icebp(intr_info))
WARN_ON(!skip_emulated_instruction(vcpu));
- else if ((vmx_get_rflags(vcpu) & X86_EFLAGS_TF) &&
- (vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
- (GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS)))
- vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS,
- vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS) | DR6_BS);
kvm_queue_exception_p(vcpu, DB_VECTOR, dr6);
return 1;
base-commit: b7fbe9a1bf9ee6c967ef77d366ca58c35fcf1887
--
> +void vmx_refresh_pending_dbg_exceptions(struct kvm_vcpu *vcpu)
> +{
> + if ((vmx_get_rflags(vcpu) & X86_EFLAGS_TF) &&
> + vmx_get_interrupt_shadow(vcpu))
> + vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS,
> + vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS) | DR6_BS);
> +}
> +
> static int handle_tpr_below_threshold(struct kvm_vcpu *vcpu)
> {
> kvm_apic_update_ppr(vcpu);
> diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
> index d09abeac2b56..2978b6506ac6 100644
> --- a/arch/x86/kvm/vmx/x86_ops.h
> +++ b/arch/x86/kvm/vmx/x86_ops.h
> @@ -74,6 +74,7 @@ void vmx_set_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
> void vmx_get_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
> void vmx_set_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
> void vmx_set_dr7(struct kvm_vcpu *vcpu, unsigned long val);
> +void vmx_refresh_pending_dbg_exceptions(struct kvm_vcpu *vcpu);
> void vmx_sync_dirty_debug_regs(struct kvm_vcpu *vcpu);
> void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg);
> unsigned long vmx_get_rflags(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 7352c2114bab..9167393cc0cc 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9232,7 +9232,12 @@ static int kvm_vcpu_check_hw_bp(unsigned long addr, u32 type, u32 dr7,
>
> static int kvm_vcpu_do_singlestep(struct kvm_vcpu *vcpu)
> {
> - return kvm_inject_emulated_db(vcpu, DR6_BS);
> + int r;
> +
> + r = kvm_inject_emulated_db(vcpu, DR6_BS);
> + if (r)
> + kvm_x86_call(refresh_pending_dbg_exceptions)(vcpu);
> + return r;
> }
>
> int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu)
> --
> 2.31.1
>
prev parent reply other threads:[~2026-05-14 19:06 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <cover.1766066076.git.houwenlong.hwl@antgroup.com>
[not found] ` <9b859ab6a6b59e5ccfdac741459117996fe2da6e.1766066076.git.houwenlong.hwl@antgroup.com>
2026-05-11 15:23 ` [PATCH v2 2/9] KVM: x86: Set guest DR6 by kvm_queue_exception_p() in instruction emulation Sean Christopherson
2026-05-11 15:26 ` Sean Christopherson
2026-05-11 15:42 ` Sean Christopherson
2026-05-11 15:45 ` [PATCH v2 0/9] KVM: x86: Improve the handling of debug exceptions during " Sean Christopherson
[not found] ` <e0df5d6812da0f508cdf697ac7627693199c8293.1766066076.git.houwenlong.hwl@antgroup.com>
2026-05-13 23:14 ` [PATCH v2 9/9] KVM: selftests: Verify 'BS' bit checking in pending debug exception state during VM-Entry Sean Christopherson
2026-05-14 5:31 ` Hou Wenlong
2026-05-14 18:51 ` Sean Christopherson
[not found] ` <5e667a338a27ef2392143962466d77432fcd5441.1766066076.git.houwenlong.hwl@antgroup.com>
2026-05-14 19:06 ` Sean Christopherson [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=agYdGRM6a8eKDL0b@google.com \
--to=seanjc@google.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=houwenlong.hwl@antgroup.com \
--cc=hpa@zytor.com \
--cc=jiangshan.ljs@antgroup.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox