All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Hou Wenlong <houwenlong.hwl@antgroup.com>
Cc: kvm@vger.kernel.org, Lai Jiangshan <jiangshan.ljs@antgroup.com>,
	 Paolo Bonzini <pbonzini@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,  Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org,  "H. Peter Anvin" <hpa@zytor.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 7/9] KVM: VMX: Refresh 'PENDING_DBG_EXCEPTIONS.BS' bit during instruction emulation
Date: Thu, 14 May 2026 12:06:01 -0700	[thread overview]
Message-ID: <agYdGRM6a8eKDL0b@google.com> (raw)
In-Reply-To: <5e667a338a27ef2392143962466d77432fcd5441.1766066076.git.houwenlong.hwl@antgroup.com>

On Thu, Dec 18, 2025, Hou Wenlong wrote:
> The VM-Entry has a consistency check that when the 'STI' or 'MOVSS'
> blocking is active, the 'PENDING_DBG_EXCEPTIONS.BS' bit must be set if
> 'RFLAGS.TF' is set; otherwise, a VM-Fail is triggered during VM-entry.
> However, when 'STI' or 'MOV SS' is emulated (e.g., using the 'force
> emulation' prefix), the emulator only refreshes interruptibility state
> but pending debug exception state is not refreshed. Since the force
> emulation prefix causes a VM-Exit due to #UD interception, which clears
> the 'PENDING_DBG_EXCEPTIONS' bits, the emulator should refresh the
> 'PENDING_DBG_EXCEPTIONS.BS' bit when the 'RFLAGS.TF' bit is set to
> ensure the success of VM-Entry.

After (way too) much thought, it's not just (forced) emulation that's flawed;
save/restore also needs similar treatment.  E.g. if userspace gains control of
the vCPU on a single-step #DB exit with STI-blocking, then KVM will save/restore
the pending #DB, RFLAGS.TF, and STI-blocking, but not PENDING_DBG_EXCEPTIONS.BS.
When the target resumes, VM-Entry will fail due to the "bad" state.

So AFAICT, we can handle both by moving the fixup logic to vmx_inject_exception()
(because it's just fixup for a consistency check, e.g. the state doesn't need to
be saved/restored).  I'm not entirely convinced this fixes *all* the flows that
can run afoul of the consistency check, but I think it gets the legimiate ones?
And if not, we can always continue playing whack-a-mole...

I'll send a v3 of the whole series, because with this approach there's no need to
commit RFLAGS before queuing the #DB in the emulator writeback path, and we can
drop kvm_vcpu_do_singlestep() entirely.

---
 arch/x86/kvm/vmx/vmx.c | 35 ++++++++++++++++++-----------------
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1701db1b2e18..a0a0ccf342d3 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1909,6 +1909,24 @@ void vmx_inject_exception(struct kvm_vcpu *vcpu)
 	u32 intr_info = ex->vector | INTR_INFO_VALID_MASK;
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
+	/*
+	 * When injecting a #DB, single-stepping is enabled in RFLAGS, and STI
+	 * or MOV-SS blocking is active, set vmcs.PENDING_DBG_EXCEPTIONS.BS to
+	 * prevent a false positive from VM-Entry consistency check.  VM-Entry
+	 * asserts that a single-step #DB _must_ be pending in this scenario,
+	 * as the previous instruction cannot have toggled RFLAGS.TF 0=>1
+	 * (because STI and POP/MOV don't modify RFLAGS), therefore the one
+	 * instruction delay when activating single-step breakpoints must have
+	 * already expired.  However, the CPU isn't smart enough to peek at
+	 * vmcs.VM_ENTRY_INTR_INFO_FIELD and so doesn't realize that yes, there
+	 * is indeed a #DB pending/imminent.
+	 */
+	if (ex->vector == DB_VECTOR &&
+	    (vmx_get_rflags(vcpu) & X86_EFLAGS_TF) &&
+	    vmx_get_interrupt_shadow(vcpu))
+		vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS,
+			    vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS) | DR6_BS);
+
 	kvm_deliver_exception_payload(vcpu, ex);
 
 	if (ex->has_error_code) {
@@ -5485,26 +5503,9 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
 			 * avoid single-step #DB and MTF updates, as ICEBP is
 			 * higher priority.  Note, skipping ICEBP still clears
 			 * STI and MOVSS blocking.
-			 *
-			 * For all other #DBs, set vmcs.PENDING_DBG_EXCEPTIONS.BS
-			 * if single-step is enabled in RFLAGS and STI or MOVSS
-			 * blocking is active, as the CPU doesn't set the bit
-			 * on VM-Exit due to #DB interception.  VM-Entry has a
-			 * consistency check that a single-step #DB is pending
-			 * in this scenario as the previous instruction cannot
-			 * have toggled RFLAGS.TF 0=>1 (because STI and POP/MOV
-			 * don't modify RFLAGS), therefore the one instruction
-			 * delay when activating single-step breakpoints must
-			 * have already expired.  Note, the CPU sets/clears BS
-			 * as appropriate for all other VM-Exits types.
 			 */
 			if (is_icebp(intr_info))
 				WARN_ON(!skip_emulated_instruction(vcpu));
-			else if ((vmx_get_rflags(vcpu) & X86_EFLAGS_TF) &&
-				 (vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
-				  (GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS)))
-				vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS,
-					    vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS) | DR6_BS);
 
 			kvm_queue_exception_p(vcpu, DB_VECTOR, dr6);
 			return 1;

base-commit: b7fbe9a1bf9ee6c967ef77d366ca58c35fcf1887
-- 

> +void vmx_refresh_pending_dbg_exceptions(struct kvm_vcpu *vcpu)
> +{
> +	if ((vmx_get_rflags(vcpu) & X86_EFLAGS_TF) &&
> +	    vmx_get_interrupt_shadow(vcpu))
> +		vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS,
> +			    vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS) | DR6_BS);
> +}
> +
>  static int handle_tpr_below_threshold(struct kvm_vcpu *vcpu)
>  {
>  	kvm_apic_update_ppr(vcpu);
> diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
> index d09abeac2b56..2978b6506ac6 100644
> --- a/arch/x86/kvm/vmx/x86_ops.h
> +++ b/arch/x86/kvm/vmx/x86_ops.h
> @@ -74,6 +74,7 @@ void vmx_set_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
>  void vmx_get_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
>  void vmx_set_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
>  void vmx_set_dr7(struct kvm_vcpu *vcpu, unsigned long val);
> +void vmx_refresh_pending_dbg_exceptions(struct kvm_vcpu *vcpu);
>  void vmx_sync_dirty_debug_regs(struct kvm_vcpu *vcpu);
>  void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg);
>  unsigned long vmx_get_rflags(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 7352c2114bab..9167393cc0cc 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9232,7 +9232,12 @@ static int kvm_vcpu_check_hw_bp(unsigned long addr, u32 type, u32 dr7,
>  
>  static int kvm_vcpu_do_singlestep(struct kvm_vcpu *vcpu)
>  {
> -	return kvm_inject_emulated_db(vcpu, DR6_BS);
> +	int r;
> +
> +	r = kvm_inject_emulated_db(vcpu, DR6_BS);
> +	if (r)
> +		kvm_x86_call(refresh_pending_dbg_exceptions)(vcpu);
> +	return r;
>  }
>  
>  int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu)
> -- 
> 2.31.1
> 

  reply	other threads:[~2026-05-14 19:06 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-18 14:00 [PATCH v2 0/9] KVM: x86: Improve the handling of debug exceptions during instruction emulation Hou Wenlong
2025-12-18 14:00 ` [PATCH v2 1/9] KVM: x86: Capture "struct x86_exception" in inject_emulated_exception() Hou Wenlong
2025-12-18 14:00 ` [PATCH v2 2/9] KVM: x86: Set guest DR6 by kvm_queue_exception_p() in instruction emulation Hou Wenlong
2026-05-11 15:23   ` Sean Christopherson
2026-05-11 15:26     ` Sean Christopherson
2026-05-11 15:42       ` Sean Christopherson
2025-12-18 14:00 ` [PATCH v2 3/9] KVM: x86: Check guest debug in DR access " Hou Wenlong
2025-12-18 14:00 ` [PATCH v2 4/9] KVM: x86: Only check effective code breakpoint in emulation Hou Wenlong
2025-12-18 14:00 ` [PATCH v2 5/9] KVM: x86: Consolidate KVM_GUESTDBG_SINGLESTEP check into the kvm_inject_emulated_db() Hou Wenlong
2025-12-18 14:00 ` [PATCH v2 6/9] KVM: x86: Move kvm_set_rflags() up before kvm_vcpu_do_singlestep() Hou Wenlong
2025-12-18 14:00 ` [PATCH v2 7/9] KVM: VMX: Refresh 'PENDING_DBG_EXCEPTIONS.BS' bit during instruction emulation Hou Wenlong
2026-05-14 19:06   ` Sean Christopherson [this message]
2025-12-18 14:00 ` [PATCH v2 8/9] KVM: selftests: Verify guest debug DR7.GD checking " Hou Wenlong
2025-12-18 14:00 ` [PATCH v2 9/9] KVM: selftests: Verify 'BS' bit checking in pending debug exception state during VM-Entry Hou Wenlong
2026-05-13 23:14   ` Sean Christopherson
2026-05-14  5:31     ` Hou Wenlong
2026-05-14 18:51       ` Sean Christopherson
2026-05-11 15:45 ` [PATCH v2 0/9] KVM: x86: Improve the handling of debug exceptions during instruction emulation Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=agYdGRM6a8eKDL0b@google.com \
    --to=seanjc@google.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=houwenlong.hwl@antgroup.com \
    --cc=hpa@zytor.com \
    --cc=jiangshan.ljs@antgroup.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.