Kernel KVM virtualization development
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	 Hou Wenlong <houwenlong.hwl@antgroup.com>,
	Lai Jiangshan <jiangshan.ljs@antgroup.com>
Subject: Re: [PATCH v3 01/10] KVM: VMX: Refresh GUEST_PENDING_DBG_EXCEPTIONS.BS on all injected #DBs
Date: Wed, 20 May 2026 09:11:20 -0700	[thread overview]
Message-ID: <ag3dKPXqPTReuLOE@google.com> (raw)
In-Reply-To: <20260515222638.1949982-2-seanjc@google.com>

On Fri, May 15, 2026, Sean Christopherson wrote:
> Move KVM's stuffing of GUEST_PENDING_DBG_EXCEPTIONS.BS when RFLAGS.TF=1 and
> MOV/POP SS or STI blocking is active into the exception injection code so
> that KVM fixes up the VMCS for all injected #DBs, not only those that are
> reflected back into the guest after #DB interception.  E.g. if KVM queues
> a #DB in the emulator, or more importantly if userspace does save/restore
> exactly on the #DB+shadow boundary, then KVM needs to massage the VMCS to
> avoid the VM-Entry consistency check.
> 
> Opportunistically update the wording of the comment to describe the
> behavior as a workaround of flawed CPU behavior/architecture, to make it
> clear that the *only* thing KVM is doing is fudging around a consistency
> check.  Per the SDM:
> 
>   There are no pending debug exceptions after VM entry if any of the
>   following are true:
> 
>     * The VM entry is vectoring with one of the following interruption
>       types: external interrupt, non-maskable interrupt (NMI), hardware
>       exception, or privileged software exception.
> 
> I.e. forcing GUEST_PENDING_DBG_EXCEPTIONS.BS does *not* impact guest-
> visible behavior.
> 
> Fixes: b9bed78e2fa9 ("KVM: VMX: Set vmcs.PENDING_DBG.BS on #DB in STI/MOVSS blocking shadow")
> Cc: stable@vger.kernel.org
> Reported-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
> Closes: https://lore.kernel.org/all/b1a294bc9ed4dae532474a5dc6c8cb6e5962de7c.1757416809.git.houwenlong.hwl@antgroup.com
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/vmx/vmx.c | 35 ++++++++++++++++++-----------------
>  1 file changed, 18 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 1701db1b2e18..a0a0ccf342d3 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1909,6 +1909,24 @@ void vmx_inject_exception(struct kvm_vcpu *vcpu)
>  	u32 intr_info = ex->vector | INTR_INFO_VALID_MASK;
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
>  
> +	/*
> +	 * When injecting a #DB, single-stepping is enabled in RFLAGS, and STI
> +	 * or MOV-SS blocking is active, set vmcs.PENDING_DBG_EXCEPTIONS.BS to
> +	 * prevent a false positive from VM-Entry consistency check.  VM-Entry
> +	 * asserts that a single-step #DB _must_ be pending in this scenario,
> +	 * as the previous instruction cannot have toggled RFLAGS.TF 0=>1
> +	 * (because STI and POP/MOV don't modify RFLAGS), therefore the one
> +	 * instruction delay when activating single-step breakpoints must have
> +	 * already expired.  However, the CPU isn't smart enough to peek at
> +	 * vmcs.VM_ENTRY_INTR_INFO_FIELD and so doesn't realize that yes, there
> +	 * is indeed a #DB pending/imminent.
> +	 */
> +	if (ex->vector == DB_VECTOR &&
> +	    (vmx_get_rflags(vcpu) & X86_EFLAGS_TF) &&
> +	    vmx_get_interrupt_shadow(vcpu))
> +		vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS,
> +			    vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS) | DR6_BS);

Pulling in a Sashiko comment:

 : By restricting this workaround to only when a #DB is injected, does this
 : leave the VM vulnerable to a VM-Entry failure regression after live migration?
 : 
 : KVM does not export GUEST_PENDING_DBG_EXCEPTIONS to userspace via
 : KVM_GET_VCPU_EVENTS. Therefore, upon migration, the destination KVM
 : initializes the VMCS with GUEST_PENDING_DBG_EXCEPTIONS=0.
 : 
 : If a live migration occurs when the guest is in an active interrupt shadow
 : with RFLAGS.TF=1, but a different event is pending (or no event is pending
 : due to a host timer preemption), this DB_VECTOR check is skipped or
 : vmx_inject_exception() is never called.
 : 
 : Consequently, KVM will attempt VM-Entry with TF=1, shadow=1, and BS=0.
 : The Intel SDM mandates that if RFLAGS.TF=1 and STI or MOV SS blocking is
 : active, the VM-Entry consistency check requires
 : GUEST_PENDING_DBG_EXCEPTIONS.BS=1. The hardware VM-Entry will fail due to
 : invalid guest state.
 : 
 : Since vmx_guest_state_valid() does not check the GUEST_PENDING_DBG_EXCEPTIONS
 : field, KVM's emulation_required flag evaluates to false. KVM then falls
 : into the error path in __vmx_handle_exit(), dumping the VMCS and crashing
 : the guest by returning KVM_EXIT_FAIL_ENTRY to userspace.
 : 
 : Does KVM need to handle the BS bit requirement in a broader context to
 : account for live migration when no #DB is being injected?

Yes, but that's a different problem entirely[*], and isn't even solvable on AMD
because SVM lacks an equivalent for GUEST_PENDING_DBG_EXCEPTIONS.  Note, only
MOV/POP-SS blocking matters, because STI blocking doesn't prevent single-step
#DBs, and single-step #DBs have higher priority than IRQs.

[*] https://lore.kernel.org/all/agUgeO5QNenQM9pT@google.com

  parent reply	other threads:[~2026-05-20 16:11 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-15 22:26 [PATCH v3 00/10] KVM: x86: Improve #DB handling in the emulator Sean Christopherson
2026-05-15 22:26 ` [PATCH v3 01/10] KVM: VMX: Refresh GUEST_PENDING_DBG_EXCEPTIONS.BS on all injected #DBs Sean Christopherson
2026-05-18  8:17   ` Hou Wenlong
2026-05-20 16:11   ` Sean Christopherson [this message]
2026-05-15 22:26 ` [PATCH v3 02/10] KVM: x86: Capture "struct x86_exception" in inject_emulated_exception() Sean Christopherson
2026-05-18 18:01   ` Yosry Ahmed
2026-05-15 22:26 ` [PATCH v3 03/10] KVM: x86: Set guest DR6 by kvm_queue_exception_p() in instruction emulation Sean Christopherson
2026-05-18 18:13   ` Yosry Ahmed
2026-05-15 22:26 ` [PATCH v3 04/10] KVM: x86: Honor KVM_GUESTDBG_USE_HW_BP when emulating MOV DR (in emulator) Sean Christopherson
2026-05-18 18:17   ` Yosry Ahmed
2026-05-15 22:26 ` [PATCH v3 05/10] KVM: x86: Honor KVM_GUESTDBG_USE_HW_BP when checking for code breakpoints in emulation Sean Christopherson
2026-05-15 22:26 ` [PATCH v3 06/10] KVM: x86: Move KVM_GUESTDBG_SINGLESTEP handling into kvm_inject_emulated_db() Sean Christopherson
2026-05-18 18:22   ` Yosry Ahmed
2026-05-15 22:26 ` [PATCH v3 07/10] KVM: x86: Drop kvm_vcpu_do_singlestep() now that it's been gutted Sean Christopherson
2026-05-18 18:22   ` Yosry Ahmed
2026-05-15 22:26 ` [PATCH v3 08/10] KVM: selftests: Add all (known) EFLAGS bit definitions Sean Christopherson
2026-05-15 22:26 ` [PATCH v3 09/10] KVM: selftests: Verify guest debug DR7.GD checking during instruction emulation Sean Christopherson
2026-05-20 16:13   ` Sean Christopherson
2026-05-15 22:26 ` [PATCH v3 10/10] KVM: selftests: Verify VMX's GUEST_PENDING_DBG_EXCEPTIONS.BS Consistency Check Sean Christopherson
2026-05-20 16:19   ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ag3dKPXqPTReuLOE@google.com \
    --to=seanjc@google.com \
    --cc=houwenlong.hwl@antgroup.com \
    --cc=jiangshan.ljs@antgroup.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox