Kernel KVM virtualization development
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Aidan Khoury <aidan@aktech.ai>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	 Paolo Bonzini <pbonzini@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,  Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org,  "H. Peter Anvin" <hpa@zytor.com>,
	Aidan Khoury <aidan@revers.engineering>,
	 Nick Peterson <everdox@gmail.com>
Subject: Re: [PATCH v1 1/1] KVM: x86: Merge pending debug causes when vectoring #DB
Date: Wed, 13 May 2026 18:08:08 -0700	[thread overview]
Message-ID: <agUgeO5QNenQM9pT@google.com> (raw)
In-Reply-To: <20260107235724.28101-2-aidan@aktech.ai>

On Wed, Jan 07, 2026, Aidan Khoury wrote:
>  static void vt_sync_dirty_debug_regs(struct kvm_vcpu *vcpu)
>  {
>  	/*
> @@ -907,6 +915,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
>  	.get_gdt = vt_op(get_gdt),
>  	.set_gdt = vt_op(set_gdt),
>  	.set_dr7 = vt_op(set_dr7),
> +	.get_pending_dbg_exceptions = vt_op(get_pending_dbg_exceptions),
>  	.sync_dirty_debug_regs = vt_op(sync_dirty_debug_regs),
>  	.cache_reg = vt_op(cache_reg),
>  	.get_rflags = vt_op(get_rflags),
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 91b6f2f3edc2..1b2e274fe317 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -5300,13 +5300,13 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
>  			 * have already expired.  Note, the CPU sets/clears BS
>  			 * as appropriate for all other VM-Exits types.
>  			 */
> +			if ((vmx_get_rflags(vcpu) & X86_EFLAGS_TF) &&
> +			    (vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
> +			     (GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS)))
> +				vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS,
> +				 vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS) | DR6_BS);

This is wrong.  Per the SDM, and confirmed on bare metal (and for giggles, in a
VM with #DB interception disabled) by generating a coincident single-step + INT1
#DB:

  The INT1 instruction also uses a one-byte opcode (F1) and generates a
  debug exception (#DB) without setting any bits in DR6.

>  			if (is_icebp(intr_info))
>  				WARN_ON(!skip_emulated_instruction(vcpu));
> -			else if ((vmx_get_rflags(vcpu) & X86_EFLAGS_TF) &&
> -				 (vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
> -				  (GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS)))
> -				vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS,
> -					    vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS) | DR6_BS);
>  
>  			kvm_queue_exception_p(vcpu, DB_VECTOR, dr6);
>  			return 1;
> @@ -5613,6 +5613,12 @@ void vmx_set_dr7(struct kvm_vcpu *vcpu, unsigned long val)
>  	vmcs_writel(GUEST_DR7, val);
>  }
>  
> +unsigned long vmx_get_pending_dbg_exceptions(struct kvm_vcpu *vcpu)
> +{
> +	return vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS) &
> +		(DR6_RTM | DR6_BS | BIT(12) /*Enabled breakpoint*/ | DR_TRAP_BITS);

Propagating bit 12 is wrong.  It's "Enabled Breakpoint" in PENDING_DBG_EXCEPTIONS,
but it's a reserved bit in DR6, and we have no idea what it might be used for on
modern hardware (probably nothing to avoid confusion with VMX).  For giggles,
before I realized where you were pulling the bit 12 documentation from, it did
used to have meaning in DR6.  Per an SDM from 2007:

  It is not possible to write a 1 to reserved bit 12 in debug status register
  DR6 on the P6 family and Pentium processors; however, it is possible to write
  a 1 in this bit on the Intel486 processor. See Table 9-1 for the different
  setting of this register following a power-up or hardware reset.

AFAIK, bit 12 hasn't (yet) been repurposed, and so shouldn't be touched by KVM.

> diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
> index 9697368d65b3..365682799d05 100644
> --- a/arch/x86/kvm/vmx/x86_ops.h
> +++ b/arch/x86/kvm/vmx/x86_ops.h
> @@ -75,6 +75,7 @@ void vmx_get_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
>  void vmx_set_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
>  void vmx_set_dr6(struct kvm_vcpu *vcpu, unsigned long val);
>  void vmx_set_dr7(struct kvm_vcpu *vcpu, unsigned long val);
> +unsigned long vmx_get_pending_dbg_exceptions(struct kvm_vcpu *vcpu);
>  void vmx_sync_dirty_debug_regs(struct kvm_vcpu *vcpu);
>  void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg);
>  unsigned long vmx_get_rflags(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 19d2d6d9e64a..c889dffe4e59 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -765,11 +765,23 @@ static int exception_type(int vector)
>  void kvm_deliver_exception_payload(struct kvm_vcpu *vcpu,
>  				   struct kvm_queued_exception *ex)
>  {
> +	unsigned long pending_dbg;
> +
>  	if (!ex->has_payload)
>  		return;
>  
>  	switch (ex->vector) {
>  	case DB_VECTOR:
> +		/*
> +		 * VMX records deferred debug causes (B0-B3, enabled breakpoint,
> +		 * BS, RTM) in the vmcs.PENDING_DBG_EXCEPTIONS field.  Merge any
> +		 * pending causes into the exception payload so the guest may
> +		 * see all accumulated reasons in DR6 when the #DB is vectored.
> +		 */
> +		pending_dbg = kvm_x86_call(get_pending_dbg_exceptions)(vcpu);

Ugh.  This isn't going to work (and right now I hate myself for realizing this).
If userspace initiates a save/restore while there is effectively-unsaved state
in GUEST_PENDING_DBG_EXCEPTIONS, then it will be lost if exception_payload_enabled
is enabled because KVM (correctly) migrates the raw payload, and waits to shove
it into DR6 until the exception is actually delivered (likely after restore).

> +		if (pending_dbg)
> +			ex->payload |= pending_dbg;

This is rather silly, a bitwise-OR with 0 is cheaper.

So while I don't exactly love the idea, I think this?  Compile tested only at
this point, I'll try to properly test it tomorrow.

---
 arch/x86/include/asm/kvm-x86-ops.h | 1 +
 arch/x86/include/asm/kvm_host.h    | 1 +
 arch/x86/kvm/vmx/main.c            | 9 +++++++++
 arch/x86/kvm/vmx/vmx.c             | 6 ++++++
 arch/x86/kvm/vmx/x86_ops.h         | 1 +
 arch/x86/kvm/x86.c                 | 8 ++++++++
 6 files changed, 26 insertions(+)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index b0269325646c..3e04a7ecbb47 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -51,6 +51,7 @@ KVM_X86_OP(get_gdt)
 KVM_X86_OP(set_gdt)
 KVM_X86_OP(sync_dirty_debug_regs)
 KVM_X86_OP(set_dr7)
+KVM_X86_OP_OPTIONAL_RET0(get_pending_dbg_exceptions)
 KVM_X86_OP(cache_reg)
 KVM_X86_OP(get_rflags)
 KVM_X86_OP(set_rflags)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 271bdd109a98..eaac6ba59591 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1834,6 +1834,7 @@ struct kvm_x86_ops {
 	void (*set_gdt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
 	void (*sync_dirty_debug_regs)(struct kvm_vcpu *vcpu);
 	void (*set_dr7)(struct kvm_vcpu *vcpu, unsigned long value);
+	unsigned long (*get_pending_dbg_exceptions)(struct kvm_vcpu *vcpu);
 	void (*cache_reg)(struct kvm_vcpu *vcpu, enum kvm_reg reg);
 	unsigned long (*get_rflags)(struct kvm_vcpu *vcpu);
 	void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags);
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 83d9921277ea..a0e67471bff1 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -470,6 +470,14 @@ static void vt_set_dr7(struct kvm_vcpu *vcpu, unsigned long val)
 	vmx_set_dr7(vcpu, val);
 }
 
+static unsigned long vt_get_pending_dbg_exceptions(struct kvm_vcpu *vcpu)
+{
+	if (WARN_ON_ONCE(is_td_vcpu(vcpu)))
+		return 0;
+
+	return vmx_get_pending_dbg_exceptions(vcpu);
+}
+
 static void vt_sync_dirty_debug_regs(struct kvm_vcpu *vcpu)
 {
 	/*
@@ -928,6 +936,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
 	.get_gdt = vt_op(get_gdt),
 	.set_gdt = vt_op(set_gdt),
 	.set_dr7 = vt_op(set_dr7),
+	.get_pending_dbg_exceptions = vt_op(get_pending_dbg_exceptions),
 	.sync_dirty_debug_regs = vt_op(sync_dirty_debug_regs),
 	.cache_reg = vt_op(cache_reg),
 	.get_rflags = vt_op(get_rflags),
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 3a47a21dd499..7f84644a5c49 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5822,6 +5822,12 @@ void vmx_set_dr7(struct kvm_vcpu *vcpu, unsigned long val)
 	vmcs_writel(GUEST_DR7, val);
 }
 
+unsigned long vmx_get_pending_dbg_exceptions(struct kvm_vcpu *vcpu)
+{
+	return vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS) &
+               (DR6_RTM | DR6_BS | DR_TRAP_BITS);
+}
+
 static int handle_tpr_below_threshold(struct kvm_vcpu *vcpu)
 {
 	kvm_apic_update_ppr(vcpu);
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 409858074246..5cfbd4e66f08 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -75,6 +75,7 @@ void vmx_set_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
 void vmx_get_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
 void vmx_set_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
 void vmx_set_dr7(struct kvm_vcpu *vcpu, unsigned long val);
+unsigned long vmx_get_pending_dbg_exceptions(struct kvm_vcpu *vcpu);
 void vmx_sync_dirty_debug_regs(struct kvm_vcpu *vcpu);
 void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg);
 unsigned long vmx_get_rflags(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 72ba1fc1bed2..8b32ab11f888 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -868,6 +868,14 @@ static void kvm_multiple_exception(struct kvm_vcpu *vcpu, unsigned int nr,
 		vcpu->arch.exception.vector = nr;
 		vcpu->arch.exception.error_code = error_code;
 		vcpu->arch.exception.has_payload = has_payload;
+		/*
+		 * VMX records deferred debug causes (B0-B3, enabled breakpoint,
+		 * BS, RTM) in the vmcs.PENDING_DBG_EXCEPTIONS field.  Merge any
+		 * pending causes into the exception payload so the guest may
+		 * see all accumulated reasons in DR6 when the #DB is vectored.
+		 */
+		if (nr == DB_VECTOR && has_payload)
+			payload |= kvm_x86_call(get_pending_dbg_exceptions)(vcpu);
 		vcpu->arch.exception.payload = payload;
 		return;
 	}

base-commit: b72a08c022f2dae4bca6f4edde5fe8012a02aefa
--

      reply	other threads:[~2026-05-14  1:08 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-07 23:57 [PATCH v1 0/1] KVM: x86: Merge pending debug causes when vectoring #DB Aidan Khoury
2026-01-07 23:57 ` [PATCH v1 1/1] " Aidan Khoury
2026-05-14  1:08   ` Sean Christopherson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=agUgeO5QNenQM9pT@google.com \
    --to=seanjc@google.com \
    --cc=aidan@aktech.ai \
    --cc=aidan@revers.engineering \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=everdox@gmail.com \
    --cc=hpa@zytor.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox