From: Sean Christopherson <seanjc@google.com>
To: Aidan Khoury <aidan@aktech.ai>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
Paolo Bonzini <pbonzini@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
Aidan Khoury <aidan@revers.engineering>,
Nick Peterson <everdox@gmail.com>
Subject: Re: [PATCH v1 1/1] KVM: x86: Merge pending debug causes when vectoring #DB
Date: Wed, 13 May 2026 18:08:08 -0700 [thread overview]
Message-ID: <agUgeO5QNenQM9pT@google.com> (raw)
In-Reply-To: <20260107235724.28101-2-aidan@aktech.ai>
On Wed, Jan 07, 2026, Aidan Khoury wrote:
> static void vt_sync_dirty_debug_regs(struct kvm_vcpu *vcpu)
> {
> /*
> @@ -907,6 +915,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
> .get_gdt = vt_op(get_gdt),
> .set_gdt = vt_op(set_gdt),
> .set_dr7 = vt_op(set_dr7),
> + .get_pending_dbg_exceptions = vt_op(get_pending_dbg_exceptions),
> .sync_dirty_debug_regs = vt_op(sync_dirty_debug_regs),
> .cache_reg = vt_op(cache_reg),
> .get_rflags = vt_op(get_rflags),
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 91b6f2f3edc2..1b2e274fe317 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -5300,13 +5300,13 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
> * have already expired. Note, the CPU sets/clears BS
> * as appropriate for all other VM-Exits types.
> */
> + if ((vmx_get_rflags(vcpu) & X86_EFLAGS_TF) &&
> + (vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
> + (GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS)))
> + vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS,
> + vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS) | DR6_BS);
This is wrong. Per the SDM, and confirmed on bare metal (and for giggles, in a
VM with #DB interception disabled) by generating a coincident single-step + INT1
#DB:
The INT1 instruction also uses a one-byte opcode (F1) and generates a
debug exception (#DB) without setting any bits in DR6.
> if (is_icebp(intr_info))
> WARN_ON(!skip_emulated_instruction(vcpu));
> - else if ((vmx_get_rflags(vcpu) & X86_EFLAGS_TF) &&
> - (vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
> - (GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS)))
> - vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS,
> - vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS) | DR6_BS);
>
> kvm_queue_exception_p(vcpu, DB_VECTOR, dr6);
> return 1;
> @@ -5613,6 +5613,12 @@ void vmx_set_dr7(struct kvm_vcpu *vcpu, unsigned long val)
> vmcs_writel(GUEST_DR7, val);
> }
>
> +unsigned long vmx_get_pending_dbg_exceptions(struct kvm_vcpu *vcpu)
> +{
> + return vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS) &
> + (DR6_RTM | DR6_BS | BIT(12) /*Enabled breakpoint*/ | DR_TRAP_BITS);
Propagating bit 12 is wrong. It's "Enabled Breakpoint" in PENDING_DBG_EXCEPTIONS,
but it's a reserved bit in DR6, and we have no idea what it might be used for on
modern hardware (probably nothing to avoid confusion with VMX). For giggles,
before I realized where you were pulling the bit 12 documentation from, it did
used to have meaning in DR6. Per an SDM from 2007:
It is not possible to write a 1 to reserved bit 12 in debug status register
DR6 on the P6 family and Pentium processors; however, it is possible to write
a 1 in this bit on the Intel486 processor. See Table 9-1 for the different
setting of this register following a power-up or hardware reset.
AFAIK, bit 12 hasn't (yet) been repurposed, and so shouldn't be touched by KVM.
> diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
> index 9697368d65b3..365682799d05 100644
> --- a/arch/x86/kvm/vmx/x86_ops.h
> +++ b/arch/x86/kvm/vmx/x86_ops.h
> @@ -75,6 +75,7 @@ void vmx_get_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
> void vmx_set_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
> void vmx_set_dr6(struct kvm_vcpu *vcpu, unsigned long val);
> void vmx_set_dr7(struct kvm_vcpu *vcpu, unsigned long val);
> +unsigned long vmx_get_pending_dbg_exceptions(struct kvm_vcpu *vcpu);
> void vmx_sync_dirty_debug_regs(struct kvm_vcpu *vcpu);
> void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg);
> unsigned long vmx_get_rflags(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 19d2d6d9e64a..c889dffe4e59 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -765,11 +765,23 @@ static int exception_type(int vector)
> void kvm_deliver_exception_payload(struct kvm_vcpu *vcpu,
> struct kvm_queued_exception *ex)
> {
> + unsigned long pending_dbg;
> +
> if (!ex->has_payload)
> return;
>
> switch (ex->vector) {
> case DB_VECTOR:
> + /*
> + * VMX records deferred debug causes (B0-B3, enabled breakpoint,
> + * BS, RTM) in the vmcs.PENDING_DBG_EXCEPTIONS field. Merge any
> + * pending causes into the exception payload so the guest may
> + * see all accumulated reasons in DR6 when the #DB is vectored.
> + */
> + pending_dbg = kvm_x86_call(get_pending_dbg_exceptions)(vcpu);
Ugh. This isn't going to work (and right now I hate myself for realizing this).
If userspace initiates a save/restore while there is effectively-unsaved state
in GUEST_PENDING_DBG_EXCEPTIONS, then it will be lost if exception_payload_enabled
is enabled because KVM (correctly) migrates the raw payload, and waits to shove
it into DR6 until the exception is actually delivered (likely after restore).
> + if (pending_dbg)
> + ex->payload |= pending_dbg;
This is rather silly, a bitwise-OR with 0 is cheaper.
So while I don't exactly love the idea, I think this? Compile tested only at
this point, I'll try to properly test it tomorrow.
---
arch/x86/include/asm/kvm-x86-ops.h | 1 +
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/vmx/main.c | 9 +++++++++
arch/x86/kvm/vmx/vmx.c | 6 ++++++
arch/x86/kvm/vmx/x86_ops.h | 1 +
arch/x86/kvm/x86.c | 8 ++++++++
6 files changed, 26 insertions(+)
diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index b0269325646c..3e04a7ecbb47 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -51,6 +51,7 @@ KVM_X86_OP(get_gdt)
KVM_X86_OP(set_gdt)
KVM_X86_OP(sync_dirty_debug_regs)
KVM_X86_OP(set_dr7)
+KVM_X86_OP_OPTIONAL_RET0(get_pending_dbg_exceptions)
KVM_X86_OP(cache_reg)
KVM_X86_OP(get_rflags)
KVM_X86_OP(set_rflags)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 271bdd109a98..eaac6ba59591 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1834,6 +1834,7 @@ struct kvm_x86_ops {
void (*set_gdt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
void (*sync_dirty_debug_regs)(struct kvm_vcpu *vcpu);
void (*set_dr7)(struct kvm_vcpu *vcpu, unsigned long value);
+ unsigned long (*get_pending_dbg_exceptions)(struct kvm_vcpu *vcpu);
void (*cache_reg)(struct kvm_vcpu *vcpu, enum kvm_reg reg);
unsigned long (*get_rflags)(struct kvm_vcpu *vcpu);
void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags);
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 83d9921277ea..a0e67471bff1 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -470,6 +470,14 @@ static void vt_set_dr7(struct kvm_vcpu *vcpu, unsigned long val)
vmx_set_dr7(vcpu, val);
}
+static unsigned long vt_get_pending_dbg_exceptions(struct kvm_vcpu *vcpu)
+{
+ if (WARN_ON_ONCE(is_td_vcpu(vcpu)))
+ return 0;
+
+ return vmx_get_pending_dbg_exceptions(vcpu);
+}
+
static void vt_sync_dirty_debug_regs(struct kvm_vcpu *vcpu)
{
/*
@@ -928,6 +936,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
.get_gdt = vt_op(get_gdt),
.set_gdt = vt_op(set_gdt),
.set_dr7 = vt_op(set_dr7),
+ .get_pending_dbg_exceptions = vt_op(get_pending_dbg_exceptions),
.sync_dirty_debug_regs = vt_op(sync_dirty_debug_regs),
.cache_reg = vt_op(cache_reg),
.get_rflags = vt_op(get_rflags),
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 3a47a21dd499..7f84644a5c49 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5822,6 +5822,12 @@ void vmx_set_dr7(struct kvm_vcpu *vcpu, unsigned long val)
vmcs_writel(GUEST_DR7, val);
}
+unsigned long vmx_get_pending_dbg_exceptions(struct kvm_vcpu *vcpu)
+{
+ return vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS) &
+ (DR6_RTM | DR6_BS | DR_TRAP_BITS);
+}
+
static int handle_tpr_below_threshold(struct kvm_vcpu *vcpu)
{
kvm_apic_update_ppr(vcpu);
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 409858074246..5cfbd4e66f08 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -75,6 +75,7 @@ void vmx_set_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
void vmx_get_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
void vmx_set_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
void vmx_set_dr7(struct kvm_vcpu *vcpu, unsigned long val);
+unsigned long vmx_get_pending_dbg_exceptions(struct kvm_vcpu *vcpu);
void vmx_sync_dirty_debug_regs(struct kvm_vcpu *vcpu);
void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg);
unsigned long vmx_get_rflags(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 72ba1fc1bed2..8b32ab11f888 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -868,6 +868,14 @@ static void kvm_multiple_exception(struct kvm_vcpu *vcpu, unsigned int nr,
vcpu->arch.exception.vector = nr;
vcpu->arch.exception.error_code = error_code;
vcpu->arch.exception.has_payload = has_payload;
+ /*
+ * VMX records deferred debug causes (B0-B3, enabled breakpoint,
+ * BS, RTM) in the vmcs.PENDING_DBG_EXCEPTIONS field. Merge any
+ * pending causes into the exception payload so the guest may
+ * see all accumulated reasons in DR6 when the #DB is vectored.
+ */
+ if (nr == DB_VECTOR && has_payload)
+ payload |= kvm_x86_call(get_pending_dbg_exceptions)(vcpu);
vcpu->arch.exception.payload = payload;
return;
}
base-commit: b72a08c022f2dae4bca6f4edde5fe8012a02aefa
--
next prev parent reply other threads:[~2026-05-14 1:08 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-07 23:57 [PATCH v1 0/1] KVM: x86: Merge pending debug causes when vectoring #DB Aidan Khoury
2026-01-07 23:57 ` [PATCH v1 1/1] " Aidan Khoury
2026-05-14 1:08 ` Sean Christopherson [this message]
2026-05-15 0:50 ` Sean Christopherson
2026-05-15 1:00 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=agUgeO5QNenQM9pT@google.com \
--to=seanjc@google.com \
--cc=aidan@aktech.ai \
--cc=aidan@revers.engineering \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=everdox@gmail.com \
--cc=hpa@zytor.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.