From: Sean Christopherson <seanjc@google.com>
To: Yosry Ahmed <yosry@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/3] KVM: x86: Check for injected exceptions before queuing a debug exception
Date: Fri, 27 Feb 2026 08:06:21 -0800 [thread overview]
Message-ID: <aaG_o58_0aHT8Xjg@google.com> (raw)
In-Reply-To: <20260227011306.3111731-4-yosry@kernel.org>
On Fri, Feb 27, 2026, Yosry Ahmed wrote:
> On KVM_SET_GUEST_DEBUG, if a #DB or #BP is injected with
> KVM_GUESTDBG_INJECT_DB or KVM_GUESTDBG_INJECT_BP, KVM fails with -EBUSY
> if there is an existing pending exception. This was introduced in
> commit 4f926bf29186 ("KVM: x86: Polish exception injection via
> KVM_SET_GUEST_DEBUG") to avoid a warning in kvm_queue_exception(),
> presumably to avoid overriding a pending exception.
>
> This added another (arguably nice) property, if there's a pending
> exception, KVM_SET_GUEST_DEBUG cannot cause a #DF or triple fault.
> However, if an exception is injected, KVM_SET_GUEST_DEBUG will cause
> a #DF or triple fault in the guest, as kvm_multiple_exception() combines
> them.
First off, this patch looks good irrespective of nested crud. Disallowing injection
of #DB/#BP while there's already an injected exception aligns with architectural
behavior; KVM needs to finish delivering the exception and thus "complete" the
instruction before queueing a new exception.
As for nested, I _was_ going to say that, assuming the original motivation for
this patch is to avoid triggering a nested VM-Exit while nested_run_pending=1,
this is incomplete. E.g. if the #BP or #DB itself is being intercepted by L1,
then queueing the exception will incorrectly deliver the nested VM-Exit before
nested VM-Enter completes. I.e. I _thought_ we would also need:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index db3f393192d9..fdbd272027ed 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12526,6 +12526,9 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
if (vcpu->arch.guest_state_protected)
return -EINVAL;
+ if (vcpu->arch.nested_run_pending)
+ return -EBUSY;
+
vcpu_load(vcpu);
if (dbg->control & (KVM_GUESTDBG_INJECT_DB | KVM_GUESTDBG_INJECT_BP)) {
But that isn't actually the case, because {svm,vmx}_check_nested_events() blocks
exceptions and VM-Exits while nested_run_pending=1. Off-list, I had rejected
modifying nested_vmx_triple_fault() to drop the triple fault like so:
@@ -5191,6 +5191,9 @@ void __nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
static void nested_vmx_triple_fault(struct kvm_vcpu *vcpu)
{
+ if (to_vmx(vcpu)->nested.nested_run_pending)
+ return;
+
kvm_clear_request(KVM_REQ_TRIPLE_FAULT, vcpu);
nested_vmx_vmexit(vcpu, EXIT_REASON_TRIPLE_FAULT, 0, 0);
}
because "That would largely defeat the entire purpose of the WARN. The whole point
is to detect bugs where KVM synthesizes a VM-Exit before completing nested VM-Enter.
It should be impossible for nested_{vmx,svm}_triple_fault() to be called with
nested_run_pending=true."
Given the above, that argument doesn't hold up at first glance, because blocking
the triple fault if nested_run_pending=1 would be consistent with how other "synchronous"
events are handled by nVMX and nSVM. But after staring at this a bit, unless I'm
forgetting something (entirely possible), I actually think {svm,vmx}_check_nested_events()
are wrong, because I stand behind my quoted statement: it should be impossible for
KVM to synthesize a synchronous event before completing nested VM-Enter.
Blocking pending and injected exceptions was added by bfcf83b1444d ("KVM: nVMX:
Fix trying to cancel vmlauch/vmresume"), and unfortunately neither the changelog
nor Lore[1] provides any details as to what exactly prompted the fix. I _suspect_
it was either syzkaller induced, or the manifestation of other bugs in KVM's
exception handling.
Finally getting to the point, in theory, I _think_ KVM should actually WARN on
all cases where it temporarily blocks a synchronous event due to a pending nested
VM-Enter (see below diff for the basic gist). However, that would open the
floodgates to syzkaller, because KVM_SET_VCPU_EVENTS can obviously stuff events,
KVM_X86_SET_MCE can queue a #MC, etc.
I _think_ we might be able to get away with rejecting KVM_SET_VCPU_EVENTS if
nested_run_pending=1, without breaking userspace? Google's VMM is insane and does
KVM_SET_VCPU_EVENTS before KVM_SET_NESTED_STATE, but in real usage, i.e. outside
of selftests, (I hope) no VMM will restore into a "live" vCPU.
So instead of patch 1, I want to try either (a) blocking KVM_SET_VCPU_EVENTS,
KVM_X86_SET_MCE, and KVM_SET_GUEST_DEBUG if nested_run_pending=1, *and* follow-up
with the below WARN-spree, or (b) add a separate flag, e.g. nested_run_in_progress
or so, that is set with nested_run_pending, but cleared on an exit to userspace,
and then WARN on _that_, i.e. so that we can detect KVM bugs (the whole point of
the WARN) and hopefully stop playing this losing game of whack-a-mole with syzkaller.
I think I'm leaning toward (b)? Except for KVM_SET_GUEST_DEBUG, where userspace
is trying to interpose on the guest, restricting ioctls doesn't really add any
value in practice. Yeah, in theory it could _maybe_ prevent userspace from shooting
itself in the foot, but practically speaking, if userspace is restoring state into
a vCPU with nested_run_pending=1, it's either playing on expert mode or is already
completely broken.
My only hesitation with (b) is that KVM wouldn't be entirely consistent, since
vmx_unhandleable_emulation_required() _does_ explicitly reject a "userspace did
something stupid with nested_run_pending=1" case. So from that perspective, part
of me wants to get greedy and try for (a).
[1] https://lore.kernel.org/all/?q=%22KVM:%20nVMX:%20Fix%20trying%20to%20cancel%20vmlauch%22
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index de90b104a0dd..d624e4db704a 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1605,14 +1605,14 @@ static int svm_check_nested_events(struct kvm_vcpu *vcpu)
}
if (vcpu->arch.exception_vmexit.pending) {
- if (block_nested_exceptions)
+ if (WARN_ON_ONCE(block_nested_exceptions))
return -EBUSY;
nested_svm_inject_exception_vmexit(vcpu);
return 0;
}
if (vcpu->arch.exception.pending) {
- if (block_nested_exceptions)
+ if (WARN_ON_ONCE(block_nested_exceptions))
return -EBUSY;
return 0;
}
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 248635da6766..a223c5e86188 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4336,7 +4336,7 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
*/
if (vcpu->arch.exception_vmexit.pending &&
!vmx_is_low_priority_db_trap(&vcpu->arch.exception_vmexit)) {
- if (block_nested_exceptions)
+ if (WARN_ON_ONCE(block_nested_exceptions))
return -EBUSY;
nested_vmx_inject_exception_vmexit(vcpu);
@@ -4345,13 +4345,13 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
if (vcpu->arch.exception.pending &&
!vmx_is_low_priority_db_trap(&vcpu->arch.exception)) {
- if (block_nested_exceptions)
+ if (WARN_ON_ONCE(block_nested_exceptions))
return -EBUSY;
goto no_vmexit;
}
if (vmx->nested.mtf_pending) {
- if (block_nested_events)
+ if (WARN_ON_ONCE(block_nested_exceptions))
return -EBUSY;
nested_vmx_update_pending_dbg(vcpu);
nested_vmx_vmexit(vcpu, EXIT_REASON_MONITOR_TRAP_FLAG, 0, 0);
@@ -4359,7 +4359,7 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
}
if (vcpu->arch.exception_vmexit.pending) {
- if (block_nested_exceptions)
+ if (WARN_ON_ONCE(block_nested_exceptions))
return -EBUSY;
nested_vmx_inject_exception_vmexit(vcpu);
@@ -4367,7 +4367,7 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
}
if (vcpu->arch.exception.pending) {
- if (block_nested_exceptions)
+ if (WARN_ON_ONCE(block_nested_exceptions))
return -EBUSY;
goto no_vmexit;
}
next prev parent reply other threads:[~2026-02-27 16:06 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-27 1:13 [PATCH 0/3] KVM: x86: Fix incorrect handling of triple faults Yosry Ahmed
2026-02-27 1:13 ` [PATCH 1/3] KVM: x86: Move nested_run_pending to kvm_vcpu_arch Yosry Ahmed
2026-02-27 1:13 ` [PATCH 2/3] KVM: x86: Do not inject triple faults into an L2 with a pending run Yosry Ahmed
2026-02-27 1:13 ` [PATCH 3/3] KVM: x86: Check for injected exceptions before queuing a debug exception Yosry Ahmed
2026-02-27 16:06 ` Sean Christopherson [this message]
2026-02-27 16:34 ` Sean Christopherson
2026-02-27 17:31 ` Yosry Ahmed
2026-02-27 18:18 ` Sean Christopherson
2026-02-27 18:34 ` Yosry Ahmed
2026-03-02 23:22 ` Sean Christopherson
2026-03-02 23:36 ` Yosry Ahmed
2026-03-02 23:47 ` Sean Christopherson
2026-03-05 17:26 ` [PATCH 0/3] KVM: x86: Fix incorrect handling of triple faults Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aaG_o58_0aHT8Xjg@google.com \
--to=seanjc@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=yosry@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.