public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Yosry Ahmed <yosry@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/3] KVM: x86: Check for injected exceptions before queuing a debug exception
Date: Fri, 27 Feb 2026 08:06:21 -0800	[thread overview]
Message-ID: <aaG_o58_0aHT8Xjg@google.com> (raw)
In-Reply-To: <20260227011306.3111731-4-yosry@kernel.org>

On Fri, Feb 27, 2026, Yosry Ahmed wrote:
> On KVM_SET_GUEST_DEBUG, if a #DB or #BP is injected with
> KVM_GUESTDBG_INJECT_DB or KVM_GUESTDBG_INJECT_BP, KVM fails with -EBUSY
> if there is an existing pending exception. This was introduced in
> commit 4f926bf29186 ("KVM: x86: Polish exception injection via
> KVM_SET_GUEST_DEBUG") to avoid a warning in kvm_queue_exception(),
> presumably to avoid overriding a pending exception.
> 
> This added another (arguably nice) property, if there's a pending
> exception, KVM_SET_GUEST_DEBUG cannot cause a #DF or triple fault.
> However, if an exception is injected, KVM_SET_GUEST_DEBUG will cause
> a #DF or triple fault in the guest, as kvm_multiple_exception() combines
> them.

First off, this patch looks good irrespective of nested crud.  Disallowing injection
of #DB/#BP while there's already an injected exception aligns with architectural
behavior; KVM needs to finish delivering the exception and thus "complete" the
instruction before queueing a new exception.

As for nested, I _was_ going to say that, assuming the original motivation for
this patch is to avoid triggering a nested VM-Exit while nested_run_pending=1,
this is incomplete.  E.g. if the #BP or #DB itself is being intercepted by L1,
then queueing the exception will incorrectly deliver the nested VM-Exit before
nested VM-Enter completes.  I.e. I _thought_ we would also need:

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index db3f393192d9..fdbd272027ed 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12526,6 +12526,9 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
        if (vcpu->arch.guest_state_protected)
                return -EINVAL;
 
+       if (vcpu->arch.nested_run_pending)
+               return -EBUSY;
+
        vcpu_load(vcpu);
 
        if (dbg->control & (KVM_GUESTDBG_INJECT_DB | KVM_GUESTDBG_INJECT_BP)) {

But that isn't actually the case, because {svm,vmx}_check_nested_events() blocks
exceptions and VM-Exits while nested_run_pending=1.  Off-list, I had rejected
modifying nested_vmx_triple_fault() to drop the triple fault like so:

@@ -5191,6 +5191,9 @@ void __nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
 
 static void nested_vmx_triple_fault(struct kvm_vcpu *vcpu)
 {
+       if (to_vmx(vcpu)->nested.nested_run_pending)
+               return;
+
        kvm_clear_request(KVM_REQ_TRIPLE_FAULT, vcpu);
        nested_vmx_vmexit(vcpu, EXIT_REASON_TRIPLE_FAULT, 0, 0);
 }

because "That would largely defeat the entire purpose of the WARN. The whole point
is to detect bugs where KVM synthesizes a VM-Exit before completing nested VM-Enter.
It should be impossible for nested_{vmx,svm}_triple_fault() to be called with
nested_run_pending=true."

Given the above, that argument doesn't hold up at first glance, because blocking
the triple fault if nested_run_pending=1 would be consistent with how other "synchronous"
events are handled by nVMX and nSVM.  But after staring at this a bit, unless I'm
forgetting something (entirely possible), I actually think {svm,vmx}_check_nested_events()
are wrong, because I stand behind my quoted statement: it should be impossible for
KVM to synthesize a synchronous event before completing nested VM-Enter.

Blocking pending and injected exceptions was added by bfcf83b1444d ("KVM: nVMX:
Fix trying to cancel vmlauch/vmresume"), and unfortunately neither the changelog
nor Lore[1] provides any details as to what exactly prompted the fix.  I _suspect_
it was either syzkaller induced, or the manifestation of other bugs in KVM's
exception handling.

Finally getting to the point, in theory, I _think_ KVM should actually WARN on
all cases where it temporarily blocks a synchronous event due to a pending nested
VM-Enter (see below diff for the basic gist).  However, that would open the
floodgates to syzkaller, because KVM_SET_VCPU_EVENTS can obviously stuff events,
KVM_X86_SET_MCE can queue a #MC, etc.

I _think_ we might be able to get away with rejecting KVM_SET_VCPU_EVENTS if
nested_run_pending=1, without breaking userspace?  Google's VMM is insane and does
KVM_SET_VCPU_EVENTS before KVM_SET_NESTED_STATE, but in real usage, i.e. outside
of selftests, (I hope) no VMM will restore into a "live" vCPU.

So instead of patch 1, I want to try either (a) blocking KVM_SET_VCPU_EVENTS,
KVM_X86_SET_MCE, and KVM_SET_GUEST_DEBUG if nested_run_pending=1, *and* follow-up
with the below WARN-spree, or (b) add a separate flag, e.g. nested_run_in_progress
or so, that is set with nested_run_pending, but cleared on an exit to userspace,
and then WARN on _that_, i.e. so that we can detect KVM bugs (the whole point of
the WARN) and hopefully stop playing this losing game of whack-a-mole with syzkaller.

I think I'm leaning toward (b)?  Except for KVM_SET_GUEST_DEBUG, where userspace
is trying to interpose on the guest, restricting ioctls doesn't really add any
value in practice.  Yeah, in theory it could _maybe_ prevent userspace from shooting
itself in the foot, but practically speaking, if userspace is restoring state into
a vCPU with nested_run_pending=1, it's either playing on expert mode or is already
completely broken.

My only hesitation with (b) is that KVM wouldn't be entirely consistent, since
vmx_unhandleable_emulation_required() _does_ explicitly reject a "userspace did
something stupid with nested_run_pending=1" case.  So from that perspective, part
of me wants to get greedy and try for (a).

[1] https://lore.kernel.org/all/?q=%22KVM:%20nVMX:%20Fix%20trying%20to%20cancel%20vmlauch%22

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index de90b104a0dd..d624e4db704a 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1605,14 +1605,14 @@ static int svm_check_nested_events(struct kvm_vcpu *vcpu)
        }
 
        if (vcpu->arch.exception_vmexit.pending) {
-               if (block_nested_exceptions)
+               if (WARN_ON_ONCE(block_nested_exceptions))
                         return -EBUSY;
                nested_svm_inject_exception_vmexit(vcpu);
                return 0;
        }
 
        if (vcpu->arch.exception.pending) {
-               if (block_nested_exceptions)
+               if (WARN_ON_ONCE(block_nested_exceptions))
                        return -EBUSY;
                return 0;
        }
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 248635da6766..a223c5e86188 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4336,7 +4336,7 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
         */
        if (vcpu->arch.exception_vmexit.pending &&
            !vmx_is_low_priority_db_trap(&vcpu->arch.exception_vmexit)) {
-               if (block_nested_exceptions)
+               if (WARN_ON_ONCE(block_nested_exceptions))
                        return -EBUSY;
 
                nested_vmx_inject_exception_vmexit(vcpu);
@@ -4345,13 +4345,13 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
 
        if (vcpu->arch.exception.pending &&
            !vmx_is_low_priority_db_trap(&vcpu->arch.exception)) {
-               if (block_nested_exceptions)
+               if (WARN_ON_ONCE(block_nested_exceptions))
                        return -EBUSY;
                goto no_vmexit;
        }
 
        if (vmx->nested.mtf_pending) {
-               if (block_nested_events)
+               if (WARN_ON_ONCE(block_nested_exceptions))
                        return -EBUSY;
                nested_vmx_update_pending_dbg(vcpu);
                nested_vmx_vmexit(vcpu, EXIT_REASON_MONITOR_TRAP_FLAG, 0, 0);
@@ -4359,7 +4359,7 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
        }
 
        if (vcpu->arch.exception_vmexit.pending) {
-               if (block_nested_exceptions)
+               if (WARN_ON_ONCE(block_nested_exceptions))
                        return -EBUSY;
 
                nested_vmx_inject_exception_vmexit(vcpu);
@@ -4367,7 +4367,7 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
        }
 
        if (vcpu->arch.exception.pending) {
-               if (block_nested_exceptions)
+               if (WARN_ON_ONCE(block_nested_exceptions))
                        return -EBUSY;
                goto no_vmexit;
        }


  reply	other threads:[~2026-02-27 16:06 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-27  1:13 [PATCH 0/3] KVM: x86: Fix incorrect handling of triple faults Yosry Ahmed
2026-02-27  1:13 ` [PATCH 1/3] KVM: x86: Move nested_run_pending to kvm_vcpu_arch Yosry Ahmed
2026-02-27  1:13 ` [PATCH 2/3] KVM: x86: Do not inject triple faults into an L2 with a pending run Yosry Ahmed
2026-02-27  1:13 ` [PATCH 3/3] KVM: x86: Check for injected exceptions before queuing a debug exception Yosry Ahmed
2026-02-27 16:06   ` Sean Christopherson [this message]
2026-02-27 16:34     ` Sean Christopherson
2026-02-27 17:31       ` Yosry Ahmed
2026-02-27 18:18         ` Sean Christopherson
2026-02-27 18:34           ` Yosry Ahmed
2026-03-02 23:22             ` Sean Christopherson
2026-03-02 23:36               ` Yosry Ahmed
2026-03-02 23:47                 ` Sean Christopherson
2026-03-05 17:26 ` [PATCH 0/3] KVM: x86: Fix incorrect handling of triple faults Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aaG_o58_0aHT8Xjg@google.com \
    --to=seanjc@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=yosry@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox