All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Yosry Ahmed <yosry@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/3] KVM: x86: Check for injected exceptions before queuing a debug exception
Date: Fri, 27 Feb 2026 10:18:28 -0800	[thread overview]
Message-ID: <aaHf9Lxx8ap_3DRI@google.com> (raw)
In-Reply-To: <CAO9r8zMdyvAJUvnxH0Scb6z3L51Djb1qpMAzX3M9g7hOkB=ZOQ@mail.gmail.com>

On Fri, Feb 27, 2026, Yosry Ahmed wrote:
> On Fri, Feb 27, 2026 at 8:34 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Fri, Feb 27, 2026, Sean Christopherson wrote:
> > > So instead of patch 1, I want to try either (a) blocking KVM_SET_VCPU_EVENTS,
> > > KVM_X86_SET_MCE, and KVM_SET_GUEST_DEBUG if nested_run_pending=1, *and* follow-up
> > > with the below WARN-spree, or (b) add a separate flag, e.g. nested_run_in_progress
> > > or so, that is set with nested_run_pending, but cleared on an exit to userspace,
> > > and then WARN on _that_, i.e. so that we can detect KVM bugs (the whole point of
> > > the WARN) and hopefully stop playing this losing game of whack-a-mole with syzkaller.
> 
> I like the idea of the WARN there, although something in the back of
> my mind tells me I went through this code before with an exception in
> mind that could be injected with nested_run_pending=1, but I can't
> remember it. Maybe it was injected by userspace and all is good.

If there is such a flow, it's likely a bug, i.e. we'd want the WARN.  AFAIK,
every single time the WARN has been hit in the last ~2-3 years has been due to
syzkaller.

> That being said, I hate nested_run_in_progress. It's too close to
> nested_run_pending and I am pretty sure they will be mixed up.

Agreed, though the fact that name is _too_ close means that, aside from the
potential for disaster (minor detail), it's accurate.

One thought is to hide nested_run_in_progress beyond a KConfig, so that attempts
to use it for anything but the sanity check(s) would fail the build.  I don't
really want to create yet another KVM_PROVE_xxx though, but unlike KVM_PROVE_MMU,
I think we want to this enabled in production.

I'll chew on this a bit...

> exception_from_userspace's name made me think this is something we
> could key off to WARN, but it's meant to morph queued exceptions from
> userspace into an "exception_vmexit" if needed. The field name is
> generic but its functionality isn't, maybe it should have been called
> exception_check_vmexit or something. Anyway..

No?  It's not a "check", it's literally an pending exception that has been morphed
to a VM-Exit.

Hmm, though looking at all that code again, I bet we can dedup a _lot_ code by
adding kvm_queued_exception.is_vmexit instead of tracking a completely separate
exception.  The only potential hiccup I can think of is if there's some wrinkle
with the interaction with already pending/injected exceptions.  Pending should be
fine, as the VM-Exit has priority.

Ah, scratch that idea, injected exceptions need to be tracked separate, e.g. see
vmcs12_save_pending_event().  It's correct for vmx_check_nested_events() to
deliver a VM-Exit even if there is an already-injected exception, e.g. if an EPT
Violation in L1's purview triggers when vectoring an injected exception, but in
that case, KVM needs to save the injected exception information into vmc{b,s}12.

> That gave me an idea though, can we add a field to
> kvm_queued_exception to identify the origin of the exception
> (userspace vs. KVM)? Then we can key the warning off of that.

That would incur non-trivial maintenance costs, and it would be tricky to get the
broader protection of the existing WARNing "right".  E.g. how would KVM know that
the VM-Exit was originally induced by an exception that was queued by userspace?

> We can potentially also avoid adding the field and just plumb the
> argument through to kvm_multiple_exception(), and WARN there if
> nested_run_pending is set and the origin is not userspace?

Not really, because kvm_vcpu_ioctl_x86_set_vcpu_events() doesn't use
kvm_queued_exception(), it stuffs things directly.

That said, if you want to try and code it up, I say go for it.  Worst case scenario
you'll have wasted a bit of time.

> > > I think I'm leaning toward (b)?  Except for KVM_SET_GUEST_DEBUG, where userspace
> > > is trying to interpose on the guest, restricting ioctls doesn't really add any
> > > value in practice.  Yeah, in theory it could _maybe_ prevent userspace from shooting
> > > itself in the foot, but practically speaking, if userspace is restoring state into
> > > a vCPU with nested_run_pending=1, it's either playing on expert mode or is already
> > > completely broken.
> > >
> > > My only hesitation with (b) is that KVM wouldn't be entirely consistent, since
> > > vmx_unhandleable_emulation_required() _does_ explicitly reject a "userspace did
> > > something stupid with nested_run_pending=1" case.  So from that perspective, part
> > > of me wants to get greedy and try for (a).
> >
> > On second (fifth?) thought, I don't think (a) is a good idea.  In addition to
> > potentially breaking userspace, it also risks preventing genuinely useful sequences.
> > E.g. even if no VMM does so today, it's entirely plausible that a VMM could want
> > to asynchronously inject an #MC to mimic a broadcast, and that the injection could
> > collide with a pending nested VM-Enter.
> >
> > I'll send a separate (maybe RFC?) series for (b) using patch 1 as a starting point.
> > I want to fiddle around with some ideas, and it'll be faster to sketch things out
> > in code versus trying to describe things in text.
> 
> So you'll apply patch 3 as-is, drop patch 2, and (potentially) take
> patch 1 and build another series on top of it?

Yeah, that's where I'm trending.

  reply	other threads:[~2026-02-27 18:18 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-27  1:13 [PATCH 0/3] KVM: x86: Fix incorrect handling of triple faults Yosry Ahmed
2026-02-27  1:13 ` [PATCH 1/3] KVM: x86: Move nested_run_pending to kvm_vcpu_arch Yosry Ahmed
2026-02-27  1:13 ` [PATCH 2/3] KVM: x86: Do not inject triple faults into an L2 with a pending run Yosry Ahmed
2026-02-27  1:13 ` [PATCH 3/3] KVM: x86: Check for injected exceptions before queuing a debug exception Yosry Ahmed
2026-02-27 16:06   ` Sean Christopherson
2026-02-27 16:34     ` Sean Christopherson
2026-02-27 17:31       ` Yosry Ahmed
2026-02-27 18:18         ` Sean Christopherson [this message]
2026-02-27 18:34           ` Yosry Ahmed
2026-03-02 23:22             ` Sean Christopherson
2026-03-02 23:36               ` Yosry Ahmed
2026-03-02 23:47                 ` Sean Christopherson
2026-03-05 17:26 ` [PATCH 0/3] KVM: x86: Fix incorrect handling of triple faults Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aaHf9Lxx8ap_3DRI@google.com \
    --to=seanjc@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=yosry@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.