public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Yosry Ahmed <yosry.ahmed@linux.dev>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	 stable@vger.kernel.org
Subject: Re: [RFC PATCH 4/5] KVM: SVM: Recalculate nested RIPs after restoring REGS/SREGS
Date: Fri, 20 Feb 2026 09:07:16 -0800	[thread overview]
Message-ID: <aZiUxBRPovFd4nDd@google.com> (raw)
In-Reply-To: <wwa2h5gcb7gfxgmsh3jdwa4d4xurkmgd26dnkwupgzcln3khfu@v3w2w6nf4tq7>

On Thu, Feb 19, 2026, Yosry Ahmed wrote:
> > E.g. after patch 2, completely untested...
> > 
> > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > index aec17c80ed73..6fc1b2e212d2 100644
> > --- a/arch/x86/kvm/svm/nested.c
> > +++ b/arch/x86/kvm/svm/nested.c
> > @@ -863,12 +863,9 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
> 
> Above the context lines we have:
> 
>         /*
>          * next_rip is consumed on VMRUN as the return address pushed on the
>          * stack for injected soft exceptions/interrupts.  If nrips is exposed
>          * to L1, take it verbatim from vmcb12.  If nrips is supported in
>          * hardware but not exposed to L1, stuff the actual L2 RIP to emulate
>          * what a nrips=0 CPU would do (L1 is responsible for advancing RIP
>          * prior to injecting the event).
>          */
>         if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
>                 vmcb02->control.next_rip    = svm->nested.ctl.next_rip;
>         else if (boot_cpu_has(X86_FEATURE_NRIPS))
>                 vmcb02->control.next_rip    = vmcb12_rip;
> 
> The same bug affects vmcb02->control.next_rip when the guest doesn't
> have NRIPS. I think we don't want to move part of the vmcb02
> initialization before VMRUN too. We can keep the initialization here and
> overwrite it before VMRUN if needed, but that's just also ugh..

Aha!  I knew I was missing something, but I couldn't quite get my brain to figure
out what.

I don't have a super strong preference as to copying the code or moving it
wholesale.  Though I would say if the pre-VMRUN logic is _identical_ (and I think
it is?), then we move it, and simply update the comment in
nested_vmcb02_prepare_control() to call that out.

> >         svm->nmi_l1_to_l2 = is_evtinj_nmi(vmcb02->control.event_inj);
> >         if (is_evtinj_soft(vmcb02->control.event_inj)) {
> >                 svm->soft_int_injected = true;
> > -               svm->soft_int_csbase = vmcb12_csbase;
> > -               svm->soft_int_old_rip = vmcb12_rip;
> > +
> >                 if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> >                         svm->soft_int_next_rip = svm->nested.ctl.next_rip;
> 
> Why not move this too?

For the same reason I think we should keep 

	if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
		vmcb02->control.next_rip    = svm->nested.ctl.next_rip;

where it is.  When NRIPS is exposed to the guest, the incoming nested state is
the one and only source of truth.  By keeping the code different, we'd effectively
be documenting that the host.NRIPS+!guest.NRIPS case is the anomaly.

> > -               else
> > -                       svm->soft_int_next_rip = vmcb12_rip;
> >         }
> >  
> >         /* LBR_CTL_ENABLE_MASK is controlled by svm_update_lbrv() */
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 8f8bc863e214..358ec940ffc9 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -4322,6 +4322,14 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
> >                 return EXIT_FASTPATH_EXIT_USERSPACE;
> >         }
> >  
> > +       if (is_guest_mode(vcpu) && svm->nested.nested_run_pending &&
> > +           svm->soft_int_injected) {
> > +               svm->soft_int_csbase = svm->vmcb->save.cs.base;
> > +               svm->soft_int_old_rip = kvm_rip_read(vcpu);
> > +               if (!guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> > +                       svm->soft_int_next_rip = kvm_rip_read(vcpu);
> > +       }
> > +
> 
> I generally dislike adding more is_guest_mode() stuff in svm_vcpu_run(),
> maybe we can refactor them later to pre-run and post-run nested
> callbacks? Anyway, not a big deal, definitely an improvement over the
> current patch assuming we can figure out how to fix next_rip.

I don't love it either, but I want to (a) avoid unnecessarily overwriting the
fields, e.g. if KVM never actually does VMRUN and (b) minimize the probability
of consuming a bad RIP.

In practice, I would expect the nested_run_pending check to be correctly predicted
the overwhelming majority of the time, i.e. I don't anticipate performance issues
due to putting the code in the hot path.

If we want to try and move the update out of svm_vcpu_run(), then we shouldn't
need generic pre/post callbacks for nested, svm_prepare_switch_to_guest() is the
right fit.


  reply	other threads:[~2026-02-20 17:07 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-12 23:07 [RFC PATCH 0/5] KVM: nSVM: Fix RIP usage in the control area after restore Yosry Ahmed
2026-02-12 23:07 ` [RFC PATCH 1/5] KVM: nSVM: Do not use L2's RIP for vmcb02's NextRIP after first L2 VMRUN Yosry Ahmed
2026-02-18 23:22   ` Sean Christopherson
2026-02-18 23:38     ` Yosry Ahmed
2026-02-12 23:07 ` [RFC PATCH 2/5] KVM: nSVM: Use the correct RIP when restoring vmcb02's control area Yosry Ahmed
2026-02-12 23:07 ` [RFC PATCH 3/5] KVM: nSVM: Move updating NextRIP and soft IRQ RIPs into a helper Yosry Ahmed
2026-02-12 23:07 ` [RFC PATCH 4/5] KVM: SVM: Recalculate nested RIPs after restoring REGS/SREGS Yosry Ahmed
2026-02-19  0:13   ` Sean Christopherson
2026-02-19  0:26     ` Yosry Ahmed
2026-02-20 17:07       ` Sean Christopherson [this message]
2026-02-20 20:27         ` Yosry Ahmed
2026-02-20 22:50           ` Sean Christopherson
2026-02-12 23:07 ` [RFC PATCH 5/5] DO NOT MERGE: KVM: selftests: Reproduce nested RIP restore bug Yosry Ahmed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aZiUxBRPovFd4nDd@google.com \
    --to=seanjc@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=yosry.ahmed@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox