[PATCH v1 1/4] KVM: nSVM: Always use NextRIP as vmcb02's NextRIP after first L2 VMRUN

public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v1 1/4] KVM: nSVM: Always use NextRIP as vmcb02's NextRIP after first L2 VMRUN
       [not found] <20260223154636.116671-1-yosry@kernel.org>
@ 2026-02-23 15:46 ` Yosry Ahmed
  2026-02-23 15:46 ` [PATCH v1 2/4] KVM: nSVM: Delay stuffing L2's current RIP into NextRIP until vCPU run Yosry Ahmed
  2026-02-23 15:46 ` [PATCH v1 3/4] KVM: nSVM: Delay setting soft IRQ RIP tracking fields " Yosry Ahmed
  2 siblings, 0 replies; 10+ messages in thread
From: Yosry Ahmed @ 2026-02-23 15:46 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable

For guests with NRIPS disabled, L1 does not provide NextRIP when running
an L2 with an injected soft interrupt, instead it advances the current RIP
before running it. KVM uses the current RIP as the NextRIP in vmcb02 to
emulate a CPU without NRIPS.

However, after L2 runs the first time, NextRIP will be updated by the
CPU and/or KVM, and the current RIP is no longer the correct value to
use in vmcb02.  Hence, after save/restore, use the current RIP if and
only if a nested run is pending, otherwise use NextRIP.

Fixes: cc440cdad5b7 ("KVM: nSVM: implement KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE")
CC: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/svm/nested.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index de90b104a0dd5..a82e6f0472ca7 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -844,17 +844,24 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
 	vmcb02->control.event_inj_err       = svm->nested.ctl.event_inj_err;
 
 	/*
-	 * next_rip is consumed on VMRUN as the return address pushed on the
+	 * NextRIP is consumed on VMRUN as the return address pushed on the
 	 * stack for injected soft exceptions/interrupts.  If nrips is exposed
-	 * to L1, take it verbatim from vmcb12.  If nrips is supported in
-	 * hardware but not exposed to L1, stuff the actual L2 RIP to emulate
-	 * what a nrips=0 CPU would do (L1 is responsible for advancing RIP
-	 * prior to injecting the event).
+	 * to L1, take it verbatim from vmcb12.
+	 *
+	 * If nrips is supported in hardware but not exposed to L1, stuff the
+	 * actual L2 RIP to emulate what a nrips=0 CPU would do (L1 is
+	 * responsible for advancing RIP prior to injecting the event). This is
+	 * only the case for the first L2 run after VMRUN. After that (e.g.
+	 * during save/restore), NextRIP is updated by the CPU and/or KVM, and
+	 * the value of the L2 RIP from vmcb12 should not be used.
 	 */
-	if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
-		vmcb02->control.next_rip    = svm->nested.ctl.next_rip;
-	else if (boot_cpu_has(X86_FEATURE_NRIPS))
-		vmcb02->control.next_rip    = vmcb12_rip;
+	if (boot_cpu_has(X86_FEATURE_NRIPS)) {
+		if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS) ||
+		    !svm->nested.nested_run_pending)
+			vmcb02->control.next_rip    = svm->nested.ctl.next_rip;
+		else
+			vmcb02->control.next_rip    = vmcb12_rip;
+	}
 
 	svm->nmi_l1_to_l2 = is_evtinj_nmi(vmcb02->control.event_inj);
 	if (is_evtinj_soft(vmcb02->control.event_inj)) {
-- 
2.53.0.345.g96ddfc5eaa-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v1 2/4] KVM: nSVM: Delay stuffing L2's current RIP into NextRIP until vCPU run
       [not found] <20260223154636.116671-1-yosry@kernel.org>
  2026-02-23 15:46 ` [PATCH v1 1/4] KVM: nSVM: Always use NextRIP as vmcb02's NextRIP after first L2 VMRUN Yosry Ahmed
@ 2026-02-23 15:46 ` Yosry Ahmed
  2026-02-25  0:07   ` Yosry Ahmed
  2026-02-23 15:46 ` [PATCH v1 3/4] KVM: nSVM: Delay setting soft IRQ RIP tracking fields " Yosry Ahmed
  2 siblings, 1 reply; 10+ messages in thread
From: Yosry Ahmed @ 2026-02-23 15:46 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable

For guests with NRIPS disabled, L1 does not provide NextRIP when running
an L2 with an injected soft interrupt, instead it advances L2's RIP
before running it. KVM uses L2's current RIP as the NextRIP in vmcb02 to
emulate a CPU without NRIPS.

However, in svm_set_nested_state(), the value used for L2's current RIP
comes from vmcb02, which is just whatever the vCPU had in vmcb02 before
restoring nested state (zero on a freshly created vCPU). Passing the
cached RIP value instead (i.e. kvm_rip_read()) would only fix the issue
if registers are restored before nested state.

Instead, split the logic of setting NextRIP in vmcb02. Handle the
'normal' case of initializing vmcb02's NextRIP using NextRIP from vmcb12
(or KVM_GET_NESTED_STATE's payload) in nested_vmcb02_prepare_control().
Delay the special case of stuffing L2's current RIP into vmcb02's
NextRIP until shortly before the vCPU is run, to make sure the most
up-to-date value of RIP is used regardless of KVM_SET_REGS and
KVM_SET_NESTED_STATE's relative ordering.

Fixes: cc440cdad5b7 ("KVM: nSVM: implement KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE")
CC: stable@vger.kernel.org
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/svm/nested.c | 25 ++++++++-----------------
 arch/x86/kvm/svm/svm.c    | 18 ++++++++++++++++++
 2 files changed, 26 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index a82e6f0472ca7..b7c80aeaebab3 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -844,24 +844,15 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
 	vmcb02->control.event_inj_err       = svm->nested.ctl.event_inj_err;
 
 	/*
-	 * NextRIP is consumed on VMRUN as the return address pushed on the
-	 * stack for injected soft exceptions/interrupts.  If nrips is exposed
-	 * to L1, take it verbatim from vmcb12.
-	 *
-	 * If nrips is supported in hardware but not exposed to L1, stuff the
-	 * actual L2 RIP to emulate what a nrips=0 CPU would do (L1 is
-	 * responsible for advancing RIP prior to injecting the event). This is
-	 * only the case for the first L2 run after VMRUN. After that (e.g.
-	 * during save/restore), NextRIP is updated by the CPU and/or KVM, and
-	 * the value of the L2 RIP from vmcb12 should not be used.
+	 * If nrips is exposed to L1, take NextRIP as-is.  Otherwise, L1
+	 * advances L2's RIP before VMRUN instead of using NextRIP. KVM will
+	 * stuff the current RIP as vmcb02's NextRIP before L2 is run.  After
+	 * the first run of L2 (e.g. after save+restore), NextRIP is updated by
+	 * the CPU and/or KVM and should be used regardless of L1's support.
 	 */
-	if (boot_cpu_has(X86_FEATURE_NRIPS)) {
-		if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS) ||
-		    !svm->nested.nested_run_pending)
-			vmcb02->control.next_rip    = svm->nested.ctl.next_rip;
-		else
-			vmcb02->control.next_rip    = vmcb12_rip;
-	}
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS) ||
+	    !svm->nested.nested_run_pending)
+		vmcb02->control.next_rip = svm->nested.ctl.next_rip;
 
 	svm->nmi_l1_to_l2 = is_evtinj_nmi(vmcb02->control.event_inj);
 	if (is_evtinj_soft(vmcb02->control.event_inj)) {
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 8f8bc863e2143..e084b9688f556 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1413,6 +1413,24 @@ static void svm_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
 		sd->bp_spec_reduce_set = true;
 		msr_set_bit(MSR_ZEN4_BP_CFG, MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT);
 	}
+
+	/*
+	 * If nrips is supported in hardware but not exposed to L1, stuff the
+	 * actual L2 RIP to emulate what a nrips=0 CPU would do (L1 is
+	 * responsible for advancing RIP prior to injecting the event). Once L2
+	 * runs after L1 executes VMRUN, NextRIP is updated by the CPU and/or
+	 * KVM, and this is no longer needed.
+	 *
+	 * This is done here (as opposed to when preparing vmcb02) to use the
+	 * most up-to-date value of RIP regardless of the order of restoring
+	 * registers and nested state in the vCPU save+restore path.
+	 */
+	if (is_guest_mode(vcpu) && svm->nested.nested_run_pending) {
+		if (boot_cpu_has(X86_FEATURE_NRIPS) &&
+		    !guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
+			svm->vmcb->control.next_rip = kvm_rip_read(vcpu);
+	}
+
 	svm->guest_state_loaded = true;
 }
 
-- 
2.53.0.345.g96ddfc5eaa-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v1 3/4] KVM: nSVM: Delay setting soft IRQ RIP tracking fields until vCPU run
       [not found] <20260223154636.116671-1-yosry@kernel.org>
  2026-02-23 15:46 ` [PATCH v1 1/4] KVM: nSVM: Always use NextRIP as vmcb02's NextRIP after first L2 VMRUN Yosry Ahmed
  2026-02-23 15:46 ` [PATCH v1 2/4] KVM: nSVM: Delay stuffing L2's current RIP into NextRIP until vCPU run Yosry Ahmed
@ 2026-02-23 15:46 ` Yosry Ahmed
  2 siblings, 0 replies; 10+ messages in thread
From: Yosry Ahmed @ 2026-02-23 15:46 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable

In the save+restore path, when restoring nested state, the values of RIP
and CS base passed into nested_vmcb02_prepare_control() are mostly
incorrect.  They are both pulled from the vmcb02. For CS base, the value
is only correct if system regs are restored before nested state. The
value of RIP is whatever the vCPU had in vmcb02 before restoring nested
state (zero on a freshly created vCPU).

Instead, take a similar approach to NextRIP, and delay initializing the
RIP tracking fields until shortly before the vCPU is run, to make sure
the most up-to-date values of RIP and CS base are used regardless of
KVM_SET_SREGS, KVM_SET_REGS, and KVM_SET_NESTED_STATE's relative
ordering.

Fixes: cc440cdad5b7 ("KVM: nSVM: implement KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE")
CC: stable@vger.kernel.org
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/svm/nested.c | 17 ++++++++---------
 arch/x86/kvm/svm/svm.c    | 10 ++++++++++
 2 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index b7c80aeaebab3..0547fd2810a3a 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -741,9 +741,7 @@ static bool is_evtinj_nmi(u32 evtinj)
 	return type == SVM_EVTINJ_TYPE_NMI;
 }
 
-static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
-					  unsigned long vmcb12_rip,
-					  unsigned long vmcb12_csbase)
+static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
 {
 	u32 int_ctl_vmcb01_bits = V_INTR_MASKING_MASK;
 	u32 int_ctl_vmcb12_bits = V_TPR_MASK | V_IRQ_INJECTION_BITS_MASK;
@@ -855,14 +853,15 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
 		vmcb02->control.next_rip = svm->nested.ctl.next_rip;
 
 	svm->nmi_l1_to_l2 = is_evtinj_nmi(vmcb02->control.event_inj);
+
+	/*
+	 * soft_int_csbase, soft_int_old_rip, and soft_int_next_rip (if L1
+	 * doesn't have NRIPS)  are initialized later, before the vCPU is run.
+	 */
 	if (is_evtinj_soft(vmcb02->control.event_inj)) {
 		svm->soft_int_injected = true;
-		svm->soft_int_csbase = vmcb12_csbase;
-		svm->soft_int_old_rip = vmcb12_rip;
 		if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
 			svm->soft_int_next_rip = svm->nested.ctl.next_rip;
-		else
-			svm->soft_int_next_rip = vmcb12_rip;
 	}
 
 	/* LBR_CTL_ENABLE_MASK is controlled by svm_update_lbrv() */
@@ -960,7 +959,7 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa,
 	nested_svm_copy_common_state(svm->vmcb01.ptr, svm->nested.vmcb02.ptr);
 
 	svm_switch_vmcb(svm, &svm->nested.vmcb02);
-	nested_vmcb02_prepare_control(svm, vmcb12->save.rip, vmcb12->save.cs.base);
+	nested_vmcb02_prepare_control(svm);
 	nested_vmcb02_prepare_save(svm, vmcb12);
 
 	ret = nested_svm_load_cr3(&svm->vcpu, svm->nested.save.cr3,
@@ -1905,7 +1904,7 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
 	nested_copy_vmcb_control_to_cache(svm, ctl);
 
 	svm_switch_vmcb(svm, &svm->nested.vmcb02);
-	nested_vmcb02_prepare_control(svm, svm->vmcb->save.rip, svm->vmcb->save.cs.base);
+	nested_vmcb02_prepare_control(svm);
 
 	/*
 	 * While the nested guest CR3 is already checked and set by
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e084b9688f556..37f3b031b3a76 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1424,11 +1424,21 @@ static void svm_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
 	 * This is done here (as opposed to when preparing vmcb02) to use the
 	 * most up-to-date value of RIP regardless of the order of restoring
 	 * registers and nested state in the vCPU save+restore path.
+	 *
+	 * Simiarly, initialize svm->soft_int_* fields here to use the most
+	 * up-to-date values of RIP and CS base, regardless of restore order.
 	 */
 	if (is_guest_mode(vcpu) && svm->nested.nested_run_pending) {
 		if (boot_cpu_has(X86_FEATURE_NRIPS) &&
 		    !guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
 			svm->vmcb->control.next_rip = kvm_rip_read(vcpu);
+
+		if (svm->soft_int_injected) {
+			svm->soft_int_csbase = svm->vmcb->save.cs.base;
+			svm->soft_int_old_rip = kvm_rip_read(vcpu);
+			if (!guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
+				svm->soft_int_next_rip = kvm_rip_read(vcpu);
+		}
 	}
 
 	svm->guest_state_loaded = true;
-- 
2.53.0.345.g96ddfc5eaa-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 2/4] KVM: nSVM: Delay stuffing L2's current RIP into NextRIP until vCPU run
  2026-02-23 15:46 ` [PATCH v1 2/4] KVM: nSVM: Delay stuffing L2's current RIP into NextRIP until vCPU run Yosry Ahmed
@ 2026-02-25  0:07   ` Yosry Ahmed
  2026-02-25  0:56     ` Sean Christopherson
  0 siblings, 1 reply; 10+ messages in thread
From: Yosry Ahmed @ 2026-02-25  0:07 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, stable

> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 8f8bc863e2143..e084b9688f556 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1413,6 +1413,24 @@ static void svm_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
>                 sd->bp_spec_reduce_set = true;
>                 msr_set_bit(MSR_ZEN4_BP_CFG, MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT);
>         }
> +
> +       /*
> +        * If nrips is supported in hardware but not exposed to L1, stuff the
> +        * actual L2 RIP to emulate what a nrips=0 CPU would do (L1 is
> +        * responsible for advancing RIP prior to injecting the event). Once L2
> +        * runs after L1 executes VMRUN, NextRIP is updated by the CPU and/or
> +        * KVM, and this is no longer needed.
> +        *
> +        * This is done here (as opposed to when preparing vmcb02) to use the
> +        * most up-to-date value of RIP regardless of the order of restoring
> +        * registers and nested state in the vCPU save+restore path.
> +        */
> +       if (is_guest_mode(vcpu) && svm->nested.nested_run_pending) {
> +               if (boot_cpu_has(X86_FEATURE_NRIPS) &&
> +                   !guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> +                       svm->vmcb->control.next_rip = kvm_rip_read(vcpu);
> +       }
> +

Doing this in svm_prepare_switch_to_guest() is wrong, or at least
after the svm->guest_state_loaded check. It's possible to emulate the
nested VMRUN without doing a vcpu_put(), which means
svm->guest_state_loaded will remain true and this code will be
skipped.

In fact, this breaks the svm_nested_soft_inject_test test. Funny
enough, I was only running it with my repro changes, which papered
over the bug because it forced an exit to userspace after VMRUN due to
single-stepping, so svm->guest_state_loaded got cleared and the code
was executed on the next KVM_RUN, before L2 runs.

I can move it above the svm->guest_state_loaded check, but I think I
will just put it in pre_svm_run() instead.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 2/4] KVM: nSVM: Delay stuffing L2's current RIP into NextRIP until vCPU run
  2026-02-25  0:07   ` Yosry Ahmed
@ 2026-02-25  0:56     ` Sean Christopherson
  2026-02-25  1:00       ` Yosry Ahmed
  0 siblings, 1 reply; 10+ messages in thread
From: Sean Christopherson @ 2026-02-25  0:56 UTC (permalink / raw)
  To: Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel, stable

On Tue, Feb 24, 2026, Yosry Ahmed wrote:
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 8f8bc863e2143..e084b9688f556 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -1413,6 +1413,24 @@ static void svm_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
> >                 sd->bp_spec_reduce_set = true;
> >                 msr_set_bit(MSR_ZEN4_BP_CFG, MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT);
> >         }
> > +
> > +       /*
> > +        * If nrips is supported in hardware but not exposed to L1, stuff the
> > +        * actual L2 RIP to emulate what a nrips=0 CPU would do (L1 is
> > +        * responsible for advancing RIP prior to injecting the event). Once L2
> > +        * runs after L1 executes VMRUN, NextRIP is updated by the CPU and/or
> > +        * KVM, and this is no longer needed.
> > +        *
> > +        * This is done here (as opposed to when preparing vmcb02) to use the
> > +        * most up-to-date value of RIP regardless of the order of restoring
> > +        * registers and nested state in the vCPU save+restore path.
> > +        */
> > +       if (is_guest_mode(vcpu) && svm->nested.nested_run_pending) {
> > +               if (boot_cpu_has(X86_FEATURE_NRIPS) &&
> > +                   !guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> > +                       svm->vmcb->control.next_rip = kvm_rip_read(vcpu);
> > +       }
> > +
> 
> Doing this in svm_prepare_switch_to_guest() is wrong, or at least
> after the svm->guest_state_loaded check. It's possible to emulate the
> nested VMRUN without doing a vcpu_put(), which means
> svm->guest_state_loaded will remain true and this code will be
> skipped.
> 
> In fact, this breaks the svm_nested_soft_inject_test test. Funny
> enough, I was only running it with my repro changes, which papered
> over the bug because it forced an exit to userspace after VMRUN due to
> single-stepping, so svm->guest_state_loaded got cleared and the code
> was executed on the next KVM_RUN, before L2 runs.
> 
> I can move it above the svm->guest_state_loaded check, but I think I
> will just put it in pre_svm_run() instead.

I would rather not expand pre_svm_run(), and instead just open code it in
svm_vcpu_run().  pre_svm_run() probably should never have been added, because
it's far from a generic "pre run" API.  E.g. if we want to keep the helper around,
it should probably be named something something ASID.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 2/4] KVM: nSVM: Delay stuffing L2's current RIP into NextRIP until vCPU run
  2026-02-25  0:56     ` Sean Christopherson
@ 2026-02-25  1:00       ` Yosry Ahmed
  2026-02-25  1:10         ` Sean Christopherson
  0 siblings, 1 reply; 10+ messages in thread
From: Yosry Ahmed @ 2026-02-25  1:00 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, stable

> > Doing this in svm_prepare_switch_to_guest() is wrong, or at least
> > after the svm->guest_state_loaded check. It's possible to emulate the
> > nested VMRUN without doing a vcpu_put(), which means
> > svm->guest_state_loaded will remain true and this code will be
> > skipped.
> >
> > In fact, this breaks the svm_nested_soft_inject_test test. Funny
> > enough, I was only running it with my repro changes, which papered
> > over the bug because it forced an exit to userspace after VMRUN due to
> > single-stepping, so svm->guest_state_loaded got cleared and the code
> > was executed on the next KVM_RUN, before L2 runs.
> >
> > I can move it above the svm->guest_state_loaded check, but I think I
> > will just put it in pre_svm_run() instead.
>
> I would rather not expand pre_svm_run(), and instead just open code it in
> svm_vcpu_run().  pre_svm_run() probably should never have been added, because
> it's far from a generic "pre run" API.  E.g. if we want to keep the helper around,
> it should probably be named something something ASID.

I sent a new version before I saw your response.. sorry.

How strongly do you feel about this? :P

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 2/4] KVM: nSVM: Delay stuffing L2's current RIP into NextRIP until vCPU run
  2026-02-25  1:00       ` Yosry Ahmed
@ 2026-02-25  1:10         ` Sean Christopherson
  2026-02-25  1:15           ` Yosry Ahmed
  0 siblings, 1 reply; 10+ messages in thread
From: Sean Christopherson @ 2026-02-25  1:10 UTC (permalink / raw)
  To: Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel, stable

On Tue, Feb 24, 2026, Yosry Ahmed wrote:
> > > Doing this in svm_prepare_switch_to_guest() is wrong, or at least
> > > after the svm->guest_state_loaded check. It's possible to emulate the
> > > nested VMRUN without doing a vcpu_put(), which means
> > > svm->guest_state_loaded will remain true and this code will be
> > > skipped.
> > >
> > > In fact, this breaks the svm_nested_soft_inject_test test. Funny
> > > enough, I was only running it with my repro changes, which papered
> > > over the bug because it forced an exit to userspace after VMRUN due to
> > > single-stepping, so svm->guest_state_loaded got cleared and the code
> > > was executed on the next KVM_RUN, before L2 runs.
> > >
> > > I can move it above the svm->guest_state_loaded check, but I think I
> > > will just put it in pre_svm_run() instead.
> >
> > I would rather not expand pre_svm_run(), and instead just open code it in
> > svm_vcpu_run().  pre_svm_run() probably should never have been added, because
> > it's far from a generic "pre run" API.  E.g. if we want to keep the helper around,
> > it should probably be named something something ASID.
> 
> I sent a new version before I saw your response.. sorry.
> 
> How strongly do you feel about this? :P

Strong enough that I'll fix it up when applying, unless it's a sticking point on
your end.

E.g. one thing that I don't love is that the code never runs for SEV guests.
Which is fine, because in practice I can't imagine KVM ever supporting nested
SVM for SEV, but it adds unnecessary cognitive load, because readers need to
reason through why the code only applies to !SEV guests.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 2/4] KVM: nSVM: Delay stuffing L2's current RIP into NextRIP until vCPU run
  2026-02-25  1:10         ` Sean Christopherson
@ 2026-02-25  1:15           ` Yosry Ahmed
  2026-02-25  1:25             ` Sean Christopherson
  0 siblings, 1 reply; 10+ messages in thread
From: Yosry Ahmed @ 2026-02-25  1:15 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, stable

On Tue, Feb 24, 2026 at 5:10 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Tue, Feb 24, 2026, Yosry Ahmed wrote:
> > > > Doing this in svm_prepare_switch_to_guest() is wrong, or at least
> > > > after the svm->guest_state_loaded check. It's possible to emulate the
> > > > nested VMRUN without doing a vcpu_put(), which means
> > > > svm->guest_state_loaded will remain true and this code will be
> > > > skipped.
> > > >
> > > > In fact, this breaks the svm_nested_soft_inject_test test. Funny
> > > > enough, I was only running it with my repro changes, which papered
> > > > over the bug because it forced an exit to userspace after VMRUN due to
> > > > single-stepping, so svm->guest_state_loaded got cleared and the code
> > > > was executed on the next KVM_RUN, before L2 runs.
> > > >
> > > > I can move it above the svm->guest_state_loaded check, but I think I
> > > > will just put it in pre_svm_run() instead.
> > >
> > > I would rather not expand pre_svm_run(), and instead just open code it in
> > > svm_vcpu_run().  pre_svm_run() probably should never have been added, because
> > > it's far from a generic "pre run" API.  E.g. if we want to keep the helper around,
> > > it should probably be named something something ASID.
> >
> > I sent a new version before I saw your response.. sorry.
> >
> > How strongly do you feel about this? :P
>
> Strong enough that I'll fix it up when applying, unless it's a sticking point on
> your end.

It's just that 99% of the time someone is reading svm_vcpu_run(), they
won't care about this code, and it's also cognitive load to filter it
out. We can add a helper for this code (and the soft IRQ inject),
something like svm_fixup_nested_rips() or sth.

We discussed a helper before and you didn't like it, but that was in a
different context (a helper that combined normal and special cases).
WDYT?

>
> E.g. one thing that I don't love is that the code never runs for SEV guests.
> Which is fine, because in practice I can't imagine KVM ever supporting nested
> SVM for SEV, but it adds unnecessary cognitive load, because readers need to
> reason through why the code only applies to !SEV guests.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 2/4] KVM: nSVM: Delay stuffing L2's current RIP into NextRIP until vCPU run
  2026-02-25  1:15           ` Yosry Ahmed
@ 2026-02-25  1:25             ` Sean Christopherson
  2026-02-25  1:42               ` Yosry Ahmed
  0 siblings, 1 reply; 10+ messages in thread
From: Sean Christopherson @ 2026-02-25  1:25 UTC (permalink / raw)
  To: Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel, stable

On Tue, Feb 24, 2026, Yosry Ahmed wrote:
> On Tue, Feb 24, 2026 at 5:10 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Tue, Feb 24, 2026, Yosry Ahmed wrote:
> > > > > Doing this in svm_prepare_switch_to_guest() is wrong, or at least
> > > > > after the svm->guest_state_loaded check. It's possible to emulate the
> > > > > nested VMRUN without doing a vcpu_put(), which means
> > > > > svm->guest_state_loaded will remain true and this code will be
> > > > > skipped.
> > > > >
> > > > > In fact, this breaks the svm_nested_soft_inject_test test. Funny
> > > > > enough, I was only running it with my repro changes, which papered
> > > > > over the bug because it forced an exit to userspace after VMRUN due to
> > > > > single-stepping, so svm->guest_state_loaded got cleared and the code
> > > > > was executed on the next KVM_RUN, before L2 runs.
> > > > >
> > > > > I can move it above the svm->guest_state_loaded check, but I think I
> > > > > will just put it in pre_svm_run() instead.
> > > >
> > > > I would rather not expand pre_svm_run(), and instead just open code it in
> > > > svm_vcpu_run().  pre_svm_run() probably should never have been added, because
> > > > it's far from a generic "pre run" API.  E.g. if we want to keep the helper around,
> > > > it should probably be named something something ASID.
> > >
> > > I sent a new version before I saw your response.. sorry.
> > >
> > > How strongly do you feel about this? :P
> >
> > Strong enough that I'll fix it up when applying, unless it's a sticking point on
> > your end.
> 
> It's just that 99% of the time someone is reading svm_vcpu_run(), they
> won't care about this code, and it's also cognitive load to filter it
> out. We can add a helper for this code (and the soft IRQ inject),
> something like svm_fixup_nested_rips() or sth.

I don't entirely disagree, but at the same time, why is someome reading svm_vcpu_run()
if they don't want to look at the gory details?

> We discussed a helper before and you didn't like it, but that was in a
> different context (a helper that combined normal and special cases).
> WDYT?

A helper would work.  svm_fixup_nested_rips() is good, the only flaw is the CS.base
chunk, but I'm not sure I care enough about 32-bit to reject the name just because
of that :-)

That would make it easier to reduce indentation, e.g.

static void svm_fixup_nested_rips(struct kvm_vcpu *vcpu)
{
	struct vcpu_svm *svm = to_svm(vcpu);

	/*
	 * If nrips is supported in hardware but not exposed to L1, stuff the
	 * actual L2 RIP to emulate what a nrips=0 CPU would do (L1 is
	 * responsible for advancing RIP prior to injecting the event). Once L2
	 * runs after L1 executes VMRUN, NextRIP is updated by the CPU and/or
	 * KVM, and this is no longer needed.
	 *
	 * This is done here (as opposed to when preparing vmcb02) to use the
	 * most up-to-date value of RIP regardless of the order of restoring
	 * registers and nested state in the vCPU save+restore path.
	 *
	 * Simiarly, initialize svm->soft_int_* fields here to use the most
	 * up-to-date values of RIP and CS base, regardless of restore order.
	 */
	if (!is_guest_mode(vcpu) || !svm->nested.nested_run_pending)
		return;

	if (boot_cpu_has(X86_FEATURE_NRIPS) &&
	    !guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
		svm->vmcb->control.next_rip = kvm_rip_read(vcpu);

	if (svm->soft_int_injected) {
		svm->soft_int_csbase = svm->vmcb->save.cs.base;
		svm->soft_int_old_rip = kvm_rip_read(vcpu);
		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
			svm->soft_int_next_rip = kvm_rip_read(vcpu);
	}
}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 2/4] KVM: nSVM: Delay stuffing L2's current RIP into NextRIP until vCPU run
  2026-02-25  1:25             ` Sean Christopherson
@ 2026-02-25  1:42               ` Yosry Ahmed
  0 siblings, 0 replies; 10+ messages in thread
From: Yosry Ahmed @ 2026-02-25  1:42 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, stable

> > We discussed a helper before and you didn't like it, but that was in a
> > different context (a helper that combined normal and special cases).
> > WDYT?
>
> A helper would work.  svm_fixup_nested_rips() is good, the only flaw is the CS.base
> chunk, but I'm not sure I care enough about 32-bit to reject the name just because
> of that :-)
>
> That would make it easier to reduce indentation, e.g.
>
> static void svm_fixup_nested_rips(struct kvm_vcpu *vcpu)
> {
>         struct vcpu_svm *svm = to_svm(vcpu);
>
>         /*
>          * If nrips is supported in hardware but not exposed to L1, stuff the
>          * actual L2 RIP to emulate what a nrips=0 CPU would do (L1 is
>          * responsible for advancing RIP prior to injecting the event). Once L2
>          * runs after L1 executes VMRUN, NextRIP is updated by the CPU and/or
>          * KVM, and this is no longer needed.
>          *
>          * This is done here (as opposed to when preparing vmcb02) to use the
>          * most up-to-date value of RIP regardless of the order of restoring
>          * registers and nested state in the vCPU save+restore path.
>          *
>          * Simiarly, initialize svm->soft_int_* fields here to use the most
>          * up-to-date values of RIP and CS base, regardless of restore order.
>          */
>         if (!is_guest_mode(vcpu) || !svm->nested.nested_run_pending)
>                 return;
>
>         if (boot_cpu_has(X86_FEATURE_NRIPS) &&
>             !guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
>                 svm->vmcb->control.next_rip = kvm_rip_read(vcpu);
>
>         if (svm->soft_int_injected) {
>                 svm->soft_int_csbase = svm->vmcb->save.cs.base;
>                 svm->soft_int_old_rip = kvm_rip_read(vcpu);
>                 if (!guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
>                         svm->soft_int_next_rip = kvm_rip_read(vcpu);
>         }
> }

Looks good, thanks Sean!

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-02-25  1:42 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260223154636.116671-1-yosry@kernel.org>
2026-02-23 15:46 ` [PATCH v1 1/4] KVM: nSVM: Always use NextRIP as vmcb02's NextRIP after first L2 VMRUN Yosry Ahmed
2026-02-23 15:46 ` [PATCH v1 2/4] KVM: nSVM: Delay stuffing L2's current RIP into NextRIP until vCPU run Yosry Ahmed
2026-02-25  0:07   ` Yosry Ahmed
2026-02-25  0:56     ` Sean Christopherson
2026-02-25  1:00       ` Yosry Ahmed
2026-02-25  1:10         ` Sean Christopherson
2026-02-25  1:15           ` Yosry Ahmed
2026-02-25  1:25             ` Sean Christopherson
2026-02-25  1:42               ` Yosry Ahmed
2026-02-23 15:46 ` [PATCH v1 3/4] KVM: nSVM: Delay setting soft IRQ RIP tracking fields " Yosry Ahmed

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox