* [RFC PATCH 1/5] KVM: nSVM: Do not use L2's RIP for vmcb02's NextRIP after first L2 VMRUN
[not found] <20260212230751.1871720-1-yosry.ahmed@linux.dev>
@ 2026-02-12 23:07 ` Yosry Ahmed
2026-02-18 23:22 ` Sean Christopherson
2026-02-12 23:07 ` [RFC PATCH 2/5] KVM: nSVM: Use the correct RIP when restoring vmcb02's control area Yosry Ahmed
` (2 subsequent siblings)
3 siblings, 1 reply; 11+ messages in thread
From: Yosry Ahmed @ 2026-02-12 23:07 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable
For guests with NRIPS disabled, L1 does not provide NextRIP when running
an L2 with an injected soft interrupt, instead it advances L2's RIP
before running it. KVM uses L2's RIP as the NextRIP in vmcb02 to emulate
a CPU without NRIPS.
However, after L2 runs the first time, NextRIP will be updated by the
CPU and/or KVM, and L2's RIP is no longer the correct value to use in
vmcb02. Hence, after save/restore, do not use L2's RIP if a nested run
is not pending (i.e. L2 has run at least once), use the NextRIP value.
Fixes: cc440cdad5b7 ("KVM: nSVM: implement KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE")
CC: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
arch/x86/kvm/svm/nested.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index de90b104a0dd..eebbe00714e3 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -844,14 +844,18 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
vmcb02->control.event_inj_err = svm->nested.ctl.event_inj_err;
/*
- * next_rip is consumed on VMRUN as the return address pushed on the
+ * NextRIP is consumed on VMRUN as the return address pushed on the
* stack for injected soft exceptions/interrupts. If nrips is exposed
- * to L1, take it verbatim from vmcb12. If nrips is supported in
- * hardware but not exposed to L1, stuff the actual L2 RIP to emulate
- * what a nrips=0 CPU would do (L1 is responsible for advancing RIP
- * prior to injecting the event).
+ * to L1, take it verbatim from vmcb12.
+ *
+ * If nrips is supported in hardware but not exposed to L1, stuff the
+ * actual L2 RIP to emulate what a nrips=0 CPU would do (L1 is
+ * responsible for advancing RIP prior to injecting the event). This is
+ * only the case for the first L2 run after VMRUN. After that (e.g.
+ * during save/restore), NextRIP is updated by the CPU and/or KVM, and
+ * the value of the L2 RIP from vmcb12 should not be used.
*/
- if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS) || !svm->nested.nested_run_pending)
vmcb02->control.next_rip = svm->nested.ctl.next_rip;
else if (boot_cpu_has(X86_FEATURE_NRIPS))
vmcb02->control.next_rip = vmcb12_rip;
--
2.53.0.273.g2a3d683680-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH 2/5] KVM: nSVM: Use the correct RIP when restoring vmcb02's control area
[not found] <20260212230751.1871720-1-yosry.ahmed@linux.dev>
2026-02-12 23:07 ` [RFC PATCH 1/5] KVM: nSVM: Do not use L2's RIP for vmcb02's NextRIP after first L2 VMRUN Yosry Ahmed
@ 2026-02-12 23:07 ` Yosry Ahmed
2026-02-12 23:07 ` [RFC PATCH 3/5] KVM: nSVM: Move updating NextRIP and soft IRQ RIPs into a helper Yosry Ahmed
2026-02-12 23:07 ` [RFC PATCH 4/5] KVM: SVM: Recalculate nested RIPs after restoring REGS/SREGS Yosry Ahmed
3 siblings, 0 replies; 11+ messages in thread
From: Yosry Ahmed @ 2026-02-12 23:07 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable
In svm_set_nested_state(), the value of RIP from vmcb02 is passed into
nested_vmcb02_prepare_control(). However, even if RIP is restored with
KVM_SET_REGS prior to KVM_SET_NESTED_STATE, its value is not reflected
into the VMCB until the vCPU is run.
Use the value from KVM's cache instead, which is what KVM_SET_REGS
updates. Not that the passed RIP is still incorrect if KVM_SET_REGS is
not called prior to KVM_SET_NESTED_STATE, this will be fixed separately.
Fixes: cc440cdad5b7 ("KVM: nSVM: implement KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE")
CC: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
arch/x86/kvm/svm/nested.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index eebbe00714e3..aec17c80ed73 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1911,7 +1911,7 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
nested_copy_vmcb_control_to_cache(svm, ctl);
svm_switch_vmcb(svm, &svm->nested.vmcb02);
- nested_vmcb02_prepare_control(svm, svm->vmcb->save.rip, svm->vmcb->save.cs.base);
+ nested_vmcb02_prepare_control(svm, kvm_rip_read(vcpu), svm->vmcb->save.cs.base);
/*
* While the nested guest CR3 is already checked and set by
--
2.53.0.273.g2a3d683680-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH 3/5] KVM: nSVM: Move updating NextRIP and soft IRQ RIPs into a helper
[not found] <20260212230751.1871720-1-yosry.ahmed@linux.dev>
2026-02-12 23:07 ` [RFC PATCH 1/5] KVM: nSVM: Do not use L2's RIP for vmcb02's NextRIP after first L2 VMRUN Yosry Ahmed
2026-02-12 23:07 ` [RFC PATCH 2/5] KVM: nSVM: Use the correct RIP when restoring vmcb02's control area Yosry Ahmed
@ 2026-02-12 23:07 ` Yosry Ahmed
2026-02-12 23:07 ` [RFC PATCH 4/5] KVM: SVM: Recalculate nested RIPs after restoring REGS/SREGS Yosry Ahmed
3 siblings, 0 replies; 11+ messages in thread
From: Yosry Ahmed @ 2026-02-12 23:07 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable
Move the logic for updating NextRIP and soft interrupt tracking fields
out of nested_vmcb02_prepare_control() into a helper, in preparation for
re-using the same logic to fixup the RIPs during save/restore.
No functional change intended.
CC: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
arch/x86/kvm/svm/nested.c | 64 +++++++++++++++++++++++----------------
arch/x86/kvm/svm/svm.h | 2 ++
2 files changed, 40 insertions(+), 26 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index aec17c80ed73..af7a0113f269 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -741,6 +741,43 @@ static bool is_evtinj_nmi(u32 evtinj)
return type == SVM_EVTINJ_TYPE_NMI;
}
+void nested_vmcb02_prepare_rips(struct kvm_vcpu *vcpu, unsigned long csbase,
+ unsigned long rip)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ if (WARN_ON_ONCE(svm->vmcb != svm->nested.vmcb02.ptr))
+ return;
+
+ /*
+ * NextRIP is consumed on VMRUN as the return address pushed on the
+ * stack for injected soft exceptions/interrupts. If nrips is exposed
+ * to L1, take it verbatim.
+ *
+ * If nrips is supported in hardware but not exposed to L1, stuff the
+ * actual L2 RIP to emulate what a nrips=0 CPU would do (L1 is
+ * responsible for advancing RIP prior to injecting the event). This is
+ * only the case for the first L2 run after VMRUN. After that (e.g.
+ * during save/restore), NextRIP is updated by the CPU and/or KVM, and
+ * the value of the L2 RIP should not be used.
+ */
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS) || !svm->nested.nested_run_pending)
+ svm->vmcb->control.next_rip = svm->nested.ctl.next_rip;
+ else if (boot_cpu_has(X86_FEATURE_NRIPS))
+ svm->vmcb->control.next_rip = rip;
+
+ if (!is_evtinj_soft(svm->nested.ctl.event_inj))
+ return;
+
+ svm->soft_int_injected = true;
+ svm->soft_int_csbase = csbase;
+ svm->soft_int_old_rip = rip;
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
+ svm->soft_int_next_rip = svm->nested.ctl.next_rip;
+ else
+ svm->soft_int_next_rip = rip;
+}
+
static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
unsigned long vmcb12_rip,
unsigned long vmcb12_csbase)
@@ -843,33 +880,8 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
vmcb02->control.event_inj = svm->nested.ctl.event_inj;
vmcb02->control.event_inj_err = svm->nested.ctl.event_inj_err;
- /*
- * NextRIP is consumed on VMRUN as the return address pushed on the
- * stack for injected soft exceptions/interrupts. If nrips is exposed
- * to L1, take it verbatim from vmcb12.
- *
- * If nrips is supported in hardware but not exposed to L1, stuff the
- * actual L2 RIP to emulate what a nrips=0 CPU would do (L1 is
- * responsible for advancing RIP prior to injecting the event). This is
- * only the case for the first L2 run after VMRUN. After that (e.g.
- * during save/restore), NextRIP is updated by the CPU and/or KVM, and
- * the value of the L2 RIP from vmcb12 should not be used.
- */
- if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS) || !svm->nested.nested_run_pending)
- vmcb02->control.next_rip = svm->nested.ctl.next_rip;
- else if (boot_cpu_has(X86_FEATURE_NRIPS))
- vmcb02->control.next_rip = vmcb12_rip;
-
svm->nmi_l1_to_l2 = is_evtinj_nmi(vmcb02->control.event_inj);
- if (is_evtinj_soft(vmcb02->control.event_inj)) {
- svm->soft_int_injected = true;
- svm->soft_int_csbase = vmcb12_csbase;
- svm->soft_int_old_rip = vmcb12_rip;
- if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
- svm->soft_int_next_rip = svm->nested.ctl.next_rip;
- else
- svm->soft_int_next_rip = vmcb12_rip;
- }
+ nested_vmcb02_prepare_rips(vcpu, vmcb12_csbase, vmcb12_rip);
/* LBR_CTL_ENABLE_MASK is controlled by svm_update_lbrv() */
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index ebd7b36b1ceb..057281dda487 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -809,6 +809,8 @@ void nested_copy_vmcb_save_to_cache(struct vcpu_svm *svm,
void nested_sync_control_from_vmcb02(struct vcpu_svm *svm);
void nested_vmcb02_compute_g_pat(struct vcpu_svm *svm);
void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_vmcb);
+void nested_vmcb02_prepare_rips(struct kvm_vcpu *vcpu, unsigned long csbase,
+ unsigned long rip);
extern struct kvm_x86_nested_ops svm_nested_ops;
--
2.53.0.273.g2a3d683680-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH 4/5] KVM: SVM: Recalculate nested RIPs after restoring REGS/SREGS
[not found] <20260212230751.1871720-1-yosry.ahmed@linux.dev>
` (2 preceding siblings ...)
2026-02-12 23:07 ` [RFC PATCH 3/5] KVM: nSVM: Move updating NextRIP and soft IRQ RIPs into a helper Yosry Ahmed
@ 2026-02-12 23:07 ` Yosry Ahmed
2026-02-19 0:13 ` Sean Christopherson
3 siblings, 1 reply; 11+ messages in thread
From: Yosry Ahmed @ 2026-02-12 23:07 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable
In the save/restore path, if KVM_SET_NESTED_STATE is performed before
restoring REGS and/or SREGS , the values of CS and RIP used to
initialize the vmcb02's NextRIP and soft interrupt tracking RIPs are
incorrect.
Recalculate them up after CS is set, or REGS are restored. This is only
needed when a nested run is pending during restore. After L2 runs for
the first time, any soft interrupts injected by L1 are already delivered
or tracked by KVM separately for re-injection, so the CS and RIP values
are no longer relevant.
If KVM_SET_NESTED_STATE is performed after both REGS and SREGS are
restored, it will just overwrite the fields.
Fixes: cc440cdad5b7 ("KVM: nSVM: implement KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE")
CC: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
arch/x86/include/asm/kvm-x86-ops.h | 1 +
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/svm/nested.c | 4 +++-
arch/x86/kvm/svm/svm.c | 21 +++++++++++++++++++++
arch/x86/kvm/x86.c | 2 ++
5 files changed, 28 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index de709fb5bd76..7221517ea3e6 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -54,6 +54,7 @@ KVM_X86_OP(cache_reg)
KVM_X86_OP(get_rflags)
KVM_X86_OP(set_rflags)
KVM_X86_OP(get_if_flag)
+KVM_X86_OP_OPTIONAL(post_user_set_regs)
KVM_X86_OP(flush_tlb_all)
KVM_X86_OP(flush_tlb_current)
#if IS_ENABLED(CONFIG_HYPERV)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ff07c45e3c73..feadd9579159 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1789,6 +1789,7 @@ struct kvm_x86_ops {
unsigned long (*get_rflags)(struct kvm_vcpu *vcpu);
void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags);
bool (*get_if_flag)(struct kvm_vcpu *vcpu);
+ void (*post_user_set_regs)(struct kvm_vcpu *vcpu);
void (*flush_tlb_all)(struct kvm_vcpu *vcpu);
void (*flush_tlb_current)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index af7a0113f269..22680aa31c28 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -766,7 +766,9 @@ void nested_vmcb02_prepare_rips(struct kvm_vcpu *vcpu, unsigned long csbase,
else if (boot_cpu_has(X86_FEATURE_NRIPS))
svm->vmcb->control.next_rip = rip;
- if (!is_evtinj_soft(svm->nested.ctl.event_inj))
+ /* L1's injected events should be cleared after the first run of L2 */
+ if (!is_evtinj_soft(svm->nested.ctl.event_inj) ||
+ WARN_ON_ONCE(!svm->nested.nested_run_pending))
return;
svm->soft_int_injected = true;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 8f8bc863e214..5729da2b300d 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1477,6 +1477,24 @@ static bool svm_get_if_flag(struct kvm_vcpu *vcpu)
: kvm_get_rflags(vcpu) & X86_EFLAGS_IF;
}
+static void svm_fixup_nested_rips(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ /*
+ * In the save/restore path, if nested state is restored before
+ * RIP or CS, then fixing up the vmcb02 (and soft IRQ tracking) is
+ * needed. This is only the case if a nested run is pending (i.e. L2
+ * is yet to run after L1's VMRUN). Otherwise, any soft IRQ injected by
+ * L1 should have been delivered to L2 or is being tracked separately by
+ * KVM for re-injection. Similarly, NextRIP would have already been
+ * updated by the CPU and/or KVM.
+ */
+ if (svm->nested.nested_run_pending)
+ nested_vmcb02_prepare_rips(vcpu, svm->vmcb->save.cs.base,
+ kvm_rip_read(vcpu));
+}
+
static void svm_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
{
kvm_register_mark_available(vcpu, reg);
@@ -1826,6 +1844,8 @@ static void svm_set_segment(struct kvm_vcpu *vcpu,
if (seg == VCPU_SREG_SS)
/* This is symmetric with svm_get_segment() */
svm->vmcb->save.cpl = (var->dpl & 3);
+ else if (seg == VCPU_SREG_CS)
+ svm_fixup_nested_rips(vcpu);
vmcb_mark_dirty(svm->vmcb, VMCB_SEG);
}
@@ -5172,6 +5192,7 @@ struct kvm_x86_ops svm_x86_ops __initdata = {
.get_rflags = svm_get_rflags,
.set_rflags = svm_set_rflags,
.get_if_flag = svm_get_if_flag,
+ .post_user_set_regs = svm_fixup_nested_rips,
.flush_tlb_all = svm_flush_tlb_all,
.flush_tlb_current = svm_flush_tlb_current,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index db3f393192d9..35fe1d337273 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12112,6 +12112,8 @@ static void __set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
kvm_rip_write(vcpu, regs->rip);
kvm_set_rflags(vcpu, regs->rflags | X86_EFLAGS_FIXED);
+ kvm_x86_call(post_user_set_regs)(vcpu);
+
vcpu->arch.exception.pending = false;
vcpu->arch.exception_vmexit.pending = false;
--
2.53.0.273.g2a3d683680-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 1/5] KVM: nSVM: Do not use L2's RIP for vmcb02's NextRIP after first L2 VMRUN
2026-02-12 23:07 ` [RFC PATCH 1/5] KVM: nSVM: Do not use L2's RIP for vmcb02's NextRIP after first L2 VMRUN Yosry Ahmed
@ 2026-02-18 23:22 ` Sean Christopherson
2026-02-18 23:38 ` Yosry Ahmed
0 siblings, 1 reply; 11+ messages in thread
From: Sean Christopherson @ 2026-02-18 23:22 UTC (permalink / raw)
To: Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel, stable
On Thu, Feb 12, 2026, Yosry Ahmed wrote:
> For guests with NRIPS disabled, L1 does not provide NextRIP when running
> an L2 with an injected soft interrupt, instead it advances L2's RIP
> before running it. KVM uses L2's RIP as the NextRIP in vmcb02 to emulate
Should "L2's RIP" be "vmcb12's RIP"? The "L2's RIP" terminology gets really
confusing in the next paragraph, as NextRIP _is_ L2's (Next)RIP. Hmm, or maybe
"current RIP"? I.e. "current RIP" vs. "NextRIP"?
> a CPU without NRIPS.
>
> However, after L2 runs the first time, NextRIP will be updated by the
> CPU and/or KVM, and L2's RIP is no longer the correct value to use in
> vmcb02. Hence, after save/restore, do not use L2's RIP if a nested run
> is not pending (i.e. L2 has run at least once), use the NextRIP value.
Too many negatives in this last sentence, it can just be (I think):
Hence, after save/restore, use the current RIP if and only if a nested
run is pending, otherwise use NextRIP.
> Fixes: cc440cdad5b7 ("KVM: nSVM: implement KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE")
> CC: stable@vger.kernel.org
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
> arch/x86/kvm/svm/nested.c | 16 ++++++++++------
> 1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index de90b104a0dd..eebbe00714e3 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -844,14 +844,18 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
> vmcb02->control.event_inj_err = svm->nested.ctl.event_inj_err;
>
> /*
> - * next_rip is consumed on VMRUN as the return address pushed on the
> + * NextRIP is consumed on VMRUN as the return address pushed on the
> * stack for injected soft exceptions/interrupts. If nrips is exposed
> - * to L1, take it verbatim from vmcb12. If nrips is supported in
> - * hardware but not exposed to L1, stuff the actual L2 RIP to emulate
> - * what a nrips=0 CPU would do (L1 is responsible for advancing RIP
> - * prior to injecting the event).
> + * to L1, take it verbatim from vmcb12.
> + *
> + * If nrips is supported in hardware but not exposed to L1, stuff the
> + * actual L2 RIP to emulate what a nrips=0 CPU would do (L1 is
> + * responsible for advancing RIP prior to injecting the event). This is
> + * only the case for the first L2 run after VMRUN. After that (e.g.
> + * during save/restore), NextRIP is updated by the CPU and/or KVM, and
> + * the value of the L2 RIP from vmcb12 should not be used.
> */
> - if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> + if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS) || !svm->nested.nested_run_pending)
This is technically wrong since KVM doesn't require NRIPS. Maybe this?
if (boot_cpu_has(X86_FEATURE_NRIPS)) {
if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS) ||
!svm->nested.nested_run_pending)
vmcb02->control.next_rip = svm->nested.ctl.next_rip;
else
vmcb02->control.next_rip = vmcb12_rip;
}
> vmcb02->control.next_rip = svm->nested.ctl.next_rip;
> else if (boot_cpu_has(X86_FEATURE_NRIPS))
> vmcb02->control.next_rip = vmcb12_rip;
> --
> 2.53.0.273.g2a3d683680-goog
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 1/5] KVM: nSVM: Do not use L2's RIP for vmcb02's NextRIP after first L2 VMRUN
2026-02-18 23:22 ` Sean Christopherson
@ 2026-02-18 23:38 ` Yosry Ahmed
0 siblings, 0 replies; 11+ messages in thread
From: Yosry Ahmed @ 2026-02-18 23:38 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, stable
On Wed, Feb 18, 2026 at 03:22:44PM -0800, Sean Christopherson wrote:
> On Thu, Feb 12, 2026, Yosry Ahmed wrote:
> > For guests with NRIPS disabled, L1 does not provide NextRIP when running
> > an L2 with an injected soft interrupt, instead it advances L2's RIP
> > before running it. KVM uses L2's RIP as the NextRIP in vmcb02 to emulate
>
> Should "L2's RIP" be "vmcb12's RIP"? The "L2's RIP" terminology gets really
> confusing in the next paragraph, as NextRIP _is_ L2's (Next)RIP. Hmm, or maybe
> "current RIP"? I.e. "current RIP" vs. "NextRIP"?
I intentionally avoided mentioning vmcb12, because after save/restore
the RIP value we pass into nested_vmcb02_prepare_control() is no longer
what's in vmcb12. I can go with "current RIP".
>
> > a CPU without NRIPS.
> >
> > However, after L2 runs the first time, NextRIP will be updated by the
> > CPU and/or KVM, and L2's RIP is no longer the correct value to use in
> > vmcb02. Hence, after save/restore, do not use L2's RIP if a nested run
> > is not pending (i.e. L2 has run at least once), use the NextRIP value.
>
> Too many negatives in this last sentence, it can just be (I think):
>
> Hence, after save/restore, use the current RIP if and only if a nested
> run is pending, otherwise use NextRIP.
Looks good.
>
> > Fixes: cc440cdad5b7 ("KVM: nSVM: implement KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE")
> > CC: stable@vger.kernel.org
> > Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> > ---
> > arch/x86/kvm/svm/nested.c | 16 ++++++++++------
> > 1 file changed, 10 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > index de90b104a0dd..eebbe00714e3 100644
> > --- a/arch/x86/kvm/svm/nested.c
> > +++ b/arch/x86/kvm/svm/nested.c
> > @@ -844,14 +844,18 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
> > vmcb02->control.event_inj_err = svm->nested.ctl.event_inj_err;
> >
> > /*
> > - * next_rip is consumed on VMRUN as the return address pushed on the
> > + * NextRIP is consumed on VMRUN as the return address pushed on the
> > * stack for injected soft exceptions/interrupts. If nrips is exposed
> > - * to L1, take it verbatim from vmcb12. If nrips is supported in
> > - * hardware but not exposed to L1, stuff the actual L2 RIP to emulate
> > - * what a nrips=0 CPU would do (L1 is responsible for advancing RIP
> > - * prior to injecting the event).
> > + * to L1, take it verbatim from vmcb12.
> > + *
> > + * If nrips is supported in hardware but not exposed to L1, stuff the
> > + * actual L2 RIP to emulate what a nrips=0 CPU would do (L1 is
> > + * responsible for advancing RIP prior to injecting the event). This is
> > + * only the case for the first L2 run after VMRUN. After that (e.g.
> > + * during save/restore), NextRIP is updated by the CPU and/or KVM, and
> > + * the value of the L2 RIP from vmcb12 should not be used.
> > */
> > - if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> > + if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS) || !svm->nested.nested_run_pending)
>
> This is technically wrong since KVM doesn't require NRIPS. Maybe this?
I hallucinated that making nested depend on nrips was merge, i.e. this
patch: https://lore.kernel.org/kvm/f0302382cf45d7a9527b4aebbfe694bbcfa7aff5.1651440202.git.maciej.szmigiero@oracle.com/.
>
> if (boot_cpu_has(X86_FEATURE_NRIPS)) {
I wonder if that's necessary, but I cannot find anything in the APM
about whether the CPU ignores NextRIP if it's not supported. It doesn't
even mention how to use it when injecting soft IRQs, only that it is
needed to properly inject then. Sigh.
> if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS) ||
> !svm->nested.nested_run_pending)
> vmcb02->control.next_rip = svm->nested.ctl.next_rip;
> else
> vmcb02->control.next_rip = vmcb12_rip;
> }
>
>
> > vmcb02->control.next_rip = svm->nested.ctl.next_rip;
> > else if (boot_cpu_has(X86_FEATURE_NRIPS))
> > vmcb02->control.next_rip = vmcb12_rip;
> > --
> > 2.53.0.273.g2a3d683680-goog
> >
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 4/5] KVM: SVM: Recalculate nested RIPs after restoring REGS/SREGS
2026-02-12 23:07 ` [RFC PATCH 4/5] KVM: SVM: Recalculate nested RIPs after restoring REGS/SREGS Yosry Ahmed
@ 2026-02-19 0:13 ` Sean Christopherson
2026-02-19 0:26 ` Yosry Ahmed
0 siblings, 1 reply; 11+ messages in thread
From: Sean Christopherson @ 2026-02-19 0:13 UTC (permalink / raw)
To: Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel, stable
On Thu, Feb 12, 2026, Yosry Ahmed wrote:
> In the save/restore path, if KVM_SET_NESTED_STATE is performed before
> restoring REGS and/or SREGS , the values of CS and RIP used to
> initialize the vmcb02's NextRIP and soft interrupt tracking RIPs are
> incorrect.
>
> Recalculate them up after CS is set, or REGS are restored. This is only
> needed when a nested run is pending during restore. After L2 runs for
> the first time, any soft interrupts injected by L1 are already delivered
> or tracked by KVM separately for re-injection, so the CS and RIP values
> are no longer relevant.
>
> If KVM_SET_NESTED_STATE is performed after both REGS and SREGS are
> restored, it will just overwrite the fields.
Apparently I suggested this general idea, but ugh. :-)
> static void svm_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
> {
> kvm_register_mark_available(vcpu, reg);
> @@ -1826,6 +1844,8 @@ static void svm_set_segment(struct kvm_vcpu *vcpu,
> if (seg == VCPU_SREG_SS)
> /* This is symmetric with svm_get_segment() */
> svm->vmcb->save.cpl = (var->dpl & 3);
> + else if (seg == VCPU_SREG_CS)
> + svm_fixup_nested_rips(vcpu);
>
> vmcb_mark_dirty(svm->vmcb, VMCB_SEG);
> }
> @@ -5172,6 +5192,7 @@ struct kvm_x86_ops svm_x86_ops __initdata = {
> .get_rflags = svm_get_rflags,
> .set_rflags = svm_set_rflags,
> .get_if_flag = svm_get_if_flag,
> + .post_user_set_regs = svm_fixup_nested_rips,
>
> .flush_tlb_all = svm_flush_tlb_all,
> .flush_tlb_current = svm_flush_tlb_current,
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index db3f393192d9..35fe1d337273 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12112,6 +12112,8 @@ static void __set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
> kvm_rip_write(vcpu, regs->rip);
> kvm_set_rflags(vcpu, regs->rflags | X86_EFLAGS_FIXED);
>
> + kvm_x86_call(post_user_set_regs)(vcpu);
I especially don't love this callback. Aside from adding a new kvm_x86_ops hook,
I don't like that _any_ CS change triggers a fixup, whereas only userspace writes
to RIP trigger a fixup. That _should_ be a moot point, because neither CS nor RIP
should change while nested_run_pending is true, but I dislike the asymmetry.
I was going to suggest we instead react to RIP being dirty, but what if we take
it a step further? Somewhat of a crazy idea, but what happens if we simply wait
until just before VMRUN to set soft_int_csbase, soft_int_old_rip, and
soft_int_next_rip (when the guest doesn't have NRIPS)?
E.g. after patch 2, completely untested...
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index aec17c80ed73..6fc1b2e212d2 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -863,12 +863,9 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
svm->nmi_l1_to_l2 = is_evtinj_nmi(vmcb02->control.event_inj);
if (is_evtinj_soft(vmcb02->control.event_inj)) {
svm->soft_int_injected = true;
- svm->soft_int_csbase = vmcb12_csbase;
- svm->soft_int_old_rip = vmcb12_rip;
+
if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
svm->soft_int_next_rip = svm->nested.ctl.next_rip;
- else
- svm->soft_int_next_rip = vmcb12_rip;
}
/* LBR_CTL_ENABLE_MASK is controlled by svm_update_lbrv() */
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 8f8bc863e214..358ec940ffc9 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4322,6 +4322,14 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
return EXIT_FASTPATH_EXIT_USERSPACE;
}
+ if (is_guest_mode(vcpu) && svm->nested.nested_run_pending &&
+ svm->soft_int_injected) {
+ svm->soft_int_csbase = svm->vmcb->save.cs.base;
+ svm->soft_int_old_rip = kvm_rip_read(vcpu);
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
+ svm->soft_int_next_rip = kvm_rip_read(vcpu);
+ }
+
sync_lapic_to_cr8(vcpu);
if (unlikely(svm->asid != svm->vmcb->control.asid)) {
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 4/5] KVM: SVM: Recalculate nested RIPs after restoring REGS/SREGS
2026-02-19 0:13 ` Sean Christopherson
@ 2026-02-19 0:26 ` Yosry Ahmed
2026-02-20 17:07 ` Sean Christopherson
0 siblings, 1 reply; 11+ messages in thread
From: Yosry Ahmed @ 2026-02-19 0:26 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, stable
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index db3f393192d9..35fe1d337273 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -12112,6 +12112,8 @@ static void __set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
> > kvm_rip_write(vcpu, regs->rip);
> > kvm_set_rflags(vcpu, regs->rflags | X86_EFLAGS_FIXED);
> >
> > + kvm_x86_call(post_user_set_regs)(vcpu);
>
> I especially don't love this callback. Aside from adding a new kvm_x86_ops hook,
> I don't like that _any_ CS change triggers a fixup, whereas only userspace writes
> to RIP trigger a fixup. That _should_ be a moot point, because neither CS nor RIP
> should change while nested_run_pending is true, but I dislike the asymmetry.
>
> I was going to suggest we instead react to RIP being dirty, but what if we take
> it a step further? Somewhat of a crazy idea, but what happens if we simply wait
> until just before VMRUN to set soft_int_csbase, soft_int_old_rip, and
> soft_int_next_rip (when the guest doesn't have NRIPS)?
I generally like this idea. I thought about it for a moment but was
worried about how much of a behavioral change this introduces, but that
was probably before I convinced myself the problem only exists with
nested_run_pending.
That being said..
>
> E.g. after patch 2, completely untested...
>
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index aec17c80ed73..6fc1b2e212d2 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -863,12 +863,9 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
Above the context lines we have:
/*
* next_rip is consumed on VMRUN as the return address pushed on the
* stack for injected soft exceptions/interrupts. If nrips is exposed
* to L1, take it verbatim from vmcb12. If nrips is supported in
* hardware but not exposed to L1, stuff the actual L2 RIP to emulate
* what a nrips=0 CPU would do (L1 is responsible for advancing RIP
* prior to injecting the event).
*/
if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
vmcb02->control.next_rip = svm->nested.ctl.next_rip;
else if (boot_cpu_has(X86_FEATURE_NRIPS))
vmcb02->control.next_rip = vmcb12_rip;
The same bug affects vmcb02->control.next_rip when the guest doesn't
have NRIPS. I think we don't want to move part of the vmcb02
initialization before VMRUN too. We can keep the initialization here and
overwrite it before VMRUN if needed, but that's just also ugh..
> svm->nmi_l1_to_l2 = is_evtinj_nmi(vmcb02->control.event_inj);
> if (is_evtinj_soft(vmcb02->control.event_inj)) {
> svm->soft_int_injected = true;
> - svm->soft_int_csbase = vmcb12_csbase;
> - svm->soft_int_old_rip = vmcb12_rip;
> +
> if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> svm->soft_int_next_rip = svm->nested.ctl.next_rip;
Why not move this too?
> - else
> - svm->soft_int_next_rip = vmcb12_rip;
> }
>
> /* LBR_CTL_ENABLE_MASK is controlled by svm_update_lbrv() */
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 8f8bc863e214..358ec940ffc9 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4322,6 +4322,14 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
> return EXIT_FASTPATH_EXIT_USERSPACE;
> }
>
> + if (is_guest_mode(vcpu) && svm->nested.nested_run_pending &&
> + svm->soft_int_injected) {
> + svm->soft_int_csbase = svm->vmcb->save.cs.base;
> + svm->soft_int_old_rip = kvm_rip_read(vcpu);
> + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> + svm->soft_int_next_rip = kvm_rip_read(vcpu);
> + }
> +
I generally dislike adding more is_guest_mode() stuff in svm_vcpu_run(),
maybe we can refactor them later to pre-run and post-run nested
callbacks? Anyway, not a big deal, definitely an improvement over the
current patch assuming we can figure out how to fix next_rip.
> sync_lapic_to_cr8(vcpu);
>
> if (unlikely(svm->asid != svm->vmcb->control.asid)) {
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 4/5] KVM: SVM: Recalculate nested RIPs after restoring REGS/SREGS
2026-02-19 0:26 ` Yosry Ahmed
@ 2026-02-20 17:07 ` Sean Christopherson
2026-02-20 20:27 ` Yosry Ahmed
0 siblings, 1 reply; 11+ messages in thread
From: Sean Christopherson @ 2026-02-20 17:07 UTC (permalink / raw)
To: Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel, stable
On Thu, Feb 19, 2026, Yosry Ahmed wrote:
> > E.g. after patch 2, completely untested...
> >
> > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > index aec17c80ed73..6fc1b2e212d2 100644
> > --- a/arch/x86/kvm/svm/nested.c
> > +++ b/arch/x86/kvm/svm/nested.c
> > @@ -863,12 +863,9 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
>
> Above the context lines we have:
>
> /*
> * next_rip is consumed on VMRUN as the return address pushed on the
> * stack for injected soft exceptions/interrupts. If nrips is exposed
> * to L1, take it verbatim from vmcb12. If nrips is supported in
> * hardware but not exposed to L1, stuff the actual L2 RIP to emulate
> * what a nrips=0 CPU would do (L1 is responsible for advancing RIP
> * prior to injecting the event).
> */
> if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> vmcb02->control.next_rip = svm->nested.ctl.next_rip;
> else if (boot_cpu_has(X86_FEATURE_NRIPS))
> vmcb02->control.next_rip = vmcb12_rip;
>
> The same bug affects vmcb02->control.next_rip when the guest doesn't
> have NRIPS. I think we don't want to move part of the vmcb02
> initialization before VMRUN too. We can keep the initialization here and
> overwrite it before VMRUN if needed, but that's just also ugh..
Aha! I knew I was missing something, but I couldn't quite get my brain to figure
out what.
I don't have a super strong preference as to copying the code or moving it
wholesale. Though I would say if the pre-VMRUN logic is _identical_ (and I think
it is?), then we move it, and simply update the comment in
nested_vmcb02_prepare_control() to call that out.
> > svm->nmi_l1_to_l2 = is_evtinj_nmi(vmcb02->control.event_inj);
> > if (is_evtinj_soft(vmcb02->control.event_inj)) {
> > svm->soft_int_injected = true;
> > - svm->soft_int_csbase = vmcb12_csbase;
> > - svm->soft_int_old_rip = vmcb12_rip;
> > +
> > if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> > svm->soft_int_next_rip = svm->nested.ctl.next_rip;
>
> Why not move this too?
For the same reason I think we should keep
if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
vmcb02->control.next_rip = svm->nested.ctl.next_rip;
where it is. When NRIPS is exposed to the guest, the incoming nested state is
the one and only source of truth. By keeping the code different, we'd effectively
be documenting that the host.NRIPS+!guest.NRIPS case is the anomaly.
> > - else
> > - svm->soft_int_next_rip = vmcb12_rip;
> > }
> >
> > /* LBR_CTL_ENABLE_MASK is controlled by svm_update_lbrv() */
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 8f8bc863e214..358ec940ffc9 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -4322,6 +4322,14 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
> > return EXIT_FASTPATH_EXIT_USERSPACE;
> > }
> >
> > + if (is_guest_mode(vcpu) && svm->nested.nested_run_pending &&
> > + svm->soft_int_injected) {
> > + svm->soft_int_csbase = svm->vmcb->save.cs.base;
> > + svm->soft_int_old_rip = kvm_rip_read(vcpu);
> > + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> > + svm->soft_int_next_rip = kvm_rip_read(vcpu);
> > + }
> > +
>
> I generally dislike adding more is_guest_mode() stuff in svm_vcpu_run(),
> maybe we can refactor them later to pre-run and post-run nested
> callbacks? Anyway, not a big deal, definitely an improvement over the
> current patch assuming we can figure out how to fix next_rip.
I don't love it either, but I want to (a) avoid unnecessarily overwriting the
fields, e.g. if KVM never actually does VMRUN and (b) minimize the probability
of consuming a bad RIP.
In practice, I would expect the nested_run_pending check to be correctly predicted
the overwhelming majority of the time, i.e. I don't anticipate performance issues
due to putting the code in the hot path.
If we want to try and move the update out of svm_vcpu_run(), then we shouldn't
need generic pre/post callbacks for nested, svm_prepare_switch_to_guest() is the
right fit.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 4/5] KVM: SVM: Recalculate nested RIPs after restoring REGS/SREGS
2026-02-20 17:07 ` Sean Christopherson
@ 2026-02-20 20:27 ` Yosry Ahmed
2026-02-20 22:50 ` Sean Christopherson
0 siblings, 1 reply; 11+ messages in thread
From: Yosry Ahmed @ 2026-02-20 20:27 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, stable
> > > svm->nmi_l1_to_l2 = is_evtinj_nmi(vmcb02->control.event_inj);
> > > if (is_evtinj_soft(vmcb02->control.event_inj)) {
> > > svm->soft_int_injected = true;
> > > - svm->soft_int_csbase = vmcb12_csbase;
> > > - svm->soft_int_old_rip = vmcb12_rip;
> > > +
> > > if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> > > svm->soft_int_next_rip = svm->nested.ctl.next_rip;
> >
> > Why not move this too?
>
> For the same reason I think we should keep
>
> if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> vmcb02->control.next_rip = svm->nested.ctl.next_rip;
>
> where it is. When NRIPS is exposed to the guest, the incoming nested state is
> the one and only source of truth. By keeping the code different, we'd effectively
> be documenting that the host.NRIPS+!guest.NRIPS case is the anomaly.
I see, makes sense. I like the fact that we should be able to completely
drop vmcb12_rip and vmcb12_csbase with this (unless we want to start
using it for the bus_lock_rip check), which will also remove the need
for patch 2.
I guess the only hiccup will be patch 1. We'll end up with this code in
nested_vmcb02_prepare_control():
if boot_cpu_has(X86_FEATURE_NRIPS)) {
if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS) ||
!svm->nested.nested_run_pending)
vmcb02->control.next_rip = svm->nested.ctl.next_rip;
}
, and this code pre-VMRUN:
if boot_cpu_has(X86_FEATURE_NRIPS)) {
if (!guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS) &&
svm->nested.nested_run_pending)
vmcb02->control.next_rip = kvm_rip_read(vcpu);
}
It seems a bit fragile that the 'if' is somewhere and the 'else' (or the
opposite condition) is somewhere else. They could get out of sync. Maybe
a helper will make this better:
/* Huge comment */
bool nested_svm_use_vmcb12_next_rip(struct kvm_vcpu *vcpu)
{
return guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS) ||
!svm->nested.nested_run_pending;
}
or maybe the name makes more sense as the negative:
nested_svm_use_rip_as_next_rip()?
I don't like both names..
Aha! Maybe it's actually better to have the helper set the NextRip
directly?
/* Huge comment */
u64 nested_vmcb02_update_next_rip(struct kvm_vcpu *vcpu)
{
u64 next_rip;
if (WARN_ON_ONCE(svm->vmcb != svm->nested.vmcb02.ptr))
return;
if (!boot_cpu_has(X86_FEATURE_NRIPS))
return;
if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS) ||
!svm->nested.nested_run_pending)
next_rip = svm->nested.ctl.next_rip;
else
next_rip = kvm_rip_read(vcpu);
svm->vmcb->control.next_rip = next_rip;
}
Then, we just call this from nested_vmcb02_prepare_control() and
svm_vcpu_run() (with the soft IRQ stuff). In some cases we'll put the
wrong thing in vmcb02 and then fix it up later, but I think that's fine
(that's what the patch is doing anyway).
However, we lose the fact that the whole thing is guarded by
nested_run_pending, so performance in svm_vcpu_run() could suffer. We
could leave it all guarded by nested_run_pending, as
nested_vmcb02_update_next_rip() should do nothing in svm_vcpu_run()
otherwise, but then the caller is depending on implementation details of
the helper.
Maybe just put it in svm_prepare_switch_to_guest() to begin with and not
guard it with nested_run_pending?
>
> > > - else
> > > - svm->soft_int_next_rip = vmcb12_rip;
> > > }
> > >
> > > /* LBR_CTL_ENABLE_MASK is controlled by svm_update_lbrv() */
> > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > index 8f8bc863e214..358ec940ffc9 100644
> > > --- a/arch/x86/kvm/svm/svm.c
> > > +++ b/arch/x86/kvm/svm/svm.c
> > > @@ -4322,6 +4322,14 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
> > > return EXIT_FASTPATH_EXIT_USERSPACE;
> > > }
> > >
> > > + if (is_guest_mode(vcpu) && svm->nested.nested_run_pending &&
> > > + svm->soft_int_injected) {
> > > + svm->soft_int_csbase = svm->vmcb->save.cs.base;
> > > + svm->soft_int_old_rip = kvm_rip_read(vcpu);
> > > + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> > > + svm->soft_int_next_rip = kvm_rip_read(vcpu);
> > > + }
> > > +
> >
> > I generally dislike adding more is_guest_mode() stuff in svm_vcpu_run(),
> > maybe we can refactor them later to pre-run and post-run nested
> > callbacks? Anyway, not a big deal, definitely an improvement over the
> > current patch assuming we can figure out how to fix next_rip.
>
> I don't love it either, but I want to (a) avoid unnecessarily overwriting the
> fields, e.g. if KVM never actually does VMRUN and (b) minimize the probability
> of consuming a bad RIP.
>
> In practice, I would expect the nested_run_pending check to be correctly predicted
> the overwhelming majority of the time, i.e. I don't anticipate performance issues
> due to putting the code in the hot path.
>
> If we want to try and move the update out of svm_vcpu_run(), then we shouldn't
> need generic pre/post callbacks for nested, svm_prepare_switch_to_guest() is the
> right fit.
I prefer putting it in svm_prepare_switch_to_guest() to begin with,
because it gives us more flexibility (see above).
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 4/5] KVM: SVM: Recalculate nested RIPs after restoring REGS/SREGS
2026-02-20 20:27 ` Yosry Ahmed
@ 2026-02-20 22:50 ` Sean Christopherson
0 siblings, 0 replies; 11+ messages in thread
From: Sean Christopherson @ 2026-02-20 22:50 UTC (permalink / raw)
To: Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel, stable
On Fri, Feb 20, 2026, Yosry Ahmed wrote:
> > > > svm->nmi_l1_to_l2 = is_evtinj_nmi(vmcb02->control.event_inj);
> > > > if (is_evtinj_soft(vmcb02->control.event_inj)) {
> > > > svm->soft_int_injected = true;
> > > > - svm->soft_int_csbase = vmcb12_csbase;
> > > > - svm->soft_int_old_rip = vmcb12_rip;
> > > > +
> > > > if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> > > > svm->soft_int_next_rip = svm->nested.ctl.next_rip;
> > >
> > > Why not move this too?
> >
> > For the same reason I think we should keep
> >
> > if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> > vmcb02->control.next_rip = svm->nested.ctl.next_rip;
> >
> > where it is. When NRIPS is exposed to the guest, the incoming nested state is
> > the one and only source of truth. By keeping the code different, we'd effectively
> > be documenting that the host.NRIPS+!guest.NRIPS case is the anomaly.
>
> I see, makes sense. I like the fact that we should be able to completely
> drop vmcb12_rip and vmcb12_csbase with this (unless we want to start
> using it for the bus_lock_rip check), which will also remove the need
> for patch 2.
...
> It seems a bit fragile that the 'if' is somewhere and the 'else' (or the
> opposite condition) is somewhere else. They could get out of sync.
> Maybe
> a helper will make this better:
>
> /* Huge comment */
> bool nested_svm_use_vmcb12_next_rip(struct kvm_vcpu *vcpu)
> {
> return guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS) ||
> !svm->nested.nested_run_pending;
> }
>
> or maybe the name makes more sense as the negative:
> nested_svm_use_rip_as_next_rip()?
>
> I don't like both names..
>
> Aha! Maybe it's actually better to have the helper set the NextRip
> directly?
>
> /* Huge comment */
> u64 nested_vmcb02_update_next_rip(struct kvm_vcpu *vcpu)
> {
> u64 next_rip;
>
> if (WARN_ON_ONCE(svm->vmcb != svm->nested.vmcb02.ptr))
> return;
>
> if (!boot_cpu_has(X86_FEATURE_NRIPS))
> return;
>
> if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS) ||
> !svm->nested.nested_run_pending)
> next_rip = svm->nested.ctl.next_rip;
> else
> next_rip = kvm_rip_read(vcpu);
>
> svm->vmcb->control.next_rip = next_rip;
> }
>
> Then, we just call this from nested_vmcb02_prepare_control() and
> svm_vcpu_run() (with the soft IRQ stuff). In some cases we'll put the
> wrong thing in vmcb02 and then fix it up later, but I think that's fine
> (that's what the patch is doing anyway).
>
> However, we lose the fact that the whole thing is guarded by
> nested_run_pending, so performance in svm_vcpu_run() could suffer. We
> could leave it all guarded by nested_run_pending, as
> nested_vmcb02_update_next_rip() should do nothing in svm_vcpu_run()
> otherwise, but then the caller is depending on implementation details of
> the helper.
>
> Maybe just put it in svm_prepare_switch_to_guest() to begin with and not
> guard it with nested_run_pending?
Of all the options, I still like open coding the two, even though it means the
"else" will be separated from the "if", followed by the
nested_svm_use_vmcb12_next_rip() helper option. I straight up dislike
nested_vmcb02_update_next_rip() because it buries simple concepts behind a
complex function (copy vmcb12->next_rip to vmcb02->next_rip is at its core a
*very* simple idea). Oh, and it unnecessarily rewrites vmcb02->next_rip in the
common case where the guest has NRIPs.
I'm a-ok with the separate "if" and "else", because it's not a pure "else", and
that's the entire reason we have a mess in the first place: we wrote an if-else,
but what is actually necessitate by KVM's uAPI is two separate (but tightly
coupled) if-statements.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-02-20 22:50 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260212230751.1871720-1-yosry.ahmed@linux.dev>
2026-02-12 23:07 ` [RFC PATCH 1/5] KVM: nSVM: Do not use L2's RIP for vmcb02's NextRIP after first L2 VMRUN Yosry Ahmed
2026-02-18 23:22 ` Sean Christopherson
2026-02-18 23:38 ` Yosry Ahmed
2026-02-12 23:07 ` [RFC PATCH 2/5] KVM: nSVM: Use the correct RIP when restoring vmcb02's control area Yosry Ahmed
2026-02-12 23:07 ` [RFC PATCH 3/5] KVM: nSVM: Move updating NextRIP and soft IRQ RIPs into a helper Yosry Ahmed
2026-02-12 23:07 ` [RFC PATCH 4/5] KVM: SVM: Recalculate nested RIPs after restoring REGS/SREGS Yosry Ahmed
2026-02-19 0:13 ` Sean Christopherson
2026-02-19 0:26 ` Yosry Ahmed
2026-02-20 17:07 ` Sean Christopherson
2026-02-20 20:27 ` Yosry Ahmed
2026-02-20 22:50 ` Sean Christopherson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox