* [PATCH v3 1/8] KVM: nSVM: Sync NextRIP to cached vmcb12 after VMRUN of L2
[not found] <20260225005950.3739782-1-yosry@kernel.org>
@ 2026-02-25 0:59 ` Yosry Ahmed
2026-02-25 0:59 ` [PATCH v3 2/8] KVM: nSVM: Sync interrupt shadow " Yosry Ahmed
` (3 subsequent siblings)
4 siblings, 0 replies; 13+ messages in thread
From: Yosry Ahmed @ 2026-02-25 0:59 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable
After VMRUN in guest mode, nested_sync_control_from_vmcb02() syncs
fields written by the CPU from vmcb02 to the cached vmcb12. This is
because the cached vmcb12 is used as the authoritative copy of some of
the controls, and is the payload when saving/restoring nested state.
NextRIP is also written by the CPU (in some cases) after VMRUN, but is
not sync'd to the cached vmcb12. As a result, it is corrupted after
save/restore (replaced by the original value written by L1 on nested
VMRUN). This could cause problems for both KVM (e.g. when injecting a
soft IRQ) or L1 (e.g. when using NextRIP to advance RIP after emulating
an instruction).
Fix this by sync'ing NextRIP to the cache after VMRUN of L2, but only
after completing interrupts (not in nested_sync_control_from_vmcb02()),
as KVM may update NextRIP (e.g. when re-injecting a soft IRQ).
Fixes: cc440cdad5b7 ("KVM: nSVM: implement KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE")
CC: stable@vger.kernel.org
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/svm/svm.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 8f8bc863e2143..07f096758f34f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4435,6 +4435,16 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
svm_complete_interrupts(vcpu);
+ /*
+ * Update the cache after completing interrupts to get an accurate
+ * NextRIP, e.g. when re-injecting a soft interrupt.
+ *
+ * FIXME: Rework svm_get_nested_state() to not pull data from the
+ * cache (except for maybe int_ctl).
+ */
+ if (is_guest_mode(vcpu))
+ svm->nested.ctl.next_rip = svm->vmcb->control.next_rip;
+
return svm_exit_handlers_fastpath(vcpu);
}
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v3 2/8] KVM: nSVM: Sync interrupt shadow to cached vmcb12 after VMRUN of L2
[not found] <20260225005950.3739782-1-yosry@kernel.org>
2026-02-25 0:59 ` [PATCH v3 1/8] KVM: nSVM: Sync NextRIP to cached vmcb12 after VMRUN of L2 Yosry Ahmed
@ 2026-02-25 0:59 ` Yosry Ahmed
2026-02-27 17:53 ` Yosry Ahmed
2026-02-25 0:59 ` [PATCH v3 5/8] KVM: nSVM: Always use NextRIP as vmcb02's NextRIP after first L2 VMRUN Yosry Ahmed
` (2 subsequent siblings)
4 siblings, 1 reply; 13+ messages in thread
From: Yosry Ahmed @ 2026-02-25 0:59 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable
After VMRUN in guest mode, nested_sync_control_from_vmcb02() syncs
fields written by the CPU from vmcb02 to the cached vmcb12. This is
because the cached vmcb12 is used as the authoritative copy of some of
the controls, and is the payload when saving/restoring nested state.
int_state is also written by the CPU, specifically bit 0 (i.e.
SVM_INTERRUPT_SHADOW_MASK) for nested VMs, but it is not sync'd to
cached vmcb12. This does not cause a problem if KVM_SET_NESTED_STATE
preceeds KVM_SET_VCPU_EVENTS in the restore path, as an interrupt shadow
would be correctly restored to vmcb02 (KVM_SET_VCPU_EVENTS overwrites
what KVM_SET_NESTED_STATE restored in int_state).
However, if KVM_SET_VCPU_EVENTS preceeds KVM_SET_NESTED_STATE, an
interrupt shadow would be restored into vmcb01 instead of vmcb02. This
would mostly be benign for L1 (delays an interrupt), but not for L2. For
L2, the vCPU could hang (e.g. if a wakeup interrupt is delivered before
a HLT that should have been in an interrupt shadow).
Sync int_state to the cached vmcb12 in nested_sync_control_from_vmcb02()
to avoid this problem. With that, KVM_SET_NESTED_STATE restores the
correct interrupt shadow state, and if KVM_SET_VCPU_EVENTS follows it
would overwrite it with the same value.
Fixes: cc440cdad5b7 ("KVM: nSVM: implement KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE")
CC: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/svm/nested.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index de90b104a0dd5..9909ff237e5ca 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -521,6 +521,7 @@ void nested_sync_control_from_vmcb02(struct vcpu_svm *svm)
u32 mask;
svm->nested.ctl.event_inj = svm->vmcb->control.event_inj;
svm->nested.ctl.event_inj_err = svm->vmcb->control.event_inj_err;
+ svm->nested.ctl.int_state = svm->vmcb->control.int_state;
/* Only a few fields of int_ctl are written by the processor. */
mask = V_IRQ_MASK | V_TPR_MASK;
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v3 5/8] KVM: nSVM: Always use NextRIP as vmcb02's NextRIP after first L2 VMRUN
[not found] <20260225005950.3739782-1-yosry@kernel.org>
2026-02-25 0:59 ` [PATCH v3 1/8] KVM: nSVM: Sync NextRIP to cached vmcb12 after VMRUN of L2 Yosry Ahmed
2026-02-25 0:59 ` [PATCH v3 2/8] KVM: nSVM: Sync interrupt shadow " Yosry Ahmed
@ 2026-02-25 0:59 ` Yosry Ahmed
2026-03-04 17:30 ` Yosry Ahmed
2026-02-25 0:59 ` [PATCH v3 6/8] KVM: nSVM: Delay stuffing L2's current RIP into NextRIP until vCPU run Yosry Ahmed
2026-02-25 0:59 ` [PATCH v3 7/8] KVM: nSVM: Delay setting soft IRQ RIP tracking fields " Yosry Ahmed
4 siblings, 1 reply; 13+ messages in thread
From: Yosry Ahmed @ 2026-02-25 0:59 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable
For guests with NRIPS disabled, L1 does not provide NextRIP when running
an L2 with an injected soft interrupt, instead it advances the current RIP
before running it. KVM uses the current RIP as the NextRIP in vmcb02 to
emulate a CPU without NRIPS.
However, after L2 runs the first time, NextRIP will be updated by the
CPU and/or KVM, and the current RIP is no longer the correct value to
use in vmcb02. Hence, after save/restore, use the current RIP if and
only if a nested run is pending, otherwise use NextRIP.
Fixes: cc440cdad5b7 ("KVM: nSVM: implement KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE")
CC: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/svm/nested.c | 25 ++++++++++++++++---------
1 file changed, 16 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 9909ff237e5ca..f3ed1bdbe76c9 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -845,17 +845,24 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
vmcb02->control.event_inj_err = svm->nested.ctl.event_inj_err;
/*
- * next_rip is consumed on VMRUN as the return address pushed on the
+ * NextRIP is consumed on VMRUN as the return address pushed on the
* stack for injected soft exceptions/interrupts. If nrips is exposed
- * to L1, take it verbatim from vmcb12. If nrips is supported in
- * hardware but not exposed to L1, stuff the actual L2 RIP to emulate
- * what a nrips=0 CPU would do (L1 is responsible for advancing RIP
- * prior to injecting the event).
+ * to L1, take it verbatim from vmcb12.
+ *
+ * If nrips is supported in hardware but not exposed to L1, stuff the
+ * actual L2 RIP to emulate what a nrips=0 CPU would do (L1 is
+ * responsible for advancing RIP prior to injecting the event). This is
+ * only the case for the first L2 run after VMRUN. After that (e.g.
+ * during save/restore), NextRIP is updated by the CPU and/or KVM, and
+ * the value of the L2 RIP from vmcb12 should not be used.
*/
- if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
- vmcb02->control.next_rip = svm->nested.ctl.next_rip;
- else if (boot_cpu_has(X86_FEATURE_NRIPS))
- vmcb02->control.next_rip = vmcb12_rip;
+ if (boot_cpu_has(X86_FEATURE_NRIPS)) {
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS) ||
+ !svm->nested.nested_run_pending)
+ vmcb02->control.next_rip = svm->nested.ctl.next_rip;
+ else
+ vmcb02->control.next_rip = vmcb12_rip;
+ }
svm->nmi_l1_to_l2 = is_evtinj_nmi(vmcb02->control.event_inj);
if (is_evtinj_soft(vmcb02->control.event_inj)) {
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v3 6/8] KVM: nSVM: Delay stuffing L2's current RIP into NextRIP until vCPU run
[not found] <20260225005950.3739782-1-yosry@kernel.org>
` (2 preceding siblings ...)
2026-02-25 0:59 ` [PATCH v3 5/8] KVM: nSVM: Always use NextRIP as vmcb02's NextRIP after first L2 VMRUN Yosry Ahmed
@ 2026-02-25 0:59 ` Yosry Ahmed
2026-02-25 0:59 ` [PATCH v3 7/8] KVM: nSVM: Delay setting soft IRQ RIP tracking fields " Yosry Ahmed
4 siblings, 0 replies; 13+ messages in thread
From: Yosry Ahmed @ 2026-02-25 0:59 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable
For guests with NRIPS disabled, L1 does not provide NextRIP when running
an L2 with an injected soft interrupt, instead it advances L2's RIP
before running it. KVM uses L2's current RIP as the NextRIP in vmcb02 to
emulate a CPU without NRIPS.
However, in svm_set_nested_state(), the value used for L2's current RIP
comes from vmcb02, which is just whatever the vCPU had in vmcb02 before
restoring nested state (zero on a freshly created vCPU). Passing the
cached RIP value instead (i.e. kvm_rip_read()) would only fix the issue
if registers are restored before nested state.
Instead, split the logic of setting NextRIP in vmcb02. Handle the
'normal' case of initializing vmcb02's NextRIP using NextRIP from vmcb12
(or KVM_GET_NESTED_STATE's payload) in nested_vmcb02_prepare_control().
Delay the special case of stuffing L2's current RIP into vmcb02's
NextRIP until shortly before the vCPU is run, to make sure the most
up-to-date value of RIP is used regardless of KVM_SET_REGS and
KVM_SET_NESTED_STATE's relative ordering.
Fixes: cc440cdad5b7 ("KVM: nSVM: implement KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE")
CC: stable@vger.kernel.org
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/svm/nested.c | 25 ++++++++-----------------
arch/x86/kvm/svm/svm.c | 17 +++++++++++++++++
2 files changed, 25 insertions(+), 17 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index f3ed1bdbe76c9..dcd4a8eb156f2 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -845,24 +845,15 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
vmcb02->control.event_inj_err = svm->nested.ctl.event_inj_err;
/*
- * NextRIP is consumed on VMRUN as the return address pushed on the
- * stack for injected soft exceptions/interrupts. If nrips is exposed
- * to L1, take it verbatim from vmcb12.
- *
- * If nrips is supported in hardware but not exposed to L1, stuff the
- * actual L2 RIP to emulate what a nrips=0 CPU would do (L1 is
- * responsible for advancing RIP prior to injecting the event). This is
- * only the case for the first L2 run after VMRUN. After that (e.g.
- * during save/restore), NextRIP is updated by the CPU and/or KVM, and
- * the value of the L2 RIP from vmcb12 should not be used.
+ * If nrips is exposed to L1, take NextRIP as-is. Otherwise, L1
+ * advances L2's RIP before VMRUN instead of using NextRIP. KVM will
+ * stuff the current RIP as vmcb02's NextRIP before L2 is run. After
+ * the first run of L2 (e.g. after save+restore), NextRIP is updated by
+ * the CPU and/or KVM and should be used regardless of L1's support.
*/
- if (boot_cpu_has(X86_FEATURE_NRIPS)) {
- if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS) ||
- !svm->nested.nested_run_pending)
- vmcb02->control.next_rip = svm->nested.ctl.next_rip;
- else
- vmcb02->control.next_rip = vmcb12_rip;
- }
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS) ||
+ !svm->nested.nested_run_pending)
+ vmcb02->control.next_rip = svm->nested.ctl.next_rip;
svm->nmi_l1_to_l2 = is_evtinj_nmi(vmcb02->control.event_inj);
if (is_evtinj_soft(vmcb02->control.event_inj)) {
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 07f096758f34f..ded4372f2d499 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3660,6 +3660,23 @@ static int pre_svm_run(struct kvm_vcpu *vcpu)
if (svm->current_vmcb->asid_generation != sd->asid_generation)
new_asid(svm, sd);
+ /*
+ * If nrips is supported in hardware but not exposed to L1, stuff the
+ * actual L2 RIP to emulate what a nrips=0 CPU would do (L1 is
+ * responsible for advancing RIP prior to injecting the event). Once L2
+ * runs after L1 executes VMRUN, NextRIP is updated by the CPU and/or
+ * KVM, and this is no longer needed.
+ *
+ * This is done here (as opposed to when preparing vmcb02) to use the
+ * most up-to-date value of RIP regardless of the order of restoring
+ * registers and nested state in the vCPU save+restore path.
+ */
+ if (is_guest_mode(vcpu) && svm->nested.nested_run_pending) {
+ if (boot_cpu_has(X86_FEATURE_NRIPS) &&
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
+ svm->vmcb->control.next_rip = kvm_rip_read(vcpu);
+ }
+
return 0;
}
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v3 7/8] KVM: nSVM: Delay setting soft IRQ RIP tracking fields until vCPU run
[not found] <20260225005950.3739782-1-yosry@kernel.org>
` (3 preceding siblings ...)
2026-02-25 0:59 ` [PATCH v3 6/8] KVM: nSVM: Delay stuffing L2's current RIP into NextRIP until vCPU run Yosry Ahmed
@ 2026-02-25 0:59 ` Yosry Ahmed
2026-03-04 17:50 ` Yosry Ahmed
4 siblings, 1 reply; 13+ messages in thread
From: Yosry Ahmed @ 2026-02-25 0:59 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, stable
In the save+restore path, when restoring nested state, the values of RIP
and CS base passed into nested_vmcb02_prepare_control() are mostly
incorrect. They are both pulled from the vmcb02. For CS base, the value
is only correct if system regs are restored before nested state. The
value of RIP is whatever the vCPU had in vmcb02 before restoring nested
state (zero on a freshly created vCPU).
Instead, take a similar approach to NextRIP, and delay initializing the
RIP tracking fields until shortly before the vCPU is run, to make sure
the most up-to-date values of RIP and CS base are used regardless of
KVM_SET_SREGS, KVM_SET_REGS, and KVM_SET_NESTED_STATE's relative
ordering.
Fixes: cc440cdad5b7 ("KVM: nSVM: implement KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE")
CC: stable@vger.kernel.org
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/svm/nested.c | 17 ++++++++---------
arch/x86/kvm/svm/svm.c | 10 ++++++++++
2 files changed, 18 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index dcd4a8eb156f2..4499241b4e401 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -742,9 +742,7 @@ static bool is_evtinj_nmi(u32 evtinj)
return type == SVM_EVTINJ_TYPE_NMI;
}
-static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
- unsigned long vmcb12_rip,
- unsigned long vmcb12_csbase)
+static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
{
u32 int_ctl_vmcb01_bits = V_INTR_MASKING_MASK;
u32 int_ctl_vmcb12_bits = V_TPR_MASK | V_IRQ_INJECTION_BITS_MASK;
@@ -856,14 +854,15 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
vmcb02->control.next_rip = svm->nested.ctl.next_rip;
svm->nmi_l1_to_l2 = is_evtinj_nmi(vmcb02->control.event_inj);
+
+ /*
+ * soft_int_csbase, soft_int_old_rip, and soft_int_next_rip (if L1
+ * doesn't have NRIPS) are initialized later, before the vCPU is run.
+ */
if (is_evtinj_soft(vmcb02->control.event_inj)) {
svm->soft_int_injected = true;
- svm->soft_int_csbase = vmcb12_csbase;
- svm->soft_int_old_rip = vmcb12_rip;
if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
svm->soft_int_next_rip = svm->nested.ctl.next_rip;
- else
- svm->soft_int_next_rip = vmcb12_rip;
}
/* LBR_CTL_ENABLE_MASK is controlled by svm_update_lbrv() */
@@ -961,7 +960,7 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa,
nested_svm_copy_common_state(svm->vmcb01.ptr, svm->nested.vmcb02.ptr);
svm_switch_vmcb(svm, &svm->nested.vmcb02);
- nested_vmcb02_prepare_control(svm, vmcb12->save.rip, vmcb12->save.cs.base);
+ nested_vmcb02_prepare_control(svm);
nested_vmcb02_prepare_save(svm, vmcb12);
ret = nested_svm_load_cr3(&svm->vcpu, svm->nested.save.cr3,
@@ -1906,7 +1905,7 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
nested_copy_vmcb_control_to_cache(svm, ctl);
svm_switch_vmcb(svm, &svm->nested.vmcb02);
- nested_vmcb02_prepare_control(svm, svm->vmcb->save.rip, svm->vmcb->save.cs.base);
+ nested_vmcb02_prepare_control(svm);
/*
* While the nested guest CR3 is already checked and set by
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index ded4372f2d499..7948e601ea784 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3670,11 +3670,21 @@ static int pre_svm_run(struct kvm_vcpu *vcpu)
* This is done here (as opposed to when preparing vmcb02) to use the
* most up-to-date value of RIP regardless of the order of restoring
* registers and nested state in the vCPU save+restore path.
+ *
+ * Simiarly, initialize svm->soft_int_* fields here to use the most
+ * up-to-date values of RIP and CS base, regardless of restore order.
*/
if (is_guest_mode(vcpu) && svm->nested.nested_run_pending) {
if (boot_cpu_has(X86_FEATURE_NRIPS) &&
!guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
svm->vmcb->control.next_rip = kvm_rip_read(vcpu);
+
+ if (svm->soft_int_injected) {
+ svm->soft_int_csbase = svm->vmcb->save.cs.base;
+ svm->soft_int_old_rip = kvm_rip_read(vcpu);
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
+ svm->soft_int_next_rip = kvm_rip_read(vcpu);
+ }
}
return 0;
--
2.53.0.414.gf7e9f6c205-goog
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v3 2/8] KVM: nSVM: Sync interrupt shadow to cached vmcb12 after VMRUN of L2
2026-02-25 0:59 ` [PATCH v3 2/8] KVM: nSVM: Sync interrupt shadow " Yosry Ahmed
@ 2026-02-27 17:53 ` Yosry Ahmed
2026-03-02 20:41 ` Sean Christopherson
0 siblings, 1 reply; 13+ messages in thread
From: Yosry Ahmed @ 2026-02-27 17:53 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, stable
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index de90b104a0dd5..9909ff237e5ca 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -521,6 +521,7 @@ void nested_sync_control_from_vmcb02(struct vcpu_svm *svm)
> u32 mask;
> svm->nested.ctl.event_inj = svm->vmcb->control.event_inj;
> svm->nested.ctl.event_inj_err = svm->vmcb->control.event_inj_err;
> + svm->nested.ctl.int_state = svm->vmcb->control.int_state;
FWIW, this is an incomplete fix. KVM might update the interrupt shadow
after this point through __svm_skip_emulated_instruction(), and that
won't be captured in svm->nested.ctl.int_state.
I think it's not worth fixing that case too, and any further effort
should go toward teaching KVM_GET_NESTED_STATE to pull state from the
correct place as discussed earlier.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 2/8] KVM: nSVM: Sync interrupt shadow to cached vmcb12 after VMRUN of L2
2026-02-27 17:53 ` Yosry Ahmed
@ 2026-03-02 20:41 ` Sean Christopherson
0 siblings, 0 replies; 13+ messages in thread
From: Sean Christopherson @ 2026-03-02 20:41 UTC (permalink / raw)
To: Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel, stable
On Fri, Feb 27, 2026, Yosry Ahmed wrote:
> > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > index de90b104a0dd5..9909ff237e5ca 100644
> > --- a/arch/x86/kvm/svm/nested.c
> > +++ b/arch/x86/kvm/svm/nested.c
> > @@ -521,6 +521,7 @@ void nested_sync_control_from_vmcb02(struct vcpu_svm *svm)
> > u32 mask;
> > svm->nested.ctl.event_inj = svm->vmcb->control.event_inj;
> > svm->nested.ctl.event_inj_err = svm->vmcb->control.event_inj_err;
> > + svm->nested.ctl.int_state = svm->vmcb->control.int_state;
>
> FWIW, this is an incomplete fix. KVM might update the interrupt shadow
> after this point through __svm_skip_emulated_instruction(), and that
> won't be captured in svm->nested.ctl.int_state.
>
> I think it's not worth fixing that case too, and any further effort
> should go toward teaching KVM_GET_NESTED_STATE to pull state from the
> correct place as discussed earlier.
+1. FWIW, AMD doesn't have a MOV/POP SS shadow, so practically speaking the only
impact is that an STI shadow could get extended for one extra instruction. Unless
the guest is doing e.g. "sti; hlt; cli", that's a non-issue.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 5/8] KVM: nSVM: Always use NextRIP as vmcb02's NextRIP after first L2 VMRUN
2026-02-25 0:59 ` [PATCH v3 5/8] KVM: nSVM: Always use NextRIP as vmcb02's NextRIP after first L2 VMRUN Yosry Ahmed
@ 2026-03-04 17:30 ` Yosry Ahmed
2026-03-04 17:39 ` Sean Christopherson
0 siblings, 1 reply; 13+ messages in thread
From: Yosry Ahmed @ 2026-03-04 17:30 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, stable
On Tue, Feb 24, 2026 at 5:00 PM Yosry Ahmed <yosry@kernel.org> wrote:
>
> For guests with NRIPS disabled, L1 does not provide NextRIP when running
> an L2 with an injected soft interrupt, instead it advances the current RIP
> before running it. KVM uses the current RIP as the NextRIP in vmcb02 to
> emulate a CPU without NRIPS.
>
> However, after L2 runs the first time, NextRIP will be updated by the
> CPU and/or KVM, and the current RIP is no longer the correct value to
> use in vmcb02. Hence, after save/restore, use the current RIP if and
> only if a nested run is pending, otherwise use NextRIP.
>
> Fixes: cc440cdad5b7 ("KVM: nSVM: implement KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE")
> CC: stable@vger.kernel.org
> Signed-off-by: Yosry Ahmed <yosry@kernel.org>
> ---
> arch/x86/kvm/svm/nested.c | 25 ++++++++++++++++---------
> 1 file changed, 16 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 9909ff237e5ca..f3ed1bdbe76c9 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -845,17 +845,24 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
> vmcb02->control.event_inj_err = svm->nested.ctl.event_inj_err;
>
> /*
> - * next_rip is consumed on VMRUN as the return address pushed on the
> + * NextRIP is consumed on VMRUN as the return address pushed on the
> * stack for injected soft exceptions/interrupts. If nrips is exposed
> - * to L1, take it verbatim from vmcb12. If nrips is supported in
> - * hardware but not exposed to L1, stuff the actual L2 RIP to emulate
> - * what a nrips=0 CPU would do (L1 is responsible for advancing RIP
> - * prior to injecting the event).
> + * to L1, take it verbatim from vmcb12.
> + *
> + * If nrips is supported in hardware but not exposed to L1, stuff the
> + * actual L2 RIP to emulate what a nrips=0 CPU would do (L1 is
> + * responsible for advancing RIP prior to injecting the event). This is
> + * only the case for the first L2 run after VMRUN. After that (e.g.
> + * during save/restore), NextRIP is updated by the CPU and/or KVM, and
> + * the value of the L2 RIP from vmcb12 should not be used.
> */
> - if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> - vmcb02->control.next_rip = svm->nested.ctl.next_rip;
> - else if (boot_cpu_has(X86_FEATURE_NRIPS))
> - vmcb02->control.next_rip = vmcb12_rip;
> + if (boot_cpu_has(X86_FEATURE_NRIPS)) {
> + if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS) ||
> + !svm->nested.nested_run_pending)
> + vmcb02->control.next_rip = svm->nested.ctl.next_rip;
> + else
> + vmcb02->control.next_rip = vmcb12_rip;
> + }
This should probably also apply to soft_int_next_rip below the context
lines. Otherwise after patch 7 we keep it uninitialized if the guest
doesn't have NRIPs and !nested_run_pending.
>
> svm->nmi_l1_to_l2 = is_evtinj_nmi(vmcb02->control.event_inj);
> if (is_evtinj_soft(vmcb02->control.event_inj)) {
> --
> 2.53.0.414.gf7e9f6c205-goog
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 5/8] KVM: nSVM: Always use NextRIP as vmcb02's NextRIP after first L2 VMRUN
2026-03-04 17:30 ` Yosry Ahmed
@ 2026-03-04 17:39 ` Sean Christopherson
2026-03-04 17:41 ` Yosry Ahmed
0 siblings, 1 reply; 13+ messages in thread
From: Sean Christopherson @ 2026-03-04 17:39 UTC (permalink / raw)
To: Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel, stable
On Wed, Mar 04, 2026, Yosry Ahmed wrote:
> On Tue, Feb 24, 2026 at 5:00 PM Yosry Ahmed <yosry@kernel.org> wrote:
> >
> > For guests with NRIPS disabled, L1 does not provide NextRIP when running
> > an L2 with an injected soft interrupt, instead it advances the current RIP
> > before running it. KVM uses the current RIP as the NextRIP in vmcb02 to
> > emulate a CPU without NRIPS.
> >
> > However, after L2 runs the first time, NextRIP will be updated by the
> > CPU and/or KVM, and the current RIP is no longer the correct value to
> > use in vmcb02. Hence, after save/restore, use the current RIP if and
> > only if a nested run is pending, otherwise use NextRIP.
> >
> > Fixes: cc440cdad5b7 ("KVM: nSVM: implement KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE")
> > CC: stable@vger.kernel.org
> > Signed-off-by: Yosry Ahmed <yosry@kernel.org>
> > ---
> > arch/x86/kvm/svm/nested.c | 25 ++++++++++++++++---------
> > 1 file changed, 16 insertions(+), 9 deletions(-)
> >
> > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > index 9909ff237e5ca..f3ed1bdbe76c9 100644
> > --- a/arch/x86/kvm/svm/nested.c
> > +++ b/arch/x86/kvm/svm/nested.c
> > @@ -845,17 +845,24 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
> > vmcb02->control.event_inj_err = svm->nested.ctl.event_inj_err;
> >
> > /*
> > - * next_rip is consumed on VMRUN as the return address pushed on the
> > + * NextRIP is consumed on VMRUN as the return address pushed on the
> > * stack for injected soft exceptions/interrupts. If nrips is exposed
> > - * to L1, take it verbatim from vmcb12. If nrips is supported in
> > - * hardware but not exposed to L1, stuff the actual L2 RIP to emulate
> > - * what a nrips=0 CPU would do (L1 is responsible for advancing RIP
> > - * prior to injecting the event).
> > + * to L1, take it verbatim from vmcb12.
> > + *
> > + * If nrips is supported in hardware but not exposed to L1, stuff the
> > + * actual L2 RIP to emulate what a nrips=0 CPU would do (L1 is
> > + * responsible for advancing RIP prior to injecting the event). This is
> > + * only the case for the first L2 run after VMRUN. After that (e.g.
> > + * during save/restore), NextRIP is updated by the CPU and/or KVM, and
> > + * the value of the L2 RIP from vmcb12 should not be used.
> > */
> > - if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> > - vmcb02->control.next_rip = svm->nested.ctl.next_rip;
> > - else if (boot_cpu_has(X86_FEATURE_NRIPS))
> > - vmcb02->control.next_rip = vmcb12_rip;
> > + if (boot_cpu_has(X86_FEATURE_NRIPS)) {
> > + if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS) ||
> > + !svm->nested.nested_run_pending)
> > + vmcb02->control.next_rip = svm->nested.ctl.next_rip;
> > + else
> > + vmcb02->control.next_rip = vmcb12_rip;
> > + }
>
> This should probably also apply to soft_int_next_rip below the context
> lines. Otherwise after patch 7 we keep it uninitialized if the guest
> doesn't have NRIPs and !nested_run_pending.
That's fine though, isn't it? Because in that case, doesn't the soft int have to
comein through svm_update_soft_interrupt_rip()? Ugh, no because nSVM migrates
control.event_inj.
IIUC, we want to end up with this?
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 03b201fe9613..d12647080051 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -925,7 +925,8 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
*/
if (is_evtinj_soft(vmcb02->control.event_inj)) {
svm->soft_int_injected = true;
- if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS) ||
+ !svm->nested.nested_run_pending)
svm->soft_int_next_rip = vmcb12_ctrl->next_rip;
}
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v3 5/8] KVM: nSVM: Always use NextRIP as vmcb02's NextRIP after first L2 VMRUN
2026-03-04 17:39 ` Sean Christopherson
@ 2026-03-04 17:41 ` Yosry Ahmed
0 siblings, 0 replies; 13+ messages in thread
From: Yosry Ahmed @ 2026-03-04 17:41 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, stable
[..]
> > > @@ -845,17 +845,24 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
> > > vmcb02->control.event_inj_err = svm->nested.ctl.event_inj_err;
> > >
> > > /*
> > > - * next_rip is consumed on VMRUN as the return address pushed on the
> > > + * NextRIP is consumed on VMRUN as the return address pushed on the
> > > * stack for injected soft exceptions/interrupts. If nrips is exposed
> > > - * to L1, take it verbatim from vmcb12. If nrips is supported in
> > > - * hardware but not exposed to L1, stuff the actual L2 RIP to emulate
> > > - * what a nrips=0 CPU would do (L1 is responsible for advancing RIP
> > > - * prior to injecting the event).
> > > + * to L1, take it verbatim from vmcb12.
> > > + *
> > > + * If nrips is supported in hardware but not exposed to L1, stuff the
> > > + * actual L2 RIP to emulate what a nrips=0 CPU would do (L1 is
> > > + * responsible for advancing RIP prior to injecting the event). This is
> > > + * only the case for the first L2 run after VMRUN. After that (e.g.
> > > + * during save/restore), NextRIP is updated by the CPU and/or KVM, and
> > > + * the value of the L2 RIP from vmcb12 should not be used.
> > > */
> > > - if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> > > - vmcb02->control.next_rip = svm->nested.ctl.next_rip;
> > > - else if (boot_cpu_has(X86_FEATURE_NRIPS))
> > > - vmcb02->control.next_rip = vmcb12_rip;
> > > + if (boot_cpu_has(X86_FEATURE_NRIPS)) {
> > > + if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS) ||
> > > + !svm->nested.nested_run_pending)
> > > + vmcb02->control.next_rip = svm->nested.ctl.next_rip;
> > > + else
> > > + vmcb02->control.next_rip = vmcb12_rip;
> > > + }
> >
> > This should probably also apply to soft_int_next_rip below the context
> > lines. Otherwise after patch 7 we keep it uninitialized if the guest
> > doesn't have NRIPs and !nested_run_pending.
>
> That's fine though, isn't it? Because in that case, doesn't the soft int have to
> comein through svm_update_soft_interrupt_rip()? Ugh, no because nSVM migrates
> control.event_inj.
>
> IIUC, we want to end up with this?
Yes.
>
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 03b201fe9613..d12647080051 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -925,7 +925,8 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
> */
> if (is_evtinj_soft(vmcb02->control.event_inj)) {
> svm->soft_int_injected = true;
> - if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> + if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS) ||
> + !svm->nested.nested_run_pending)
> svm->soft_int_next_rip = vmcb12_ctrl->next_rip;
> }
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 7/8] KVM: nSVM: Delay setting soft IRQ RIP tracking fields until vCPU run
2026-02-25 0:59 ` [PATCH v3 7/8] KVM: nSVM: Delay setting soft IRQ RIP tracking fields " Yosry Ahmed
@ 2026-03-04 17:50 ` Yosry Ahmed
2026-03-04 18:34 ` Sean Christopherson
0 siblings, 1 reply; 13+ messages in thread
From: Yosry Ahmed @ 2026-03-04 17:50 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, stable
On Tue, Feb 24, 2026 at 5:00 PM Yosry Ahmed <yosry@kernel.org> wrote:
>
> In the save+restore path, when restoring nested state, the values of RIP
> and CS base passed into nested_vmcb02_prepare_control() are mostly
> incorrect. They are both pulled from the vmcb02. For CS base, the value
> is only correct if system regs are restored before nested state. The
> value of RIP is whatever the vCPU had in vmcb02 before restoring nested
> state (zero on a freshly created vCPU).
>
> Instead, take a similar approach to NextRIP, and delay initializing the
> RIP tracking fields until shortly before the vCPU is run, to make sure
> the most up-to-date values of RIP and CS base are used regardless of
> KVM_SET_SREGS, KVM_SET_REGS, and KVM_SET_NESTED_STATE's relative
> ordering.
>
> Fixes: cc440cdad5b7 ("KVM: nSVM: implement KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE")
> CC: stable@vger.kernel.org
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Yosry Ahmed <yosry@kernel.org>
> ---
> arch/x86/kvm/svm/nested.c | 17 ++++++++---------
> arch/x86/kvm/svm/svm.c | 10 ++++++++++
> 2 files changed, 18 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index dcd4a8eb156f2..4499241b4e401 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -742,9 +742,7 @@ static bool is_evtinj_nmi(u32 evtinj)
> return type == SVM_EVTINJ_TYPE_NMI;
> }
>
> -static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
> - unsigned long vmcb12_rip,
> - unsigned long vmcb12_csbase)
> +static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
> {
> u32 int_ctl_vmcb01_bits = V_INTR_MASKING_MASK;
> u32 int_ctl_vmcb12_bits = V_TPR_MASK | V_IRQ_INJECTION_BITS_MASK;
> @@ -856,14 +854,15 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
> vmcb02->control.next_rip = svm->nested.ctl.next_rip;
>
> svm->nmi_l1_to_l2 = is_evtinj_nmi(vmcb02->control.event_inj);
> +
> + /*
> + * soft_int_csbase, soft_int_old_rip, and soft_int_next_rip (if L1
> + * doesn't have NRIPS) are initialized later, before the vCPU is run.
> + */
> if (is_evtinj_soft(vmcb02->control.event_inj)) {
> svm->soft_int_injected = true;
> - svm->soft_int_csbase = vmcb12_csbase;
> - svm->soft_int_old_rip = vmcb12_rip;
> if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> svm->soft_int_next_rip = svm->nested.ctl.next_rip;
> - else
> - svm->soft_int_next_rip = vmcb12_rip;
> }
>
> /* LBR_CTL_ENABLE_MASK is controlled by svm_update_lbrv() */
> @@ -961,7 +960,7 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa,
> nested_svm_copy_common_state(svm->vmcb01.ptr, svm->nested.vmcb02.ptr);
>
> svm_switch_vmcb(svm, &svm->nested.vmcb02);
> - nested_vmcb02_prepare_control(svm, vmcb12->save.rip, vmcb12->save.cs.base);
> + nested_vmcb02_prepare_control(svm);
> nested_vmcb02_prepare_save(svm, vmcb12);
>
> ret = nested_svm_load_cr3(&svm->vcpu, svm->nested.save.cr3,
> @@ -1906,7 +1905,7 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
> nested_copy_vmcb_control_to_cache(svm, ctl);
>
> svm_switch_vmcb(svm, &svm->nested.vmcb02);
> - nested_vmcb02_prepare_control(svm, svm->vmcb->save.rip, svm->vmcb->save.cs.base);
> + nested_vmcb02_prepare_control(svm);
>
> /*
> * While the nested guest CR3 is already checked and set by
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index ded4372f2d499..7948e601ea784 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3670,11 +3670,21 @@ static int pre_svm_run(struct kvm_vcpu *vcpu)
> * This is done here (as opposed to when preparing vmcb02) to use the
> * most up-to-date value of RIP regardless of the order of restoring
> * registers and nested state in the vCPU save+restore path.
> + *
> + * Simiarly, initialize svm->soft_int_* fields here to use the most
> + * up-to-date values of RIP and CS base, regardless of restore order.
> */
> if (is_guest_mode(vcpu) && svm->nested.nested_run_pending) {
> if (boot_cpu_has(X86_FEATURE_NRIPS) &&
> !guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> svm->vmcb->control.next_rip = kvm_rip_read(vcpu);
> +
> + if (svm->soft_int_injected) {
> + svm->soft_int_csbase = svm->vmcb->save.cs.base;
> + svm->soft_int_old_rip = kvm_rip_read(vcpu);
> + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> + svm->soft_int_next_rip = kvm_rip_read(vcpu);
> + }
AI found a bug here. These fields will be left uninitialized if we
cancel injection before pre_svm_run() (e.g. due to
kvm_vcpu_exit_request()). I was going to suggest moving this to
pre-run, but this leaves a larger gap where RIP can be updated from
under us. Sean has a better fixup in progress.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 7/8] KVM: nSVM: Delay setting soft IRQ RIP tracking fields until vCPU run
2026-03-04 17:50 ` Yosry Ahmed
@ 2026-03-04 18:34 ` Sean Christopherson
2026-03-04 18:39 ` Yosry Ahmed
0 siblings, 1 reply; 13+ messages in thread
From: Sean Christopherson @ 2026-03-04 18:34 UTC (permalink / raw)
To: Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel, stable
On Wed, Mar 04, 2026, Yosry Ahmed wrote:
> On Tue, Feb 24, 2026 at 5:00 PM Yosry Ahmed <yosry@kernel.org> wrote:
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index ded4372f2d499..7948e601ea784 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -3670,11 +3670,21 @@ static int pre_svm_run(struct kvm_vcpu *vcpu)
> > * This is done here (as opposed to when preparing vmcb02) to use the
> > * most up-to-date value of RIP regardless of the order of restoring
> > * registers and nested state in the vCPU save+restore path.
> > + *
> > + * Simiarly, initialize svm->soft_int_* fields here to use the most
> > + * up-to-date values of RIP and CS base, regardless of restore order.
> > */
> > if (is_guest_mode(vcpu) && svm->nested.nested_run_pending) {
> > if (boot_cpu_has(X86_FEATURE_NRIPS) &&
> > !guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> > svm->vmcb->control.next_rip = kvm_rip_read(vcpu);
> > +
> > + if (svm->soft_int_injected) {
> > + svm->soft_int_csbase = svm->vmcb->save.cs.base;
> > + svm->soft_int_old_rip = kvm_rip_read(vcpu);
> > + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> > + svm->soft_int_next_rip = kvm_rip_read(vcpu);
> > + }
>
> AI found a bug here. These fields will be left uninitialized if we
> cancel injection before pre_svm_run() (e.g. due to
> kvm_vcpu_exit_request()). I was going to suggest moving this to
> pre-run, but this leaves a larger gap where RIP can be updated from
> under us. Sean has a better fixup in progress.
With comments to explain the madness, this should work as fixup. It's gross and
brittle, but the only alternative I see is to add a flag to differentiate the
save/restore case from the VMRUN case. Which isn't terrible, but IMO most of
the brittleness comes from the disaster that is the architecture.
Given that the soft int reinjection code will be inherently brittle, and that
the save/restore scenario will be _extremely_ rare, I think it's worth the extra
bit of nastiness so that _if_ there's a bug, it's at least slightly more likely
we'll find it via the normal VMRUN path.
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 258aa3bfb84b..2bfbaf92d3e5 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3755,6 +3755,16 @@ static int svm_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
return svm_invoke_exit_handler(vcpu, svm->vmcb->control.exit_code);
}
+static void svm_set_nested_run_soft_int_state(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ svm->soft_int_csbase = svm->vmcb->save.cs.base;
+ svm->soft_int_old_rip = kvm_rip_read(vcpu);
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
+ svm->soft_int_next_rip = kvm_rip_read(vcpu);
+}
+
static int pre_svm_run(struct kvm_vcpu *vcpu)
{
struct svm_cpu_data *sd = per_cpu_ptr(&svm_data, vcpu->cpu);
@@ -3797,12 +3807,8 @@ static int pre_svm_run(struct kvm_vcpu *vcpu)
!guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
svm->vmcb->control.next_rip = kvm_rip_read(vcpu);
- if (svm->soft_int_injected) {
- svm->soft_int_csbase = svm->vmcb->save.cs.base;
- svm->soft_int_old_rip = kvm_rip_read(vcpu);
- if (!guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
- svm->soft_int_next_rip = kvm_rip_read(vcpu);
- }
+ if (svm->soft_int_injected)
+ svm_set_nested_run_soft_int_state(vcpu);
}
return 0;
@@ -4250,6 +4256,9 @@ static void svm_complete_soft_interrupt(struct kvm_vcpu *vcpu, u8 vector,
bool is_soft = (type == SVM_EXITINTINFO_TYPE_SOFT);
struct vcpu_svm *svm = to_svm(vcpu);
+ if (is_guest_mode(vcpu) && svm->nested.nested_run_pending)
+ svm_set_nested_run_soft_int_state(vcpu);
+
/*
* If NRIPS is enabled, KVM must snapshot the pre-VMRUN next_rip that's
* associated with the original soft exception/interrupt. next_rip is
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v3 7/8] KVM: nSVM: Delay setting soft IRQ RIP tracking fields until vCPU run
2026-03-04 18:34 ` Sean Christopherson
@ 2026-03-04 18:39 ` Yosry Ahmed
0 siblings, 0 replies; 13+ messages in thread
From: Yosry Ahmed @ 2026-03-04 18:39 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, stable
> > AI found a bug here. These fields will be left uninitialized if we
> > cancel injection before pre_svm_run() (e.g. due to
> > kvm_vcpu_exit_request()). I was going to suggest moving this to
> > pre-run, but this leaves a larger gap where RIP can be updated from
> > under us. Sean has a better fixup in progress.
>
> With comments to explain the madness, this should work as fixup. It's gross and
> brittle, but the only alternative I see is to add a flag to differentiate the
> save/restore case from the VMRUN case. Which isn't terrible, but IMO most of
> the brittleness comes from the disaster that is the architecture.
>
> Given that the soft int reinjection code will be inherently brittle, and that
> the save/restore scenario will be _extremely_ rare, I think it's worth the extra
> bit of nastiness so that _if_ there's a bug, it's at least slightly more likely
> we'll find it via the normal VMRUN path.
Agreed, the fixup looks good to me as long as we add comments to
explain WTF is going on here lol.
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 258aa3bfb84b..2bfbaf92d3e5 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3755,6 +3755,16 @@ static int svm_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
> return svm_invoke_exit_handler(vcpu, svm->vmcb->control.exit_code);
> }
>
> +static void svm_set_nested_run_soft_int_state(struct kvm_vcpu *vcpu)
> +{
> + struct vcpu_svm *svm = to_svm(vcpu);
> +
> + svm->soft_int_csbase = svm->vmcb->save.cs.base;
> + svm->soft_int_old_rip = kvm_rip_read(vcpu);
> + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> + svm->soft_int_next_rip = kvm_rip_read(vcpu);
> +}
> +
> static int pre_svm_run(struct kvm_vcpu *vcpu)
> {
> struct svm_cpu_data *sd = per_cpu_ptr(&svm_data, vcpu->cpu);
> @@ -3797,12 +3807,8 @@ static int pre_svm_run(struct kvm_vcpu *vcpu)
> !guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> svm->vmcb->control.next_rip = kvm_rip_read(vcpu);
>
> - if (svm->soft_int_injected) {
> - svm->soft_int_csbase = svm->vmcb->save.cs.base;
> - svm->soft_int_old_rip = kvm_rip_read(vcpu);
> - if (!guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> - svm->soft_int_next_rip = kvm_rip_read(vcpu);
> - }
> + if (svm->soft_int_injected)
> + svm_set_nested_run_soft_int_state(vcpu);
> }
>
> return 0;
> @@ -4250,6 +4256,9 @@ static void svm_complete_soft_interrupt(struct kvm_vcpu *vcpu, u8 vector,
> bool is_soft = (type == SVM_EXITINTINFO_TYPE_SOFT);
> struct vcpu_svm *svm = to_svm(vcpu);
>
> + if (is_guest_mode(vcpu) && svm->nested.nested_run_pending)
> + svm_set_nested_run_soft_int_state(vcpu);
> +
> /*
> * If NRIPS is enabled, KVM must snapshot the pre-VMRUN next_rip that's
> * associated with the original soft exception/interrupt. next_rip is
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2026-03-04 18:39 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260225005950.3739782-1-yosry@kernel.org>
2026-02-25 0:59 ` [PATCH v3 1/8] KVM: nSVM: Sync NextRIP to cached vmcb12 after VMRUN of L2 Yosry Ahmed
2026-02-25 0:59 ` [PATCH v3 2/8] KVM: nSVM: Sync interrupt shadow " Yosry Ahmed
2026-02-27 17:53 ` Yosry Ahmed
2026-03-02 20:41 ` Sean Christopherson
2026-02-25 0:59 ` [PATCH v3 5/8] KVM: nSVM: Always use NextRIP as vmcb02's NextRIP after first L2 VMRUN Yosry Ahmed
2026-03-04 17:30 ` Yosry Ahmed
2026-03-04 17:39 ` Sean Christopherson
2026-03-04 17:41 ` Yosry Ahmed
2026-02-25 0:59 ` [PATCH v3 6/8] KVM: nSVM: Delay stuffing L2's current RIP into NextRIP until vCPU run Yosry Ahmed
2026-02-25 0:59 ` [PATCH v3 7/8] KVM: nSVM: Delay setting soft IRQ RIP tracking fields " Yosry Ahmed
2026-03-04 17:50 ` Yosry Ahmed
2026-03-04 18:34 ` Sean Christopherson
2026-03-04 18:39 ` Yosry Ahmed
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox