From: Sean Christopherson <seanjc@google.com>
To: Yosry Ahmed <yosry.ahmed@linux.dev>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
stable@vger.kernel.org
Subject: Re: [PATCH 1/4] KVM: nSVM: Sync next_rip to cached vmcb12 after VMRUN of L2
Date: Mon, 9 Feb 2026 17:49:06 -0800 [thread overview]
Message-ID: <aYqOkvHs3L-AX-CG@google.com> (raw)
In-Reply-To: <20260210005449.3125133-2-yosry.ahmed@linux.dev>
On Tue, Feb 10, 2026, Yosry Ahmed wrote:
> After VMRUN in guest mode, nested_sync_control_from_vmcb02() syncs
> fields written by the CPU from vmcb02 to the cached vmcb12. This is
> because the cached vmcb12 is used as the authoritative copy of some of
> the controls, and is the payload when saving/restoring nested state.
>
> next_rip is also written by the CPU (in some cases) after VMRUN, but is
> not sync'd to cached vmcb12. As a result, it is corrupted after
> save/restore (replaced by the original value written by L1 on nested
> VMRUN). This could cause problems for both KVM (e.g. when injecting a
> soft IRQ) or L1 (e.g. when using next_rip to advance RIP after emulating
> an instruction).
>
> Fix this by sync'ing next_rip in nested_sync_control_from_vmcb02(). Move
> the call to nested_sync_control_from_vmcb02() (and the entire
> is_guest_mode() block) after svm_complete_interrupts(), as it may update
> next_rip in vmcb02.
I'll give you one guess as to what I would say about bundling changes. AFAICT,
there is _zero_ reason to move the call nested_sync_control_from_vmcb02() in a
patch tagged for stable@.
> Fixes: cc440cdad5b7 ("KVM: nSVM: implement KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE")
> CC: stable@vger.kernel.org
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
> arch/x86/kvm/svm/nested.c | 6 ++++--
> arch/x86/kvm/svm/svm.c | 26 +++++++++++++++-----------
> 2 files changed, 19 insertions(+), 13 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index de90b104a0dd..70086ba6497f 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -519,8 +519,10 @@ void nested_copy_vmcb_save_to_cache(struct vcpu_svm *svm,
> void nested_sync_control_from_vmcb02(struct vcpu_svm *svm)
> {
> u32 mask;
> - svm->nested.ctl.event_inj = svm->vmcb->control.event_inj;
> - svm->nested.ctl.event_inj_err = svm->vmcb->control.event_inj_err;
> +
> + svm->nested.ctl.event_inj = svm->vmcb->control.event_inj;
> + svm->nested.ctl.event_inj_err = svm->vmcb->control.event_inj_err;
> + svm->nested.ctl.next_rip = svm->vmcb->control.next_rip;
This is all a mess (the existing code). nested_svm_vmexit() does this:
vmcb12->control.int_state = vmcb02->control.int_state;
vmcb12->control.exit_code = vmcb02->control.exit_code;
vmcb12->control.exit_info_1 = vmcb02->control.exit_info_1;
vmcb12->control.exit_info_2 = vmcb02->control.exit_info_2;
if (!svm_is_vmrun_failure(vmcb12->control.exit_code))
nested_save_pending_event_to_vmcb12(svm, vmcb12);
if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
vmcb12->control.next_rip = vmcb02->control.next_rip;
vmcb12->control.int_ctl = svm->nested.ctl.int_ctl;
vmcb12->control.event_inj = svm->nested.ctl.event_inj;
vmcb12->control.event_inj_err = svm->nested.ctl.event_inj_err;
but then svm_get_nested_state(), by way of nested_copy_vmcb_cache_to_control(),
pulls everything from the cached fields. Which probably only works because the
only fields that are pulled from vmcb02 nested_svm_vmexit() are never modified
by the CPU.
Actually, I take that back, I have no idea how this code works. How does e.g.
exit_info_1 not get clobbered on save/restore?
In other words, AFAICT, nested.ctl.int_ctl is special in that KVM needs it to be
up-to-date at all times, *and* it needs to copied back to vmcb12 (or userspace).
Part of me wants to remove these two fields entirely:
/* cache for control fields of the guest */
struct vmcb_ctrl_area_cached ctl;
/*
* Note: this struct is not kept up-to-date while L2 runs; it is only
* valid within nested_svm_vmrun.
*/
struct vmcb_save_area_cached save;
and instead use "full" caches only for the duration of nested_svm_vmrun(). Or
hell, just copy the entire vmcb12 and throw the cached structures in the garbage.
But that'll probably end in a game of whack-a-mole as things get moved back in.
So rather than do something totally drastic, I think we should kill
nested_copy_vmcb_cache_to_control() and replace it with a "save control" flow.
And then have it share code as much code as possible with nested_svm_vmexit(),
and fixup nested_svm_vmexit() to not pull from svm->nested.ctl unnecessarily.
Which, again AFICT, is pretty much limited to int_ctl: either vmcb02 is
authoritative, or KVM shouldn't be updating vmcb12, and so only the "save control"
for KVM_GET_NESTED_STATE needs to copy from the cache to the migrated vmcb12.
That'll probably end up a bit fat for a stable@ patch, so we could do a gross
one-off fix for this issue, and then do cleanups on top.
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 5f0136dbdde6..cd5664c65a00 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4435,6 +4435,16 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
svm_complete_interrupts(vcpu);
+ /*
+ * Update the cache after completing interrupts to get an accurate
+ * NextRIP, e.g. when re-injecting a soft interrupt.
+ *
+ * FIXME: Rework svm_get_nested_state() to not pull data from the
+ * cache (except for maybe int_ctl).
+ */
+ if (is_guest_mode(vcpu))
+ svm->nested.ctl.next_rip = svm->vmcb->control.next_rip;
+
return svm_exit_handlers_fastpath(vcpu);
}
> /* Only a few fields of int_ctl are written by the processor. */
> mask = V_IRQ_MASK | V_TPR_MASK;
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 5f0136dbdde6..6d8d4d19455e 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4399,17 +4399,6 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
> sync_cr8_to_lapic(vcpu);
>
> svm->next_rip = 0;
> - if (is_guest_mode(vcpu)) {
> - nested_sync_control_from_vmcb02(svm);
> -
> - /* Track VMRUNs that have made past consistency checking */
> - if (svm->nested.nested_run_pending &&
> - !svm_is_vmrun_failure(svm->vmcb->control.exit_code))
> - ++vcpu->stat.nested_run;
> -
> - svm->nested.nested_run_pending = 0;
> - }
> -
> svm->vmcb->control.tlb_ctl = TLB_CONTROL_DO_NOTHING;
>
> /*
> @@ -4435,6 +4424,21 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
>
> svm_complete_interrupts(vcpu);
>
> + /*
> + * svm_complete_interrupts() may update svm->vmcb->control.next_rip,
> + * which is sync'd by nested_sync_control_from_vmcb02() below.
Please try to avoid referencing functions and fields in comments. History has
shown that they almost always become stale.
> + */
> + if (is_guest_mode(vcpu)) {
> + nested_sync_control_from_vmcb02(svm);
> +
> + /* Track VMRUNs that have made past consistency checking */
> + if (svm->nested.nested_run_pending &&
> + !svm_is_vmrun_failure(svm->vmcb->control.exit_code))
> + ++vcpu->stat.nested_run;
> +
> + svm->nested.nested_run_pending = 0;
> + }
> +
> return svm_exit_handlers_fastpath(vcpu);
> }
>
>
> base-commit: e944fe2c09f405a2e2d147145c9b470084bc4c9a
> --
> 2.53.0.rc2.204.g2597b5adb4-goog
>
next prev parent reply other threads:[~2026-02-10 1:49 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-10 0:54 [PATCH 0/4] KVM: nSVM: Fix save/restore of next_rip & int_state Yosry Ahmed
2026-02-10 0:54 ` [PATCH 1/4] KVM: nSVM: Sync next_rip to cached vmcb12 after VMRUN of L2 Yosry Ahmed
2026-02-10 1:49 ` Sean Christopherson [this message]
2026-02-10 16:25 ` Yosry Ahmed
2026-02-10 19:20 ` Sean Christopherson
2026-02-10 22:19 ` Yosry Ahmed
2026-02-11 0:09 ` Sean Christopherson
2026-02-11 0:27 ` Yosry Ahmed
2026-02-11 0:39 ` Sean Christopherson
2026-02-11 1:02 ` Yosry Ahmed
2026-02-21 0:03 ` Sean Christopherson
2026-02-21 9:11 ` Yosry Ahmed
2026-02-23 16:59 ` Sean Christopherson
2026-02-23 17:21 ` Yosry Ahmed
2026-02-23 20:23 ` Sean Christopherson
2026-02-10 0:54 ` [PATCH 2/4] KVM: nSVM: Sync int_state " Yosry Ahmed
2026-02-10 0:54 ` [PATCH 3/4] KVM: selftests: Extend state_test to check vGIF Yosry Ahmed
2026-02-10 0:54 ` [PATCH 4/4] KVM: selftests: Extend state_test to check next_rip Yosry Ahmed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aYqOkvHs3L-AX-CG@google.com \
--to=seanjc@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=stable@vger.kernel.org \
--cc=yosry.ahmed@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox