public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Yosry Ahmed <yosry.ahmed@linux.dev>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	 stable@vger.kernel.org
Subject: Re: [PATCH 1/4] KVM: nSVM: Sync next_rip to cached vmcb12 after VMRUN of L2
Date: Mon, 9 Feb 2026 17:49:06 -0800	[thread overview]
Message-ID: <aYqOkvHs3L-AX-CG@google.com> (raw)
In-Reply-To: <20260210005449.3125133-2-yosry.ahmed@linux.dev>

On Tue, Feb 10, 2026, Yosry Ahmed wrote:
> After VMRUN in guest mode, nested_sync_control_from_vmcb02() syncs
> fields written by the CPU from vmcb02 to the cached vmcb12. This is
> because the cached vmcb12 is used as the authoritative copy of some of
> the controls, and is the payload when saving/restoring nested state.
> 
> next_rip is also written by the CPU (in some cases) after VMRUN, but is
> not sync'd to cached vmcb12. As a result, it is corrupted after
> save/restore (replaced by the original value written by L1 on nested
> VMRUN). This could cause problems for both KVM (e.g. when injecting a
> soft IRQ) or L1 (e.g. when using next_rip to advance RIP after emulating
> an instruction).
> 
> Fix this by sync'ing next_rip in nested_sync_control_from_vmcb02(). Move
> the call to nested_sync_control_from_vmcb02() (and the entire
> is_guest_mode() block) after svm_complete_interrupts(), as it may update
> next_rip in vmcb02.

I'll give you one guess as to what I would say about bundling changes.  AFAICT,
there is _zero_ reason to move the call nested_sync_control_from_vmcb02() in a
patch tagged for stable@.

> Fixes: cc440cdad5b7 ("KVM: nSVM: implement KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE")
> CC: stable@vger.kernel.org
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
>  arch/x86/kvm/svm/nested.c |  6 ++++--
>  arch/x86/kvm/svm/svm.c    | 26 +++++++++++++++-----------
>  2 files changed, 19 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index de90b104a0dd..70086ba6497f 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -519,8 +519,10 @@ void nested_copy_vmcb_save_to_cache(struct vcpu_svm *svm,
>  void nested_sync_control_from_vmcb02(struct vcpu_svm *svm)
>  {
>  	u32 mask;
> -	svm->nested.ctl.event_inj      = svm->vmcb->control.event_inj;
> -	svm->nested.ctl.event_inj_err  = svm->vmcb->control.event_inj_err;
> +
> +	svm->nested.ctl.event_inj	= svm->vmcb->control.event_inj;
> +	svm->nested.ctl.event_inj_err	= svm->vmcb->control.event_inj_err;
> +	svm->nested.ctl.next_rip	= svm->vmcb->control.next_rip;

This is all a mess (the existing code).  nested_svm_vmexit() does this:

	vmcb12->control.int_state         = vmcb02->control.int_state;
	vmcb12->control.exit_code         = vmcb02->control.exit_code;
	vmcb12->control.exit_info_1       = vmcb02->control.exit_info_1;
	vmcb12->control.exit_info_2       = vmcb02->control.exit_info_2;

	if (!svm_is_vmrun_failure(vmcb12->control.exit_code))
		nested_save_pending_event_to_vmcb12(svm, vmcb12);

	if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
		vmcb12->control.next_rip  = vmcb02->control.next_rip;

	vmcb12->control.int_ctl           = svm->nested.ctl.int_ctl;
	vmcb12->control.event_inj         = svm->nested.ctl.event_inj;
	vmcb12->control.event_inj_err     = svm->nested.ctl.event_inj_err;

but then svm_get_nested_state(), by way of nested_copy_vmcb_cache_to_control(),
pulls everything from the cached fields.  Which probably only works because the
only fields that are pulled from vmcb02 nested_svm_vmexit() are never modified
by the CPU.

Actually, I take that back, I have no idea how this code works.  How does e.g.
exit_info_1 not get clobbered on save/restore?

In other words, AFAICT, nested.ctl.int_ctl is special in that KVM needs it to be
up-to-date at all times, *and* it needs to copied back to vmcb12 (or userspace).

Part of me wants to remove these two fields entirely:

	/* cache for control fields of the guest */
	struct vmcb_ctrl_area_cached ctl;

	/*
	 * Note: this struct is not kept up-to-date while L2 runs; it is only
	 * valid within nested_svm_vmrun.
	 */
	struct vmcb_save_area_cached save;

and instead use "full" caches only for the duration of nested_svm_vmrun().  Or
hell, just copy the entire vmcb12 and throw the cached structures in the garbage.
But that'll probably end in a game of whack-a-mole as things get moved back in.

So rather than do something totally drastic, I think we should kill
nested_copy_vmcb_cache_to_control() and replace it with a "save control" flow.
And then have it share code as much code as possible with nested_svm_vmexit(),
and fixup nested_svm_vmexit() to not pull from svm->nested.ctl unnecessarily.
Which, again AFICT, is pretty much limited to int_ctl: either vmcb02 is
authoritative, or KVM shouldn't be updating vmcb12, and so only the "save control"
for KVM_GET_NESTED_STATE needs to copy from the cache to the migrated vmcb12.

That'll probably end up a bit fat for a stable@ patch, so we could do a gross
one-off fix for this issue, and then do cleanups on top.

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 5f0136dbdde6..cd5664c65a00 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4435,6 +4435,16 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
 
        svm_complete_interrupts(vcpu);
 
+       /*
+        * Update the cache after completing interrupts to get an accurate
+        * NextRIP, e.g. when re-injecting a soft interrupt.
+        *
+        * FIXME: Rework svm_get_nested_state() to not pull data from the
+        *        cache (except for maybe int_ctl).
+        */
+       if (is_guest_mode(vcpu))
+               svm->nested.ctl.next_rip = svm->vmcb->control.next_rip;
+
        return svm_exit_handlers_fastpath(vcpu);
 }
 
>  	/* Only a few fields of int_ctl are written by the processor.  */
>  	mask = V_IRQ_MASK | V_TPR_MASK;
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 5f0136dbdde6..6d8d4d19455e 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4399,17 +4399,6 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
>  	sync_cr8_to_lapic(vcpu);
>  
>  	svm->next_rip = 0;
> -	if (is_guest_mode(vcpu)) {
> -		nested_sync_control_from_vmcb02(svm);
> -
> -		/* Track VMRUNs that have made past consistency checking */
> -		if (svm->nested.nested_run_pending &&
> -		    !svm_is_vmrun_failure(svm->vmcb->control.exit_code))
> -                        ++vcpu->stat.nested_run;
> -
> -		svm->nested.nested_run_pending = 0;
> -	}
> -
>  	svm->vmcb->control.tlb_ctl = TLB_CONTROL_DO_NOTHING;
>  
>  	/*
> @@ -4435,6 +4424,21 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
>  
>  	svm_complete_interrupts(vcpu);
>  
> +	/*
> +	 * svm_complete_interrupts() may update svm->vmcb->control.next_rip,
> +	 * which is sync'd by nested_sync_control_from_vmcb02() below.

Please try to avoid referencing functions and fields in comments.  History has
shown that they almost always become stale.

> +	 */
> +	if (is_guest_mode(vcpu)) {
> +		nested_sync_control_from_vmcb02(svm);
> +
> +		/* Track VMRUNs that have made past consistency checking */
> +		if (svm->nested.nested_run_pending &&
> +		    !svm_is_vmrun_failure(svm->vmcb->control.exit_code))
> +			++vcpu->stat.nested_run;
> +
> +		svm->nested.nested_run_pending = 0;
> +	}
> +
>  	return svm_exit_handlers_fastpath(vcpu);
>  }
>  
> 
> base-commit: e944fe2c09f405a2e2d147145c9b470084bc4c9a
> -- 
> 2.53.0.rc2.204.g2597b5adb4-goog
> 

  reply	other threads:[~2026-02-10  1:49 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20260210005449.3125133-1-yosry.ahmed@linux.dev>
2026-02-10  0:54 ` [PATCH 1/4] KVM: nSVM: Sync next_rip to cached vmcb12 after VMRUN of L2 Yosry Ahmed
2026-02-10  1:49   ` Sean Christopherson [this message]
2026-02-10 16:25     ` Yosry Ahmed
2026-02-10 19:20       ` Sean Christopherson
2026-02-10 22:19         ` Yosry Ahmed
2026-02-11  0:09           ` Sean Christopherson
2026-02-11  0:27             ` Yosry Ahmed
2026-02-11  0:39               ` Sean Christopherson
2026-02-11  1:02                 ` Yosry Ahmed
2026-02-21  0:03                   ` Sean Christopherson
2026-02-21  9:11                     ` Yosry Ahmed
2026-02-23 16:59                       ` Sean Christopherson
2026-02-23 17:21                         ` Yosry Ahmed
2026-02-23 20:23                           ` Sean Christopherson
2026-02-10  0:54 ` [PATCH 2/4] KVM: nSVM: Sync int_state " Yosry Ahmed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aYqOkvHs3L-AX-CG@google.com \
    --to=seanjc@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=yosry.ahmed@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox