From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C05AB2857F4 for ; Tue, 10 Feb 2026 01:49:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770688151; cv=none; b=Hkkk+c8QjqBMI5wu8zeMwhzmmxrF5nc0TkuE2GTXL5kvXIYxtt7dBrpXxGCWRFUBHIwUP5rBkQIzVuZkvxIaQ3fYH648wDhFcI2sLrAY9YcIOqs3PFWK5fy9Jabq2t2Yy56M236YIo4mOzU8L67QpIVB4aHIl9qKxYdWRUDSoOI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770688151; c=relaxed/simple; bh=f0M30O36pDWmWBeFmtpq3aPQid16Ebu45oYQtgcZlpA=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=b6Qg2b0T/Gd96AtGfsyOgYWX3cmtYL8Q+cEwnPOc6oPSQWMz3DKtv949l6lvs2+g+wAT6Cuv1UQ4jA5euJ+5R5LWyKtR1+F8OQvLDVL55EdnSP9UdL2tQiyWzA2MvAF5fV++bUAmrALLDt3hQmHnaCmRqpyhlP3y8Smq1XbGf5U= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=r4l5y/R+; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="r4l5y/R+" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-3561f5bd22eso320173a91.2 for ; Mon, 09 Feb 2026 17:49:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1770688148; x=1771292948; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=hZDg23eObk06p2X9aJI9PRi0MksTWlD14nG5Hqvivpw=; b=r4l5y/R+gw8r3iNw5TW4MoSHVzV6JEn0ozAyLwIpjMvucCp1WUQ7EDYRFYVBSXbBQB rg2ofJYHi4RuR7hNPXhyjCHjHIcrxIyjt/gP9WNykucBhI6h+hnPmPN34+UmsK0toGVW lZw0/YfQTscV+w37gxYhLWVvBgXit4cy3VL6UD0oJ6NrHbIuGbLeaJ9SO68QIpkc9Jyd u9eJZNiUaNO2PjSHHK/SnI6VYmD4Chgx4qiyIYknjeHXaNVFtkOfa++J7DhRofK7W21O YcbY9wnnlE/fVhUy/Q3jJBPlnlP7B3z76D1W0p2Ig8pTgCF4AeAzSF8tRNFO8iCGC10h WZUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770688148; x=1771292948; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=hZDg23eObk06p2X9aJI9PRi0MksTWlD14nG5Hqvivpw=; b=vDBMQ2eT8rjW9cCYFczf1loR92ifle9WcDYgzzlDZhZwU/TYUv4PpBom6u0X9FUjXI 4m29R+wjDb/A96L/ae62s4Sk+vDWJ2pGwvrB8IiuM9ISDVwomSxsbHGtiNghlTNE1FLa Tn8XxWUBAS19ybY2cs1xlWg/9izJgTDoa+B2aeeq8ldQURd1JnS+IsgJuNpMewp+Nb22 0Mc4hUmYiDxYNqZpgYPaHw0qKH4ndBP6QtZsA7rSCl/jKh9kBtVAcY9IlO13bXBkakJW YWjKYo8BQyBK7QAa+0UDokpUJW/+PQcy8nLAnQlvKd62RafB70Khy+GdnWyDSdZlbkMR hH/w== X-Forwarded-Encrypted: i=1; AJvYcCUYFBIyhTj2HSHSE9Oc4dRZP3ok09zybA8kOIuP75JwWu4Ek3ZLXH95mQu3s0MGje9T08DPdb4Bd/9I6X4=@vger.kernel.org X-Gm-Message-State: AOJu0YzwD/yWQRWjMIgLkmnWt++Ey8zG1i5PFNb9bhUpVkmTqt4s4BRK l4t1FQ1Eh0OGzHh0WBt02KXbmz/JlTGHMpkeuDI1i7qB8ckRYbOGWVgqrXJbI4pez+ECklu2oOP JLZZ/5g== X-Received: from pjtf21.prod.google.com ([2002:a17:90a:c295:b0:34a:a9d5:99d6]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:2d86:b0:341:134:a962 with SMTP id 98e67ed59e1d1-354b3e4c3f3mr9201788a91.28.1770688148074; Mon, 09 Feb 2026 17:49:08 -0800 (PST) Date: Mon, 9 Feb 2026 17:49:06 -0800 In-Reply-To: <20260210005449.3125133-2-yosry.ahmed@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260210005449.3125133-1-yosry.ahmed@linux.dev> <20260210005449.3125133-2-yosry.ahmed@linux.dev> Message-ID: Subject: Re: [PATCH 1/4] KVM: nSVM: Sync next_rip to cached vmcb12 after VMRUN of L2 From: Sean Christopherson To: Yosry Ahmed Cc: Paolo Bonzini , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Content-Type: text/plain; charset="us-ascii" On Tue, Feb 10, 2026, Yosry Ahmed wrote: > After VMRUN in guest mode, nested_sync_control_from_vmcb02() syncs > fields written by the CPU from vmcb02 to the cached vmcb12. This is > because the cached vmcb12 is used as the authoritative copy of some of > the controls, and is the payload when saving/restoring nested state. > > next_rip is also written by the CPU (in some cases) after VMRUN, but is > not sync'd to cached vmcb12. As a result, it is corrupted after > save/restore (replaced by the original value written by L1 on nested > VMRUN). This could cause problems for both KVM (e.g. when injecting a > soft IRQ) or L1 (e.g. when using next_rip to advance RIP after emulating > an instruction). > > Fix this by sync'ing next_rip in nested_sync_control_from_vmcb02(). Move > the call to nested_sync_control_from_vmcb02() (and the entire > is_guest_mode() block) after svm_complete_interrupts(), as it may update > next_rip in vmcb02. I'll give you one guess as to what I would say about bundling changes. AFAICT, there is _zero_ reason to move the call nested_sync_control_from_vmcb02() in a patch tagged for stable@. > Fixes: cc440cdad5b7 ("KVM: nSVM: implement KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE") > CC: stable@vger.kernel.org > Signed-off-by: Yosry Ahmed > --- > arch/x86/kvm/svm/nested.c | 6 ++++-- > arch/x86/kvm/svm/svm.c | 26 +++++++++++++++----------- > 2 files changed, 19 insertions(+), 13 deletions(-) > > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c > index de90b104a0dd..70086ba6497f 100644 > --- a/arch/x86/kvm/svm/nested.c > +++ b/arch/x86/kvm/svm/nested.c > @@ -519,8 +519,10 @@ void nested_copy_vmcb_save_to_cache(struct vcpu_svm *svm, > void nested_sync_control_from_vmcb02(struct vcpu_svm *svm) > { > u32 mask; > - svm->nested.ctl.event_inj = svm->vmcb->control.event_inj; > - svm->nested.ctl.event_inj_err = svm->vmcb->control.event_inj_err; > + > + svm->nested.ctl.event_inj = svm->vmcb->control.event_inj; > + svm->nested.ctl.event_inj_err = svm->vmcb->control.event_inj_err; > + svm->nested.ctl.next_rip = svm->vmcb->control.next_rip; This is all a mess (the existing code). nested_svm_vmexit() does this: vmcb12->control.int_state = vmcb02->control.int_state; vmcb12->control.exit_code = vmcb02->control.exit_code; vmcb12->control.exit_info_1 = vmcb02->control.exit_info_1; vmcb12->control.exit_info_2 = vmcb02->control.exit_info_2; if (!svm_is_vmrun_failure(vmcb12->control.exit_code)) nested_save_pending_event_to_vmcb12(svm, vmcb12); if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS)) vmcb12->control.next_rip = vmcb02->control.next_rip; vmcb12->control.int_ctl = svm->nested.ctl.int_ctl; vmcb12->control.event_inj = svm->nested.ctl.event_inj; vmcb12->control.event_inj_err = svm->nested.ctl.event_inj_err; but then svm_get_nested_state(), by way of nested_copy_vmcb_cache_to_control(), pulls everything from the cached fields. Which probably only works because the only fields that are pulled from vmcb02 nested_svm_vmexit() are never modified by the CPU. Actually, I take that back, I have no idea how this code works. How does e.g. exit_info_1 not get clobbered on save/restore? In other words, AFAICT, nested.ctl.int_ctl is special in that KVM needs it to be up-to-date at all times, *and* it needs to copied back to vmcb12 (or userspace). Part of me wants to remove these two fields entirely: /* cache for control fields of the guest */ struct vmcb_ctrl_area_cached ctl; /* * Note: this struct is not kept up-to-date while L2 runs; it is only * valid within nested_svm_vmrun. */ struct vmcb_save_area_cached save; and instead use "full" caches only for the duration of nested_svm_vmrun(). Or hell, just copy the entire vmcb12 and throw the cached structures in the garbage. But that'll probably end in a game of whack-a-mole as things get moved back in. So rather than do something totally drastic, I think we should kill nested_copy_vmcb_cache_to_control() and replace it with a "save control" flow. And then have it share code as much code as possible with nested_svm_vmexit(), and fixup nested_svm_vmexit() to not pull from svm->nested.ctl unnecessarily. Which, again AFICT, is pretty much limited to int_ctl: either vmcb02 is authoritative, or KVM shouldn't be updating vmcb12, and so only the "save control" for KVM_GET_NESTED_STATE needs to copy from the cache to the migrated vmcb12. That'll probably end up a bit fat for a stable@ patch, so we could do a gross one-off fix for this issue, and then do cleanups on top. diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 5f0136dbdde6..cd5664c65a00 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -4435,6 +4435,16 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags) svm_complete_interrupts(vcpu); + /* + * Update the cache after completing interrupts to get an accurate + * NextRIP, e.g. when re-injecting a soft interrupt. + * + * FIXME: Rework svm_get_nested_state() to not pull data from the + * cache (except for maybe int_ctl). + */ + if (is_guest_mode(vcpu)) + svm->nested.ctl.next_rip = svm->vmcb->control.next_rip; + return svm_exit_handlers_fastpath(vcpu); } > /* Only a few fields of int_ctl are written by the processor. */ > mask = V_IRQ_MASK | V_TPR_MASK; > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > index 5f0136dbdde6..6d8d4d19455e 100644 > --- a/arch/x86/kvm/svm/svm.c > +++ b/arch/x86/kvm/svm/svm.c > @@ -4399,17 +4399,6 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags) > sync_cr8_to_lapic(vcpu); > > svm->next_rip = 0; > - if (is_guest_mode(vcpu)) { > - nested_sync_control_from_vmcb02(svm); > - > - /* Track VMRUNs that have made past consistency checking */ > - if (svm->nested.nested_run_pending && > - !svm_is_vmrun_failure(svm->vmcb->control.exit_code)) > - ++vcpu->stat.nested_run; > - > - svm->nested.nested_run_pending = 0; > - } > - > svm->vmcb->control.tlb_ctl = TLB_CONTROL_DO_NOTHING; > > /* > @@ -4435,6 +4424,21 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags) > > svm_complete_interrupts(vcpu); > > + /* > + * svm_complete_interrupts() may update svm->vmcb->control.next_rip, > + * which is sync'd by nested_sync_control_from_vmcb02() below. Please try to avoid referencing functions and fields in comments. History has shown that they almost always become stale. > + */ > + if (is_guest_mode(vcpu)) { > + nested_sync_control_from_vmcb02(svm); > + > + /* Track VMRUNs that have made past consistency checking */ > + if (svm->nested.nested_run_pending && > + !svm_is_vmrun_failure(svm->vmcb->control.exit_code)) > + ++vcpu->stat.nested_run; > + > + svm->nested.nested_run_pending = 0; > + } > + > return svm_exit_handlers_fastpath(vcpu); > } > > > base-commit: e944fe2c09f405a2e2d147145c9b470084bc4c9a > -- > 2.53.0.rc2.204.g2597b5adb4-goog >