From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 519AC3064B5 for ; Fri, 20 Feb 2026 17:07:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771607239; cv=none; b=MxnxhUcagsXpVWoHKyDNYh6aD2NQwSnMeREOjAIuxcQc2pjjRoqpiD/IZaqNaIWcFgvfz9xxuLqrk2ZIAcMlYvFbnMfKswpmTzH3f8xontqbNmYnAiSuLfy3NZGPnh0BBumTJ3jxUPzSRbk2burcXBIO8rRXecjA7kx0BvFl000= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771607239; c=relaxed/simple; bh=nTvKG2bYwt4WXjBgVLKMsVCflgzXNBIIVlM0LeA9T5k=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=LTnk+TM2vQjUBrVnCQM9nNy9SMAuzTvVpa+qZ6i0z1Ca35Aw5m5gEEeZOw+lgN8Z1ujEhrCkp4UcbBEklHNhAMyi/4Uav5VvlrEUYaAZFKfYLo1UMvO847Y/Bh54sEBd6h913+yBHpWl/qT0B18mZIJd6youuC5vpRiWVRUQ2XE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=kcJclAJ8; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="kcJclAJ8" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2a8fc061ce1so160083865ad.0 for ; Fri, 20 Feb 2026 09:07:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1771607238; x=1772212038; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=FQ61+EII6jhjSoVXf1SpvV9qDNdO2h+UXN+UPcHPg4I=; b=kcJclAJ89LbyDGHgJ4QfuBm43MbfCrIIFZbZhoAETJUGDnP9YuUWl2rDosh8iHRixX fBYE9RPQRmDRjLpC+klju/2IPojuwDiP1KZckgOB6hTv1VMdz1b0yYa6noo5kS00vEzQ JIpvjy4ysKWOWjwQKO4RMQqS5KIszFUcAzgNtqXOMPosTr1agn8jPehLSgQMzR+t72Aj lqRzq23VTAiPZSTVM3+V6oNDtNptZLLAv5YN5guhQWt8RrbOV/So2i2cfPFiG06tuW5n m+9nW0FbpieTVWtnndNB0ePpOjsV+elj+cVLtznsUCHXYMobUTkA45bmvQT8XZ19OJOo XDQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771607238; x=1772212038; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=FQ61+EII6jhjSoVXf1SpvV9qDNdO2h+UXN+UPcHPg4I=; b=aE0NaJCnMA+9afGFuSB1o5ZfRbpJD4Oc8bsioIrO1/an8ocL/39Y0NQUa0CMZ03jnW nM2NxA/rJQk6Qz4FwkNnYLaBFxqpJUGEA03vPi/jsFIVXDUIEo8WSi5dMU38oRLBI6J1 gNr4OsCONgbFY5BKHXFqMT000G70ahI2L3WtD11FeXq1eRGaXk31EfOOACbUbn28hk2M QhVjEU1Fy6zbdDpuaMmecYEj4+Dn0YgEaCE56RPFzCLGc9gFkGFTj3MgxOChY0ykUGu0 aully246OeBO1TNPNAsP4LYY7KHzlFQBMjP9sGcXMhId7rZgPuTxjiHruw4PgWWA5iD+ sU0g== X-Forwarded-Encrypted: i=1; AJvYcCXM+hqsd1Bp3M6r4ZtKl1911jpe2cx3lv04ZDdmwHQbVODw/YB9J2UYYA3YOh9UwUC+PqZDBncUgHgFRN8=@vger.kernel.org X-Gm-Message-State: AOJu0Yxf9TVtkcSGzOSRSMiOeFZwBbgcUQph386fiYm+J6vNpelfqUbA ijsv64wBgqVsegzlP4ymMjKdqRWW+CR4Fa2QNyr8eF0vHuXvSY7O/cAKCqRWXk3bapLpCeyfPgS aGrbWpQ== X-Received: from plbks16.prod.google.com ([2002:a17:903:850:b0:2ab:348e:7201]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:e949:b0:2aa:d65f:47a2 with SMTP id d9443c01a7336-2ad7451cad0mr2076595ad.41.1771607237557; Fri, 20 Feb 2026 09:07:17 -0800 (PST) Date: Fri, 20 Feb 2026 09:07:16 -0800 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260212230751.1871720-1-yosry.ahmed@linux.dev> <20260212230751.1871720-5-yosry.ahmed@linux.dev> Message-ID: Subject: Re: [RFC PATCH 4/5] KVM: SVM: Recalculate nested RIPs after restoring REGS/SREGS From: Sean Christopherson To: Yosry Ahmed Cc: Paolo Bonzini , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Content-Type: text/plain; charset="us-ascii" On Thu, Feb 19, 2026, Yosry Ahmed wrote: > > E.g. after patch 2, completely untested... > > > > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c > > index aec17c80ed73..6fc1b2e212d2 100644 > > --- a/arch/x86/kvm/svm/nested.c > > +++ b/arch/x86/kvm/svm/nested.c > > @@ -863,12 +863,9 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm, > > Above the context lines we have: > > /* > * next_rip is consumed on VMRUN as the return address pushed on the > * stack for injected soft exceptions/interrupts. If nrips is exposed > * to L1, take it verbatim from vmcb12. If nrips is supported in > * hardware but not exposed to L1, stuff the actual L2 RIP to emulate > * what a nrips=0 CPU would do (L1 is responsible for advancing RIP > * prior to injecting the event). > */ > if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS)) > vmcb02->control.next_rip = svm->nested.ctl.next_rip; > else if (boot_cpu_has(X86_FEATURE_NRIPS)) > vmcb02->control.next_rip = vmcb12_rip; > > The same bug affects vmcb02->control.next_rip when the guest doesn't > have NRIPS. I think we don't want to move part of the vmcb02 > initialization before VMRUN too. We can keep the initialization here and > overwrite it before VMRUN if needed, but that's just also ugh.. Aha! I knew I was missing something, but I couldn't quite get my brain to figure out what. I don't have a super strong preference as to copying the code or moving it wholesale. Though I would say if the pre-VMRUN logic is _identical_ (and I think it is?), then we move it, and simply update the comment in nested_vmcb02_prepare_control() to call that out. > > svm->nmi_l1_to_l2 = is_evtinj_nmi(vmcb02->control.event_inj); > > if (is_evtinj_soft(vmcb02->control.event_inj)) { > > svm->soft_int_injected = true; > > - svm->soft_int_csbase = vmcb12_csbase; > > - svm->soft_int_old_rip = vmcb12_rip; > > + > > if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS)) > > svm->soft_int_next_rip = svm->nested.ctl.next_rip; > > Why not move this too? For the same reason I think we should keep if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS)) vmcb02->control.next_rip = svm->nested.ctl.next_rip; where it is. When NRIPS is exposed to the guest, the incoming nested state is the one and only source of truth. By keeping the code different, we'd effectively be documenting that the host.NRIPS+!guest.NRIPS case is the anomaly. > > - else > > - svm->soft_int_next_rip = vmcb12_rip; > > } > > > > /* LBR_CTL_ENABLE_MASK is controlled by svm_update_lbrv() */ > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > > index 8f8bc863e214..358ec940ffc9 100644 > > --- a/arch/x86/kvm/svm/svm.c > > +++ b/arch/x86/kvm/svm/svm.c > > @@ -4322,6 +4322,14 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags) > > return EXIT_FASTPATH_EXIT_USERSPACE; > > } > > > > + if (is_guest_mode(vcpu) && svm->nested.nested_run_pending && > > + svm->soft_int_injected) { > > + svm->soft_int_csbase = svm->vmcb->save.cs.base; > > + svm->soft_int_old_rip = kvm_rip_read(vcpu); > > + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS)) > > + svm->soft_int_next_rip = kvm_rip_read(vcpu); > > + } > > + > > I generally dislike adding more is_guest_mode() stuff in svm_vcpu_run(), > maybe we can refactor them later to pre-run and post-run nested > callbacks? Anyway, not a big deal, definitely an improvement over the > current patch assuming we can figure out how to fix next_rip. I don't love it either, but I want to (a) avoid unnecessarily overwriting the fields, e.g. if KVM never actually does VMRUN and (b) minimize the probability of consuming a bad RIP. In practice, I would expect the nested_run_pending check to be correctly predicted the overwhelming majority of the time, i.e. I don't anticipate performance issues due to putting the code in the hot path. If we want to try and move the update out of svm_vcpu_run(), then we shouldn't need generic pre/post callbacks for nested, svm_prepare_switch_to_guest() is the right fit.