From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 65D1235957 for ; Thu, 19 Feb 2026 00:13:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771460012; cv=none; b=ZIwboA3mlxNmMFtPu8ZguoGhelSdus9l2o9WCkGIF2HnLr7PlUOyDMBEteEQ4PlIugQtC20AA8Bnmqya4mqKXdi3x+wLX2LOfFLWEOOAhW3Qj5LIQnmIUm2Z+/5NjDxFrD4hF5G4YerGyKaHIUB/XxnxckS5JX6mxTKfsOMNhbc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771460012; c=relaxed/simple; bh=5LBVOhN5/tZsOeO8gtiQtyUGUL6ip3rX4p/O6Jqs8Wg=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=P4IYAM6vInDJjD1tseO/Deco1zCMaM7+5kdw6Ow46NYEadCrPVxcIRuZN3qTKJ6criNJrp3pjNv9o/y8HDhbTR8Ek9OjaHPTGGcxh28fWdBbhm9ildWwi7kNaqAnxqDla9fr9XkBG1tbVTkaiKMcbvOc3UGH9vbdMbUuklPXdUY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=0XXRhhU2; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="0XXRhhU2" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2a944e6336eso20327895ad.0 for ; Wed, 18 Feb 2026 16:13:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1771460011; x=1772064811; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Jpc3P+bZWaKUUzprm5fVf6tAxsYLg9P8gmiGzeYpTJs=; b=0XXRhhU2iXsiF6F6SpySsT4Lno2CHDW9MfgzC4tmCchtLiqDZ1Qkt89zdJRMNrXgbp Hjg5e8lMMDSRrZXHEjnqaq+IZO6ZaGOeW0/DUQoVdlbas7Y5wPebpG6EcJJ0fHMd4IVn 2meVbC7vC2AOvG1w7fdbunCqkgEmoGgzckli3qmSNQCh08DTWVFKtvbr/JgmQEozvHbK oaxBql//X5QAbI3rFR7LTVyfcPQ2mLrfP9FfxtjPTGKPV90L0Z9uJoMpIE5+vXsC11sX 40Pi2DUMP45ZyiUSDFSlt13jzdwjwTqjIdlt2zXoYWNBV8g0d2DQX/Kdmmnv7Yf9+fwp Bwow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771460011; x=1772064811; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Jpc3P+bZWaKUUzprm5fVf6tAxsYLg9P8gmiGzeYpTJs=; b=sk0H8hzCaqlsrBMlrh1OUA97EGbdu8M/AL5L39Gl2whO3ilAGXr/QwTJ96g86JhGxj yhuveqWCK9MJZrLPv0Dn1EuL+rv1vDeu0C7/u0jFsoqWB/6i0ar5ltmiV/OVdrdgaoKA nnSiAtKgxZ409XPPRQhstPlHD+pivsl/fZ827KYRM5GYs2Jbrp69aOBSImHlBPRAih1F wNfrpXwazIrhtT5Z5Pg/EOHy8/jf4l5ZiTKJhS+d2oiKNmybH55XRh733NJKJCll5Eie yzxTKrOuHMxuTtMG1Dvz/17NW/8ZeKU0J28wF9bo1nBD7MPq9EfY//zklcMnC/2ULy8l IRxA== X-Forwarded-Encrypted: i=1; AJvYcCV71zi7QIBmSI41bgOtZlQXo72b7Ahzq7hJSGGYj2o5L8uTWBx7fZHljKb+t3w2PRQipPMs2AkbcIiWY6E=@vger.kernel.org X-Gm-Message-State: AOJu0YxjI66yIaXJeesUJmEYmWFw4Vaeeyt+rFkGJk/2s9fiFndwOrCS CiUwi/+zYJ/hNQ6OKLvgqDdklC8GCfhEpKtf33pKa3EvXntY/KzwMkwnywPntB6e5oYpqQXdiaQ HkzJkyw== X-Received: from plsm8.prod.google.com ([2002:a17:902:bb88:b0:2aa:dce0:e741]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:fc50:b0:295:96bc:8699 with SMTP id d9443c01a7336-2ad50e9ceadmr34889765ad.20.1771460010649; Wed, 18 Feb 2026 16:13:30 -0800 (PST) Date: Wed, 18 Feb 2026 16:13:29 -0800 In-Reply-To: <20260212230751.1871720-5-yosry.ahmed@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260212230751.1871720-1-yosry.ahmed@linux.dev> <20260212230751.1871720-5-yosry.ahmed@linux.dev> Message-ID: Subject: Re: [RFC PATCH 4/5] KVM: SVM: Recalculate nested RIPs after restoring REGS/SREGS From: Sean Christopherson To: Yosry Ahmed Cc: Paolo Bonzini , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Content-Type: text/plain; charset="us-ascii" On Thu, Feb 12, 2026, Yosry Ahmed wrote: > In the save/restore path, if KVM_SET_NESTED_STATE is performed before > restoring REGS and/or SREGS , the values of CS and RIP used to > initialize the vmcb02's NextRIP and soft interrupt tracking RIPs are > incorrect. > > Recalculate them up after CS is set, or REGS are restored. This is only > needed when a nested run is pending during restore. After L2 runs for > the first time, any soft interrupts injected by L1 are already delivered > or tracked by KVM separately for re-injection, so the CS and RIP values > are no longer relevant. > > If KVM_SET_NESTED_STATE is performed after both REGS and SREGS are > restored, it will just overwrite the fields. Apparently I suggested this general idea, but ugh. :-) > static void svm_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg) > { > kvm_register_mark_available(vcpu, reg); > @@ -1826,6 +1844,8 @@ static void svm_set_segment(struct kvm_vcpu *vcpu, > if (seg == VCPU_SREG_SS) > /* This is symmetric with svm_get_segment() */ > svm->vmcb->save.cpl = (var->dpl & 3); > + else if (seg == VCPU_SREG_CS) > + svm_fixup_nested_rips(vcpu); > > vmcb_mark_dirty(svm->vmcb, VMCB_SEG); > } > @@ -5172,6 +5192,7 @@ struct kvm_x86_ops svm_x86_ops __initdata = { > .get_rflags = svm_get_rflags, > .set_rflags = svm_set_rflags, > .get_if_flag = svm_get_if_flag, > + .post_user_set_regs = svm_fixup_nested_rips, > > .flush_tlb_all = svm_flush_tlb_all, > .flush_tlb_current = svm_flush_tlb_current, > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index db3f393192d9..35fe1d337273 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -12112,6 +12112,8 @@ static void __set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) > kvm_rip_write(vcpu, regs->rip); > kvm_set_rflags(vcpu, regs->rflags | X86_EFLAGS_FIXED); > > + kvm_x86_call(post_user_set_regs)(vcpu); I especially don't love this callback. Aside from adding a new kvm_x86_ops hook, I don't like that _any_ CS change triggers a fixup, whereas only userspace writes to RIP trigger a fixup. That _should_ be a moot point, because neither CS nor RIP should change while nested_run_pending is true, but I dislike the asymmetry. I was going to suggest we instead react to RIP being dirty, but what if we take it a step further? Somewhat of a crazy idea, but what happens if we simply wait until just before VMRUN to set soft_int_csbase, soft_int_old_rip, and soft_int_next_rip (when the guest doesn't have NRIPS)? E.g. after patch 2, completely untested... diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index aec17c80ed73..6fc1b2e212d2 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -863,12 +863,9 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm, svm->nmi_l1_to_l2 = is_evtinj_nmi(vmcb02->control.event_inj); if (is_evtinj_soft(vmcb02->control.event_inj)) { svm->soft_int_injected = true; - svm->soft_int_csbase = vmcb12_csbase; - svm->soft_int_old_rip = vmcb12_rip; + if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS)) svm->soft_int_next_rip = svm->nested.ctl.next_rip; - else - svm->soft_int_next_rip = vmcb12_rip; } /* LBR_CTL_ENABLE_MASK is controlled by svm_update_lbrv() */ diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 8f8bc863e214..358ec940ffc9 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -4322,6 +4322,14 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags) return EXIT_FASTPATH_EXIT_USERSPACE; } + if (is_guest_mode(vcpu) && svm->nested.nested_run_pending && + svm->soft_int_injected) { + svm->soft_int_csbase = svm->vmcb->save.cs.base; + svm->soft_int_old_rip = kvm_rip_read(vcpu); + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS)) + svm->soft_int_next_rip = kvm_rip_read(vcpu); + } + sync_lapic_to_cr8(vcpu); if (unlikely(svm->asid != svm->vmcb->control.asid)) {