From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 059A337AA6D for ; Wed, 4 Mar 2026 18:34:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772649296; cv=none; b=pY2ZLCAZ8HbwrSpo5DcsbNZuuh/udwqZBtRLBSwp7RZuMEibRgvdTiXPas/Sy9XruFdPFSBpl8qHeOuXkGXQoCjy5xgWLuF/o/bupCXXFjOW6yejmcy73cATKImkJ/JebMrYgfvLaGWdEW/QBkks6/9ajUqxSz9oVJ1vQhDbYXs= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772649296; c=relaxed/simple; bh=pddQ1sHqXRKwEx6DgskNl9V+i6uNVeLK7geszSsncxg=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=WbMb5PfJe9vAA6gZsykXDrMDufjBxxFLcYKodAjIN/TZ6KmXzne1vP15hp1lFkmU1jDeKDc+v3z/216cbw9Lm8GRdLmq0Omz3nFyh19tYWkFR7xZDvtEMqWR1tsrsl3cgHn4ewTVMfe3ncwcC2Z8bO8JVEA1CTxvduu/1p9Oirc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=AbzDtfPC; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="AbzDtfPC" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-35641c14663so7451789a91.2 for ; Wed, 04 Mar 2026 10:34:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1772649294; x=1773254094; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=4RNBMpp2q2mifTqjvXaoCxPMqrYTj+LYVfCr+UpK9yU=; b=AbzDtfPCDSfpGYXgCao1bPfZ4GJ9SFawOlfbHhsxD6RbNLeiA8RbYJ2yR2JGntyDep TMSWixG3/WHQ0QMn6iukWWCQ4YVQomoSnXVMZIXlwYBS5q6f9Nmg5vMLwgIqpKg2aEvf VjkVCrPQWVpydSUfjbjfUiYpR4R2OO/9hjFp8PWyDaTo0+VpS4fwCLWr3ORF2qVbNeut x3+EY4jyNe58n4Se0+rIglfwT78R/EmkP9WDm92nrVUo0Srqrf4H7b6P0MJL+2yafOSk HHywBTlh4/66aixCP8UgUJGIfxx9YAIzE7VpLoupr6CJRtNhwpqZfUBY2hVpGXEX7IRy MvdQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772649294; x=1773254094; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=4RNBMpp2q2mifTqjvXaoCxPMqrYTj+LYVfCr+UpK9yU=; b=d6XzSVmFh4KC2MmwZdEfgz0Z9sul1Aay2uL/2Lf9xpZGDK/AgkMSBUtjPx0Dv4K5+9 cCg9C58FMx0FeBWOEfm0gZDtYl74X6DfSJ9EpCD8ZjRg5davePZf03tJz+plf/HUCFgL OKwPMjurrDmlVImAN45x43wh56shGB1PqTBKK014oFWr3QiXDed9IGg3VikRHCD4E/UI G0FPOMszN/YtRgaOZ0u/UapqPKGpVny7YFWJum0CHtkbo8iKieztca48DfbWug2Q+n8y a+8Ie5t0UeILhgiZVkMNu40j5G/zxSpgo5S6V4aHeGcmlp1LGhR2oh6uU9llRRpNc4Sp 1ukg== X-Forwarded-Encrypted: i=1; AJvYcCV/gBj+sRsTWAAGbGU30cUbqaUR952CX208in6tBavi/rZi2OZY/z2gRNbMujgboiEv3giNnERhAXRsyw8=@vger.kernel.org X-Gm-Message-State: AOJu0Yw5QvPUprAoVsRSRLynKY4QilSgVDrX1UeVBJXfWgeALBbCkp7J 53+aqnnMJkVtO7SfNHgqEz+zEp9EeQseb6lll/ntq3spRhdvUTBST9MW4w5Jf+o+QA+tSckGlyO HY7y4bw== X-Received: from pjboi15.prod.google.com ([2002:a17:90b:3a0f:b0:354:a065:ec58]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90a:d40d:b0:358:e7de:a717 with SMTP id 98e67ed59e1d1-359a6a69b23mr2328391a91.25.1772649294186; Wed, 04 Mar 2026 10:34:54 -0800 (PST) Date: Wed, 4 Mar 2026 10:34:52 -0800 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260225005950.3739782-1-yosry@kernel.org> <20260225005950.3739782-8-yosry@kernel.org> Message-ID: Subject: Re: [PATCH v3 7/8] KVM: nSVM: Delay setting soft IRQ RIP tracking fields until vCPU run From: Sean Christopherson To: Yosry Ahmed Cc: Paolo Bonzini , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On Wed, Mar 04, 2026, Yosry Ahmed wrote: > On Tue, Feb 24, 2026 at 5:00=E2=80=AFPM Yosry Ahmed wr= ote: > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > > index ded4372f2d499..7948e601ea784 100644 > > --- a/arch/x86/kvm/svm/svm.c > > +++ b/arch/x86/kvm/svm/svm.c > > @@ -3670,11 +3670,21 @@ static int pre_svm_run(struct kvm_vcpu *vcpu) > > * This is done here (as opposed to when preparing vmcb02) to u= se the > > * most up-to-date value of RIP regardless of the order of rest= oring > > * registers and nested state in the vCPU save+restore path. > > + * > > + * Simiarly, initialize svm->soft_int_* fields here to use the = most > > + * up-to-date values of RIP and CS base, regardless of restore = order. > > */ > > if (is_guest_mode(vcpu) && svm->nested.nested_run_pending) { > > if (boot_cpu_has(X86_FEATURE_NRIPS) && > > !guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS)) > > svm->vmcb->control.next_rip =3D kvm_rip_read(vc= pu); > > + > > + if (svm->soft_int_injected) { > > + svm->soft_int_csbase =3D svm->vmcb->save.cs.bas= e; > > + svm->soft_int_old_rip =3D kvm_rip_read(vcpu); > > + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS)= ) > > + svm->soft_int_next_rip =3D kvm_rip_read= (vcpu); > > + } >=20 > AI found a bug here. These fields will be left uninitialized if we > cancel injection before pre_svm_run() (e.g. due to > kvm_vcpu_exit_request()). I was going to suggest moving this to > pre-run, but this leaves a larger gap where RIP can be updated from > under us. Sean has a better fixup in progress. With comments to explain the madness, this should work as fixup. It's gros= s and brittle, but the only alternative I see is to add a flag to differentiate t= he save/restore case from the VMRUN case. Which isn't terrible, but IMO most = of the brittleness comes from the disaster that is the architecture. Given that the soft int reinjection code will be inherently brittle, and th= at the save/restore scenario will be _extremely_ rare, I think it's worth the = extra bit of nastiness so that _if_ there's a bug, it's at least slightly more li= kely we'll find it via the normal VMRUN path. diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 258aa3bfb84b..2bfbaf92d3e5 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -3755,6 +3755,16 @@ static int svm_handle_exit(struct kvm_vcpu *vcpu, fa= stpath_t exit_fastpath) return svm_invoke_exit_handler(vcpu, svm->vmcb->control.exit_code); } =20 +static void svm_set_nested_run_soft_int_state(struct kvm_vcpu *vcpu) +{ + struct vcpu_svm *svm =3D to_svm(vcpu); + + svm->soft_int_csbase =3D svm->vmcb->save.cs.base; + svm->soft_int_old_rip =3D kvm_rip_read(vcpu); + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS)) + svm->soft_int_next_rip =3D kvm_rip_read(vcpu); +} + static int pre_svm_run(struct kvm_vcpu *vcpu) { struct svm_cpu_data *sd =3D per_cpu_ptr(&svm_data, vcpu->cpu); @@ -3797,12 +3807,8 @@ static int pre_svm_run(struct kvm_vcpu *vcpu) !guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS)) svm->vmcb->control.next_rip =3D kvm_rip_read(vcpu); =20 - if (svm->soft_int_injected) { - svm->soft_int_csbase =3D svm->vmcb->save.cs.base; - svm->soft_int_old_rip =3D kvm_rip_read(vcpu); - if (!guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS)) - svm->soft_int_next_rip =3D kvm_rip_read(vcp= u); - } + if (svm->soft_int_injected) + svm_set_nested_run_soft_int_state(vcpu); } =20 return 0; @@ -4250,6 +4256,9 @@ static void svm_complete_soft_interrupt(struct kvm_vc= pu *vcpu, u8 vector, bool is_soft =3D (type =3D=3D SVM_EXITINTINFO_TYPE_SOFT); struct vcpu_svm *svm =3D to_svm(vcpu); =20 + if (is_guest_mode(vcpu) && svm->nested.nested_run_pending) + svm_set_nested_run_soft_int_state(vcpu); + /* * If NRIPS is enabled, KVM must snapshot the pre-VMRUN next_rip th= at's * associated with the original soft exception/interrupt. next_rip= is