From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id D78BB1D6DB5
	for <stable@vger.kernel.org>; Tue, 24 Feb 2026 00:35:56 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1771893358; cv=none; b=uNOK8f/UZne/wLsjFWX8jNPLh7U74oTRQYJkzgsVe1SP0x9DKnTGqTpJPF2jXOzruK03He2edSMC/HLxnJjmYgFcny/lSN5z/D2lscIznicVC9aSds80OK49hqY7YXr+udf/AHLoD/yjFS16MhtM/dCOYcgddCVv8YmC2nmQ1Tg=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1771893358; c=relaxed/simple;
	bh=M6g0YDap4C1uqReTa7EwkgNp6R9HB4m80rv/FdOmZPc=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type; b=bMCCp+N7/w+nmdCsay1cSLzF/bCrMbWqKNRPPCUaVj9VUT9WfXDatP5uPAZbbnKivSKCQMgtYPc/8sZ4q2bLTojQcn28wcIH6M6NgBmOxkpZV367K4EZuanJi0kwWfVpoosDQoaWRTer2NF2naTHU7XV0UW9qTfcjo6rZI6gjPE=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=YHN9pEt+; arc=none smtp.client-ip=209.85.214.201
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="YHN9pEt+"
Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-2a8fc061ce1so386306395ad.0
        for <stable@vger.kernel.org>; Mon, 23 Feb 2026 16:35:56 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1771893356; x=1772498156; darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=MNsgvRbBMfW3rMafejRvmewmBE10GfmJzRXEziHw284=;
        b=YHN9pEt+lVRth3TxKir0a5Y7O8S45IwZnDPNfWKg7ALp7+xqMaEI3WfwkScNelqsAb
         ud74FcCgo9j05DqDEHSNPMqCIZdcjpGSYp1xoRJfYXWm8FdzxNJLfRBKd+89NeMxm+1y
         olbXKuCYpkHiBKMmUAozJSLZdT9/g7EBfBNHcECoLz83rVGkoZvf889O+955tjp3hUoS
         BVg8wThkbXBrqHxW0QFDTymnWzEoqZlJkyY/TFWrWydCyY/yzBPbJSwiIm+jmZeVGWWH
         GVl5vC8ZlrZAFCxwifzepycZ6tpyuy2lzJ0N8ElmUfzt/RBtVvyLwPShWFzlFcTKoLn7
         AeEw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1771893356; x=1772498156;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=MNsgvRbBMfW3rMafejRvmewmBE10GfmJzRXEziHw284=;
        b=CJD8qA4Qm23r8VoeNuVeNvJ+uNeLffZrm8G/VwyXZL+oBryIQbcNVC+WW1kIDeLiAE
         Sb17fG5r/Y+vhMx2isfHwXPbwTUvTrXIQZu60C4lbj+rHtjvKRojMMBpQko7t+dYErD0
         Yk1Jv7U9r78i0SYR/zfnqR2CKE7d+6QbpjOxEXWaQY44/4L4VCq9ffLJkHbJOIAZGJ9P
         nyUHGl/2uT4Jp1JI1AWKQoNzvk7cwVzPyx/Vo17I0PxhmM02HkZuJ+ZqAOp1z2XB3W0E
         JIBUqUp/eKHYFEG9ieQkLw+TrlBvChmVKiUtDfsNslGSjQfUiiflq2LYuxaMQ1R2kYy4
         Uglw==
X-Forwarded-Encrypted: i=1; AJvYcCViePbcl1lDR2C9BxnyBS0whbJIsdczmRDs3GH+Tq8t3onj5qq+zMQvLpuM3iaSwZAN3sKHx4A=@vger.kernel.org
X-Gm-Message-State: AOJu0Ywu4fsRP60TkSznePa+C5ZAYk8JFvghyBST+CI5igjzlpFqi+uf
	qF5IcPRdmqyS1xcKBT//qj/Cy7zf7aiaGGwZYIB+Nrp3E2twol7AVlE+uA99QEMonok7gW0D8OI
	ytblJIw==
X-Received: from plsd5.prod.google.com ([2002:a17:902:b705:b0:2ab:3156:fa7b])
 (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:37cc:b0:2a9:5db8:d651
 with SMTP id d9443c01a7336-2ad744683e8mr95147495ad.25.1771893356036; Mon, 23
 Feb 2026 16:35:56 -0800 (PST)
Date: Mon, 23 Feb 2026 16:35:54 -0800
In-Reply-To: <20260206190851.860662-7-yosry.ahmed@linux.dev>
Precedence: bulk
X-Mailing-List: stable@vger.kernel.org
List-Id: <stable.vger.kernel.org>
List-Subscribe: <mailto:stable+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:stable+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20260206190851.860662-1-yosry.ahmed@linux.dev> <20260206190851.860662-7-yosry.ahmed@linux.dev>
Message-ID: <aZzyanOAcoAnh01A@google.com>
Subject: Re: [PATCH v5 06/26] KVM: nSVM: Triple fault if mapping VMCB12 fails
 on nested #VMEXIT
From: Sean Christopherson <seanjc@google.com>
To: Yosry Ahmed <yosry.ahmed@linux.dev>
Cc: Paolo Bonzini <pbonzini@redhat.com>, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, 
	stable@vger.kernel.org
Content-Type: text/plain; charset="us-ascii"

On Fri, Feb 06, 2026, Yosry Ahmed wrote:
> KVM currently injects a #GP and hopes for the best if mapping VMCB12
> fails on nested #VMEXIT, and only if the failure mode is -EINVAL.
> Mapping the VMCB12 could also fail if creating host mappings fails.
> 
> After the #GP is injected, nested_svm_vmexit() bails early, without
> cleaning up (e.g. KVM_REQ_GET_NESTED_STATE_PAGES is set, is_guest_mode()
> is true, etc). Move mapping VMCB12 a bit later, after leaving guest mode
> and clearing KVM_REQ_GET_NESTED_STATE_PAGES, right before the VMCB12 is
> actually used.
> 
> Instead of optionally injecting a #GP, triple fault the guest if mapping
> VMCB12 fails since KVM cannot make a sane recovery. The APM states that
> a #VMEXIT will triple fault if host state is illegal or an exception
> occurs while loading host state, so the behavior is not entirely made
> up.
> 
> Also update the WARN_ON() in svm_get_nested_state_pages() to
> WARN_ON_ONCE() to avoid future user-triggeable bugs spamming kernel logs
> and potentially causing issues.
> 
> Fixes: cf74a78b229d ("KVM: SVM: Add VMEXIT handler and intercepts")
> CC: stable@vger.kernel.org
> Co-developed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
>  arch/x86/kvm/svm/nested.c | 25 +++++++++++--------------
>  1 file changed, 11 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index fab0d3d5baa2..830341b0e1f8 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -1121,24 +1121,14 @@ void svm_copy_vmloadsave_state(struct vmcb *to_vmcb, struct vmcb *from_vmcb)
>  int nested_svm_vmexit(struct vcpu_svm *svm)
>  {
>  	struct kvm_vcpu *vcpu = &svm->vcpu;
> +	gpa_t vmcb12_gpa = svm->nested.vmcb12_gpa;
>  	struct vmcb *vmcb01 = svm->vmcb01.ptr;
>  	struct vmcb *vmcb02 = svm->nested.vmcb02.ptr;
>  	struct vmcb *vmcb12;
>  	struct kvm_host_map map;
> -	int rc;
> -
> -	rc = kvm_vcpu_map(vcpu, gpa_to_gfn(svm->nested.vmcb12_gpa), &map);
> -	if (rc) {
> -		if (rc == -EINVAL)
> -			kvm_inject_gp(vcpu, 0);
> -		return 1;
> -	}
> -
> -	vmcb12 = map.hva;
>  
>  	/* Exit Guest-Mode */
>  	leave_guest_mode(vcpu);
> -	svm->nested.vmcb12_gpa = 0;
>  	WARN_ON_ONCE(svm->nested.nested_run_pending);
>  
>  	kvm_clear_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu);
> @@ -1146,8 +1136,16 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
>  	/* in case we halted in L2 */
>  	kvm_set_mp_state(vcpu, KVM_MP_STATE_RUNNABLE);
>  
> +	svm->nested.vmcb12_gpa = 0;
> +
> +	if (kvm_vcpu_map(vcpu, gpa_to_gfn(vmcb12_gpa), &map)) {
> +		kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
> +		return 1;

Returning early isn't entirely correct.  In fact, I think it's worse than the
current behavior in many aspects.

By doing leave_guest_mode() and not switching back to vmcb01 and not putting
vcpu->arch.mmu back to root_mmu, the vCPU will be in L1 but with vmcb02 and L2's
MMU active.

The idea I can come up with is to isolate the vmcb12 writes (which is suprisingly
straightforward), and then simply skip the vmcb12 updates.  E.g.

---
 arch/x86/kvm/svm/nested.c | 95 ++++++++++++++++++++++-----------------
 1 file changed, 54 insertions(+), 41 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index fab0d3d5baa2..e8c163d95364 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -639,6 +639,12 @@ void nested_vmcb02_compute_g_pat(struct vcpu_svm *svm)
 	svm->nested.vmcb02.ptr->save.g_pat = svm->vmcb01.ptr->save.g_pat;
 }
 
+static bool nested_vmcb12_has_lbrv(struct kvm_vcpu *vcpu)
+{
+	return guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) &&
+	       to_svm(vcpu)->nested.ctl.virt_ext;
+}
+
 static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12)
 {
 	bool new_vmcb12 = false;
@@ -703,8 +709,7 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12
 		vmcb_mark_dirty(vmcb02, VMCB_DR);
 	}
 
-	if (unlikely(guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) &&
-		     (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))) {
+	if (nested_vmcb12_has_lbrv(vcpu)) {
 		/*
 		 * Reserved bits of DEBUGCTL are ignored.  Be consistent with
 		 * svm_set_msr's definition of reserved bits.
@@ -1118,35 +1123,14 @@ void svm_copy_vmloadsave_state(struct vmcb *to_vmcb, struct vmcb *from_vmcb)
 	to_vmcb->save.sysenter_eip = from_vmcb->save.sysenter_eip;
 }
 
-int nested_svm_vmexit(struct vcpu_svm *svm)
+static void nested_svm_vmexit_update_vmcb12(struct kvm_vcpu *vcpu,
+					    struct vmcb *vmcb12,
+					    struct vmcb *vmcb02)
 {
-	struct kvm_vcpu *vcpu = &svm->vcpu;
-	struct vmcb *vmcb01 = svm->vmcb01.ptr;
-	struct vmcb *vmcb02 = svm->nested.vmcb02.ptr;
-	struct vmcb *vmcb12;
-	struct kvm_host_map map;
-	int rc;
+	struct vcpu_svm *svm = to_svm(vcpu);
 
-	rc = kvm_vcpu_map(vcpu, gpa_to_gfn(svm->nested.vmcb12_gpa), &map);
-	if (rc) {
-		if (rc == -EINVAL)
-			kvm_inject_gp(vcpu, 0);
-		return 1;
-	}
-
-	vmcb12 = map.hva;
-
-	/* Exit Guest-Mode */
-	leave_guest_mode(vcpu);
-	svm->nested.vmcb12_gpa = 0;
-	WARN_ON_ONCE(svm->nested.nested_run_pending);
-
-	kvm_clear_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu);
-
-	/* in case we halted in L2 */
-	kvm_set_mp_state(vcpu, KVM_MP_STATE_RUNNABLE);
-
-	/* Give the current vmcb to the guest */
+	if (!vmcb12)
+		return;
 
 	vmcb12->save.es     = vmcb02->save.es;
 	vmcb12->save.cs     = vmcb02->save.cs;
@@ -1184,14 +1168,53 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
 		vmcb12->control.next_rip  = vmcb02->control.next_rip;
 
+	if (nested_vmcb12_has_lbrv(vcpu))
+		svm_copy_lbrs(&vmcb12->save, &vmcb02->save);
+
 	vmcb12->control.int_ctl           = svm->nested.ctl.int_ctl;
 	vmcb12->control.event_inj         = svm->nested.ctl.event_inj;
 	vmcb12->control.event_inj_err     = svm->nested.ctl.event_inj_err;
 
+	trace_kvm_nested_vmexit_inject(vmcb12->control.exit_code,
+				       vmcb12->control.exit_info_1,
+				       vmcb12->control.exit_info_2,
+				       vmcb12->control.exit_int_info,
+				       vmcb12->control.exit_int_info_err,
+				       KVM_ISA_SVM);
+}
+
+int nested_svm_vmexit(struct vcpu_svm *svm)
+{
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	struct vmcb *vmcb01 = svm->vmcb01.ptr;
+	struct vmcb *vmcb02 = svm->nested.vmcb02.ptr;
+	struct vmcb *vmcb12;
+	struct kvm_host_map map;
+	int rc;
+
+	if (!kvm_vcpu_map(vcpu, gpa_to_gfn(svm->nested.vmcb12_gpa), &map)) {
+		vmcb12 = map.hva;
+	} else {
+		kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
+		vmcb12 = NULL;
+	}
+
+	/* Exit Guest-Mode */
+	leave_guest_mode(vcpu);
+	svm->nested.vmcb12_gpa = 0;
+	WARN_ON_ONCE(svm->nested.nested_run_pending);
+
+	kvm_clear_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu);
+
+	/* in case we halted in L2 */
+	kvm_set_mp_state(vcpu, KVM_MP_STATE_RUNNABLE);
+
+	/* Give the current vmcb to the guest */
+	nested_svm_vmexit_update_vmcb12(vcpu, vmcb12, vmcb02);
+
 	if (!kvm_pause_in_guest(vcpu->kvm)) {
 		vmcb01->control.pause_filter_count = vmcb02->control.pause_filter_count;
 		vmcb_mark_dirty(vmcb01, VMCB_INTERCEPTS);
-
 	}
 
 	/*
@@ -1232,10 +1255,7 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	if (!nested_exit_on_intr(svm))
 		kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
 
-	if (unlikely(guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) &&
-		     (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))) {
-		svm_copy_lbrs(&vmcb12->save, &vmcb02->save);
-	} else {
+	if (!nested_vmcb12_has_lbrv(vcpu)) {
 		svm_copy_lbrs(&vmcb01->save, &vmcb02->save);
 		vmcb_mark_dirty(vmcb01, VMCB_LBR);
 	}
@@ -1291,13 +1311,6 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	svm->vcpu.arch.dr7 = DR7_FIXED_1;
 	kvm_update_dr7(&svm->vcpu);
 
-	trace_kvm_nested_vmexit_inject(vmcb12->control.exit_code,
-				       vmcb12->control.exit_info_1,
-				       vmcb12->control.exit_info_2,
-				       vmcb12->control.exit_int_info,
-				       vmcb12->control.exit_int_info_err,
-				       KVM_ISA_SVM);
-
 	kvm_vcpu_unmap(vcpu, &map);
 
 	nested_svm_transition_tlb_flush(vcpu);

base-commit: 2125912d022f4740238a950469da505783945be6
--