From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A79A4337BBC for ; Mon, 2 Mar 2026 23:22:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772493772; cv=none; b=pesAgHI8Ty3nzTeKqAIthbLHKTfiRfekqu5JV9XUbOe6j84T5NnL7D7P/BtVVdM8ceHs0BKxpIRl1obfHEnQ8RjKXmkoGl8RAWHIm71cR3AgcvDczj8aFXnQmOkLNTumkfUn29dA53q3v7LJ3tZcLkPuyeKslRhdp1MpSHzCkWg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772493772; c=relaxed/simple; bh=XQxSl3O1Eyae0Df2yUx6VgVUmrQa7UPWnlc9wzLzFNM=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=GR8lxwaunUr+7J7ZFw6rXviLSWGp5VW0l81a+LLoW9c2EkaocYRE3ewWuDSywsnAkq5WEKvHhfqIDRcpzB0MlsjQA1BCfEYqdh/3BCzpHiflefMVlwqkTgpApGexPdRNnKWRWxEsLT94/wOxAC/FLbYR/qqeYhhZUTmk5F0l23U= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=VLlh3XJ/; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="VLlh3XJ/" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-3598d4b19deso5738208a91.2 for ; Mon, 02 Mar 2026 15:22:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1772493769; x=1773098569; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=b7QinaCQVNXT8p32KVuTREOQfi+HlQYnbR0yXpcOwU4=; b=VLlh3XJ/nC+tNyEAXE+dj3KfM+8ZTa073sO54f/nR1eDN46Z72B6pz6G0FQ91Tb5Vk 3OGGpahuPqOmPaNqACuLLLWIFFrKZSvO0AEx3Ngn0KcQs6wfwYt8gSU++bYml1v4danr UalZ34sr835WwooQbrOk2UDtKgcBSrKfCfqVpCX/9ILus6648B/BOyES5klVMCwFKNY+ XuvZ4IcG+HUKN7LHjOYFyhPwgifW428An3eGHYDq/77j5a6aaDbRCVNmyr4H0fYS/syB 0vbIFD4vLBdMwjQMtP31tF/GQXwiNFnODI2kjMtwIpFNrj+AVqpfeZtEpggtBmFYOFv5 4TNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772493769; x=1773098569; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=b7QinaCQVNXT8p32KVuTREOQfi+HlQYnbR0yXpcOwU4=; b=HEZ2aLL3S+dGZSPpg5mJ7dqNb1RpDcM3XtQeuGKMouFTEFNuzl2QkQ8Tk+b1IxhrRy 97+VBCfhI0YoioghtfdtlTEsxSPcB2DBskFf1RSYo8hrQjBBWfCF0dNuXWMfNFqTuU4o VHnj35gNdupcg+2BO+ziOvqInl/OL3TjhokhSUlCd38s4obwPUN1PnWT0v5/uIQ3W91l kWvfRgOxxqZKDuywsWQPI2wot5pHj0vJrUStDMXFMlHdrYoK/h6ola5mQTi0RJ7cKDtx VxZUgMDJvXVU16cJvJshcB0dWPWqMEJcqEDn1Hu5QM0IOfI68wgiHPtuvlzAtOFBOmS6 oHgg== X-Forwarded-Encrypted: i=1; AJvYcCUW5cGINyZzEnhB32bJYPJD23bNNgtbxFDra6ZniYL27+L/GNcVBFT1jm56EYTUDMILbiwhaW3g1S8xY8A=@vger.kernel.org X-Gm-Message-State: AOJu0YyDo3fq9iUk36FIwFp713TLDIplqrehlbHYmSnSzIcXBK4oTwwI p6qMgzm8pVnJ97gk+bdJyCJdGHcBAy3LJ8LK4DyxLbRvt+xu6z4gAbiLxHHwaFyvjcV/jNgjKwW P/v8bzA== X-Received: from pjqc15.prod.google.com ([2002:a17:90a:a60f:b0:358:f40b:c72b]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:524a:b0:359:df9:c9f9 with SMTP id 98e67ed59e1d1-35965c3af23mr11813683a91.7.1772493768578; Mon, 02 Mar 2026 15:22:48 -0800 (PST) Date: Mon, 2 Mar 2026 15:22:47 -0800 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260227011306.3111731-1-yosry@kernel.org> <20260227011306.3111731-4-yosry@kernel.org> Message-ID: Subject: Re: [PATCH 3/3] KVM: x86: Check for injected exceptions before queuing a debug exception From: Sean Christopherson To: Yosry Ahmed Cc: Paolo Bonzini , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="us-ascii" On Fri, Feb 27, 2026, Yosry Ahmed wrote: > > > That being said, I hate nested_run_in_progress. It's too close to > > > nested_run_pending and I am pretty sure they will be mixed up. > > > > Agreed, though the fact that name is _too_ close means that, aside from the > > potential for disaster (minor detail), it's accurate. > > > > One thought is to hide nested_run_in_progress beyond a KConfig, so that attempts > > to use it for anything but the sanity check(s) would fail the build. I don't > > really want to create yet another KVM_PROVE_xxx though, but unlike KVM_PROVE_MMU, > > I think we want to this enabled in production. > > > > I'll chew on this a bit... > > Maybe (if we go this direction) name it very explicitly > warn_on_nested_exception if it's only intended to be used for the > sanity checks? It's not just about exceptions though. That's the case that has caused a rash of recent problems, but the rule isn't specific to exceptions, it's very broadly Thou Shalt Not Cancel VMRUN. I think that's where there's some disconnect. We can't make the nested_run_pending warnings go away by adding more sanity checks, and I am dead set against removing those warnings. Aha! Idea. What if we turn nested_run_pending into a u8, and use a magic value of '2' to indicate that userspace gained control of the CPU since nested_run_pending was set, and then only WARN on nested_run_pending==1? That way we don't have to come up with a new name, and there's zero chance of nested_run_pending and something like nested_run_in_progress getting out of sync. --- arch/x86/include/asm/kvm_host.h | 6 +++++- arch/x86/kvm/svm/nested.c | 3 ++- arch/x86/kvm/vmx/nested.c | 4 ++-- arch/x86/kvm/x86.c | 7 +++++++ arch/x86/kvm/x86.h | 10 ++++++++++ 5 files changed, 26 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 19b3790e5e99..a8d39b3aff6a 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1104,8 +1104,12 @@ struct kvm_vcpu_arch { * can only occur at instruction boundaries. The only exception is * VMX's "notify" exits, which exist in large part to break the CPU out * of infinite ucode loops, but can corrupt vCPU state in the process! + * + * For all intents and purposes, this is a boolean, but it's tracked as + * a u8 so that KVM can detect when userspace may have stuffed vCPU + * state and generated an architecturally-impossible VM-Exit. */ - bool nested_run_pending; + u8 nested_run_pending; #if IS_ENABLED(CONFIG_HYPERV) hpa_t hv_root_tdp; diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index c2d4c9c63146..77ff9ead957c 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -1138,7 +1138,8 @@ int nested_svm_vmexit(struct vcpu_svm *svm) /* Exit Guest-Mode */ leave_guest_mode(vcpu); svm->nested.vmcb12_gpa = 0; - WARN_ON_ONCE(vcpu->arch.nested_run_pending); + + kvm_warn_on_nested_run_pending(vcpu); kvm_clear_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu); diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 031075467a6d..5659545360dc 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -5042,7 +5042,7 @@ void __nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason, vmx->nested.mtf_pending = false; /* trying to cancel vmlaunch/vmresume is a bug */ - WARN_ON_ONCE(vcpu->arch.nested_run_pending); + kvm_warn_on_nested_run_pending(vcpu); #ifdef CONFIG_KVM_HYPERV if (kvm_check_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu)) { @@ -6665,7 +6665,7 @@ bool nested_vmx_reflect_vmexit(struct kvm_vcpu *vcpu) unsigned long exit_qual; u32 exit_intr_info; - WARN_ON_ONCE(vcpu->arch.nested_run_pending); + kvm_warn_on_nested_run_pending(vcpu); /* * Late nested VM-Fail shares the same flow as nested VM-Exit since KVM diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index db3f393192d9..30ff5a755572 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -12023,6 +12023,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) if (r <= 0) goto out; + /* + * If userspace may have modified vCPU state, mark nested_run_pending + * as "untrusted" to avoid triggering false-positive WARNs. + */ + if (vcpu->arch.nested_run_pending == 1) + vcpu->arch.nested_run_pending = 2; + r = vcpu_run(vcpu); out: diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 94d4f07aaaa0..d3003c8be961 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -188,6 +188,16 @@ static inline bool kvm_can_set_cpuid_and_feature_msrs(struct kvm_vcpu *vcpu) return vcpu->arch.last_vmentry_cpu == -1 && !is_guest_mode(vcpu); } +/* + * WARN if a nested VM-Enter is pending completion, and userspace hasn't gained + * control since the nested VM-Enter was initiated (in which case, userspace + * may have modified vCPU state to induce an architecturally invalid VM-Exit). + */ +static inline void kvm_warn_on_nested_run_pending(struct kvm_vcpu *vcpu) +{ + WARN_ON_ONCE(vcpu->arch.nested_run_pending == 1); +} + static inline void kvm_set_mp_state(struct kvm_vcpu *vcpu, int mp_state) { vcpu->arch.mp_state = mp_state; base-commit: a68a4bbc5b9ce5b722473399f05cb05217abaee8 --