From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BEAF23E7155 for ; Wed, 20 May 2026 13:40:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.19 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779284423; cv=none; b=Fk9HQrRDkksvb2tu0uUwgXu/kvJPp2hOeo9UJujEH75SURlmsPuYaA/yqru0dStuO5lSFnTtGn0btP6YSt39nK+2JGOFaeaASvvO9Bl8NbP/ttz8ZTMdwbYGosSWvFRE+EIX0bbIUYPIlaNy+n3xFv3h7usihuV9eBzNMBJuDlQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779284423; c=relaxed/simple; bh=AlehDs/IqpKIcQ5FsfjPn2/Cs/DHt7hYJskpq+M548c=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Plp2mMU0mqVLDd194pNfpL1SNs4Tn4AagaOjewdi4Qfp3TPx98f2pa6wRffOGO/+91DtLOlbqo+c1MtAUK0NynTInZ7shhQja7Fkh+Cn/HvkFql97xFvP0j1Gqmb83zNWNv4F2Zy6F7tKiuu93IGCvRWTcl2AEywVSwCulUKeGg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Gws/IIrU; arc=none smtp.client-ip=198.175.65.19 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Gws/IIrU" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1779284419; x=1810820419; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=AlehDs/IqpKIcQ5FsfjPn2/Cs/DHt7hYJskpq+M548c=; b=Gws/IIrURdj7xLHtgp9eHbWiIm2aZcuJ5n1yr8KgEaVbcIrKeLiWmBeI WgmTsLb6pJrMEJbpaTFaBMtfee9ZO6WwHhfEjkcYBZezKgQnuIgj69Acy QiZB75STdQkMMioXpgIinN8X4akeGXvc8i3Ha8GmDvTqW/o4YNxCoJKA0 aW0WXxiCuztcExwkakBhzzdFoyttEw91yv18ea1Qto7RIpTcJHSoRdjQd 6SbNQhYGCOqQ8uhjEA3ZmTXw+vh/6Jbw+w2jo8eIEQctxqlkGuuUekUAa Wdxqj26B6bV4y21zGb5+I8eMaxes591gxZoJVfBk1Em5vaTuT6n2F36x6 Q==; X-CSE-ConnectionGUID: OXP43Cn4RMCTbg0/P/mYDA== X-CSE-MsgGUID: JCW2yTYRQrC8nOWMloZaiw== X-IronPort-AV: E=McAfee;i="6800,10657,11792"; a="80146497" X-IronPort-AV: E=Sophos;i="6.23,244,1770624000"; d="scan'208";a="80146497" Received: from fmviesa010.fm.intel.com ([10.60.135.150]) by orvoesa111.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 May 2026 06:40:15 -0700 X-CSE-ConnectionGUID: JwgCiX6ZSsWPYH4PXaafsQ== X-CSE-MsgGUID: DMYUw12pT3m4kEo8EfC7YA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,244,1770624000"; d="scan'208";a="235923846" Received: from 984fee019967.jf.intel.com ([10.23.153.244]) by fmviesa010-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 May 2026 06:40:14 -0700 From: Chao Gao To: kvm@vger.kernel.org, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org Cc: binbin.wu@linux.intel.com, dave.hansen@linux.intel.com, djbw@kernel.org, ira.weiny@intel.com, kai.huang@intel.com, kas@kernel.org, nik.borisov@suse.com, paulmck@kernel.org, pbonzini@redhat.com, reinette.chatre@intel.com, rick.p.edgecombe@intel.com, sagis@google.com, seanjc@google.com, tony.lindgren@linux.intel.com, vannapurve@google.com, vishal.l.verma@intel.com, yilun.xu@linux.intel.com, xiaoyao.li@intel.com, yan.y.zhao@intel.com, Chao Gao , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Subject: [PATCH v10 11/25] coco/tdx-host: Don't expose P-SEAMLDR information on CPUs with erratum Date: Wed, 20 May 2026 06:38:14 -0700 Message-ID: <20260520133909.409394-12-chao.gao@intel.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260520133909.409394-1-chao.gao@intel.com> References: <20260520133909.409394-1-chao.gao@intel.com> Precedence: bulk X-Mailing-List: linux-coco@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit TDX-capable CPUs clobber the current VMCS on return from P-SEAMLDR, as documented in IntelĀ® Trust Domain CPU Architectural Extensions: SEAMRET from the P-SEAMLDR clears the current VMCS structure pointed to by the current-VMCS pointer. A VMM that invokes the P-SEAMLDR using SEAMCALL must reload the current-VMCS, if required, using the VMPTRLD instruction. Clearing the current VMCS behind KVM's back will break KVM. Future CPUs will fix this by preserving the current VMCS across P-SEAMLDR calls. A future specification update will describe the SEAMRET VMCS-clearing behavior as an erratum and to state that it does not occur when IA32_VMX_BASIC[60] is set. Add a CPU bug bit for this erratum and refuse to expose P-SEAMLDR information on affected CPUs, because even reading the P-SEAMLDR sysfs knobs would enter and exit P-SEAMLDR. Use a CPU bug bit to stay consistent with X86_BUG_TDX_PW_MCE. As a bonus, the bug bit is visible to userspace, which allows userspace to determine why these sysfs files are not exposed, and it can also be checked by other kernel components in the future if needed. == Alternatives == Two workarounds were considered but both were rejected: 1. Save/restore the current VMCS around P-SEAMLDR calls. This produces ugly assembly code [1] and doesn't play well with #MCE or #NMI if they need to use the current VMCS. 2. Move KVM's VMCS tracking logic to the TDX core code, which would break the boundary between KVM and the TDX core code [2]. Signed-off-by: Chao Gao Reviewed-by: Kai Huang Reviewed-by: Kiryl Shutsemau (Meta) Reviewed-by: Rick Edgecombe Reviewed-by: Dave Hansen Link: https://lore.kernel.org/kvm/fedb3192-e68c-423c-93b2-a4dc2f964148@intel.com/ # [1] Link: https://lore.kernel.org/kvm/aYIXFmT-676oN6j0@google.com/ # [2] --- This is split into a separate patch rather than folded into the previous one, because the erratum handling warrants a longer changelog and discussion of the alternatives. v10: - Make it clear that clearing VMCS is the current architecture behavior. but will be fixed by a later doc update. --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/vmx.h | 1 + arch/x86/virt/vmx/tdx/tdx.c | 11 +++++++++++ drivers/virt/coco/tdx-host/tdx-host.c | 8 ++++++++ 4 files changed, 21 insertions(+) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 1d506e5d6f46..7b572bc24265 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -573,4 +573,5 @@ #define X86_BUG_ITS_NATIVE_ONLY X86_BUG( 1*32+ 8) /* "its_native_only" CPU is affected by ITS, VMX is not affected */ #define X86_BUG_TSA X86_BUG( 1*32+ 9) /* "tsa" CPU is affected by Transient Scheduler Attacks */ #define X86_BUG_VMSCAPE X86_BUG( 1*32+10) /* "vmscape" CPU is affected by VMSCAPE attacks from guests */ +#define X86_BUG_SEAMRET_INVD_VMCS X86_BUG( 1*32+11) /* "seamret_invd_vmcs" SEAMRET from P-SEAMLDR clears the current VMCS */ #endif /* _ASM_X86_CPUFEATURES_H */ diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 37080382df54..49d8551d285d 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -147,6 +147,7 @@ struct vmcs { #define VMX_BASIC_INOUT BIT_ULL(54) #define VMX_BASIC_TRUE_CTLS BIT_ULL(55) #define VMX_BASIC_NO_HW_ERROR_CODE_CC BIT_ULL(56) +#define VMX_BASIC_NO_SEAMRET_INVD_VMCS BIT_ULL(60) static inline u32 vmx_basic_vmcs_revision_id(u64 vmx_basic) { diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 5fb0441a9ac6..53cf99c41dbb 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -42,6 +42,7 @@ #include #include #include +#include #include "seamcall_internal.h" #include "tdx.h" @@ -1450,6 +1451,8 @@ static struct notifier_block tdx_memory_nb = { static void __init check_tdx_erratum(void) { + u64 basic_msr; + /* * These CPUs have an erratum. A partial write from non-TD * software (e.g. via MOVNTI variants or UC/WC mapping) to TDX @@ -1461,6 +1464,14 @@ static void __init check_tdx_erratum(void) case INTEL_EMERALDRAPIDS_X: setup_force_cpu_bug(X86_BUG_TDX_PW_MCE); } + + /* + * Some TDX-capable CPUs have an erratum where the current VMCS is + * cleared after calling into P-SEAMLDR. + */ + rdmsrq(MSR_IA32_VMX_BASIC, basic_msr); + if (!(basic_msr & VMX_BASIC_NO_SEAMRET_INVD_VMCS)) + setup_force_cpu_bug(X86_BUG_SEAMRET_INVD_VMCS); } void __init tdx_init(void) diff --git a/drivers/virt/coco/tdx-host/tdx-host.c b/drivers/virt/coco/tdx-host/tdx-host.c index 2997311f72fa..2cd7be7bb404 100644 --- a/drivers/virt/coco/tdx-host/tdx-host.c +++ b/drivers/virt/coco/tdx-host/tdx-host.c @@ -98,6 +98,14 @@ static umode_t seamldr_group_visible(struct kobject *kobj, struct attribute *att if (!tdx_supports_runtime_update(sysinfo)) return 0; + /* + * Calling P-SEAMLDR on CPUs with the seamret_invd_vmcs bug clears + * the current VMCS, which breaks KVM. Verify the erratum is not + * present before exposing P-SEAMLDR features. + */ + if (boot_cpu_has_bug(X86_BUG_SEAMRET_INVD_VMCS)) + return 0; + return attr->mode; } -- 2.52.0