Linux Confidential Computing Development
 help / color / mirror / Atom feed
* Re: [PATCH v2 19/31] iommu/vt-d: Reserve the MSB domain ID bit for the TDX module
From: Baolu Lu @ 2026-04-09  5:48 UTC (permalink / raw)
  To: Xu Yilun
  Cc: kernel test robot, linux-coco, linux-pci, dan.j.williams, x86,
	oe-kbuild-all, chao.gao, dave.jiang, yilun.xu, zhenzhong.duan,
	kvm, rick.p.edgecombe, dave.hansen, kas, xiaoyao.li,
	vishal.l.verma, linux-kernel
In-Reply-To: <adZFCF01fxt4gBh8@yilunxu-OptiPlex-7050>

On 4/8/26 20:07, Xu Yilun wrote:
> On Tue, Mar 31, 2026 at 03:20:44PM +0800, Baolu Lu wrote:
>> On 3/29/26 00:57, kernel test robot wrote:
>>> kernel test robot noticed the following build warnings:
>>>
>>> [auto build test WARNING on 11439c4635edd669ae435eec308f4ab8a0804808]
>>>
>>> url:https://github.com/intel-lab-lkp/linux/commits/Xu-Yilun/x86-tdx-Move-
>>> all-TDX-error-defines-into-asm-shared-tdx_errno-h/20260328-151524
>>> base:   11439c4635edd669ae435eec308f4ab8a0804808
>>> patch link:https://lore.kernel.org/r/20260327160132.2946114-20-
>>> yilun.xu%40linux.intel.com
>>> patch subject: [PATCH v2 19/31] iommu/vt-d: Reserve the MSB domain ID bit for the TDX module
>>> config: i386-randconfig-141-20260328
>>> (https://download.01.org/0day-ci/archive/20260329/202603290006.za7iiDgF-
>>> lkp@intel.com/config)
>>> compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
>>> smatch: v0.5.0-9004-gb810ac53
>>> reproduce (this is a W=1 build):
>>> (https://download.01.org/0day-ci/archive/20260329/202603290006.za7iiDgF-
>>> lkp@intel.com/reproduce)
>>>
>>> If you fix the issue in a separate patch/commit (i.e. not just a new version of
>>> the same patch/commit), kindly add following tags
>>> | Reported-by: kernel test robot<lkp@intel.com>
>>> | Closes:https://lore.kernel.org/oe-kbuild-all/202603290006.za7iiDgF-lkp@intel.com/
>>>
>>> All warnings (new ones prefixed by >>, old ones prefixed by <<):
>>>
>>>>> WARNING: modpost: vmlinux: section mismatch in reference: iommu_max_domain_id+0x55 (section: .text.iommu_max_domain_id) -> acpi_table_parse_keyp (section: .init.text)
>>
>>
>> acpi_table_parse_keyp() is marked as __init. But this patch causes the
>> intel iommu driver to call it from a runtime function.
>>
>> int __init_or_acpilib
>> acpi_table_parse_keyp(enum acpi_keyp_type id,
>>                        acpi_tbl_entry_handler_arg handler_arg, void *arg)
>> {
>>          return __acpi_table_parse_entries(ACPI_SIG_KEYP,
>>                                            sizeof(struct acpi_table_keyp),
>> id,
>>                                            NULL, handler_arg, arg, 0);
>> }
> 
> Is it better we configure ACPI table as library, so that drivers could
> use it freely at runtime? tdx-host also uses this function.
> 
> --------8<--------
> 
> diff --git a/drivers/iommu/intel/Kconfig b/drivers/iommu/intel/Kconfig
> index 5471f814e073..55188d6d38bb 100644
> --- a/drivers/iommu/intel/Kconfig
> +++ b/drivers/iommu/intel/Kconfig
> @@ -1,6 +1,7 @@
>   # SPDX-License-Identifier: GPL-2.0-only
>   # Intel IOMMU support
>   config DMAR_TABLE
> +       select ACPI_TABLE_LIB
>          bool
> 
>   config DMAR_PERF
> 

This looks better.

Thanks,
baolu

^ permalink raw reply

* RE: [PATCH v2 18/31] iommu/vt-d: Cache max domain ID to avoid redundant calculation
From: Tian, Kevin @ 2026-04-09  7:02 UTC (permalink / raw)
  To: Xu Yilun, linux-coco@lists.linux.dev, linux-pci@vger.kernel.org,
	Williams, Dan J, x86@kernel.org
  Cc: Gao, Chao, Jiang, Dave, baolu.lu@linux.intel.com, Xu, Yilun,
	Duan, Zhenzhong, kvm@vger.kernel.org, Edgecombe, Rick P,
	dave.hansen@linux.intel.com, kas@kernel.org, Li, Xiaoyao,
	Verma, Vishal L, linux-kernel@vger.kernel.org
In-Reply-To: <20260327160132.2946114-19-yilun.xu@linux.intel.com>

> From: Xu Yilun <yilun.xu@linux.intel.com>
> Sent: Saturday, March 28, 2026 12:01 AM
> 
> From: Lu Baolu <baolu.lu@linux.intel.com>
> 
> The cap_ndoms() helper calculates the maximum available domain ID from
> the value of capability register, which can be inefficient if called
> repeatedly. Cache the maximum supported domain ID in max_domain_id
> field
> during initialization to avoid redundant calls to cap_ndoms() throughout
> the IOMMU driver.
> 
> No functionality change.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply

* RE: [PATCH v2 19/31] iommu/vt-d: Reserve the MSB domain ID bit for the TDX module
From: Tian, Kevin @ 2026-04-09  7:16 UTC (permalink / raw)
  To: Xu Yilun, linux-coco@lists.linux.dev, linux-pci@vger.kernel.org,
	Williams, Dan J, x86@kernel.org
  Cc: Gao, Chao, Jiang, Dave, baolu.lu@linux.intel.com, Xu, Yilun,
	Duan, Zhenzhong, kvm@vger.kernel.org, Edgecombe, Rick P,
	dave.hansen@linux.intel.com, kas@kernel.org, Li, Xiaoyao,
	Verma, Vishal L, linux-kernel@vger.kernel.org
In-Reply-To: <20260327160132.2946114-20-yilun.xu@linux.intel.com>

> From: Xu Yilun <yilun.xu@linux.intel.com>
> Sent: Saturday, March 28, 2026 12:01 AM
> 
> +
> +static bool platform_is_tdxc_enhanced(void)

platform_support_tdxc()

> +{
> +	static int tvm_usable = -1;
> +	int ret;
> +
> +	/* only need to parse once */
> +	if (tvm_usable != -1)
> +		return !!tvm_usable;
> +
> +	tvm_usable = 0;
> +	ret = acpi_table_parse_keyp(ACPI_KEYP_TYPE_CONFIG_UNIT,
> +				    keyp_config_unit_tvm_usable,
> &tvm_usable);
> +	if (ret < 0)
> +		tvm_usable = 0;

this is useless. tvm_usable is already set to '0' before the function call.

> +
> +	return !!tvm_usable;
> +}
> +
> +static unsigned long iommu_max_domain_id(struct intel_iommu *iommu)
> +{
> +	unsigned long ndoms = cap_ndoms(iommu->cap);
> +
> +	/*
> +	 * Intel TDX Connect Architecture Specification, Section 2.2 Trusted
> DMA
> +	 *
> +	 * When IOMMU is enabled to support TDX Connect, the IOMMU
> restricts
> +	 * the VMM’s DID setting, reserving the MSB bit for the TDX module.
> The
> +	 * TDX module always sets this reserved bit on the trusted DMA table.
> +	 */
> +	if (ecap_tdxc(iommu->ecap) && platform_is_tdxc_enhanced()) {
> +		pr_info_once("Most Significant Bit of domain ID
> reserved.\n");

'... reserved for TDX Connect'

> +		return ndoms >> 1;
> +	}
> +

Here we need more words to explain the strategy here.

The comment says "When IOMMU is *enabled*...", but the code here
just checks the static capability. It's probably a design choice that you
don't want to add complexity on recycling DIDs when TDX connect
is actually enabled, but it's worth a note here.

btw in patch23 commit msg:

"
There is no dedicated way to enumerate which IOMMU devices support
trusted operations. The host has to call TDH.IOMMU.SETUP on all IOMMU
devices and tell their trusted capability by the return value.
"

which implies that ecap_tdxc() alone doesn't really report the capability?

anyway all of those need a better explanation here...

^ permalink raw reply

* RE: [PATCH v2 20/31] x86/virt/tdx: Add a helper to loop on TDX_INTERRUPTED_RESUMABLE
From: Tian, Kevin @ 2026-04-09  7:21 UTC (permalink / raw)
  To: Xu Yilun, linux-coco@lists.linux.dev, linux-pci@vger.kernel.org,
	Williams, Dan J, x86@kernel.org
  Cc: Gao, Chao, Jiang, Dave, baolu.lu@linux.intel.com, Xu, Yilun,
	Duan, Zhenzhong, kvm@vger.kernel.org, Edgecombe, Rick P,
	dave.hansen@linux.intel.com, kas@kernel.org, Li, Xiaoyao,
	Verma, Vishal L, linux-kernel@vger.kernel.org
In-Reply-To: <20260327160132.2946114-21-yilun.xu@linux.intel.com>

> From: Xu Yilun <yilun.xu@linux.intel.com>
> Sent: Saturday, March 28, 2026 12:01 AM
> 
> +static u64 __maybe_unused __seamcall_ir_resched(sc_func_t sc_func, u64
> fn,
> +						struct tdx_module_args *args)
> +{

'ir' sounds redundant with the trailing 'resched'?

not big deal, just a bit confusing when seeing it in IOMMU side where
'ir' also refers to 'interrupt remapping' and is frequently used in 
irq_remapping.c... :)

^ permalink raw reply

* RE: [PATCH v2 21/31] x86/virt/tdx: Add SEAMCALL wrappers for trusted IOMMU setup and clear
From: Tian, Kevin @ 2026-04-09  7:30 UTC (permalink / raw)
  To: Xu Yilun, linux-coco@lists.linux.dev, linux-pci@vger.kernel.org,
	Williams, Dan J, x86@kernel.org
  Cc: Gao, Chao, Jiang, Dave, baolu.lu@linux.intel.com, Xu, Yilun,
	Duan, Zhenzhong, kvm@vger.kernel.org, Edgecombe, Rick P,
	dave.hansen@linux.intel.com, kas@kernel.org, Li, Xiaoyao,
	Verma, Vishal L, linux-kernel@vger.kernel.org
In-Reply-To: <20260327160132.2946114-22-yilun.xu@linux.intel.com>

> From: Xu Yilun <yilun.xu@linux.intel.com>
> Sent: Saturday, March 28, 2026 12:01 AM
> 
> From: Zhenzhong Duan <zhenzhong.duan@intel.com>
> 
> Add SEAMCALLs to setup/clear trusted IOMMU for TDX Connect.

what is 'trusted IOMMU'? a new hardware, or some sensitive resource in
the IOMMU which is only visible to TDX module?

If the latter it's clearer to say "trusted configuration in IOMMU".

> 
> Enable TEE I/O support for a target device requires to setup trusted IOMMU
> for the related IOMMU device first, even only for enabling physical secure
> links like SPDM/IDE.

this series is just about SPDM/IDE. then the first part about TEE I/O is not
really relevant.

> 
> TDH.IOMMU.SETUP takes the register base address (VTBAR) to position an
> IOMMU device, and outputs an IOMMU_ID as the trusted IOMMU identifier.
> TDH.IOMMU.CLEAR takes the IOMMU_ID to reverse the setup.

Intel IOMMU is called VT-d. It has a register block but not a PCI device so
there is no BAR resource related.

let's just call it 'reg_base'

intel-iommu driver already has its own 'id' definition for each iommu device.
It's clearer to add a prefix to this new id, e.g. tdx_iommu_id?

^ permalink raw reply

* Re: [PATCH v2 10/19] x86, swiotlb: Teach swiotlb to skip "accepted" devices
From: Aneesh Kumar K.V @ 2026-04-09  7:33 UTC (permalink / raw)
  To: Dan Williams, linux-coco, linux-pci
  Cc: gregkh, aik, yilun.xu, bhelgaas, alistair23, lukas, jgg,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Marek Szyprowski, Robin Murphy
In-Reply-To: <20260303000207.1836586-11-dan.j.williams@intel.com>

Dan Williams <dan.j.williams@intel.com> writes:

> There are two mechanisms to force SWIOTLB operation, the kernel command
> line option and the internal SWIOTLB_FORCE flag. With the arrival of
> "accepted" devices, devices that have been enabled to DMA to private
> encrypted memory, the SWIOTLB_FORCE flag is an awkward fit. It may be the
> case that SWIOTLB operation wants to be forced regardless of the device
> acceptance state.
>
> Introduce a new SWIOTLB_UNACCPTED flag that allows for both augmenting the
> result of is_swiotlb_force_bounce() dynamically and allowing for an "always
> SWIOTLB" override.
>
> Cc: Thomas Gleixner <tglx@kernel.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: x86@kernel.org
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  include/linux/swiotlb.h   | 15 ++++++++++++---
>  arch/x86/kernel/pci-dma.c |  2 +-
>  kernel/dma/swiotlb.c      |  1 +
>  3 files changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 3dae0f592063..0efb9b8e5dd0 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -17,6 +17,7 @@ struct scatterlist;
>  #define SWIOTLB_VERBOSE	(1 << 0) /* verbose initialization */
>  #define SWIOTLB_FORCE	(1 << 1) /* force bounce buffering */
>  #define SWIOTLB_ANY	(1 << 2) /* allow any memory for the buffer */
> +#define SWIOTLB_UNACCEPTED (1 << 3) /* swiotlb for unaccepted devices */
>  
>  /*
>   * Maximum allowable number of contiguous slabs to map,
> @@ -91,6 +92,7 @@ struct io_tlb_pool {
>   * @nslabs:	Total number of IO TLB slabs in all pools.
>   * @debugfs:	The dentry to debugfs.
>   * @force_bounce: %true if swiotlb bouncing is forced
> + * @bounce_unaccepted: %true if unaccepted devices must bounce
>   * @for_alloc:  %true if the pool is used for memory allocation
>   * @can_grow:	%true if more pools can be allocated dynamically.
>   * @phys_limit:	Maximum allowed physical address.
> @@ -109,8 +111,9 @@ struct io_tlb_mem {
>  	struct io_tlb_pool defpool;
>  	unsigned long nslabs;
>  	struct dentry *debugfs;
> -	bool force_bounce;
> -	bool for_alloc;
> +	u8 force_bounce:1;
> +	u8 bounce_unaccepted:1;
> +	u8 for_alloc:1;
>  #ifdef CONFIG_SWIOTLB_DYNAMIC
>  	bool can_grow;
>  	u64 phys_limit;
> @@ -173,7 +176,13 @@ static inline bool is_swiotlb_force_bounce(struct device *dev)
>  {
>  	struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
>  
> -	return mem && mem->force_bounce;
> +	if (!mem)
> +		return false;
> +	if (mem->force_bounce)
> +		return true;
> +	if (mem->bounce_unaccepted && !device_cc_accepted(dev))
> +		return true;
> +	return false;
>  }
>  
>  void swiotlb_init(bool addressing_limited, unsigned int flags);
> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
> index 6267363e0189..8a737f501ae5 100644
> --- a/arch/x86/kernel/pci-dma.c
> +++ b/arch/x86/kernel/pci-dma.c
> @@ -61,7 +61,7 @@ static void __init pci_swiotlb_detect(void)
>  	 */
>  	if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) {
>  		x86_swiotlb_enable = true;
> -		x86_swiotlb_flags |= SWIOTLB_FORCE;
> +		x86_swiotlb_flags |= SWIOTLB_UNACCEPTED;
>  	}
>  }
>  #else
>

I guess we can also include arm64 change here
modified   arch/arm64/mm/init.c
@@ -335,7 +335,7 @@ void __init arch_mm_preinit(void)
 
 	if (is_realm_world()) {
 		swiotlb = true;
-		flags |= SWIOTLB_FORCE;
+		flags |= SWIOTLB_UNACCEPTED;
 	}
 
 	if (IS_ENABLED(CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC) && !swiotlb) {

-aneesh

^ permalink raw reply

* Re: [PATCH 2/2] x86/virt/tdx: Use PFN directly for unmapping guest private memory
From: Yan Zhao @ 2026-04-09  6:54 UTC (permalink / raw)
  To: Paolo Bonzini, Xiaoyao Li, seanjc, dave.hansen, tglx, mingo, bp,
	kas, x86, linux-kernel, kvm, linux-coco, kai.huang,
	rick.p.edgecombe, yilun.xu, vannapurve, ackerleytng, sagis,
	binbin.wu, isaku.yamahata
In-Reply-To: <adRTWttGqVfIHaNf@yzhao56-desk.sh.intel.com>

On Tue, Apr 07, 2026 at 08:44:10AM +0800, Yan Zhao wrote:
> On Sat, Apr 04, 2026 at 08:39:00AM +0200, Paolo Bonzini wrote:
> > On 3/19/26 09:56, Yan Zhao wrote:
> > > On Thu, Mar 19, 2026 at 04:56:10PM +0800, Xiaoyao Li wrote:
> > > > So why not considering option 2?
> > > > 
> > > >    2. keep tdx_quirk_reset_page() as-is for the cases of
> > > >       tdx_reclaim_page() and tdx_reclaim_td_control_pages() that have the
> > > >       struct page. But only change tdx_sept_remove_private_spte() to use
> > > >       tdx_quirk_reset_paddr() directly.
> > > > 
> > > > It will need export tdx_quirk_reset_paddr() for KVM. I think it will be OK?
> > > I don't think it's necessary. But if we have to export an extra API, IMHO,
> > > tdx_quirk_reset_pfn() is better than tdx_quirk_reset_paddr(). Otherwise,
> > > why not only expose tdx_quirk_reset_paddr()?
> > 
> > That works for me, it seems the cleanest.
> Hi Paolo,
> To avoid misunderstanding: you think only exporting tdx_quirk_reset_paddr() is
> the cleanest, right? :)
Could I rename tdx_quirk_reset_page() to tdx_quirk_phymem_page_reset() and only
export tdx_quirk_phymem_page_reset()?

The "phymem_page" is similar to that in tdh_phymem_page_wbinvd_hkid(), indicating
it's operating on physical memory of page size, so it does not confuse people
even though it takes PFN as input. Another benefit is that callers have no need
to specify size, which is always PAGE_SIZE.

^ permalink raw reply

* Re: [PATCH v2 09/19] PCI/TSM: Support creating encrypted MMIO descriptors via TDISP Report
From: Aneesh Kumar K.V @ 2026-04-09  7:48 UTC (permalink / raw)
  To: Jason Gunthorpe, Xu Yilun
  Cc: Dan Williams, linux-coco, linux-pci, gregkh, aik, bhelgaas,
	alistair23, lukas, Arnd Bergmann
In-Reply-To: <20260313133658.GD1586734@nvidia.com>

Jason Gunthorpe <jgg@nvidia.com> writes:

> On Fri, Mar 13, 2026 at 06:23:51PM +0800, Xu Yilun wrote:
>
>> My understanding is, it is the obfuscated host start pfn of this range,
>> if this range has offset to the BAR start, this field should also be
>> offsetted.
>
> The OS must get an idea of the bar layout out of the report, so there
> have to be restrictions on how it is formed otherwise it is
> unparsible. IMHO the PCI spec created this very general mechanism but
> the CPU CC specs need to constrain it to be usable by an OS.
>

ARM CCA spec mention these restrictions in section

A9.6.2 Realm validation of device memory mappings

-aneesh

^ permalink raw reply

* RE: [PATCH v2 22/31] iommu/vt-d: Export a helper to do function for each dmar_drhd_unit
From: Tian, Kevin @ 2026-04-09  7:49 UTC (permalink / raw)
  To: Xu Yilun, linux-coco@lists.linux.dev, linux-pci@vger.kernel.org,
	Williams, Dan J, x86@kernel.org
  Cc: Gao, Chao, Jiang, Dave, baolu.lu@linux.intel.com, Xu, Yilun,
	Duan, Zhenzhong, kvm@vger.kernel.org, Edgecombe, Rick P,
	dave.hansen@linux.intel.com, kas@kernel.org, Li, Xiaoyao,
	Verma, Vishal L, linux-kernel@vger.kernel.org
In-Reply-To: <20260327160132.2946114-23-yilun.xu@linux.intel.com>

> From: Xu Yilun <yilun.xu@linux.intel.com>
> Sent: Saturday, March 28, 2026 12:01 AM
> 
> @@ -86,6 +86,8 @@ extern struct list_head dmar_drhd_units;
>  				dmar_rcu_check())			\
>  		if (i=drhd->iommu, 0) {} else
> 
> +int do_for_each_drhd_unit(int (*fn)(struct dmar_drhd_unit *));
> +
>  static inline bool dmar_rcu_check(void)

It's a bit weird to insert it here. Move it to follow for_each_iommu().

> +
> +int do_for_each_drhd_unit(int (*fn)(struct dmar_drhd_unit *))
> +{
> +	struct dmar_drhd_unit *drhd;
> +	int ret;
> +
> +	guard(rwsem_read)(&dmar_global_lock);
> +
> +	for_each_drhd_unit(drhd) {
> +		ret = fn(drhd);
> +		if (ret)
> +			return ret;
> +	}
> +	return 0;
> +}

use for_each_active_drhd_unit(). or is there need to setup the trusted
configuration even on ignored iommu?

^ permalink raw reply

* RE: [PATCH v2 23/31] coco/tdx-host: Setup all trusted IOMMUs on TDX Connect init
From: Tian, Kevin @ 2026-04-09  7:51 UTC (permalink / raw)
  To: Xu Yilun, linux-coco@lists.linux.dev, linux-pci@vger.kernel.org,
	Williams, Dan J, x86@kernel.org
  Cc: Gao, Chao, Jiang, Dave, baolu.lu@linux.intel.com, Xu, Yilun,
	Duan, Zhenzhong, kvm@vger.kernel.org, Edgecombe, Rick P,
	dave.hansen@linux.intel.com, kas@kernel.org, Li, Xiaoyao,
	Verma, Vishal L, linux-kernel@vger.kernel.org
In-Reply-To: <20260327160132.2946114-24-yilun.xu@linux.intel.com>

> From: Xu Yilun <yilun.xu@linux.intel.com>
> Sent: Saturday, March 28, 2026 12:01 AM
> 
> Setup all trusted IOMMUs on TDX Connect initialization and clear all on
> TDX Connect removal.
> 
> Trusted IOMMU setup is the pre-condition for all following TDX Connect
> operations such as SPDM/IDE setup. It is more of a platform
> configuration than a standalone IOMMU configuration, so put the
> implementation in tdx-host driver.
> 

not sure what above tries to tell. why is it a platform configuration
when you have seamcalls on each IOMMU?

^ permalink raw reply

* RE: [PATCH v2 24/31] coco/tdx-host: Add a helper to exchange SPDM messages through DOE
From: Tian, Kevin @ 2026-04-09  7:56 UTC (permalink / raw)
  To: Xu Yilun, linux-coco@lists.linux.dev, linux-pci@vger.kernel.org,
	Williams, Dan J, x86@kernel.org
  Cc: Gao, Chao, Jiang, Dave, baolu.lu@linux.intel.com, Xu, Yilun,
	Duan, Zhenzhong, kvm@vger.kernel.org, Edgecombe, Rick P,
	dave.hansen@linux.intel.com, kas@kernel.org, Li, Xiaoyao,
	Verma, Vishal L, linux-kernel@vger.kernel.org
In-Reply-To: <20260327160132.2946114-25-yilun.xu@linux.intel.com>

> From: Xu Yilun <yilun.xu@linux.intel.com>
> Sent: Saturday, March 28, 2026 12:01 AM
> +
> +static int __maybe_unused tdx_spdm_msg_exchange(struct tdx_tsm_link
> *tlink,
> +						void *request, size_t
> request_sz,
> +						void *response, size_t
> response_sz)
> +{
> +	struct pci_dev *pdev = tlink->pci.base_tsm.pdev;

call it pci_spdm_msg_exchange() and pass in struct pci_dev directly.

there is no other use of tlink in this function. could add a note that
this should be moved to pci core when a 2nd user of raw frame comes.

^ permalink raw reply

* RE: [PATCH v2 25/31] x86/virt/tdx: Add SEAMCALL wrappers for SPDM management
From: Tian, Kevin @ 2026-04-09  7:59 UTC (permalink / raw)
  To: Xu Yilun, linux-coco@lists.linux.dev, linux-pci@vger.kernel.org,
	Williams, Dan J, x86@kernel.org
  Cc: Gao, Chao, Jiang, Dave, baolu.lu@linux.intel.com, Xu, Yilun,
	Duan, Zhenzhong, kvm@vger.kernel.org, Edgecombe, Rick P,
	dave.hansen@linux.intel.com, kas@kernel.org, Li, Xiaoyao,
	Verma, Vishal L, linux-kernel@vger.kernel.org
In-Reply-To: <20260327160132.2946114-26-yilun.xu@linux.intel.com>

> From: Xu Yilun <yilun.xu@linux.intel.com>
> Sent: Saturday, March 28, 2026 12:01 AM
> 

here ...

> - TDH.SPDM.MNG supports three SPDM runtime operations: HEARTBEAT,
>   KEY_UPDATE and DEV_INFO_RECOLLECTION.

... but the actual helper just pass whatever ops to TDX module 

> +u64 tdh_exec_spdm_mng(u64 spdm_id, u64 spdm_op, struct page
> *spdm_param,
> +		      struct page *spdm_rsp, struct page *spdm_req,
> +		      struct tdx_page_array *spdm_out,
> +		      u64 *spdm_req_or_out_len)
> +{
> +	struct tdx_module_args args = {
> +		.rcx = spdm_id,
> +		.rdx = spdm_op,
> +		.r8 = spdm_param ? page_to_phys(spdm_param) : -1,
> +		.r9 = page_to_phys(spdm_rsp),
> +		.r10 = page_to_phys(spdm_req),
> +		.r11 = spdm_out ? hpa_array_t_assign_raw(spdm_out) : -1,
> +	};
> +	u64 r;
> +
> +	r = seamcall_ret_ir_exec(TDH_SPDM_MNG, &args);
> +
> +	*spdm_req_or_out_len = args.rcx;
> +
> +	return r;
> +}
> +EXPORT_SYMBOL_FOR_MODULES(tdh_exec_spdm_mng, "tdx-host");

^ permalink raw reply

* RE: [PATCH v2 30/31] coco/tdx-host: Implement IDE stream setup/teardown
From: Tian, Kevin @ 2026-04-09  8:02 UTC (permalink / raw)
  To: Xu Yilun, linux-coco@lists.linux.dev, linux-pci@vger.kernel.org,
	Williams, Dan J, x86@kernel.org
  Cc: Gao, Chao, Jiang, Dave, baolu.lu@linux.intel.com, Xu, Yilun,
	Duan, Zhenzhong, kvm@vger.kernel.org, Edgecombe, Rick P,
	dave.hansen@linux.intel.com, kas@kernel.org, Li, Xiaoyao,
	Verma, Vishal L, linux-kernel@vger.kernel.org
In-Reply-To: <20260327160132.2946114-31-yilun.xu@linux.intel.com>

> From: Xu Yilun <yilun.xu@linux.intel.com>
> Sent: Saturday, March 28, 2026 12:02 AM
> 
> Implementation for a most straightforward Selective IDE stream setup.
> Hard code all parameters for Stream Control Register. And no IDE Key
> Refresh support.
> 

'more straightforward', compared to what?

^ permalink raw reply

* Re: [PATCH 2/2] x86/virt/tdx: Use PFN directly for unmapping guest private memory
From: Yan Zhao @ 2026-04-09  7:42 UTC (permalink / raw)
  To: Kiryl Shutsemau
  Cc: seanjc, pbonzini, dave.hansen, tglx, mingo, bp, x86, linux-kernel,
	kvm, linux-coco, kai.huang, rick.p.edgecombe, yilun.xu,
	vannapurve, ackerleytng, sagis, binbin.wu, xiaoyao.li,
	isaku.yamahata
In-Reply-To: <abvTJq0Ks22WnLSA@thinkstation>

On Thu, Mar 19, 2026 at 10:48:08AM +0000, Kiryl Shutsemau wrote:
> On Thu, Mar 19, 2026 at 08:58:08AM +0800, Yan Zhao wrote:
> > @@ -1817,11 +1817,11 @@ static void tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn,
> >  	if (TDX_BUG_ON_2(err, TDH_MEM_PAGE_REMOVE, entry, level_state, kvm))
> >  		return;
> >  
> > -	err = tdh_phymem_page_wbinvd_hkid((u16)kvm_tdx->hkid, page);
> > +	err = tdh_phymem_page_wbinvd_hkid((u16)kvm_tdx->hkid, pfn);
> >  	if (TDX_BUG_ON(err, TDH_PHYMEM_PAGE_WBINVD, kvm))
> >  		return;
> >  
> > -	tdx_quirk_reset_page(page);
> > +	tdx_quirk_reset_page(pfn);
> >  }
> >  
> >  void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
> 
> The same problem. @level is ignored.
There's a "KVM_BUG_ON(level != PG_LEVEL_4K, kvm)" in
tdx_sept_remove_private_spte() before invoking
tdh_phymem_page_wbinvd_hkid() and tdx_quirk_reset_page().

So it should be fine.

^ permalink raw reply

* Re: [PATCH 2/2] x86/tdx: Accept hotplugged memory before online
From: Marc-André Lureau @ 2026-04-09 15:19 UTC (permalink / raw)
  To: Duan, Zhenzhong, David Hildenbrand
  Cc: Edgecombe, Rick P, Reshetova, Elena, pbonzini@redhat.com,
	prsampat@amd.com, x86@kernel.org, kas@kernel.org,
	dave.hansen@linux.intel.com, linux-kernel@vger.kernel.org,
	mingo@redhat.com, bp@alien8.de, Qiang, Chenyi, tglx@kernel.org,
	hpa@zytor.com, kvm@vger.kernel.org, linux-coco@lists.linux.dev
In-Reply-To: <IA3PR11MB91365041D42CB7F53DA5A98092582@IA3PR11MB9136.namprd11.prod.outlook.com>

Hi

On Thu, Apr 9, 2026 at 5:36 AM Duan, Zhenzhong <zhenzhong.duan@intel.com> wrote:
>
>
>
> >-----Original Message-----
> >From: Edgecombe, Rick P <rick.p.edgecombe@intel.com>
> >Subject: Re: [PATCH 2/2] x86/tdx: Accept hotplugged memory before online
> >
> >On Fri, 2026-04-03 at 10:37 +0000, Reshetova, Elena wrote:
> >> > > > So the part about whether a triggered accept succeeds or returns an
> >> > > > already accepted error is already under the control of the host. > >
> >> > > > I.e., if we don't have the zeroing behavior, the host can already > >
> >> > > > cause the page to get zeroed. So I don't think anything is > >
> >> > > > regressed. Both come down to how careful the guest is about what it > >
> >> > > > accepts.
> >> >
> >> > Yes, and my point is that we should not allow guest to freely double
> >> > accepting ever.
> >> > For any use case that requires releasing memory and accepting it > back, it
> >> > should be explicit action by the guest to track that memory > has been
> >> > "released" (under correct and safe conditions) and then it > is ok to accept
> >> > it back (even if it doesnt mean physically accepting > it) and in this case
> >> > it is ok (and even strongly desired) to zero the > page to simulate the
> >> > normal accept behaviour.
> >
> >Hmm, it doesn't seem like you engaged with my point. Or at least I'm not
> >following what is exposed?
> >
> >So I'm going to assume you agree that this procedure would not open up any
> >specific new capabilities for the host that don't exist today. And instead you
> >are just saying that the guest should have infrastructure to not double accept
> >memory in the first place.
> >
> >But the problem here is not that the guest losing track of the accept state
> >actually. It is that the guest relies on the host to actually zap the S-EPT
> >before re-plugging memory at the same physical address space. So the guest is
> >tracking that the memory is released correctly. Better tracking will not help.
> >It relies on host behavior to not hit a double accept.
> >
> >TDX connect will use this "unaccept" seamcall, so I asked Zhenzhong (Cced) how
> >much of what we need for that solution will just get added for TDX connect
> >anyway. It seems like we should make sure the same solution will work for both
> >SNP and TDX and keep the options open at this stage.
>
> For that solution, analog to hotplug, TDX Connect needs a hot-unplug handler to
> use "release" seamcall to unaccept private memory before unplug, that's it. But
> if the zapping S-EPT will not happen in host, I think this "release" seamcall is also
> unnecessary for TDX Connect.
>
> I also have a silly question which I looked over this thread and didn't find answer.
> Do we have to support private memory hotplug, what benefit we get to support it?
> If we only allow shared memory plug/unplug to TD, then we don't need this series.
> Guest decides to convert shared memory to private after plug and do the opposite before unplug.
> This works for both TDX connect and memory unplug as memory release is implicitly triggered
> in memory convert.

I did some successful experiments with modified QEMU & kernel, this
seems to work.

On virtio-mem plug, set_memory_encrypted() makes the memory private +
accepted. On unplug, make it return to shared with
set_memory_decrypted(). QEMU handles REQ_UNPLUG and can punch both
shared & guest_memfd planes (which will TDH.MEM.PAGE.REMOVE).
Re-plugging also works fine.

The virtio spec should probably be updated to explicitly define the
shared state on unplug and the private state on plug, driven by the
guest/driver. Those are KVM memory attributes, I suppose this is
generic enough.


^ permalink raw reply

* Re: [PATCH v2 00/16] fs,x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem
From: Moger, Babu @ 2026-04-09 17:19 UTC (permalink / raw)
  To: Reinette Chatre, Babu Moger, corbet@lwn.net, tony.luck@intel.com,
	Dave.Martin@arm.com, james.morse@arm.com, tglx@kernel.org,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com
  Cc: skhan@linuxfoundation.org, x86@kernel.org, hpa@zytor.com,
	peterz@infradead.org, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
	rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
	vschneid@redhat.com, kas@kernel.org, rick.p.edgecombe@intel.com,
	akpm@linux-foundation.org, pmladek@suse.com,
	rdunlap@infradead.org, dapeng1.mi@linux.intel.com,
	kees@kernel.org, elver@google.com, paulmck@kernel.org,
	lirongqing@baidu.com, safinaskar@gmail.com, fvdl@google.com,
	seanjc@google.com, pawan.kumar.gupta@linux.intel.com,
	xin@zytor.com, tiala@microsoft.com, chang.seok.bae@intel.com,
	Lendacky, Thomas, elena.reshetova@intel.com,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-coco@lists.linux.dev, kvm@vger.kernel.org,
	eranian@google.com, peternewman@google.com
In-Reply-To: <43880b7b-b390-4e7f-8c2a-46cde9e3b051@intel.com>

Hi Reinette,

On 4/8/2026 6:41 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 4/8/26 4:07 PM, Moger, Babu wrote:
>> On 4/8/2026 4:24 PM, Reinette Chatre wrote:
>>> On 4/8/26 1:45 PM, Babu Moger wrote:
> ...
> 
>>>> The modes "global_assign_ctrl_inherit_mon_per_cpu" and "global_assign_ctrl_assign_mon_per_cpu" represent the actual PLZA modes.
>>>>
>>>> Both of these modes introduce new files kernel_mode_cpus/ and kernel_mode_cpus_list in the resctrl group.
>>>
>>> Right. To be specific when the user changes the mode to either "global_assign_ctrl_inherit_mon_per_cpu" or
>>> "global_assign_ctrl_assign_mon_per_cpu" the new files will be created in the default resource group with
>>> associated setting applied globally at that time.
>>
>> If, at that point, "info/kernel_mode_assignment" points to // (the default group), is that correct?
> 
> I see "info/kernel_mode_assignment" pointing to default group as the only
> option right after a mode switch away from "inherit_ctrl_and_mon".
> 
> To elaborate, the current idea is that the mode within info/kernel_mode determines
> which, if any, control files are presented to user space.
> Assuming that the system boots up with:
> 	# cat info/kernel_mode
> 	[inherit_ctrl_and_mon]
> 	global_assign_ctrl_inherit_mon_per_cpu
> 	global_assign_ctrl_assign_mon_per_cpu
> 
> In above scenario "info/kernel_mode_assignment" does not exist (is not visible to
> user space).
> 
> When the user switches to either "global_assign_ctrl_inherit_mon_per_cpu" or
> 'global_assign_ctrl_assign_mon_per_cpu" then "info/kernel_mode_assignment" is created
> (or made visible to user space) and is expected to point to default group.
> User can change the group using "info/kernel_mode_assignment" at this point.
> 
> If the current scenario is below ...
> 	# cat info/kernel_mode
> 	[global_assign_ctrl_inherit_mon_per_cpu]
> 	inherit_ctrl_and_mon
> 	global_assign_ctrl_assign_mon_per_cpu
> 
> ... then "info/kernel_mode_assignment" will exist but what it should contain if
> user switches mode at this point may be up for discussion.
> 
> option 1)
> When user switches mode to "global_assign_ctrl_assign_mon_per_cpu" then
> the resource group in "info/kernel_mode_assignment" is reset to the
> default group and all CPUs PLZA state reset to match. The kernel_mode_cpus
> and kernel_mode_cpuslist files become visible in default resource group
> and they contain "all online CPUs".
> 
> option 2)
> When user switches mode to "global_assign_ctrl_assign_mon_per_cpu" then
> the resource group in "info/kernel_mode_assignment" is kept and all
> CPUs PLZA state set to match it while also keeping the current
> values of that resource group's kernel_mode_cpus and kernel_mode_cpuslist
> files.
> 
> I am leaning towards "option 1" to keep it consistent with a switch from
> "inherit_ctrl_and_mon" and being deterministic about how a mode is started with

Yes. The "option 1" seems appropriate.

> a clean slate. What are your thoughts? What would be use case where a user would
> want to switch between "global_assign_ctrl_inherit_mon_per_cpu" and
> "global_assign_ctrl_assign_mon_per_cpu" to just switch rmid_en on and off?


This is a bit tricky.

Currently, our requirement is to have a CTRL_MON group for 
global_assign_ctrl_inherit_mon_per_cpu. In this scenario, we use the 
group’s CLOSID for PLZA configuration, and RMID is not used (rmid_en = 
0) when setting up PLZA.

Our requirement is also to have a CTRL_MON/MON group for 
global_assign_ctrl_assign_mon_per_cpu. In this case as well, the group’s 
CLOSID and RMID (rmid_en = 1)  both are used configure PLZA.

Actually, we should not allow these changes from 
global_assign_ctrl_inherit_mon_per_cpu  to 
global_assign_ctrl_assign_mon_per_cpu or visa versa.

This seems restrictive.

> 
> 
>> And if "info/kernel_mode_assignment" points to a different group
>> (for example, test//), then the kernel_mode_cpus/ and
>> kernel_mode_cpus_list files will be created only under the test//
>> group. Is that correct?
> 
> I expect that if "info/kernel_mode_assignment" exists then the group
> listed within contains kernel_mode_cpus and kernel_mode_cpuslist.
> How the group ends up in "info/kernel_mode_assignment" could result
> from mode change or from write by user space.
> 
Ack.

Thanks
Babu>


^ permalink raw reply

* Re: [PATCH v2 00/16] fs,x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem
From: Reinette Chatre @ 2026-04-09 17:26 UTC (permalink / raw)
  To: Moger, Babu, Babu Moger, corbet@lwn.net, tony.luck@intel.com,
	Dave.Martin@arm.com, james.morse@arm.com, tglx@kernel.org,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com
  Cc: skhan@linuxfoundation.org, x86@kernel.org, hpa@zytor.com,
	peterz@infradead.org, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
	rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
	vschneid@redhat.com, kas@kernel.org, rick.p.edgecombe@intel.com,
	akpm@linux-foundation.org, pmladek@suse.com,
	rdunlap@infradead.org, dapeng1.mi@linux.intel.com,
	kees@kernel.org, elver@google.com, paulmck@kernel.org,
	lirongqing@baidu.com, safinaskar@gmail.com, fvdl@google.com,
	seanjc@google.com, pawan.kumar.gupta@linux.intel.com,
	xin@zytor.com, tiala@microsoft.com, chang.seok.bae@intel.com,
	Lendacky, Thomas, elena.reshetova@intel.com,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-coco@lists.linux.dev, kvm@vger.kernel.org,
	eranian@google.com, peternewman@google.com
In-Reply-To: <bb9f62f1-0c79-4d29-9866-c39d08c3a774@amd.com>


Hi Babu,

On 4/9/26 10:19 AM, Moger, Babu wrote:
> On 4/8/2026 6:41 PM, Reinette Chatre wrote:

>> When the user switches to either "global_assign_ctrl_inherit_mon_per_cpu" or
>> 'global_assign_ctrl_assign_mon_per_cpu" then "info/kernel_mode_assignment" is created
>> (or made visible to user space) and is expected to point to default group.
>> User can change the group using "info/kernel_mode_assignment" at this point.
>>
>> If the current scenario is below ...
>>     # cat info/kernel_mode
>>     [global_assign_ctrl_inherit_mon_per_cpu]
>>     inherit_ctrl_and_mon
>>     global_assign_ctrl_assign_mon_per_cpu
>>
>> ... then "info/kernel_mode_assignment" will exist but what it should contain if
>> user switches mode at this point may be up for discussion.
>>
>> option 1)
>> When user switches mode to "global_assign_ctrl_assign_mon_per_cpu" then
>> the resource group in "info/kernel_mode_assignment" is reset to the
>> default group and all CPUs PLZA state reset to match. The kernel_mode_cpus
>> and kernel_mode_cpuslist files become visible in default resource group
>> and they contain "all online CPUs".
>>
>> option 2)
>> When user switches mode to "global_assign_ctrl_assign_mon_per_cpu" then
>> the resource group in "info/kernel_mode_assignment" is kept and all
>> CPUs PLZA state set to match it while also keeping the current
>> values of that resource group's kernel_mode_cpus and kernel_mode_cpuslist
>> files.
>>
>> I am leaning towards "option 1" to keep it consistent with a switch from
>> "inherit_ctrl_and_mon" and being deterministic about how a mode is started with
> 
> Yes. The "option 1" seems appropriate.
> 
>> a clean slate. What are your thoughts? What would be use case where a user would
>> want to switch between "global_assign_ctrl_inherit_mon_per_cpu" and
>> "global_assign_ctrl_assign_mon_per_cpu" to just switch rmid_en on and off?
> 
> 
> This is a bit tricky.
> 
> Currently, our requirement is to have a CTRL_MON group for
> global_assign_ctrl_inherit_mon_per_cpu. In this scenario, we use the
> group’s CLOSID for PLZA configuration, and RMID is not used (rmid_en
> = 0) when setting up PLZA.
> 
> Our requirement is also to have a CTRL_MON/MON group for
> global_assign_ctrl_assign_mon_per_cpu. In this case as well, the
> group’s CLOSID and RMID (rmid_en = 1)  both are used configure PLZA.

ah, right. Good catch.

> 
> Actually, we should not allow these changes from
> global_assign_ctrl_inherit_mon_per_cpu  to
> global_assign_ctrl_assign_mon_per_cpu or visa versa.

resctrl could allow it but as part of the switch it resets the "kernel mode group" to
be the default group every time? This would be the "option 1" above.

Reinette


^ permalink raw reply

* Re: [PATCH v2 00/16] fs,x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem
From: Moger, Babu @ 2026-04-09 18:05 UTC (permalink / raw)
  To: Reinette Chatre, Babu Moger, corbet@lwn.net, tony.luck@intel.com,
	Dave.Martin@arm.com, james.morse@arm.com, tglx@kernel.org,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com
  Cc: skhan@linuxfoundation.org, x86@kernel.org, hpa@zytor.com,
	peterz@infradead.org, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
	rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
	vschneid@redhat.com, kas@kernel.org, rick.p.edgecombe@intel.com,
	akpm@linux-foundation.org, pmladek@suse.com,
	rdunlap@infradead.org, dapeng1.mi@linux.intel.com,
	kees@kernel.org, elver@google.com, paulmck@kernel.org,
	lirongqing@baidu.com, safinaskar@gmail.com, fvdl@google.com,
	seanjc@google.com, pawan.kumar.gupta@linux.intel.com,
	xin@zytor.com, tiala@microsoft.com, chang.seok.bae@intel.com,
	Lendacky, Thomas, elena.reshetova@intel.com,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-coco@lists.linux.dev, kvm@vger.kernel.org,
	eranian@google.com, peternewman@google.com
In-Reply-To: <5a273b0f-8225-4e9e-924e-884183734659@intel.com>

Hi Reinette,

On 4/9/2026 12:26 PM, Reinette Chatre wrote:
> 
> Hi Babu,
> 
> On 4/9/26 10:19 AM, Moger, Babu wrote:
>> On 4/8/2026 6:41 PM, Reinette Chatre wrote:
> 
>>> When the user switches to either "global_assign_ctrl_inherit_mon_per_cpu" or
>>> 'global_assign_ctrl_assign_mon_per_cpu" then "info/kernel_mode_assignment" is created
>>> (or made visible to user space) and is expected to point to default group.
>>> User can change the group using "info/kernel_mode_assignment" at this point.
>>>
>>> If the current scenario is below ...
>>>      # cat info/kernel_mode
>>>      [global_assign_ctrl_inherit_mon_per_cpu]
>>>      inherit_ctrl_and_mon
>>>      global_assign_ctrl_assign_mon_per_cpu
>>>
>>> ... then "info/kernel_mode_assignment" will exist but what it should contain if
>>> user switches mode at this point may be up for discussion.
>>>
>>> option 1)
>>> When user switches mode to "global_assign_ctrl_assign_mon_per_cpu" then
>>> the resource group in "info/kernel_mode_assignment" is reset to the
>>> default group and all CPUs PLZA state reset to match. The kernel_mode_cpus
>>> and kernel_mode_cpuslist files become visible in default resource group
>>> and they contain "all online CPUs".
>>>
>>> option 2)
>>> When user switches mode to "global_assign_ctrl_assign_mon_per_cpu" then
>>> the resource group in "info/kernel_mode_assignment" is kept and all
>>> CPUs PLZA state set to match it while also keeping the current
>>> values of that resource group's kernel_mode_cpus and kernel_mode_cpuslist
>>> files.
>>>
>>> I am leaning towards "option 1" to keep it consistent with a switch from
>>> "inherit_ctrl_and_mon" and being deterministic about how a mode is started with
>>
>> Yes. The "option 1" seems appropriate.
>>
>>> a clean slate. What are your thoughts? What would be use case where a user would
>>> want to switch between "global_assign_ctrl_inherit_mon_per_cpu" and
>>> "global_assign_ctrl_assign_mon_per_cpu" to just switch rmid_en on and off?
>>
>>
>> This is a bit tricky.
>>
>> Currently, our requirement is to have a CTRL_MON group for
>> global_assign_ctrl_inherit_mon_per_cpu. In this scenario, we use the
>> group’s CLOSID for PLZA configuration, and RMID is not used (rmid_en
>> = 0) when setting up PLZA.
>>
>> Our requirement is also to have a CTRL_MON/MON group for
>> global_assign_ctrl_assign_mon_per_cpu. In this case as well, the
>> group’s CLOSID and RMID (rmid_en = 1)  both are used configure PLZA.
> 
> ah, right. Good catch.
> 
>>
>> Actually, we should not allow these changes from
>> global_assign_ctrl_inherit_mon_per_cpu  to
>> global_assign_ctrl_assign_mon_per_cpu or visa versa.
> 
> resctrl could allow it but as part of the switch it resets the "kernel mode group" to
> be the default group every time? This would be the "option 1" above.

Other options.

Allow global_assign_ctrl_inherit_mon_per_cpu -> 
global_assign_ctrl_assign_mon_per_cpu. As part of the switch, reset the 
"kernel mode group" to the default group.

Allow global_assign_ctrl_assign_mon_per_cpu -> 
global_assign_ctrl_inherit_mon_per_cpu. In this case switch
to CTRL_MON/MON -> CTRL_MON.

Thanks
Babu



> 
> Reinette
> 
> 


^ permalink raw reply

* Re: [PATCH v2 00/16] fs,x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem
From: Reinette Chatre @ 2026-04-09 20:50 UTC (permalink / raw)
  To: Moger, Babu, Babu Moger, corbet@lwn.net, tony.luck@intel.com,
	Dave.Martin@arm.com, james.morse@arm.com, tglx@kernel.org,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com
  Cc: skhan@linuxfoundation.org, x86@kernel.org, hpa@zytor.com,
	peterz@infradead.org, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
	rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
	vschneid@redhat.com, kas@kernel.org, rick.p.edgecombe@intel.com,
	akpm@linux-foundation.org, pmladek@suse.com,
	rdunlap@infradead.org, dapeng1.mi@linux.intel.com,
	kees@kernel.org, elver@google.com, paulmck@kernel.org,
	lirongqing@baidu.com, safinaskar@gmail.com, fvdl@google.com,
	seanjc@google.com, pawan.kumar.gupta@linux.intel.com,
	xin@zytor.com, tiala@microsoft.com, chang.seok.bae@intel.com,
	Lendacky, Thomas, elena.reshetova@intel.com,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-coco@lists.linux.dev, kvm@vger.kernel.org,
	eranian@google.com, peternewman@google.com
In-Reply-To: <73c46024-4cf2-4f03-9268-d4378825fa87@amd.com>

Hi Babu,

On 4/9/26 11:05 AM, Moger, Babu wrote:
> On 4/9/2026 12:26 PM, Reinette Chatre wrote:
>> On 4/9/26 10:19 AM, Moger, Babu wrote:
>>> On 4/8/2026 6:41 PM, Reinette Chatre wrote:
>>
>>>> When the user switches to either "global_assign_ctrl_inherit_mon_per_cpu" or
>>>> 'global_assign_ctrl_assign_mon_per_cpu" then "info/kernel_mode_assignment" is created
>>>> (or made visible to user space) and is expected to point to default group.
>>>> User can change the group using "info/kernel_mode_assignment" at this point.
>>>>
>>>> If the current scenario is below ...
>>>>      # cat info/kernel_mode
>>>>      [global_assign_ctrl_inherit_mon_per_cpu]
>>>>      inherit_ctrl_and_mon
>>>>      global_assign_ctrl_assign_mon_per_cpu
>>>>
>>>> ... then "info/kernel_mode_assignment" will exist but what it should contain if
>>>> user switches mode at this point may be up for discussion.
>>>>
>>>> option 1)
>>>> When user switches mode to "global_assign_ctrl_assign_mon_per_cpu" then
>>>> the resource group in "info/kernel_mode_assignment" is reset to the
>>>> default group and all CPUs PLZA state reset to match. The kernel_mode_cpus
>>>> and kernel_mode_cpuslist files become visible in default resource group
>>>> and they contain "all online CPUs".
>>>>
>>>> option 2)
>>>> When user switches mode to "global_assign_ctrl_assign_mon_per_cpu" then
>>>> the resource group in "info/kernel_mode_assignment" is kept and all
>>>> CPUs PLZA state set to match it while also keeping the current
>>>> values of that resource group's kernel_mode_cpus and kernel_mode_cpuslist
>>>> files.
>>>>
>>>> I am leaning towards "option 1" to keep it consistent with a switch from
>>>> "inherit_ctrl_and_mon" and being deterministic about how a mode is started with
>>>
>>> Yes. The "option 1" seems appropriate.
>>>
>>>> a clean slate. What are your thoughts? What would be use case where a user would
>>>> want to switch between "global_assign_ctrl_inherit_mon_per_cpu" and
>>>> "global_assign_ctrl_assign_mon_per_cpu" to just switch rmid_en on and off?
>>>
>>>
>>> This is a bit tricky.
>>>
>>> Currently, our requirement is to have a CTRL_MON group for
>>> global_assign_ctrl_inherit_mon_per_cpu. In this scenario, we use the
>>> group’s CLOSID for PLZA configuration, and RMID is not used (rmid_en
>>> = 0) when setting up PLZA.
>>>
>>> Our requirement is also to have a CTRL_MON/MON group for
>>> global_assign_ctrl_assign_mon_per_cpu. In this case as well, the
>>> group’s CLOSID and RMID (rmid_en = 1)  both are used configure PLZA.
>>
>> ah, right. Good catch.
>>
>>>
>>> Actually, we should not allow these changes from
>>> global_assign_ctrl_inherit_mon_per_cpu  to
>>> global_assign_ctrl_assign_mon_per_cpu or visa versa.
>>
>> resctrl could allow it but as part of the switch it resets the "kernel mode group" to
>> be the default group every time? This would be the "option 1" above.
> 
> Other options.
> 
> Allow global_assign_ctrl_inherit_mon_per_cpu -> global_assign_ctrl_assign_mon_per_cpu. As part of the switch, reset the "kernel mode group" to the default group.
> 
> Allow global_assign_ctrl_assign_mon_per_cpu -> global_assign_ctrl_inherit_mon_per_cpu. In this case switch
> to CTRL_MON/MON -> CTRL_MON.
> 

ok. Could you please return the courtesy of providing feedback on the
suggestion you are responding to and also include the motivation why your
suggestion is the better option? 

Reinette

^ permalink raw reply

* [PATCH v2 0/6] KVM: x86: Reg cleanups / prep work for APX
From: Sean Christopherson @ 2026-04-09 22:42 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Kiryl Shutsemau
  Cc: kvm, x86, linux-coco, linux-kernel, Chang S . Bae

Clean up KVM's register tracking and storage, primarily to prepare for landing
APX, which expands the maximum number of GPRs from 16 to 32.

v2:
 - Call out the RIP is effectively an "EX" reg too (in patch 2). [Paolo]
 - Rework the available/dirty APIs to have an explicit "clear" operation
   for available, and only a full "reset" for dirty. [Yosry, Paolo]

v1: https://lore.kernel.org/all/20260311003346.2626238-1-seanjc@google.com

Sean Christopherson (6):
  KVM: x86: Add dedicated storage for guest RIP
  KVM: x86: Drop the "EX" part of "EXREG" to avoid collision with APX
  KVM: nVMX: Do a bitwise-AND of regs_avail when switching active VMCS
  KVM: x86: Add wrapper APIs to reset dirty/available register masks
  KVM: x86: Track available/dirty register masks as "unsigned long"
    values
  KVM: x86: Use a proper bitmap for tracking available/dirty registers

 arch/x86/include/asm/kvm_host.h | 32 +++++++++--------
 arch/x86/kvm/kvm_cache_regs.h   | 62 +++++++++++++++++++++++----------
 arch/x86/kvm/svm/sev.c          |  2 +-
 arch/x86/kvm/svm/svm.c          | 16 ++++-----
 arch/x86/kvm/svm/svm.h          |  2 +-
 arch/x86/kvm/vmx/nested.c       | 10 +++---
 arch/x86/kvm/vmx/tdx.c          | 36 +++++++++----------
 arch/x86/kvm/vmx/vmx.c          | 52 +++++++++++++--------------
 arch/x86/kvm/vmx/vmx.h          | 24 ++++++-------
 arch/x86/kvm/x86.c              | 20 +++++------
 10 files changed, 143 insertions(+), 113 deletions(-)


base-commit: b89df297a47e641581ee67793592e5c6ae0428f4
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply

* [PATCH v2 1/6] KVM: x86: Add dedicated storage for guest RIP
From: Sean Christopherson @ 2026-04-09 22:42 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Kiryl Shutsemau
  Cc: kvm, x86, linux-coco, linux-kernel, Chang S . Bae
In-Reply-To: <20260409224236.2021562-1-seanjc@google.com>

Add kvm_vcpu_arch.rip to track guest RIP instead of including it in the
generic regs[] array.  Decoupling RIP from regs[] will allow using a
*completely* arbitrary index for RIP, as opposed to the mostly-arbitrary
index that is currently used.  That in turn will allow using indices
16-31 to track R16-R31 that are coming with APX.

Note, although RIP can used for addressing, it does NOT have an
architecturally defined index, and so can't be reached via flows like
get_vmx_mem_address() where KVM "blindly" reads a general purpose register
given the SIB information reported by hardware.  For RIP-relative
addressing, hardware reports the full "offset" in vmcs.EXIT_QUALIFICATION.

Note #2, keep the available/dirty tracking as RSP is context switched
through the VMCS, i.e. needs to be cached for VMX.

Opportunistically rename NR_VCPU_REGS to NR_VCPU_GENERAL_PURPOSE_REGS to
better capture what it tracks, and so that KVM can slot in R16-R13 without
running into weirdness where KVM's definition of "EXREG" doesn't line up
with APX's definition of "extended reg".

No functional change intended.

Cc: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h | 10 ++++++----
 arch/x86/kvm/kvm_cache_regs.h   | 12 ++++++++----
 arch/x86/kvm/svm/sev.c          |  2 +-
 arch/x86/kvm/svm/svm.c          |  6 +++---
 arch/x86/kvm/vmx/vmx.c          |  8 ++++----
 arch/x86/kvm/vmx/vmx.h          |  2 +-
 6 files changed, 23 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c470e40a00aa..68a11325e8bc 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -191,10 +191,11 @@ enum kvm_reg {
 	VCPU_REGS_R14 = __VCPU_REGS_R14,
 	VCPU_REGS_R15 = __VCPU_REGS_R15,
 #endif
-	VCPU_REGS_RIP,
-	NR_VCPU_REGS,
+	NR_VCPU_GENERAL_PURPOSE_REGS,
 
-	VCPU_EXREG_PDPTR = NR_VCPU_REGS,
+	VCPU_REG_RIP = NR_VCPU_GENERAL_PURPOSE_REGS,
+
+	VCPU_EXREG_PDPTR,
 	VCPU_EXREG_CR0,
 	/*
 	 * Alias AMD's ERAPS (not a real register) to CR3 so that common code
@@ -799,7 +800,8 @@ struct kvm_vcpu_arch {
 	 * rip and regs accesses must go through
 	 * kvm_{register,rip}_{read,write} functions.
 	 */
-	unsigned long regs[NR_VCPU_REGS];
+	unsigned long regs[NR_VCPU_GENERAL_PURPOSE_REGS];
+	unsigned long rip;
 	u32 regs_avail;
 	u32 regs_dirty;
 
diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index 8ddb01191d6f..9b7df9de0e87 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -112,7 +112,7 @@ static __always_inline bool kvm_register_test_and_mark_available(struct kvm_vcpu
  */
 static inline unsigned long kvm_register_read_raw(struct kvm_vcpu *vcpu, int reg)
 {
-	if (WARN_ON_ONCE((unsigned int)reg >= NR_VCPU_REGS))
+	if (WARN_ON_ONCE((unsigned int)reg >= NR_VCPU_GENERAL_PURPOSE_REGS))
 		return 0;
 
 	if (!kvm_register_is_available(vcpu, reg))
@@ -124,7 +124,7 @@ static inline unsigned long kvm_register_read_raw(struct kvm_vcpu *vcpu, int reg
 static inline void kvm_register_write_raw(struct kvm_vcpu *vcpu, int reg,
 					  unsigned long val)
 {
-	if (WARN_ON_ONCE((unsigned int)reg >= NR_VCPU_REGS))
+	if (WARN_ON_ONCE((unsigned int)reg >= NR_VCPU_GENERAL_PURPOSE_REGS))
 		return;
 
 	vcpu->arch.regs[reg] = val;
@@ -133,12 +133,16 @@ static inline void kvm_register_write_raw(struct kvm_vcpu *vcpu, int reg,
 
 static inline unsigned long kvm_rip_read(struct kvm_vcpu *vcpu)
 {
-	return kvm_register_read_raw(vcpu, VCPU_REGS_RIP);
+	if (!kvm_register_is_available(vcpu, VCPU_REG_RIP))
+		kvm_x86_call(cache_reg)(vcpu, VCPU_REG_RIP);
+
+	return vcpu->arch.rip;
 }
 
 static inline void kvm_rip_write(struct kvm_vcpu *vcpu, unsigned long val)
 {
-	kvm_register_write_raw(vcpu, VCPU_REGS_RIP, val);
+	vcpu->arch.rip = val;
+	kvm_register_mark_dirty(vcpu, VCPU_REG_RIP);
 }
 
 static inline unsigned long kvm_rsp_read(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 75d0c03d69bc..2010b157e288 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -967,7 +967,7 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
 	save->r14 = svm->vcpu.arch.regs[VCPU_REGS_R14];
 	save->r15 = svm->vcpu.arch.regs[VCPU_REGS_R15];
 #endif
-	save->rip = svm->vcpu.arch.regs[VCPU_REGS_RIP];
+	save->rip = svm->vcpu.arch.rip;
 
 	/* Sync some non-GPR registers before encrypting */
 	save->xcr0 = svm->vcpu.arch.xcr0;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e7fdd7a9c280..85edaee27b03 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4420,7 +4420,7 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
 
 	svm->vmcb->save.rax = vcpu->arch.regs[VCPU_REGS_RAX];
 	svm->vmcb->save.rsp = vcpu->arch.regs[VCPU_REGS_RSP];
-	svm->vmcb->save.rip = vcpu->arch.regs[VCPU_REGS_RIP];
+	svm->vmcb->save.rip = vcpu->arch.rip;
 
 	/*
 	 * Disable singlestep if we're injecting an interrupt/exception.
@@ -4506,7 +4506,7 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
 		vcpu->arch.cr2 = svm->vmcb->save.cr2;
 		vcpu->arch.regs[VCPU_REGS_RAX] = svm->vmcb->save.rax;
 		vcpu->arch.regs[VCPU_REGS_RSP] = svm->vmcb->save.rsp;
-		vcpu->arch.regs[VCPU_REGS_RIP] = svm->vmcb->save.rip;
+		vcpu->arch.rip = svm->vmcb->save.rip;
 	}
 	vcpu->arch.regs_dirty = 0;
 
@@ -4946,7 +4946,7 @@ static int svm_enter_smm(struct kvm_vcpu *vcpu, union kvm_smram *smram)
 
 	svm->vmcb->save.rax = vcpu->arch.regs[VCPU_REGS_RAX];
 	svm->vmcb->save.rsp = vcpu->arch.regs[VCPU_REGS_RSP];
-	svm->vmcb->save.rip = vcpu->arch.regs[VCPU_REGS_RIP];
+	svm->vmcb->save.rip = vcpu->arch.rip;
 
 	nested_svm_simple_vmexit(svm, SVM_EXIT_SW);
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index a29896a9ef14..577b0c6286ad 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2604,8 +2604,8 @@ void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
 	case VCPU_REGS_RSP:
 		vcpu->arch.regs[VCPU_REGS_RSP] = vmcs_readl(GUEST_RSP);
 		break;
-	case VCPU_REGS_RIP:
-		vcpu->arch.regs[VCPU_REGS_RIP] = vmcs_readl(GUEST_RIP);
+	case VCPU_REG_RIP:
+		vcpu->arch.rip = vmcs_readl(GUEST_RIP);
 		break;
 	case VCPU_EXREG_PDPTR:
 		if (enable_ept)
@@ -7536,8 +7536,8 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
 
 	if (kvm_register_is_dirty(vcpu, VCPU_REGS_RSP))
 		vmcs_writel(GUEST_RSP, vcpu->arch.regs[VCPU_REGS_RSP]);
-	if (kvm_register_is_dirty(vcpu, VCPU_REGS_RIP))
-		vmcs_writel(GUEST_RIP, vcpu->arch.regs[VCPU_REGS_RIP]);
+	if (kvm_register_is_dirty(vcpu, VCPU_REG_RIP))
+		vmcs_writel(GUEST_RIP, vcpu->arch.rip);
 	vcpu->arch.regs_dirty = 0;
 
 	if (run_flags & KVM_RUN_LOAD_GUEST_DR6)
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index db84e8001da5..d0cc5f6c6879 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -620,7 +620,7 @@ BUILD_CONTROLS_SHADOW(tertiary_exec, TERTIARY_VM_EXEC_CONTROL, 64)
  * cache on demand.  Other registers not listed here are synced to
  * the cache immediately after VM-Exit.
  */
-#define VMX_REGS_LAZY_LOAD_SET	((1 << VCPU_REGS_RIP) |         \
+#define VMX_REGS_LAZY_LOAD_SET	((1 << VCPU_REG_RIP) |         \
 				(1 << VCPU_REGS_RSP) |          \
 				(1 << VCPU_EXREG_RFLAGS) |      \
 				(1 << VCPU_EXREG_PDPTR) |       \
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related

* [PATCH v2 2/6] KVM: x86: Drop the "EX" part of "EXREG" to avoid collision with APX
From: Sean Christopherson @ 2026-04-09 22:42 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Kiryl Shutsemau
  Cc: kvm, x86, linux-coco, linux-kernel, Chang S . Bae
In-Reply-To: <20260409224236.2021562-1-seanjc@google.com>

Now that NR_VCPU_REGS is no longer a thing, and now that now that RIP is
effectively an EXREG, drop the "EX" is for extended (or maybe extra?")
prefix from non-GPR registers to avoid a collision with APX (Advanced
Performance Extensions), which adds:

  16 additional general-purpose registers (GPRs) R16–R31, also referred
  to as Extended GPRs (EGPRs)  in this document;

I.e. KVM's version of "extended" won't match with APX's definition.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h | 18 +++++++--------
 arch/x86/kvm/kvm_cache_regs.h   | 16 ++++++-------
 arch/x86/kvm/svm/svm.c          |  6 ++---
 arch/x86/kvm/svm/svm.h          |  2 +-
 arch/x86/kvm/vmx/nested.c       |  6 ++---
 arch/x86/kvm/vmx/tdx.c          |  4 ++--
 arch/x86/kvm/vmx/vmx.c          | 40 ++++++++++++++++-----------------
 arch/x86/kvm/vmx/vmx.h          | 20 ++++++++---------
 arch/x86/kvm/x86.c              | 16 ++++++-------
 9 files changed, 64 insertions(+), 64 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 68a11325e8bc..b1eae1e7b04f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -195,8 +195,8 @@ enum kvm_reg {
 
 	VCPU_REG_RIP = NR_VCPU_GENERAL_PURPOSE_REGS,
 
-	VCPU_EXREG_PDPTR,
-	VCPU_EXREG_CR0,
+	VCPU_REG_PDPTR,
+	VCPU_REG_CR0,
 	/*
 	 * Alias AMD's ERAPS (not a real register) to CR3 so that common code
 	 * can trigger emulation of the RAP (Return Address Predictor) with
@@ -204,13 +204,13 @@ enum kvm_reg {
 	 * is cleared on writes to CR3, i.e. marking CR3 dirty will naturally
 	 * mark ERAPS dirty as well.
 	 */
-	VCPU_EXREG_CR3,
-	VCPU_EXREG_ERAPS = VCPU_EXREG_CR3,
-	VCPU_EXREG_CR4,
-	VCPU_EXREG_RFLAGS,
-	VCPU_EXREG_SEGMENTS,
-	VCPU_EXREG_EXIT_INFO_1,
-	VCPU_EXREG_EXIT_INFO_2,
+	VCPU_REG_CR3,
+	VCPU_REG_ERAPS = VCPU_REG_CR3,
+	VCPU_REG_CR4,
+	VCPU_REG_RFLAGS,
+	VCPU_REG_SEGMENTS,
+	VCPU_REG_EXIT_INFO_1,
+	VCPU_REG_EXIT_INFO_2,
 };
 
 enum {
diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index 9b7df9de0e87..ac1f9867a234 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -159,8 +159,8 @@ static inline u64 kvm_pdptr_read(struct kvm_vcpu *vcpu, int index)
 {
 	might_sleep();  /* on svm */
 
-	if (!kvm_register_is_available(vcpu, VCPU_EXREG_PDPTR))
-		kvm_x86_call(cache_reg)(vcpu, VCPU_EXREG_PDPTR);
+	if (!kvm_register_is_available(vcpu, VCPU_REG_PDPTR))
+		kvm_x86_call(cache_reg)(vcpu, VCPU_REG_PDPTR);
 
 	return vcpu->arch.walk_mmu->pdptrs[index];
 }
@@ -174,8 +174,8 @@ static inline ulong kvm_read_cr0_bits(struct kvm_vcpu *vcpu, ulong mask)
 {
 	ulong tmask = mask & KVM_POSSIBLE_CR0_GUEST_BITS;
 	if ((tmask & vcpu->arch.cr0_guest_owned_bits) &&
-	    !kvm_register_is_available(vcpu, VCPU_EXREG_CR0))
-		kvm_x86_call(cache_reg)(vcpu, VCPU_EXREG_CR0);
+	    !kvm_register_is_available(vcpu, VCPU_REG_CR0))
+		kvm_x86_call(cache_reg)(vcpu, VCPU_REG_CR0);
 	return vcpu->arch.cr0 & mask;
 }
 
@@ -196,8 +196,8 @@ static inline ulong kvm_read_cr4_bits(struct kvm_vcpu *vcpu, ulong mask)
 {
 	ulong tmask = mask & KVM_POSSIBLE_CR4_GUEST_BITS;
 	if ((tmask & vcpu->arch.cr4_guest_owned_bits) &&
-	    !kvm_register_is_available(vcpu, VCPU_EXREG_CR4))
-		kvm_x86_call(cache_reg)(vcpu, VCPU_EXREG_CR4);
+	    !kvm_register_is_available(vcpu, VCPU_REG_CR4))
+		kvm_x86_call(cache_reg)(vcpu, VCPU_REG_CR4);
 	return vcpu->arch.cr4 & mask;
 }
 
@@ -211,8 +211,8 @@ static __always_inline bool kvm_is_cr4_bit_set(struct kvm_vcpu *vcpu,
 
 static inline ulong kvm_read_cr3(struct kvm_vcpu *vcpu)
 {
-	if (!kvm_register_is_available(vcpu, VCPU_EXREG_CR3))
-		kvm_x86_call(cache_reg)(vcpu, VCPU_EXREG_CR3);
+	if (!kvm_register_is_available(vcpu, VCPU_REG_CR3))
+		kvm_x86_call(cache_reg)(vcpu, VCPU_REG_CR3);
 	return vcpu->arch.cr3;
 }
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 85edaee27b03..ee5749d8b3e8 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1517,7 +1517,7 @@ static void svm_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
 	kvm_register_mark_available(vcpu, reg);
 
 	switch (reg) {
-	case VCPU_EXREG_PDPTR:
+	case VCPU_REG_PDPTR:
 		/*
 		 * When !npt_enabled, mmu->pdptrs[] is already available since
 		 * it is always updated per SDM when moving to CRs.
@@ -4179,7 +4179,7 @@ static void svm_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t gva)
 
 static void svm_flush_tlb_guest(struct kvm_vcpu *vcpu)
 {
-	kvm_register_mark_dirty(vcpu, VCPU_EXREG_ERAPS);
+	kvm_register_mark_dirty(vcpu, VCPU_REG_ERAPS);
 
 	svm_flush_tlb_asid(vcpu);
 }
@@ -4457,7 +4457,7 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
 	svm->vmcb->save.cr2 = vcpu->arch.cr2;
 
 	if (guest_cpu_cap_has(vcpu, X86_FEATURE_ERAPS) &&
-	    kvm_register_is_dirty(vcpu, VCPU_EXREG_ERAPS))
+	    kvm_register_is_dirty(vcpu, VCPU_REG_ERAPS))
 		svm->vmcb->control.erap_ctl |= ERAP_CONTROL_CLEAR_RAP;
 
 	svm_fixup_nested_rips(vcpu);
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index fd0652b32c81..677d268ae9c7 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -474,7 +474,7 @@ static inline bool svm_is_vmrun_failure(u64 exit_code)
  * KVM_REQ_LOAD_MMU_PGD is always requested when the cached vcpu->arch.cr3
  * is changed.  svm_load_mmu_pgd() then syncs the new CR3 value into the VMCB.
  */
-#define SVM_REGS_LAZY_LOAD_SET	(1 << VCPU_EXREG_PDPTR)
+#define SVM_REGS_LAZY_LOAD_SET	(1 << VCPU_REG_PDPTR)
 
 static inline void __vmcb_set_intercept(unsigned long *intercepts, u32 bit)
 {
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 3fe88f29be7a..22b1f06a9d40 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1189,7 +1189,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3,
 	}
 
 	vcpu->arch.cr3 = cr3;
-	kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
+	kvm_register_mark_dirty(vcpu, VCPU_REG_CR3);
 
 	/* Re-initialize the MMU, e.g. to pick up CR4 MMU role changes. */
 	kvm_init_mmu(vcpu);
@@ -4972,7 +4972,7 @@ static void nested_vmx_restore_host_state(struct kvm_vcpu *vcpu)
 
 	nested_ept_uninit_mmu_context(vcpu);
 	vcpu->arch.cr3 = vmcs_readl(GUEST_CR3);
-	kvm_register_mark_available(vcpu, VCPU_EXREG_CR3);
+	kvm_register_mark_available(vcpu, VCPU_REG_CR3);
 
 	/*
 	 * Use ept_save_pdptrs(vcpu) to load the MMU's cached PDPTRs
@@ -5074,7 +5074,7 @@ void __nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
 	kvm_service_local_tlb_flush_requests(vcpu);
 
 	/*
-	 * VCPU_EXREG_PDPTR will be clobbered in arch/x86/kvm/vmx/vmx.h between
+	 * VCPU_REG_PDPTR will be clobbered in arch/x86/kvm/vmx/vmx.h between
 	 * now and the new vmentry.  Ensure that the VMCS02 PDPTR fields are
 	 * up-to-date before switching to L1.
 	 */
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 1e47c194af53..c23ec4ac8bc8 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1013,8 +1013,8 @@ static fastpath_t tdx_exit_handlers_fastpath(struct kvm_vcpu *vcpu)
 	return EXIT_FASTPATH_NONE;
 }
 
-#define TDX_REGS_AVAIL_SET	(BIT_ULL(VCPU_EXREG_EXIT_INFO_1) | \
-				 BIT_ULL(VCPU_EXREG_EXIT_INFO_2) | \
+#define TDX_REGS_AVAIL_SET	(BIT_ULL(VCPU_REG_EXIT_INFO_1) | \
+				 BIT_ULL(VCPU_REG_EXIT_INFO_2) | \
 				 BIT_ULL(VCPU_REGS_RAX) | \
 				 BIT_ULL(VCPU_REGS_RBX) | \
 				 BIT_ULL(VCPU_REGS_RCX) | \
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 577b0c6286ad..aa1c26018439 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -843,8 +843,8 @@ static bool vmx_segment_cache_test_set(struct vcpu_vmx *vmx, unsigned seg,
 	bool ret;
 	u32 mask = 1 << (seg * SEG_FIELD_NR + field);
 
-	if (!kvm_register_is_available(&vmx->vcpu, VCPU_EXREG_SEGMENTS)) {
-		kvm_register_mark_available(&vmx->vcpu, VCPU_EXREG_SEGMENTS);
+	if (!kvm_register_is_available(&vmx->vcpu, VCPU_REG_SEGMENTS)) {
+		kvm_register_mark_available(&vmx->vcpu, VCPU_REG_SEGMENTS);
 		vmx->segment_cache.bitmask = 0;
 	}
 	ret = vmx->segment_cache.bitmask & mask;
@@ -1609,8 +1609,8 @@ unsigned long vmx_get_rflags(struct kvm_vcpu *vcpu)
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	unsigned long rflags, save_rflags;
 
-	if (!kvm_register_is_available(vcpu, VCPU_EXREG_RFLAGS)) {
-		kvm_register_mark_available(vcpu, VCPU_EXREG_RFLAGS);
+	if (!kvm_register_is_available(vcpu, VCPU_REG_RFLAGS)) {
+		kvm_register_mark_available(vcpu, VCPU_REG_RFLAGS);
 		rflags = vmcs_readl(GUEST_RFLAGS);
 		if (vmx->rmode.vm86_active) {
 			rflags &= RMODE_GUEST_OWNED_EFLAGS_BITS;
@@ -1633,7 +1633,7 @@ void vmx_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
 	 * if L1 runs L2 as a restricted guest.
 	 */
 	if (is_unrestricted_guest(vcpu)) {
-		kvm_register_mark_available(vcpu, VCPU_EXREG_RFLAGS);
+		kvm_register_mark_available(vcpu, VCPU_REG_RFLAGS);
 		vmx->rflags = rflags;
 		vmcs_writel(GUEST_RFLAGS, rflags);
 		return;
@@ -2607,17 +2607,17 @@ void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
 	case VCPU_REG_RIP:
 		vcpu->arch.rip = vmcs_readl(GUEST_RIP);
 		break;
-	case VCPU_EXREG_PDPTR:
+	case VCPU_REG_PDPTR:
 		if (enable_ept)
 			ept_save_pdptrs(vcpu);
 		break;
-	case VCPU_EXREG_CR0:
+	case VCPU_REG_CR0:
 		guest_owned_bits = vcpu->arch.cr0_guest_owned_bits;
 
 		vcpu->arch.cr0 &= ~guest_owned_bits;
 		vcpu->arch.cr0 |= vmcs_readl(GUEST_CR0) & guest_owned_bits;
 		break;
-	case VCPU_EXREG_CR3:
+	case VCPU_REG_CR3:
 		/*
 		 * When intercepting CR3 loads, e.g. for shadowing paging, KVM's
 		 * CR3 is loaded into hardware, not the guest's CR3.
@@ -2625,7 +2625,7 @@ void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
 		if (!(exec_controls_get(to_vmx(vcpu)) & CPU_BASED_CR3_LOAD_EXITING))
 			vcpu->arch.cr3 = vmcs_readl(GUEST_CR3);
 		break;
-	case VCPU_EXREG_CR4:
+	case VCPU_REG_CR4:
 		guest_owned_bits = vcpu->arch.cr4_guest_owned_bits;
 
 		vcpu->arch.cr4 &= ~guest_owned_bits;
@@ -3350,7 +3350,7 @@ void vmx_ept_load_pdptrs(struct kvm_vcpu *vcpu)
 {
 	struct kvm_mmu *mmu = vcpu->arch.walk_mmu;
 
-	if (!kvm_register_is_dirty(vcpu, VCPU_EXREG_PDPTR))
+	if (!kvm_register_is_dirty(vcpu, VCPU_REG_PDPTR))
 		return;
 
 	if (is_pae_paging(vcpu)) {
@@ -3373,7 +3373,7 @@ void ept_save_pdptrs(struct kvm_vcpu *vcpu)
 	mmu->pdptrs[2] = vmcs_read64(GUEST_PDPTR2);
 	mmu->pdptrs[3] = vmcs_read64(GUEST_PDPTR3);
 
-	kvm_register_mark_available(vcpu, VCPU_EXREG_PDPTR);
+	kvm_register_mark_available(vcpu, VCPU_REG_PDPTR);
 }
 
 #define CR3_EXITING_BITS (CPU_BASED_CR3_LOAD_EXITING | \
@@ -3416,7 +3416,7 @@ void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 	vmcs_writel(CR0_READ_SHADOW, cr0);
 	vmcs_writel(GUEST_CR0, hw_cr0);
 	vcpu->arch.cr0 = cr0;
-	kvm_register_mark_available(vcpu, VCPU_EXREG_CR0);
+	kvm_register_mark_available(vcpu, VCPU_REG_CR0);
 
 #ifdef CONFIG_X86_64
 	if (vcpu->arch.efer & EFER_LME) {
@@ -3434,8 +3434,8 @@ void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 		 * (correctly) stop reading vmcs.GUEST_CR3 because it thinks
 		 * KVM's CR3 is installed.
 		 */
-		if (!kvm_register_is_available(vcpu, VCPU_EXREG_CR3))
-			vmx_cache_reg(vcpu, VCPU_EXREG_CR3);
+		if (!kvm_register_is_available(vcpu, VCPU_REG_CR3))
+			vmx_cache_reg(vcpu, VCPU_REG_CR3);
 
 		/*
 		 * When running with EPT but not unrestricted guest, KVM must
@@ -3472,7 +3472,7 @@ void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 		 * GUEST_CR3 is still vmx->ept_identity_map_addr if EPT + !URG.
 		 */
 		if (!(old_cr0_pg & X86_CR0_PG) && (cr0 & X86_CR0_PG))
-			kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
+			kvm_register_mark_dirty(vcpu, VCPU_REG_CR3);
 	}
 
 	/* depends on vcpu->arch.cr0 to be set to a new value */
@@ -3501,7 +3501,7 @@ void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level)
 
 		if (!enable_unrestricted_guest && !is_paging(vcpu))
 			guest_cr3 = to_kvm_vmx(kvm)->ept_identity_map_addr;
-		else if (kvm_register_is_dirty(vcpu, VCPU_EXREG_CR3))
+		else if (kvm_register_is_dirty(vcpu, VCPU_REG_CR3))
 			guest_cr3 = vcpu->arch.cr3;
 		else /* vmcs.GUEST_CR3 is already up-to-date. */
 			update_guest_cr3 = false;
@@ -3561,7 +3561,7 @@ void vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 	}
 
 	vcpu->arch.cr4 = cr4;
-	kvm_register_mark_available(vcpu, VCPU_EXREG_CR4);
+	kvm_register_mark_available(vcpu, VCPU_REG_CR4);
 
 	if (!enable_unrestricted_guest) {
 		if (enable_ept) {
@@ -5021,7 +5021,7 @@ void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 	vmcs_write32(GUEST_IDTR_LIMIT, 0xffff);
 
 	vmx_segment_cache_clear(vmx);
-	kvm_register_mark_available(vcpu, VCPU_EXREG_SEGMENTS);
+	kvm_register_mark_available(vcpu, VCPU_REG_SEGMENTS);
 
 	vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE);
 	vmcs_write32(GUEST_INTERRUPTIBILITY_INFO, 0);
@@ -7514,9 +7514,9 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
 
 		vmx->vt.exit_reason.full = EXIT_REASON_INVALID_STATE;
 		vmx->vt.exit_reason.failed_vmentry = 1;
-		kvm_register_mark_available(vcpu, VCPU_EXREG_EXIT_INFO_1);
+		kvm_register_mark_available(vcpu, VCPU_REG_EXIT_INFO_1);
 		vmx->vt.exit_qualification = ENTRY_FAIL_DEFAULT;
-		kvm_register_mark_available(vcpu, VCPU_EXREG_EXIT_INFO_2);
+		kvm_register_mark_available(vcpu, VCPU_REG_EXIT_INFO_2);
 		vmx->vt.exit_intr_info = 0;
 		return EXIT_FASTPATH_NONE;
 	}
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index d0cc5f6c6879..9fb76ea48caf 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -317,7 +317,7 @@ static __always_inline unsigned long vmx_get_exit_qual(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vt *vt = to_vt(vcpu);
 
-	if (!kvm_register_test_and_mark_available(vcpu, VCPU_EXREG_EXIT_INFO_1) &&
+	if (!kvm_register_test_and_mark_available(vcpu, VCPU_REG_EXIT_INFO_1) &&
 	    !WARN_ON_ONCE(is_td_vcpu(vcpu)))
 		vt->exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
 
@@ -328,7 +328,7 @@ static __always_inline u32 vmx_get_intr_info(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vt *vt = to_vt(vcpu);
 
-	if (!kvm_register_test_and_mark_available(vcpu, VCPU_EXREG_EXIT_INFO_2) &&
+	if (!kvm_register_test_and_mark_available(vcpu, VCPU_REG_EXIT_INFO_2) &&
 	    !WARN_ON_ONCE(is_td_vcpu(vcpu)))
 		vt->exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
 
@@ -622,14 +622,14 @@ BUILD_CONTROLS_SHADOW(tertiary_exec, TERTIARY_VM_EXEC_CONTROL, 64)
  */
 #define VMX_REGS_LAZY_LOAD_SET	((1 << VCPU_REG_RIP) |         \
 				(1 << VCPU_REGS_RSP) |          \
-				(1 << VCPU_EXREG_RFLAGS) |      \
-				(1 << VCPU_EXREG_PDPTR) |       \
-				(1 << VCPU_EXREG_SEGMENTS) |    \
-				(1 << VCPU_EXREG_CR0) |         \
-				(1 << VCPU_EXREG_CR3) |         \
-				(1 << VCPU_EXREG_CR4) |         \
-				(1 << VCPU_EXREG_EXIT_INFO_1) | \
-				(1 << VCPU_EXREG_EXIT_INFO_2))
+				(1 << VCPU_REG_RFLAGS) |      \
+				(1 << VCPU_REG_PDPTR) |       \
+				(1 << VCPU_REG_SEGMENTS) |    \
+				(1 << VCPU_REG_CR0) |         \
+				(1 << VCPU_REG_CR3) |         \
+				(1 << VCPU_REG_CR4) |         \
+				(1 << VCPU_REG_EXIT_INFO_1) | \
+				(1 << VCPU_REG_EXIT_INFO_2))
 
 static inline unsigned long vmx_l1_guest_owned_cr0_bits(void)
 {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0a1b63c63d1a..ac05cc289b56 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1090,14 +1090,14 @@ int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3)
 	}
 
 	/*
-	 * Marking VCPU_EXREG_PDPTR dirty doesn't work for !tdp_enabled.
+	 * Marking VCPU_REG_PDPTR dirty doesn't work for !tdp_enabled.
 	 * Shadow page roots need to be reconstructed instead.
 	 */
 	if (!tdp_enabled && memcmp(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs)))
 		kvm_mmu_free_roots(vcpu->kvm, mmu, KVM_MMU_ROOT_CURRENT);
 
 	memcpy(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs));
-	kvm_register_mark_dirty(vcpu, VCPU_EXREG_PDPTR);
+	kvm_register_mark_dirty(vcpu, VCPU_REG_PDPTR);
 	kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu);
 	vcpu->arch.pdptrs_from_userspace = false;
 
@@ -1478,7 +1478,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 		kvm_mmu_new_pgd(vcpu, cr3);
 
 	vcpu->arch.cr3 = cr3;
-	kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
+	kvm_register_mark_dirty(vcpu, VCPU_REG_CR3);
 	/* Do not call post_set_cr3, we do not get here for confidential guests.  */
 
 handle_tlb_flush:
@@ -12473,7 +12473,7 @@ static int __set_sregs_common(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs,
 	vcpu->arch.cr2 = sregs->cr2;
 	*mmu_reset_needed |= kvm_read_cr3(vcpu) != sregs->cr3;
 	vcpu->arch.cr3 = sregs->cr3;
-	kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
+	kvm_register_mark_dirty(vcpu, VCPU_REG_CR3);
 	kvm_x86_call(post_set_cr3)(vcpu, sregs->cr3);
 
 	kvm_set_cr8(vcpu, sregs->cr8);
@@ -12566,7 +12566,7 @@ static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2)
 		for (i = 0; i < 4 ; i++)
 			kvm_pdptr_write(vcpu, i, sregs2->pdptrs[i]);
 
-		kvm_register_mark_dirty(vcpu, VCPU_EXREG_PDPTR);
+		kvm_register_mark_dirty(vcpu, VCPU_REG_PDPTR);
 		mmu_reset_needed = 1;
 		vcpu->arch.pdptrs_from_userspace = true;
 	}
@@ -13111,7 +13111,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 	kvm_rip_write(vcpu, 0xfff0);
 
 	vcpu->arch.cr3 = 0;
-	kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
+	kvm_register_mark_dirty(vcpu, VCPU_REG_CR3);
 
 	/*
 	 * CR0.CD/NW are set on RESET, preserved on INIT.  Note, some versions
@@ -14323,7 +14323,7 @@ int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva)
 		 * the RAP (Return Address Predicator).
 		 */
 		if (guest_cpu_cap_has(vcpu, X86_FEATURE_ERAPS))
-			kvm_register_is_dirty(vcpu, VCPU_EXREG_ERAPS);
+			kvm_register_is_dirty(vcpu, VCPU_REG_ERAPS);
 
 		kvm_invalidate_pcid(vcpu, operand.pcid);
 		return kvm_skip_emulated_instruction(vcpu);
@@ -14339,7 +14339,7 @@ int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva)
 		fallthrough;
 	case INVPCID_TYPE_ALL_INCL_GLOBAL:
 		/*
-		 * Don't bother marking VCPU_EXREG_ERAPS dirty, SVM will take
+		 * Don't bother marking VCPU_REG_ERAPS dirty, SVM will take
 		 * care of doing so when emulating the full guest TLB flush
 		 * (the RAP is cleared on all implicit TLB flushes).
 		 */
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related

* [PATCH v2 3/6] KVM: nVMX: Do a bitwise-AND of regs_avail when switching active VMCS
From: Sean Christopherson @ 2026-04-09 22:42 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Kiryl Shutsemau
  Cc: kvm, x86, linux-coco, linux-kernel, Chang S . Bae
In-Reply-To: <20260409224236.2021562-1-seanjc@google.com>

When switching between vmcs01 and vmcs02, do a bitwise-AND of regs_avail
to effectively reset the mask for the new VMCS, purely to be consistent
with all other "full" writes of regs_avail.  In practice, a straight write
versus a bitwise-AND will yield the same result, as kvm_arch_vcpu_create()
marks *all* registers available (and dirty), and KVM never marks registers
unavailable unless they're lazily loaded.

This will allow adding wrapper APIs to set regs_{avail,dirty} without
having to add special handling for a nVMX use case that doesn't exist in
practice.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 22b1f06a9d40..63c4ca8c97d5 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -310,7 +310,7 @@ static void vmx_switch_vmcs(struct kvm_vcpu *vcpu, struct loaded_vmcs *vmcs)
 	vmx_sync_vmcs_host_state(vmx, prev);
 	put_cpu();
 
-	vcpu->arch.regs_avail = ~VMX_REGS_LAZY_LOAD_SET;
+	vcpu->arch.regs_avail &= ~VMX_REGS_LAZY_LOAD_SET;
 
 	/*
 	 * All lazily updated registers will be reloaded from VMCS12 on both
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related

* [PATCH v2 4/6] KVM: x86: Add wrapper APIs to reset dirty/available register masks
From: Sean Christopherson @ 2026-04-09 22:42 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Kiryl Shutsemau
  Cc: kvm, x86, linux-coco, linux-kernel, Chang S . Bae
In-Reply-To: <20260409224236.2021562-1-seanjc@google.com>

Add wrappers for setting regs_{avail,dirty} in anticipation of turning the
fields into proper bitmaps, at which point direct writes won't work so
well.

Deliberately leave the initialization in kvm_arch_vcpu_create() as-is,
because the regs_avail logic in particular is special in that it's the one
and only place where KVM marks eagerly synchronized registers as available.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/kvm_cache_regs.h | 18 ++++++++++++++++++
 arch/x86/kvm/svm/svm.c        |  4 ++--
 arch/x86/kvm/vmx/nested.c     |  4 ++--
 arch/x86/kvm/vmx/tdx.c        |  2 +-
 arch/x86/kvm/vmx/vmx.c        |  4 ++--
 5 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index ac1f9867a234..7f71d468178c 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -105,6 +105,24 @@ static __always_inline bool kvm_register_test_and_mark_available(struct kvm_vcpu
 	return arch___test_and_set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
 }
 
+static __always_inline void kvm_clear_available_registers(struct kvm_vcpu *vcpu,
+							  u32 clear_mask)
+{
+	/*
+	 * Note the bitwise-AND!  In practice, a straight write would also work
+	 * as KVM initializes the mask to all ones and never clears registers
+	 * that are eagerly synchronized.  Using a bitwise-AND adds a bit of
+	 * sanity checking as incorrectly marking an eagerly sync'd register
+	 * unavailable will generate a WARN due to an unexpected cache request.
+	 */
+	vcpu->arch.regs_avail &= ~clear_mask;
+}
+
+static __always_inline void kvm_reset_dirty_registers(struct kvm_vcpu *vcpu)
+{
+	vcpu->arch.regs_dirty = 0;
+}
+
 /*
  * The "raw" register helpers are only for cases where the full 64 bits of a
  * register are read/written irrespective of current vCPU mode.  In other words,
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index ee5749d8b3e8..2b73d2650155 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4508,7 +4508,7 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
 		vcpu->arch.regs[VCPU_REGS_RSP] = svm->vmcb->save.rsp;
 		vcpu->arch.rip = svm->vmcb->save.rip;
 	}
-	vcpu->arch.regs_dirty = 0;
+	kvm_reset_dirty_registers(vcpu);
 
 	if (unlikely(svm->vmcb->control.exit_code == SVM_EXIT_NMI))
 		kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
@@ -4554,7 +4554,7 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
 		vcpu->arch.apf.host_apf_flags =
 			kvm_read_and_reset_apf_flags();
 
-	vcpu->arch.regs_avail &= ~SVM_REGS_LAZY_LOAD_SET;
+	kvm_clear_available_registers(vcpu, SVM_REGS_LAZY_LOAD_SET);
 
 	if (!msr_write_intercepted(vcpu, MSR_AMD64_PERF_CNTR_GLOBAL_CTL))
 		rdmsrq(MSR_AMD64_PERF_CNTR_GLOBAL_CTL, vcpu_to_pmu(vcpu)->global_ctrl);
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 63c4ca8c97d5..c4d2bc080add 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -310,13 +310,13 @@ static void vmx_switch_vmcs(struct kvm_vcpu *vcpu, struct loaded_vmcs *vmcs)
 	vmx_sync_vmcs_host_state(vmx, prev);
 	put_cpu();
 
-	vcpu->arch.regs_avail &= ~VMX_REGS_LAZY_LOAD_SET;
+	kvm_clear_available_registers(vcpu, VMX_REGS_LAZY_LOAD_SET);
 
 	/*
 	 * All lazily updated registers will be reloaded from VMCS12 on both
 	 * vmentry and vmexit.
 	 */
-	vcpu->arch.regs_dirty = 0;
+	kvm_reset_dirty_registers(vcpu);
 }
 
 static void nested_put_vmcs12_pages(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index c23ec4ac8bc8..c9ab7902151f 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1098,7 +1098,7 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
 
 	tdx_load_host_xsave_state(vcpu);
 
-	vcpu->arch.regs_avail &= TDX_REGS_AVAIL_SET;
+	kvm_clear_available_registers(vcpu, ~(u32)TDX_REGS_AVAIL_SET);
 
 	if (unlikely(tdx->vp_enter_ret == EXIT_REASON_EPT_MISCONFIG))
 		return EXIT_FASTPATH_NONE;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index aa1c26018439..61eeafcd70f1 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7472,7 +7472,7 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
 				   flags);
 
 	vcpu->arch.cr2 = native_read_cr2();
-	vcpu->arch.regs_avail &= ~VMX_REGS_LAZY_LOAD_SET;
+	kvm_clear_available_registers(vcpu, VMX_REGS_LAZY_LOAD_SET);
 
 	vmx->idt_vectoring_info = 0;
 
@@ -7538,7 +7538,7 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
 		vmcs_writel(GUEST_RSP, vcpu->arch.regs[VCPU_REGS_RSP]);
 	if (kvm_register_is_dirty(vcpu, VCPU_REG_RIP))
 		vmcs_writel(GUEST_RIP, vcpu->arch.rip);
-	vcpu->arch.regs_dirty = 0;
+	kvm_reset_dirty_registers(vcpu);
 
 	if (run_flags & KVM_RUN_LOAD_GUEST_DR6)
 		set_debugreg(vcpu->arch.dr6, 6);
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related

* [PATCH v2 5/6] KVM: x86: Track available/dirty register masks as "unsigned long" values
From: Sean Christopherson @ 2026-04-09 22:42 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Kiryl Shutsemau
  Cc: kvm, x86, linux-coco, linux-kernel, Chang S . Bae
In-Reply-To: <20260409224236.2021562-1-seanjc@google.com>

Convert regs_{avail,dirty} and all related masks to "unsigned long" values
as an intermediate step towards declaring the fields as actual bitmaps, and
as a step toward support APX, which will push the total number of registers
beyond 32 on 64-bit kernels.

Opportunistically convert TDX's ULL bitmask to a UL to match everything
else (TDX is 64-bit only, so it's a nop in the end).

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  4 ++--
 arch/x86/kvm/kvm_cache_regs.h   |  2 +-
 arch/x86/kvm/svm/svm.h          |  2 +-
 arch/x86/kvm/vmx/tdx.c          | 36 ++++++++++++++++-----------------
 arch/x86/kvm/vmx/vmx.h          | 20 +++++++++---------
 5 files changed, 32 insertions(+), 32 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b1eae1e7b04f..c47eb294c066 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -802,8 +802,8 @@ struct kvm_vcpu_arch {
 	 */
 	unsigned long regs[NR_VCPU_GENERAL_PURPOSE_REGS];
 	unsigned long rip;
-	u32 regs_avail;
-	u32 regs_dirty;
+	unsigned long regs_avail;
+	unsigned long regs_dirty;
 
 	unsigned long cr0;
 	unsigned long cr0_guest_owned_bits;
diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index 7f71d468178c..171e6bc2e169 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -106,7 +106,7 @@ static __always_inline bool kvm_register_test_and_mark_available(struct kvm_vcpu
 }
 
 static __always_inline void kvm_clear_available_registers(struct kvm_vcpu *vcpu,
-							  u32 clear_mask)
+							  unsigned long clear_mask)
 {
 	/*
 	 * Note the bitwise-AND!  In practice, a straight write would also work
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 677d268ae9c7..7b46a3f13de1 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -474,7 +474,7 @@ static inline bool svm_is_vmrun_failure(u64 exit_code)
  * KVM_REQ_LOAD_MMU_PGD is always requested when the cached vcpu->arch.cr3
  * is changed.  svm_load_mmu_pgd() then syncs the new CR3 value into the VMCB.
  */
-#define SVM_REGS_LAZY_LOAD_SET	(1 << VCPU_REG_PDPTR)
+#define SVM_REGS_LAZY_LOAD_SET	(BIT(VCPU_REG_PDPTR))
 
 static inline void __vmcb_set_intercept(unsigned long *intercepts, u32 bit)
 {
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index c9ab7902151f..85f28363e4cc 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1013,23 +1013,23 @@ static fastpath_t tdx_exit_handlers_fastpath(struct kvm_vcpu *vcpu)
 	return EXIT_FASTPATH_NONE;
 }
 
-#define TDX_REGS_AVAIL_SET	(BIT_ULL(VCPU_REG_EXIT_INFO_1) | \
-				 BIT_ULL(VCPU_REG_EXIT_INFO_2) | \
-				 BIT_ULL(VCPU_REGS_RAX) | \
-				 BIT_ULL(VCPU_REGS_RBX) | \
-				 BIT_ULL(VCPU_REGS_RCX) | \
-				 BIT_ULL(VCPU_REGS_RDX) | \
-				 BIT_ULL(VCPU_REGS_RBP) | \
-				 BIT_ULL(VCPU_REGS_RSI) | \
-				 BIT_ULL(VCPU_REGS_RDI) | \
-				 BIT_ULL(VCPU_REGS_R8) | \
-				 BIT_ULL(VCPU_REGS_R9) | \
-				 BIT_ULL(VCPU_REGS_R10) | \
-				 BIT_ULL(VCPU_REGS_R11) | \
-				 BIT_ULL(VCPU_REGS_R12) | \
-				 BIT_ULL(VCPU_REGS_R13) | \
-				 BIT_ULL(VCPU_REGS_R14) | \
-				 BIT_ULL(VCPU_REGS_R15))
+#define TDX_REGS_AVAIL_SET	(BIT(VCPU_REG_EXIT_INFO_1) | \
+				 BIT(VCPU_REG_EXIT_INFO_2) | \
+				 BIT(VCPU_REGS_RAX) | \
+				 BIT(VCPU_REGS_RBX) | \
+				 BIT(VCPU_REGS_RCX) | \
+				 BIT(VCPU_REGS_RDX) | \
+				 BIT(VCPU_REGS_RBP) | \
+				 BIT(VCPU_REGS_RSI) | \
+				 BIT(VCPU_REGS_RDI) | \
+				 BIT(VCPU_REGS_R8) | \
+				 BIT(VCPU_REGS_R9) | \
+				 BIT(VCPU_REGS_R10) | \
+				 BIT(VCPU_REGS_R11) | \
+				 BIT(VCPU_REGS_R12) | \
+				 BIT(VCPU_REGS_R13) | \
+				 BIT(VCPU_REGS_R14) | \
+				 BIT(VCPU_REGS_R15))
 
 static void tdx_load_host_xsave_state(struct kvm_vcpu *vcpu)
 {
@@ -1098,7 +1098,7 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
 
 	tdx_load_host_xsave_state(vcpu);
 
-	kvm_clear_available_registers(vcpu, ~(u32)TDX_REGS_AVAIL_SET);
+	kvm_clear_available_registers(vcpu, ~TDX_REGS_AVAIL_SET);
 
 	if (unlikely(tdx->vp_enter_ret == EXIT_REASON_EPT_MISCONFIG))
 		return EXIT_FASTPATH_NONE;
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 9fb76ea48caf..48447fa983f4 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -620,16 +620,16 @@ BUILD_CONTROLS_SHADOW(tertiary_exec, TERTIARY_VM_EXEC_CONTROL, 64)
  * cache on demand.  Other registers not listed here are synced to
  * the cache immediately after VM-Exit.
  */
-#define VMX_REGS_LAZY_LOAD_SET	((1 << VCPU_REG_RIP) |         \
-				(1 << VCPU_REGS_RSP) |          \
-				(1 << VCPU_REG_RFLAGS) |      \
-				(1 << VCPU_REG_PDPTR) |       \
-				(1 << VCPU_REG_SEGMENTS) |    \
-				(1 << VCPU_REG_CR0) |         \
-				(1 << VCPU_REG_CR3) |         \
-				(1 << VCPU_REG_CR4) |         \
-				(1 << VCPU_REG_EXIT_INFO_1) | \
-				(1 << VCPU_REG_EXIT_INFO_2))
+#define VMX_REGS_LAZY_LOAD_SET	(BIT(VCPU_REGS_RSP) |		\
+				 BIT(VCPU_REG_RIP) |		\
+				 BIT(VCPU_REG_RFLAGS) |		\
+				 BIT(VCPU_REG_PDPTR) |		\
+				 BIT(VCPU_REG_SEGMENTS) |	\
+				 BIT(VCPU_REG_CR0) |		\
+				 BIT(VCPU_REG_CR3) |		\
+				 BIT(VCPU_REG_CR4) |		\
+				 BIT(VCPU_REG_EXIT_INFO_1) |	\
+				 BIT(VCPU_REG_EXIT_INFO_2))
 
 static inline unsigned long vmx_l1_guest_owned_cr0_bits(void)
 {
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox