Linux Confidential Computing Development
 help / color / mirror / Atom feed
* Re: [PATCH v9 08/23] coco/tdx-host: Expose P-SEAMLDR information via sysfs
From: Dave Hansen @ 2026-05-15 17:28 UTC (permalink / raw)
  To: Chao Gao, kvm, linux-coco, linux-kernel
  Cc: binbin.wu, dave.hansen, djbw, ira.weiny, kai.huang, kas,
	nik.borisov, paulmck, pbonzini, reinette.chatre, rick.p.edgecombe,
	sagis, seanjc, tony.lindgren, vannapurve, vishal.l.verma,
	yilun.xu, xiaoyao.li, yan.y.zhao, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin
In-Reply-To: <20260513151045.1420990-9-chao.gao@intel.com>

On 5/13/26 08:09, Chao Gao wrote:
> +static umode_t seamldr_group_visible(struct kobject *kobj, struct attribute *attr, int idx)
> +{
> +	const struct tdx_sys_info *sysinfo = tdx_get_sysinfo();
> +
> +	if (!sysinfo)
> +		return 0;
> +
> +	return tdx_supports_runtime_update(sysinfo) ? attr->mode : 0;
> +}

Axe the tertiary form.

	if (!tdx_supports_runtime_update(sysinfo))
		return 0;

	return attr->mode;

Please.

^ permalink raw reply

* Re: [PATCH v9 09/23] coco/tdx-host: Don't expose P-SEAMLDR information on CPUs with erratum
From: Dave Hansen @ 2026-05-15 17:26 UTC (permalink / raw)
  To: Chao Gao, kvm, linux-coco, linux-kernel
  Cc: binbin.wu, dave.hansen, djbw, ira.weiny, kai.huang, kas,
	nik.borisov, paulmck, pbonzini, reinette.chatre, rick.p.edgecombe,
	sagis, seanjc, tony.lindgren, vannapurve, vishal.l.verma,
	yilun.xu, xiaoyao.li, yan.y.zhao, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin
In-Reply-To: <20260513151045.1420990-10-chao.gao@intel.com>

On 5/13/26 08:09, Chao Gao wrote:
> Some TDX-capable CPUs have an erratum, as documented in Intel® Trust
> Domain CPU Architectural Extensions (May 2021 edition) Chapter 2.3:

2021, eh?

>   SEAMRET from the P-SEAMLDR clears the current VMCS structure pointed
>   to by the current-VMCS pointer. A VMM that invokes the P-SEAMLDR using
>   SEAMCALL must reload the current-VMCS, if required, using the VMPTRLD
>   instruction.
> 
> Clearing the current VMCS behind KVM's back will break KVM.
> 
> This erratum is not present when IA32_VMX_BASIC[60] is set. Add a CPU
> bug bit for this erratum and refuse to expose P-SEAMLDR information
> on affected CPUs, because even reading the P-SEAMLDR sysfs knobs would
> enter and exit P-SEAMLDR.
> 
> Use a CPU bug bit to stay consistent with X86_BUG_TDX_PW_MCE. As a bonus,
> the bug bit is visible to userspace, which allows userspace to determine
> why these sysfs files are not exposed, and it can also be checked by other
> kernel components in the future if needed.
> 
> == Alternatives ==
> Two workarounds were considered but both were rejected:
> 
> 1. Save/restore the current VMCS around P-SEAMLDR calls. This produces ugly
>    assembly code [1] and doesn't play well with #MCE or #NMI if they
>    need to use the current VMCS.
> 
> 2. Move KVM's VMCS tracking logic to the TDX core code, which would break
>    the boundary between KVM and the TDX core code [2].

Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>


^ permalink raw reply

* Re: [PATCH v9 08/23] coco/tdx-host: Expose P-SEAMLDR information via sysfs
From: Dave Hansen @ 2026-05-15 17:23 UTC (permalink / raw)
  To: Chao Gao, kvm, linux-coco, linux-kernel
  Cc: binbin.wu, dave.hansen, djbw, ira.weiny, kai.huang, kas,
	nik.borisov, paulmck, pbonzini, reinette.chatre, rick.p.edgecombe,
	sagis, seanjc, tony.lindgren, vannapurve, vishal.l.verma,
	yilun.xu, xiaoyao.li, yan.y.zhao, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin
In-Reply-To: <20260513151045.1420990-9-chao.gao@intel.com>

> +What:		/sys/devices/faux/tdx_host/seamldr_version
> +Contact:	linux-coco@lists.linux.dev
> +Description:	(RO) Report the version of the loaded P-SEAMLDR. The P-SEAMLDR
> +		version is formatted as x.y.z, where "x" is the major version,
> +		"y" is the minor version and "z" is the update version. Versions
> +		are used for bug reporting and compatibility checks.

Rather than a copy/paste, I'd just say the format is the same as the
main module version.

...
> +static ssize_t num_remaining_updates_show(struct device *dev,
> +					  struct device_attribute *attr,
> +					  char *buf)
> +{
> +	struct seamldr_info info;
> +	int ret;
> +
> +	ret = seamldr_get_info(&info);
> +	if (ret)
> +		return ret;
> +
> +	return sysfs_emit(buf, "%u\n", info.num_remaining_updates);
> +}
> +
> +static DEVICE_ATTR_ADMIN_RO(seamldr_version);
> +static DEVICE_ATTR_ADMIN_RO(num_remaining_updates);
> +
> +static struct attribute *seamldr_attrs[] = {
> +	&dev_attr_seamldr_version.attr,
> +	&dev_attr_num_remaining_updates.attr,
> +	NULL,
> +};
> +
> +static umode_t seamldr_group_visible(struct kobject *kobj, struct attribute *attr, int idx)
> +{
> +	const struct tdx_sys_info *sysinfo = tdx_get_sysinfo();
> +
> +	if (!sysinfo)
> +		return 0;
> +
> +	return tdx_supports_runtime_update(sysinfo) ? attr->mode : 0;
> +}
> +
> +static const struct attribute_group seamldr_group = {
> +	.attrs = seamldr_attrs,
> +	.is_visible = seamldr_group_visible,
> +};

I feel like we need to mention *somewhere* that these are kinda nasty.
tdx_get_sysinfo() is slow and single-threaded. These very much are and
need to stay 0400 for good reason.

Talk about the DEVICE_ATTR_ADMIN_RO() choice _somewhere_, please.

With those fixed:

Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>

^ permalink raw reply

* Re: [PATCH v9 07/23] x86/virt/seamldr: Add a helper to retrieve P-SEAMLDR information
From: Dave Hansen @ 2026-05-15 17:18 UTC (permalink / raw)
  To: Chao Gao, kvm, linux-coco, linux-kernel
  Cc: binbin.wu, dave.hansen, djbw, ira.weiny, kai.huang, kas,
	nik.borisov, paulmck, pbonzini, reinette.chatre, rick.p.edgecombe,
	sagis, seanjc, tony.lindgren, vannapurve, vishal.l.verma,
	yilun.xu, xiaoyao.li, yan.y.zhao, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin
In-Reply-To: <20260513151045.1420990-8-chao.gao@intel.com>

On 5/13/26 08:09, Chao Gao wrote:
> P-SEAMLDR reports its state via SEAMLDR.INFO, including its version and
> the number of remaining runtime updates.
> 
> This information is useful for userspace. For example, the admin can use
> the P-SEAMLDR version to determine whether a candidate TDX module is
> compatible with the running loader, and can use the remaining update count
> to determine whether another runtime update is still possible.
> 
> Add a helper to retrieve P-SEAMLDR information in preparation for
> exposing P-SEAMLDR version and other necessary information to userspace.
> Export the new kAPI for use by tdx-host.ko.

What is "tdx-host.ko"? My kernel doesn't have that. Remember a few
patches ago, the "tristate" in Kconfig? ;)

You also called in the "tdx_host" device. Please be consistent with the
naming.

> Note that there are two distinct P-SEAMLDR APIs with similar names:
> 
>   SEAMLDR.INFO: Returns a SEAMLDR_INFO structure containing SEAMLDR
>                 information such as version and remaining updates.
> 
>   SEAMLDR.SEAMINFO: Returns a SEAMLDR_SEAMINFO structure containing SEAM
>                     and system information such as Convertible Memory
> 		    Regions (CMRs) and number of CPUs and sockets.
> 
> The former is used here.
This doesn't help.

"SEAMLDR.INFO" is metadata about the loader. It's metadata for the
update process.

"SEAMLDR.SEAMINFO" is metadata about the module. It is for the module
init process, not for the update process.

Right? Isn't that a billion times more useful and actually helps
differentiate them?

Also, more nits: I hate former/latter too. It makes me stop reading and
have to go back *EVER* *TIME*. I hate that particular english construct.
It's horrid.

Just say:

	Use SEAMLDR.INFO here.

It's even shorter than the passive "is used" and doesn't require me
going back and re-read the text.

> +++ b/arch/x86/include/asm/seamldr.h
> @@ -0,0 +1,36 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_X86_SEAMLDR_H
> +#define _ASM_X86_SEAMLDR_H
> +
> +#include <linux/types.h>
> +
> +/*
> + * This is called the "SEAMLDR_INFO" data structure and is defined
> + * in "SEAM Loader (SEAMLDR) Interface Specification".

More succinct:

    * This is the "SEAMLDR_INFO" data structure defined in the
      "SEAM Loader (SEAMLDR) Interface Specification".

> + * The SEAMLDR.INFO documentation requires this to be aligned to a
> + * 256-byte boundary.
> + */

Just say:

 * Must be aligned to a 256-byte boundary.

With those fixed:

Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>

^ permalink raw reply

* Re: [PATCH v9 06/23] x86/virt/seamldr: Introduce a wrapper for P-SEAMLDR SEAMCALLs
From: Dave Hansen @ 2026-05-15 17:07 UTC (permalink / raw)
  To: Chao Gao, kvm, linux-coco, linux-kernel, linux-rt-devel
  Cc: binbin.wu, dave.hansen, djbw, ira.weiny, kai.huang, kas,
	nik.borisov, paulmck, pbonzini, reinette.chatre, rick.p.edgecombe,
	sagis, seanjc, tony.lindgren, vannapurve, vishal.l.verma,
	yilun.xu, xiaoyao.li, yan.y.zhao, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Sebastian Andrzej Siewior,
	Clark Williams, Steven Rostedt
In-Reply-To: <20260513151045.1420990-7-chao.gao@intel.com>

On 5/13/26 08:09, Chao Gao wrote:
> The TDX architecture uses the "SEAMCALL" instruction to communicate with
> SEAM mode software. Right now, the only SEAM mode software that the kernel
> communicates with is the TDX module. But, there is actually another
> component that runs in SEAM mode but it is separate from the TDX module:
> the persistent SEAM loader or "P-SEAMLDR". Right now, the only component
> that communicates with it is the BIOS which loads the TDX module itself at
> boot. But, to support updating the TDX module, the kernel now needs to be
> able to talk to it.
> 
> P-SEAMLDR SEAMCALLs differ from TDX module SEAMCALLs in areas such as
> concurrency requirements. Add a P-SEAMLDR wrapper to handle these
> differences and prepare for implementing concrete functions.
> 
> Use seamcall_prerr() (not '_ret') because current P-SEAMLDR calls do not
> use any output registers other than RAX.
> 
> Note that unlike P-SEAMLDR, there is also a non-persistent SEAM loader
> ("NP-SEAMLDR"). This is an authenticated code module (ACM) that is not
> callable at runtime. Only BIOS launches it to load P-SEAMLDR at boot;
> the kernel does not need to interact with it for runtime update.

Nit: the "main act" solution statement for this changelog is:

	Add a P-SEAMLDR wrapper to handle these differences and prepare
	for implementing concrete functions.

What's more important, that ^?

Or:

	Use seamcall_prerr() (not '_ret') because current P-SEAMLDR
	calls do not use any output registers other than RAX.

?

Now, what is in a more prominent place in the changelog?

Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>

^ permalink raw reply

* Re: [PATCH v9 06/23] x86/virt/seamldr: Introduce a wrapper for P-SEAMLDR SEAMCALLs
From: Dave Hansen @ 2026-05-15 17:02 UTC (permalink / raw)
  To: Chao Gao, kvm, linux-coco, linux-kernel, linux-rt-devel
  Cc: binbin.wu, dave.hansen, djbw, ira.weiny, kai.huang, kas,
	nik.borisov, paulmck, pbonzini, reinette.chatre, rick.p.edgecombe,
	sagis, seanjc, tony.lindgren, vannapurve, vishal.l.verma,
	yilun.xu, xiaoyao.li, yan.y.zhao, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Sebastian Andrzej Siewior,
	Clark Williams, Steven Rostedt
In-Reply-To: <20260513151045.1420990-7-chao.gao@intel.com>

On 5/13/26 08:09, Chao Gao wrote:
> Note that unlike P-SEAMLDR, there is also a non-persistent SEAM loader
> ("NP-SEAMLDR"). This is an authenticated code module (ACM) that is not
> callable at runtime. Only BIOS launches it to load P-SEAMLDR at boot;
> the kernel does not need to interact with it for runtime update.

The "unlike" is the wrong word to use here.

Here's a rewrite, with some Claude help:

Note: Despite the similar name, the NP-SEAMLDR ("Non-Persistent")
differs sharply from the P-SEAMLDR. It is an authenticated code module
(ACM) invoked exclusively by the BIOS at boot rather than a component
running in SEAM mode. The kernel cannot call it at runtime. It exposes
no SEAMCALL interface.

^ permalink raw reply

* Re: [PATCH v9 05/23] coco/tdx-host: Expose TDX module version
From: Dave Hansen @ 2026-05-15 16:53 UTC (permalink / raw)
  To: Chao Gao, kvm, linux-coco, linux-kernel
  Cc: binbin.wu, dave.hansen, djbw, ira.weiny, kai.huang, kas,
	nik.borisov, paulmck, pbonzini, reinette.chatre, rick.p.edgecombe,
	sagis, seanjc, tony.lindgren, vannapurve, vishal.l.verma,
	yilun.xu, xiaoyao.li, yan.y.zhao, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin
In-Reply-To: <20260513151045.1420990-6-chao.gao@intel.com>

On 5/13/26 08:09, Chao Gao wrote:
> For TDX module updates, userspace needs to select compatible update
> versions based on the current module version. This design delegates
> module selection complexity to userspace because TDX module update
> policies are complex and version series are platform-specific.

I'm not sure exactly what a "version series" is. Do you need to say more
than that the policy is complex?

> For example, the 1.5.x series is for certain platform generations, while
> the 2.0.x series is intended for others. And TDX module 1.5.x may be
> updated to 1.5.y but not to 1.5.y+1.

That's not much of an example, IMNHO. How about:

	For example, the 1.5.x series runs on Sapphire Rapids but not
	Granite Rapids, which needs 2.0.x. Updates are also constrained
	by version distance, so a 1.5.6 module might permit updates to
	1.5.7 but not to 1.5.20.

> Expose the TDX module version to userspace via sysfs to aid module
> selection. Since the TDX faux device will drive module updates, expose
> the version as its attribute.
> 
> One bonus of exposing TDX module version via sysfs is: TDX module
> version information remains available even after dmesg logs are cleared.

I honestly wouldn't even mention this bit. You don't need a bonus.

> +++ b/Documentation/ABI/testing/sysfs-devices-faux-tdx-host
> @@ -0,0 +1,6 @@
> +What:		/sys/devices/faux/tdx_host/version
> +Contact:	linux-coco@lists.linux.dev
> +Description:	(RO) Report the version of the loaded TDX module. The TDX module
> +		version is formatted as x.y.z, where "x" is the major version,
> +		"y" is the minor version and "z" is the update version. Versions
> +		are used for bug reporting, TDX module updates etc.

The "etc." is silly. Just zap it.

Description:	(RO) Report the version of the loaded TDX module.
		Formatted as "major.minor.update". Used by TDX module
		update tooling. Example: "1.2.03"

That's at least a wee bit of warning to folks about the leading 0 so if
they are parsing it they are a wee bit careful with it.

> diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
> index 8b739ac01479..b7f4396b5cc5 100644
> --- a/arch/x86/include/asm/tdx.h
> +++ b/arch/x86/include/asm/tdx.h
> @@ -41,6 +41,12 @@
>  #include <asm/tdx_global_metadata.h>
>  #include <linux/pgtable.h>
>  
> +/*
> + * TDX module and P-SEAMLDR version convention: "major.minor.update"
> + * (e.g., "1.5.08") with zero-padded two-digit update field.
> + */
> +#define TDX_VERSION_FMT "%u.%u.%02u"

I hate "e.g.". I'm not sure why, but maybe it it often misused, or that
it isn't allowed by the style guide I had to use in high school. Either
way, please stop using it.

You don't have to modify this patch for it, but please stop.

Second, this was an opportunity to peel out the creation of
"TDX_VERSION_FMT". It would have shrunk your series by ~10 lines and
made this patch more obvious.

Again, you don't have to go do it, but it was a missed opportunity.


With those updates:

Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>


^ permalink raw reply

* Re: [PATCH v9 04/23] coco/tdx-host: Introduce a "tdx_host" device
From: Dave Hansen @ 2026-05-15 16:21 UTC (permalink / raw)
  To: Chao Gao, kvm, linux-coco, linux-kernel
  Cc: binbin.wu, dave.hansen, djbw, ira.weiny, kai.huang, kas,
	nik.borisov, paulmck, pbonzini, reinette.chatre, rick.p.edgecombe,
	sagis, seanjc, tony.lindgren, vannapurve, vishal.l.verma,
	yilun.xu, xiaoyao.li, yan.y.zhao, Dan Williams, Jonathan Cameron,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin
In-Reply-To: <20260513151045.1420990-5-chao.gao@intel.com>

On 5/13/26 08:09, Chao Gao wrote:
> Co-developed-by: Xu Yilun <yilun.xu@linux.intel.com>
> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>

This SoB chain at _least_ needs a note. It looks quite bizarre.

> +config TDX_HOST_SERVICES
> +	tristate "TDX Host Services Driver"
> +	depends on INTEL_TDX_HOST
> +	default m
> +	help
> +	  Enable access to TDX host services like module update and
> +	  extensions (e.g. TDX Connect).
> +
> +	  Say y or m if enabling support for confidential virtual machine
> +	  support (CONFIG_INTEL_TDX_HOST). The module is called tdx_host.ko.

In what world will anyone ever set INTEL_TDX_HOST=y, but turn this off?
Is this even worth a Kconfig prompt?

I guess we need it for the module or built in choice. But otherwise it
seems a bit silly.

^ permalink raw reply

* Re: [PATCH v9 02/23] x86/virt/tdx: Move TDX_FEATURES0 bits to asm/tdx.h
From: Dave Hansen @ 2026-05-15 16:15 UTC (permalink / raw)
  To: Chao Gao, kvm, linux-coco, linux-kernel
  Cc: binbin.wu, dave.hansen, djbw, ira.weiny, kai.huang, kas,
	nik.borisov, paulmck, pbonzini, reinette.chatre, rick.p.edgecombe,
	sagis, seanjc, tony.lindgren, vannapurve, vishal.l.verma,
	yilun.xu, xiaoyao.li, yan.y.zhao, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin
In-Reply-To: <20260513151045.1420990-3-chao.gao@intel.com>

On 5/13/26 08:09, Chao Gao wrote:
> This prepares for TDX module update [1] and Dynamic PAMT [2] support. Both
> add new TDX_FEATURES0 capability bits, and both need those capabilities to
> be queried from code outside arch/x86/virt. The corresponding feature-query
> helpers therefore need to live in the public asm/tdx.h header, so move the
> existing bit definitions there first.

Please don't add unnecessary changelog cruft. If you need this move for
this series, that's enough.

^ permalink raw reply

* Re: [PATCH v9 01/23] x86/virt/tdx: Consolidate TDX global initialization states
From: Dave Hansen @ 2026-05-15 16:14 UTC (permalink / raw)
  To: Chao Gao, kvm, linux-coco, linux-kernel
  Cc: binbin.wu, dave.hansen, djbw, ira.weiny, kai.huang, kas,
	nik.borisov, paulmck, pbonzini, reinette.chatre, rick.p.edgecombe,
	sagis, seanjc, tony.lindgren, vannapurve, vishal.l.verma,
	yilun.xu, xiaoyao.li, yan.y.zhao, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin
In-Reply-To: <20260513151045.1420990-2-chao.gao@intel.com>

On 5/13/26 08:09, Chao Gao wrote:
...
> Group the states into a single structure so they can be reset together, for
> example with memset(), and so a newly added state won't be missed.
...
> +struct tdx_module_state {
> +	bool initialized;
> +	bool sysinit_done;
> +	int sysinit_ret;
> +};
...
> @@ -113,30 +119,28 @@ static int try_init_module_global(void)
>  {
>  	struct tdx_module_args args = {};
>  	static DEFINE_RAW_SPINLOCK(sysinit_lock);
> -	static bool sysinit_done;
> -	static int sysinit_ret;

This doesn't look right to me.

>  	raw_spin_lock(&sysinit_lock);
>  
> -	if (sysinit_done)
> +	if (tdx_module_state.sysinit_done)
>  		goto out;
>  
>  	/* RCX is module attributes and all bits are reserved */
>  	args.rcx = 0;
> -	sysinit_ret = seamcall_prerr(TDH_SYS_INIT, &args);
> +	tdx_module_state.sysinit_ret = seamcall_prerr(TDH_SYS_INIT, &args);
>  
>  	/*
>  	 * The first SEAMCALL also detects the TDX module, thus
>  	 * it can fail due to the TDX module is not loaded.
>  	 * Dump message to let the user know.
>  	 */
> -	if (sysinit_ret == -ENODEV)
> +	if (tdx_module_state.sysinit_ret == -ENODEV)
>  		pr_err("module not loaded\n");
>  
> -	sysinit_done = true;
> +	tdx_module_state.sysinit_done = true;
>  out:
>  	raw_spin_unlock(&sysinit_lock);
> -	return sysinit_ret;
> +	return tdx_module_state.sysinit_ret;
>  }

But I think it's because 'sysinit_ret' is really a funky thing. It's
just here so that the module only _tries_ to be initialized once. That
one-time init records its error code in sysinit_ret and then secondary
callers pick it up.

Here's how you do that in a non-confusing way (some context chopped out):

static int try_init_module_global(void)
{
        struct tdx_module_args args = {};
        static DEFINE_RAW_SPINLOCK(sysinit_lock);
        int ret;

        raw_spin_lock(&sysinit_lock);

	/* Return the "cached" return code: */
        if (tdx_module_state.sysinit_done) {
		ret = tdx_module_state.sysinit_ret;
                goto out;
	}

        ret = seamcall_prerr(TDH_SYS_INIT, &args);

	/* Save the return code for later callers: */
	tdx_module_state.sysinit_ret  = ret;
        tdx_module_state.sysinit_done = true;
out:
        raw_spin_unlock(&sysinit_lock);
        return ret;
}

See how it sets the module state in _one_ place? It also only touches
the module state under the lock so it's more obvious that it is correct
and there are no races or tearing or other nonsense.

*That* is a proper refactoring.

I'm also not sure we need to be saving the return code. It seems a bit
much, but we don't have to fix that now.

^ permalink raw reply

* Re: [PATCH v2 03/15] KVM: x86/xen: Don't truncate RAX when handling hypercall from protected guest
From: Sean Christopherson @ 2026-05-15 12:55 UTC (permalink / raw)
  To: Binbin Wu
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Kiryl Shutsemau, David Woodhouse,
	Paul Durrant, Dave Hansen, Rick Edgecombe, kvm, x86, linux-coco,
	linux-kernel, Yosry Ahmed, Kai Huang
In-Reply-To: <27ba35fd-5563-4bbd-8f95-2285b50efa7a@linux.intel.com>

On Fri, May 15, 2026, Binbin Wu wrote:
> 
> 
> On 5/15/2026 5:53 AM, Sean Christopherson wrote:
> > Don't truncate RAX when handling a Xen hypercall for a guest with protected
> > state, as KVM's ABI is to assume the guest is in 64-bit for such cases
> > (the guest leaving garbage in 63:32 after a transition to 32-bit mode is
> > far less likely than 63:32 being necessary to complete the hypercall).
> > 
> > Fixes: b5aead0064f3 ("KVM: x86: Assume a 64-bit hypercall for guests with protected state")
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> 
> The patch looks good to me, but one question below.
> 
> > ---
> >  arch/x86/kvm/xen.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
> > index 6d9be74bb673..895095dc684e 100644
> > --- a/arch/x86/kvm/xen.c
> > +++ b/arch/x86/kvm/xen.c
> > @@ -1678,15 +1678,14 @@ int kvm_xen_hypercall(struct kvm_vcpu *vcpu)
> >  	bool handled = false;
> >  	u8 cpl;
> >  
> > -	input = (u64)kvm_register_read(vcpu, VCPU_REGS_RAX);
> > -
> >  	/* Hyper-V hypercalls get bit 31 set in EAX */
> > -	if ((input & 0x80000000) &&
> > +	if ((kvm_rax_read(vcpu) & 0x80000000) &&
> >  	    kvm_hv_hypercall_enabled(vcpu))
> >  		return kvm_hv_hypercall(vcpu);
> >  
> >  	longmode = is_64_bit_hypercall(vcpu);
> 
> Is the variable name misleading?

It most definitely is.  However, @longmode is passed around quite a few locations
in xen.c, and so I don't want to opportunistically fix this one variable.  Though
I'm definitely not opposed to a separate patch to rename them all to is_64bit or
something.

> If the vcpu is in compatible mode (when guest state is not protected),
> it's in long mode, but the code goes to !longmode path.
> 
> >  	if (!longmode) {
> > +		input = (u32)kvm_rax_read(vcpu);
> >  		params[0] = (u32)kvm_rbx_read(vcpu);
> >  		params[1] = (u32)kvm_rcx_read(vcpu);
> >  		params[2] = (u32)kvm_rdx_read(vcpu);
> > @@ -1696,6 +1695,7 @@ int kvm_xen_hypercall(struct kvm_vcpu *vcpu)
> >  	}
> >  	else {
> >  #ifdef CONFIG_X86_64
> > +		input = (u64)kvm_rax_read(vcpu);
> >  		params[0] = (u64)kvm_rdi_read(vcpu);
> >  		params[1] = (u64)kvm_rsi_read(vcpu);
> >  		params[2] = (u64)kvm_rdx_read(vcpu);
> 

^ permalink raw reply

* Re: [PATCH v2 12/15] KVM: x86: Harden is_64_bit_hypercall() against bugs on 32-bit kernels
From: Binbin Wu @ 2026-05-15  9:31 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Kiryl Shutsemau, David Woodhouse,
	Paul Durrant, Dave Hansen, Rick Edgecombe, kvm, x86, linux-coco,
	linux-kernel, Yosry Ahmed, Kai Huang
In-Reply-To: <20260514215355.1648463-13-seanjc@google.com>


On 5/15/2026 5:53 AM, Sean Christopherson wrote:
> Unconditionally return %false for is_64_bit_hypercall() on 32-bit kernels
> to guard against incorrectly setting guest_state_protected, and because
> in a (very) hypothetical world where 32-bit KVM supports protected guests,
> assuming a hypercall was made in 64-bit mode is flat out wrong.
> 
> Reviewed-by: Kai Huang <kai.huang@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>

> ---
>  arch/x86/kvm/regs.h | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/x86/kvm/regs.h b/arch/x86/kvm/regs.h
> index 52bed14f43e3..d4d2a47a4968 100644
> --- a/arch/x86/kvm/regs.h
> +++ b/arch/x86/kvm/regs.h
> @@ -39,12 +39,16 @@ static inline bool is_64_bit_mode(struct kvm_vcpu *vcpu)
>  
>  static inline bool is_64_bit_hypercall(struct kvm_vcpu *vcpu)
>  {
> +#ifdef CONFIG_X86_64
>  	/*
>  	 * If running with protected guest state, the CS register is not
>  	 * accessible. The hypercall register values will have had to been
>  	 * provided in 64-bit mode, so assume the guest is in 64-bit.
>  	 */
>  	return vcpu->arch.guest_state_protected || is_64_bit_mode(vcpu);
> +#else
> +	return false;
> +#endif
>  }
>  
>  static __always_inline unsigned long kvm_reg_mode_mask(struct kvm_vcpu *vcpu)


^ permalink raw reply

* Re: [PATCH v2 11/15] Revert "KVM: VMX: Read 32-bit GPR values for ENCLS instructions outside of 64-bit mode"
From: Binbin Wu @ 2026-05-15  9:26 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Kiryl Shutsemau, David Woodhouse,
	Paul Durrant, Dave Hansen, Rick Edgecombe, kvm, x86, linux-coco,
	linux-kernel, Yosry Ahmed, Kai Huang
In-Reply-To: <20260514215355.1648463-12-seanjc@google.com>


On 5/15/2026 5:53 AM, Sean Christopherson wrote:
> Now that kvm_<reg>_read() are mode aware, i.e. are functionally equivalent
> to kvm_register_read(), revert aback to the less verbose versions.
> 
> No functional change intended.
> 
> This reverts commit 60919eccf6764c71cef31a1afeaa1a36b8e5ab85.
> 
> Acked-by: Kai Huang <kai.huang@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>

> ---
>  arch/x86/kvm/vmx/sgx.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
> index 2f5a1c58f3c5..876dc2814108 100644
> --- a/arch/x86/kvm/vmx/sgx.c
> +++ b/arch/x86/kvm/vmx/sgx.c
> @@ -225,8 +225,8 @@ static int handle_encls_ecreate(struct kvm_vcpu *vcpu)
>  	struct x86_exception ex;
>  	int r;
>  
> -	if (sgx_get_encls_gva(vcpu, kvm_register_read(vcpu, VCPU_REGS_RBX), 32, 32, &pageinfo_gva) ||
> -	    sgx_get_encls_gva(vcpu, kvm_register_read(vcpu, VCPU_REGS_RCX), 4096, 4096, &secs_gva))
> +	if (sgx_get_encls_gva(vcpu, kvm_rbx_read(vcpu), 32, 32, &pageinfo_gva) ||
> +	    sgx_get_encls_gva(vcpu, kvm_rcx_read(vcpu), 4096, 4096, &secs_gva))
>  		return 1;
>  
>  	/*
> @@ -302,9 +302,9 @@ static int handle_encls_einit(struct kvm_vcpu *vcpu)
>  	gpa_t sig_gpa, secs_gpa, token_gpa;
>  	int ret, trapnr;
>  
> -	if (sgx_get_encls_gva(vcpu, kvm_register_read(vcpu, VCPU_REGS_RBX), 1808, 4096, &sig_gva) ||
> -	    sgx_get_encls_gva(vcpu, kvm_register_read(vcpu, VCPU_REGS_RCX), 4096, 4096, &secs_gva) ||
> -	    sgx_get_encls_gva(vcpu, kvm_register_read(vcpu, VCPU_REGS_RDX), 304, 512, &token_gva))
> +	if (sgx_get_encls_gva(vcpu, kvm_rbx_read(vcpu), 1808, 4096, &sig_gva) ||
> +	    sgx_get_encls_gva(vcpu, kvm_rcx_read(vcpu), 4096, 4096, &secs_gva) ||
> +	    sgx_get_encls_gva(vcpu, kvm_rdx_read(vcpu), 304, 512, &token_gva))
>  		return 1;
>  
>  	/*


^ permalink raw reply

* Re: [PATCH v2 09/15] KVM: x86: Drop non-raw kvm_<reg>_write() helpers
From: Binbin Wu @ 2026-05-15  9:11 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Kiryl Shutsemau, David Woodhouse,
	Paul Durrant, Dave Hansen, Rick Edgecombe, kvm, x86, linux-coco,
	linux-kernel, Yosry Ahmed, Kai Huang
In-Reply-To: <20260514215355.1648463-10-seanjc@google.com>



On 5/15/2026 5:53 AM, Sean Christopherson wrote:
> Drop the non-raw, mode-aware kvm_<reg>_write() helpers as there is no
> usage in KVM, and in all likelihood there will never be usage in KVM as
> use of hardcoded registers in instructions is uncommon, and *modifying*
> hardcoded registers is practically unheard of.  While there are a few
> instructions that modify registers in mode-aware ways, e.g. REP string
> and some ENCLS varieties, the odds of KVM needing to emulate such
> instructions (outside of the fully emulator) are vanishingly small.
> 
> Drop kvm_<reg>_write() to prevent incorrect usage; _if_ a new instruction
> comes along that needs to modify a hardcoded register, this can be
> reverted.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>

> ---
>  arch/x86/kvm/regs.h | 5 -----
>  1 file changed, 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/regs.h b/arch/x86/kvm/regs.h
> index b28e71caed25..52bed14f43e3 100644
> --- a/arch/x86/kvm/regs.h
> +++ b/arch/x86/kvm/regs.h
> @@ -61,11 +61,6 @@ static __always_inline unsigned long kvm_##lname##_read(struct kvm_vcpu *vcpu)
>  {											\
>  	return vcpu->arch.regs[VCPU_REGS_##uname] & kvm_reg_mode_mask(vcpu);		\
>  }											\
> -static __always_inline void kvm_##lname##_write(struct kvm_vcpu *vcpu,			\
> -						unsigned long val)			\
> -{											\
> -	vcpu->arch.regs[VCPU_REGS_##uname] = val & kvm_reg_mode_mask(vcpu);		\
> -}											\
>  static __always_inline unsigned long kvm_##lname##_read_raw(struct kvm_vcpu *vcpu)	\
>  {											\
>  	return vcpu->arch.regs[VCPU_REGS_##uname];					\


^ permalink raw reply

* Re: [PATCH v2 08/15] KVM: x86: Add mode-aware versions of kvm_<reg>_{read,write}() helpers
From: Binbin Wu @ 2026-05-15  8:46 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Kiryl Shutsemau, David Woodhouse,
	Paul Durrant, Dave Hansen, Rick Edgecombe, kvm, x86, linux-coco,
	linux-kernel, Yosry Ahmed, Kai Huang
In-Reply-To: <20260514215355.1648463-9-seanjc@google.com>



On 5/15/2026 5:53 AM, Sean Christopherson wrote:
> Make kvm_<reg>_{read,write}() mode-aware (where the value is truncated to
> 32 bits if the vCPU isn't in 64-bit mode), and convert all the intentional
> "raw" accesses to kvm_<reg>_{read,write}_raw() versions.  To avoid
> confusion and bikeshedding over whether or not explicit 32-bit accesses
> should use the "raw" or mode-aware variants, add and use "e" versions, e.g.
> for things like RDMSR, WRMSR, and CPUID, where the instruction uses only
> only bits 31:0, regardless of mode.
   ^
  double "only"

> 
> No functional change intended (all use of "e" versions is for cases where
> the value is already truncated due to bouncing through a u32).
> 
> Cc: Binbin Wu <binbin.wu@linux.intel.com>
> Cc: Kai Huang <kai.huang@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>

^ permalink raw reply

* Re: [PATCH v2 07/15] KVM: x86: Move inlined CR and DR helpers from x86.h to regs.h
From: Binbin Wu @ 2026-05-15  8:07 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Kiryl Shutsemau, David Woodhouse,
	Paul Durrant, Dave Hansen, Rick Edgecombe, kvm, x86, linux-coco,
	linux-kernel, Yosry Ahmed, Kai Huang
In-Reply-To: <20260514215355.1648463-8-seanjc@google.com>



On 5/15/2026 5:53 AM, Sean Christopherson wrote:
> Move inlined Control Register and Debug Register helpers from x86.h to the
> aptly named regs.h, to help trim down x86.h (and x86.c in the future).
> 
> Move select EFER functionality, but leave behind all other MSR handling,
> There is more than enough MSR code to carve out msr.{c,h} in the future.
> Give EFER special treatment as it's an "MSR" in name only, e.g. it's has
                                                                    ^
                                                                   it

> far more in common with CR4 than it does with any MSR.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>

Nit: The shortlog doesn't cover EFER.



^ permalink raw reply

* Re: [PATCH v2 06/15] KVM: x86: Rename kvm_cache_regs.h => regs.h
From: Binbin Wu @ 2026-05-15  7:45 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Kiryl Shutsemau, David Woodhouse,
	Paul Durrant, Dave Hansen, Rick Edgecombe, kvm, x86, linux-coco,
	linux-kernel, Yosry Ahmed, Kai Huang
In-Reply-To: <20260514215355.1648463-7-seanjc@google.com>


On 5/15/2026 5:53 AM, Sean Christopherson wrote:
> Rename kvm_cache_regs.h to simply regs.h, as the "cache" nomenclature is
> already a lie (the file deals with state/registers that aren't cached per
> se), and so that more code/functionality can be landed in the header
> without making it a truly horrible misnomer.
> 
> Deliberately drop the kvm_ prefix/namespace to align with other "local"
> headers, and to further differentiate regs.h from the public/global
> arch/x86/include/asm/kvm_vcpu_regs.h, which sadly needs to stay in asm/
> so that the number of registers can be referenced by kvm_vcpu_arch.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>


^ permalink raw reply

* Re: [PATCH v2 05/15] KVM: x86: Trace hypercall register *after* truncating values for 32-bit
From: Binbin Wu @ 2026-05-15  7:32 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Kiryl Shutsemau, David Woodhouse,
	Paul Durrant, Dave Hansen, Rick Edgecombe, kvm, x86, linux-coco,
	linux-kernel, Yosry Ahmed, Kai Huang
In-Reply-To: <20260514215355.1648463-6-seanjc@google.com>



On 5/15/2026 5:53 AM, Sean Christopherson wrote:
> When tracing hypercalls, invoke the tracepoint *after* truncating the
> register values for 32-bit guests so as not to record unused garbage (in
> the extremely unlikely scenario that the guest left garbage in a register
> after transitioning from 64-bit mode to 32-bit mode).
> 
> Fixes: 229456fc34b1 ("KVM: convert custom marker based tracing to event traces")
> Reviewed-by: Yosry Ahmed <yosry@kernel.org>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>

> ---
>  arch/x86/kvm/x86.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 209eae67ab18..23b3957b9ae0 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10430,8 +10430,6 @@ int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, int cpl,
>  
>  	++vcpu->stat.hypercalls;
>  
> -	trace_kvm_hypercall(nr, a0, a1, a2, a3);
> -
>  	if (!op_64_bit) {
>  		nr &= 0xFFFFFFFF;
>  		a0 &= 0xFFFFFFFF;
> @@ -10440,6 +10438,8 @@ int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, int cpl,
>  		a3 &= 0xFFFFFFFF;
>  	}
>  
> +	trace_kvm_hypercall(nr, a0, a1, a2, a3);
> +
>  	if (cpl) {
>  		ret = -KVM_EPERM;
>  		goto out;


^ permalink raw reply

* Re: [PATCH v2 04/15] KVM: VMX: Read 32-bit GPR values for ENCLS instructions outside of 64-bit mode
From: Binbin Wu @ 2026-05-15  7:26 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Kiryl Shutsemau, David Woodhouse,
	Paul Durrant, Dave Hansen, Rick Edgecombe, kvm, x86, linux-coco,
	linux-kernel, Yosry Ahmed, Kai Huang
In-Reply-To: <20260514215355.1648463-5-seanjc@google.com>



On 5/15/2026 5:53 AM, Sean Christopherson wrote:
> When getting register values for ENCLS emulation, use kvm_register_read()
> instead of kvm_<reg>_read() so that bits 63:32 of the register are dropped
> if the guest is in 32-bit mode.
> 
> Note, the misleading/surprising behavior of kvm_<reg>_read() being "raw"
> variants under the hood will be addressed once all non-benign bugs are
> fixed.
> 
> Fixes: 70210c044b4e ("KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions")
> Fixes: b6f084ca5538 ("KVM: VMX: Add ENCLS[EINIT] handler to support SGX Launch Control (LC)")
> Acked-by: Kai Huang <kai.huang@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>

> ---
>  arch/x86/kvm/vmx/sgx.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
> index df1d0cf76947..4c61fc33f764 100644
> --- a/arch/x86/kvm/vmx/sgx.c
> +++ b/arch/x86/kvm/vmx/sgx.c
> @@ -225,8 +225,8 @@ static int handle_encls_ecreate(struct kvm_vcpu *vcpu)
>  	struct x86_exception ex;
>  	int r;
>  
> -	if (sgx_get_encls_gva(vcpu, kvm_rbx_read(vcpu), 32, 32, &pageinfo_gva) ||
> -	    sgx_get_encls_gva(vcpu, kvm_rcx_read(vcpu), 4096, 4096, &secs_gva))
> +	if (sgx_get_encls_gva(vcpu, kvm_register_read(vcpu, VCPU_REGS_RBX), 32, 32, &pageinfo_gva) ||
> +	    sgx_get_encls_gva(vcpu, kvm_register_read(vcpu, VCPU_REGS_RCX), 4096, 4096, &secs_gva))
>  		return 1;
>  
>  	/*
> @@ -302,9 +302,9 @@ static int handle_encls_einit(struct kvm_vcpu *vcpu)
>  	gpa_t sig_gpa, secs_gpa, token_gpa;
>  	int ret, trapnr;
>  
> -	if (sgx_get_encls_gva(vcpu, kvm_rbx_read(vcpu), 1808, 4096, &sig_gva) ||
> -	    sgx_get_encls_gva(vcpu, kvm_rcx_read(vcpu), 4096, 4096, &secs_gva) ||
> -	    sgx_get_encls_gva(vcpu, kvm_rdx_read(vcpu), 304, 512, &token_gva))
> +	if (sgx_get_encls_gva(vcpu, kvm_register_read(vcpu, VCPU_REGS_RBX), 1808, 4096, &sig_gva) ||
> +	    sgx_get_encls_gva(vcpu, kvm_register_read(vcpu, VCPU_REGS_RCX), 4096, 4096, &secs_gva) ||
> +	    sgx_get_encls_gva(vcpu, kvm_register_read(vcpu, VCPU_REGS_RDX), 304, 512, &token_gva))
>  		return 1;
>  
>  	/*


^ permalink raw reply

* Re: [PATCH v2 03/15] KVM: x86/xen: Don't truncate RAX when handling hypercall from protected guest
From: Binbin Wu @ 2026-05-15  7:21 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Kiryl Shutsemau, David Woodhouse,
	Paul Durrant, Dave Hansen, Rick Edgecombe, kvm, x86, linux-coco,
	linux-kernel, Yosry Ahmed, Kai Huang
In-Reply-To: <20260514215355.1648463-4-seanjc@google.com>



On 5/15/2026 5:53 AM, Sean Christopherson wrote:
> Don't truncate RAX when handling a Xen hypercall for a guest with protected
> state, as KVM's ABI is to assume the guest is in 64-bit for such cases
> (the guest leaving garbage in 63:32 after a transition to 32-bit mode is
> far less likely than 63:32 being necessary to complete the hypercall).
> 
> Fixes: b5aead0064f3 ("KVM: x86: Assume a 64-bit hypercall for guests with protected state")
> Signed-off-by: Sean Christopherson <seanjc@google.com>

The patch looks good to me, but one question below.

> ---
>  arch/x86/kvm/xen.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
> index 6d9be74bb673..895095dc684e 100644
> --- a/arch/x86/kvm/xen.c
> +++ b/arch/x86/kvm/xen.c
> @@ -1678,15 +1678,14 @@ int kvm_xen_hypercall(struct kvm_vcpu *vcpu)
>  	bool handled = false;
>  	u8 cpl;
>  
> -	input = (u64)kvm_register_read(vcpu, VCPU_REGS_RAX);
> -
>  	/* Hyper-V hypercalls get bit 31 set in EAX */
> -	if ((input & 0x80000000) &&
> +	if ((kvm_rax_read(vcpu) & 0x80000000) &&
>  	    kvm_hv_hypercall_enabled(vcpu))
>  		return kvm_hv_hypercall(vcpu);
>  
>  	longmode = is_64_bit_hypercall(vcpu);

Is the variable name misleading?
If the vcpu is in compatible mode (when guest state is not protected),
it's in long mode, but the code goes to !longmode path.

>  	if (!longmode) {
> +		input = (u32)kvm_rax_read(vcpu);
>  		params[0] = (u32)kvm_rbx_read(vcpu);
>  		params[1] = (u32)kvm_rcx_read(vcpu);
>  		params[2] = (u32)kvm_rdx_read(vcpu);
> @@ -1696,6 +1695,7 @@ int kvm_xen_hypercall(struct kvm_vcpu *vcpu)
>  	}
>  	else {
>  #ifdef CONFIG_X86_64
> +		input = (u64)kvm_rax_read(vcpu);
>  		params[0] = (u64)kvm_rdi_read(vcpu);
>  		params[1] = (u64)kvm_rsi_read(vcpu);
>  		params[2] = (u64)kvm_rdx_read(vcpu);


^ permalink raw reply

* Re: [PATCH v2 02/15] KVM: x86/xen: Bug the VM if 32-bit KVM observes a 64-bit mode hypercall
From: Binbin Wu @ 2026-05-15  6:46 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Kiryl Shutsemau, David Woodhouse,
	Paul Durrant, Dave Hansen, Rick Edgecombe, kvm, x86, linux-coco,
	linux-kernel, Yosry Ahmed, Kai Huang
In-Reply-To: <20260514215355.1648463-3-seanjc@google.com>



On 5/15/2026 5:53 AM, Sean Christopherson wrote:
> Bug the VM if 32-bit KVM attempts to handle a 64-bit hypercall, primarily
> so that a future change to set "input" in mode-specific code doesn't
> trigger a false positive warn=>error:
> 
>   arch/x86/kvm/xen.c:1687:6: error: variable 'input' is used uninitialized
>                                     whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
>    1687 |         if (!longmode) {
>         |             ^~~~~~~~~
>   arch/x86/kvm/xen.c:1708:31: note: uninitialized use occurs here
>    1708 |         trace_kvm_xen_hypercall(cpl, input, params[0], params[1], params[2],
>         |                                      ^~~~~
>   x86/kvm/xen.c:1687:2: note: remove the 'if' if its condition is always true
>    1687 |         if (!longmode) {
>         |         ^~~~~~~~~~~~~~
>   arch/x86/kvm/xen.c:1677:11: note: initialize the variable 'input' to silence this warning
>    1677 |         u64 input, params[6], r = -ENOSYS;
>         |                  ^
>   1 error generated.
> 
> Note, params[] also has the same flaw, but -Wsometimes-uninitialized
> doesn't seem to be enforced for arrays, presumably because it's difficult
> to avoid false positives on specific entries.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>

> ---
>  arch/x86/kvm/xen.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
> index 91fd3673c09a..6d9be74bb673 100644
> --- a/arch/x86/kvm/xen.c
> +++ b/arch/x86/kvm/xen.c
> @@ -1694,16 +1694,19 @@ int kvm_xen_hypercall(struct kvm_vcpu *vcpu)
>  		params[4] = (u32)kvm_rdi_read(vcpu);
>  		params[5] = (u32)kvm_rbp_read(vcpu);
>  	}
> -#ifdef CONFIG_X86_64
>  	else {
> +#ifdef CONFIG_X86_64
>  		params[0] = (u64)kvm_rdi_read(vcpu);
>  		params[1] = (u64)kvm_rsi_read(vcpu);
>  		params[2] = (u64)kvm_rdx_read(vcpu);
>  		params[3] = (u64)kvm_r10_read(vcpu);
>  		params[4] = (u64)kvm_r8_read(vcpu);
>  		params[5] = (u64)kvm_r9_read(vcpu);
> -	}
> +#else
> +		KVM_BUG_ON(1, vcpu->kvm);
> +		return -EIO;
>  #endif
> +	}
>  	cpl = kvm_x86_call(get_cpl)(vcpu);
>  	trace_kvm_xen_hypercall(cpl, input, params[0], params[1], params[2],
>  				params[3], params[4], params[5]);


^ permalink raw reply

* Re: [PATCH v2 01/15] KVM: SVM: Truncate INVLPGA address in compatibility mode
From: Binbin Wu @ 2026-05-15  6:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Kiryl Shutsemau, David Woodhouse,
	Paul Durrant, Dave Hansen, Rick Edgecombe, kvm, x86, linux-coco,
	linux-kernel, Yosry Ahmed, Kai Huang
In-Reply-To: <20260514215355.1648463-2-seanjc@google.com>



On 5/15/2026 5:53 AM, Sean Christopherson wrote:
> Check for full 64-bit mode, not just long mode, when truncating the
> virtual address as part of INVLPGA emulation.  Compatibility mode doesn't
> support 64-bit addressing.
> 
> Note, the FIXME still applies, e.g. if the guest deliberately targeted
> EAX while in 64-bit via an address size override.  That flaw isn't worth
> fixing as it would require decoding the code stream, which would open a
                                                                        ^
an extra 'a'

> an entirely different can of worms, and in practice no sane guest would
> shove garbage into RAX[63:32] and execute INVLPGA.
> 
> Note #2, VMSAVE, VMLOAD, and VMRUN all suffer from the same architectural
> flaw of not providing the full linear address in a VMCB exit information
> field, because, quoting the APM verbatim:
> 
>   the linear address is available directly from the guest rAX register
> 
> (VMSAVE, VMLOAD, and VMRUN take a physical address, but they're behavior
                                                              ^
                                                          their      > with respect to rAX is otherwise identical).
> 
> Fixes: bc9eff67fc35 ("KVM: SVM: Use default rAX size for INVLPGA emulation")
> Reviewed-by: Yosry Ahmed <yosry@kernel.org>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>

> ---
>  arch/x86/kvm/svm/svm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index e74fcde6155e..4ad87f8df392 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2415,7 +2415,7 @@ static int invlpga_interception(struct kvm_vcpu *vcpu)
>  		return 1;
>  
>  	/* FIXME: Handle an address size prefix. */
> -	if (!is_long_mode(vcpu))
> +	if (!is_64_bit_mode(vcpu))
>  		gva = (u32)gva;
>  
>  	trace_kvm_invlpga(to_svm(vcpu)->vmcb->save.rip, asid, gva);


^ permalink raw reply

* Re: [PATCH v9 20/23] x86/virt/tdx: Reject updates during compatibility-sensitive operations
From: Chao Gao @ 2026-05-15  6:12 UTC (permalink / raw)
  To: kvm, linux-coco, linux-kernel
  Cc: binbin.wu, dave.hansen, djbw, ira.weiny, kai.huang, kas,
	nik.borisov, paulmck, pbonzini, reinette.chatre, rick.p.edgecombe,
	sagis, seanjc, tony.lindgren, vannapurve, vishal.l.verma,
	yilun.xu, xiaoyao.li, yan.y.zhao, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin
In-Reply-To: <20260513151045.1420990-21-chao.gao@intel.com>

> /* Bit definitions of TDX_FEATURES0 metadata field */
> #define TDX_FEATURES0_NO_RBP_MOD	BIT_ULL(18)
>+#define TDX_FEATURES0_UPDATE_COMPAT	BIT_ULL(47)

Sashiko pointed out that this definition is currently unused.

We do not pre-check this TDX_FEATURES0 bit. So, I will drop this definition
in the next revision.

^ permalink raw reply

* Re: [PATCH v9 11/23] x86/virt/seamldr: Allocate and populate a module update request
From: Chao Gao @ 2026-05-15  6:05 UTC (permalink / raw)
  To: kvm, linux-coco, linux-kernel
  Cc: binbin.wu, dave.hansen, djbw, ira.weiny, kai.huang, kas,
	nik.borisov, paulmck, pbonzini, reinette.chatre, rick.p.edgecombe,
	sagis, seanjc, tony.lindgren, vannapurve, vishal.l.verma,
	yilun.xu, xiaoyao.li, yan.y.zhao, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin
In-Reply-To: <20260513151045.1420990-12-chao.gao@intel.com>

>+static int init_seamldr_params(struct seamldr_params *params, const u8 *data, u32 size)
>+{
>+	const struct tdx_image *image		= (const void *)data;
>+	const struct tdx_image_header *header	= &image->header;
>+
>+	u32 sigstruct_len	= header->sigstruct_nr_pages * PAGE_SIZE;
>+	u32 module_len		= header->module_nr_pages * PAGE_SIZE;

I looked at Sashiko's two reports here.

 (1) The header is dereferenced before validating that the input is large
     enough to contain a full header.

 (2) The page-count to byte-count multiplication could in principle
     overflow.

For (1), I agree the validation order should be fixed. Even if the input
buffer is page-backed in practice, the parser should still verify that
size is at least sizeof(struct tdx_image_header) before dereferencing the
header.

For (2), I think using u64 for the derived byte lengths is sufficient in
this case. That avoids overflow in the multiplication itself, and the later
size consistency check:

	HEADER_SIZE + sigstruct_len + module_len != size

will reject malformed inputs.

Below is the fix I plan to fold into this patch in the next revision:

diff --git a/arch/x86/virt/vmx/tdx/seamldr.c
b/arch/x86/virt/vmx/tdx/seamldr.c
index 58ce39315b60..9f4350079477 100644
--- a/arch/x86/virt/vmx/tdx/seamldr.c
+++ b/arch/x86/virt/vmx/tdx/seamldr.c
@@ -148,8 +148,8 @@ static int init_seamldr_params(struct seamldr_params
*params, const u8 *data, u3
	const struct tdx_image *image		= (const void *)data;
	const struct tdx_image_header *header	= &image->header;
 
-	u32 sigstruct_len	= header->sigstruct_nr_pages * PAGE_SIZE;
-	u32 module_len		= header->module_nr_pages * PAGE_SIZE;
+	u64 sigstruct_len	= header->sigstruct_nr_pages * PAGE_SIZE;
+	u64 module_len		= header->module_nr_pages * PAGE_SIZE;
 
	u8 *header_start	= (u8 *)header;
	u8 *header_end		= header_start + HEADER_SIZE;
@@ -299,6 +299,9 @@ int seamldr_install_module(const u8 *data, u32 size)
	struct seamldr_params *params;
	int ret;
 
+	if (size <= HEADER_SIZE)
+		return -EINVAL;
+
	params = kzalloc_obj(*params);
	if (!params)
		return -ENOMEM;

^ permalink raw reply related

* Re: [PATCH v2 00/15] KVM: x86: Clean up kvm_<reg>_{read,write}() mess
From: Yosry Ahmed @ 2026-05-14 22:31 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Kiryl Shutsemau, David Woodhouse,
	Paul Durrant, Dave Hansen, Rick Edgecombe, kvm, x86, linux-coco,
	linux-kernel, Kai Huang, Binbin Wu
In-Reply-To: <20260514215355.1648463-1-seanjc@google.com>

On Thu, May 14, 2026 at 2:54 PM Sean Christopherson <seanjc@google.com> wrote:
>
> Add proper, explicit "raw" versions of kvm_<reg>_{read,write}(), along
> with "e" versions (for hardcoded 32-bit accesses), and convert the
> existing kvm_<reg>_{read,write}() APIs into mode-aware variants.
>
> This was prompted by commit 435741a4e766 ("KVM: SVM: Properly check RAX
> on #GP intercept of SVM instructions"), where using kvm_rax_read() to
> get EAX/RAX would have (*very* surprisingly) been wrong as it's actually
> a "raw" variant that doesn't truncate accesses when the guest is in 32-bit
> mode.
>
> Aside from my dislike of inconsistent APIs, I really want to avoid carrying
> code that's subtly relying on using kvm_register_read(...) when accessing a
> hardcoded register.
>
> Fix a handful of minor warts along the way.
>
> Oh, and introduce regs.{c,h}, which just a "minor" addendum.  Yosry pointed
> out that moving _more_ code into x86.h was rather gross (especially since the
> code split was super arbitrary), and it turns out that create regs.{c,h} isn't
> all that hard.  In the future, I think we can also add msr.{c,h}, so I very
> deliberately didn't include that functionality in regs.{c,h}.
>
> v2:
>  - Collect tags. [Yosry, Kai
>  - Fix some truly egregious goofs. [Binbin]
>  - Rename kvm_cache_regs.h => regs.h, add regs.c. [Yosry, though he'll
>    probably yell at me for saying this was his suggestion :-) ]

This is kinda sorta the opposite of what I suggested, but sure :P

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox