All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/2] x86/tdx: Port I/O emulation fixes
@ 2026-05-27 12:05 Kiryl Shutsemau (Meta)
  2026-05-27 12:05 ` [PATCH v3 1/2] x86/tdx: Fix off-by-one in port I/O handling Kiryl Shutsemau (Meta)
  2026-05-27 12:05 ` [PATCH v3 2/2] x86/tdx: Fix zero-extension for 32-bit port I/O Kiryl Shutsemau (Meta)
  0 siblings, 2 replies; 12+ messages in thread
From: Kiryl Shutsemau (Meta) @ 2026-05-27 12:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86
  Cc: H . Peter Anvin, Rick Edgecombe, Kuppuswamy Sathyanarayanan,
	Kai Huang, Sean Christopherson, Borys Tsyrulnikov, linux-kernel,
	linux-coco, kvm, stable, Kiryl Shutsemau (Meta)

This series addresses two technical inaccuracies in the TDX guest port
I/O emulation code reported by Borys Tsyrulnikov.

The first patch fixes an off-by-one error in the GENMASK() macro usage
where the mask was being calculated as one bit too wide (e.g. 9 bits for
an 8-bit operation).

The second patch ensures that 32-bit port I/O operations (INL) correctly
zero-extend the result to the full 64-bit RAX register, as required by
the x86 architecture. Currently, the emulation preserves the upper 32
bits of RAX during such operations.

Both issues were introduced in the initial implementation of the runtime
hypercalls for port I/O.

v1: https://lore.kernel.org/all/20260331112430.71425-1-kas@kernel.org/
v2: https://lore.kernel.org/all/20260428125632.129770-1-kas@kernel.org/

Changes in v3:
  - Expand the comment in patch 2 with a table describing which RAX
    bits each IN form writes vs preserves, clarifying why the 32-bit
    case needs to clear RAX[63:32] (Dave Hansen).
  - Rebase onto v7.1-rc5.

Changes in v2:
  - Rephrase the size check in handle_in() as "if (size == 4)" for
    readability (Kuppuswamy)
  - Add Link: to the bug report on both patches (Kuppuswamy)
  - Collect Reviewed-by tags (Kai Huang, Kuppuswamy Sathyanarayanan)
  - Rebase onto v7.1-rc1

Kiryl Shutsemau (Meta) (2):
  x86/tdx: Fix off-by-one in port I/O handling
  x86/tdx: Fix zero-extension for 32-bit port I/O

 arch/x86/coco/tdx/tdx.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)


base-commit: e7ae89a0c97ce2b68b0983cd01eda67cf373517d
-- 
2.54.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 1/2] x86/tdx: Fix off-by-one in port I/O handling
  2026-05-27 12:05 [PATCH v3 0/2] x86/tdx: Port I/O emulation fixes Kiryl Shutsemau (Meta)
@ 2026-05-27 12:05 ` Kiryl Shutsemau (Meta)
  2026-05-27 12:47   ` sashiko-bot
  2026-05-27 15:38   ` Edgecombe, Rick P
  2026-05-27 12:05 ` [PATCH v3 2/2] x86/tdx: Fix zero-extension for 32-bit port I/O Kiryl Shutsemau (Meta)
  1 sibling, 2 replies; 12+ messages in thread
From: Kiryl Shutsemau (Meta) @ 2026-05-27 12:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86
  Cc: H . Peter Anvin, Rick Edgecombe, Kuppuswamy Sathyanarayanan,
	Kai Huang, Sean Christopherson, Borys Tsyrulnikov, linux-kernel,
	linux-coco, kvm, stable, Kiryl Shutsemau (Meta)

handle_in() and handle_out() in arch/x86/coco/tdx/tdx.c use:

    u64 mask = GENMASK(BITS_PER_BYTE * size, 0);

GENMASK(h, l) includes bit h. For size=1 (INB), this produces
GENMASK(8, 0) = 0x1FF (9 bits) instead of GENMASK(7, 0) = 0xFF (8
bits). The mask is one bit too wide for all I/O sizes.

Fix the mask calculation.

Fixes: 03149948832a ("x86/tdx: Port I/O: Add runtime hypercalls")
Reported-by: Borys Tsyrulnikov <tsyrulnikov.borys@gmail.com>
Link: https://lore.kernel.org/all/CAKw_Dz96rfSQc6Rn+9QBcUFHhmkK+9zu+P=bxowfZwxrATCBRg@mail.gmail.com/
Signed-off-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: stable@vger.kernel.org
---
 arch/x86/coco/tdx/tdx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 186915a17c50..65119362f9a2 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -693,7 +693,7 @@ static bool handle_in(struct pt_regs *regs, int size, int port)
 		.r13 = PORT_READ,
 		.r14 = port,
 	};
-	u64 mask = GENMASK(BITS_PER_BYTE * size, 0);
+	u64 mask = GENMASK(BITS_PER_BYTE * size - 1, 0);
 	bool success;
 
 	/*
@@ -713,7 +713,7 @@ static bool handle_in(struct pt_regs *regs, int size, int port)
 
 static bool handle_out(struct pt_regs *regs, int size, int port)
 {
-	u64 mask = GENMASK(BITS_PER_BYTE * size, 0);
+	u64 mask = GENMASK(BITS_PER_BYTE * size - 1, 0);
 
 	/*
 	 * Emulate the I/O write via hypercall. More info about ABI can be found
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 2/2] x86/tdx: Fix zero-extension for 32-bit port I/O
  2026-05-27 12:05 [PATCH v3 0/2] x86/tdx: Port I/O emulation fixes Kiryl Shutsemau (Meta)
  2026-05-27 12:05 ` [PATCH v3 1/2] x86/tdx: Fix off-by-one in port I/O handling Kiryl Shutsemau (Meta)
@ 2026-05-27 12:05 ` Kiryl Shutsemau (Meta)
  2026-05-27 13:12   ` sashiko-bot
                     ` (2 more replies)
  1 sibling, 3 replies; 12+ messages in thread
From: Kiryl Shutsemau (Meta) @ 2026-05-27 12:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86
  Cc: H . Peter Anvin, Rick Edgecombe, Kuppuswamy Sathyanarayanan,
	Kai Huang, Sean Christopherson, Borys Tsyrulnikov, linux-kernel,
	linux-coco, kvm, stable, Kiryl Shutsemau (Meta)

According to x86 architecture rules, 32-bit operations zero-extend the
result to 64 bits. The current implementation of handle_in() only masks
the lower 32 bits, which preserves the upper 32 bits of RAX when a
32-bit port IN instruction is emulated.

Update handle_in() to zero out the entire RAX register when the I/O size
is 4 bytes to ensure correct zero-extension. For smaller sizes (1 or 2
bytes), continue to preserve the unaffected upper bits.

Fixes: 03149948832a ("x86/tdx: Port I/O: Add runtime hypercalls")
Reported-by: Borys Tsyrulnikov <tsyrulnikov.borys@gmail.com>
Link: https://lore.kernel.org/all/CAKw_Dz96rfSQc6Rn+9QBcUFHhmkK+9zu+P=bxowfZwxrATCBRg@mail.gmail.com/
Signed-off-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: stable@vger.kernel.org
---
 arch/x86/coco/tdx/tdx.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 65119362f9a2..58feca419326 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -703,8 +703,25 @@ static bool handle_in(struct pt_regs *regs, int size, int port)
 	 */
 	success = !__tdx_hypercall(&args);
 
-	/* Update part of the register affected by the emulated instruction */
-	regs->ax &= ~mask;
+	/*
+	 * IN writes the result into a sub-register of RAX. Only the
+	 * 32-bit form zero-extends; the smaller forms leave the upper
+	 * bits untouched:
+	 *
+	 *   insn  dest  size  bits written     bits preserved
+	 *   inb   AL    1     RAX[ 7: 0]       RAX[63: 8]
+	 *   inw   AX    2     RAX[15: 0]       RAX[63:16]
+	 *   inl   EAX   4     RAX[63: 0]       (none, zero-extended)
+	 *
+	 * 'mask' only covers the low 'size' bytes, which is exactly the
+	 * range affected for size 1 and 2. For size 4 the write also
+	 * clears RAX[63:32], so widen the clear-mask.
+	 */
+	if (size == 4)
+		regs->ax = 0;
+	else
+		regs->ax &= ~mask;
+
 	if (success)
 		regs->ax |= args.r11 & mask;
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/2] x86/tdx: Fix off-by-one in port I/O handling
  2026-05-27 12:05 ` [PATCH v3 1/2] x86/tdx: Fix off-by-one in port I/O handling Kiryl Shutsemau (Meta)
@ 2026-05-27 12:47   ` sashiko-bot
  2026-05-27 15:38   ` Edgecombe, Rick P
  1 sibling, 0 replies; 12+ messages in thread
From: sashiko-bot @ 2026-05-27 12:47 UTC (permalink / raw)
  Cc: kvm

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [Critical] VMM-controlled Register Poisoning in CPUID emulation.
--

commit 8bf39531f06bffdf9286b3ddc6968088a81a8929
Author: Kiryl Shutsemau (Meta) <kas@kernel.org>

x86/tdx: Fix off-by-one in port I/O handling

This commit corrects an off-by-one error in the mask calculation for port io
handling. The previous implementation used a mask that was one bit too wide
for all io sizes, which is now corrected.

Link: https://lore.kernel.org/all/CAKw_Dz96rfSQc6Rn+9QBcUFHhmkK+9zu+P=bxowfZwxrATCBRg@mail.gmail.com/

> diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
> index 186915a17c50f..65119362f9a25 100644
> --- a/arch/x86/coco/tdx/tdx.c
> +++ b/arch/x86/coco/tdx/tdx.c
> @@ -693,7 +693,7 @@ static bool handle_in(struct pt_regs *regs, int size, int port)
>  		.r13 = PORT_READ,
>  		.r14 = port,
>  	};
> -	u64 mask = GENMASK(BITS_PER_BYTE * size, 0);
> +	u64 mask = GENMASK(BITS_PER_BYTE * size - 1, 0);
>  	bool success;
>  
>  	/*

[ ... ]

[Severity: Critical]
This is a pre-existing issue, but can a malicious hypervisor poison the upper
32 bits of the registers during cpuid emulation?

Architecturally, the cpuid instruction returns 32-bit values in eax, ebx,
ecx, and edx. In 64-bit mode, writes to these 32-bit registers must
zero-extend to their 64-bit counterparts.

Looking at handle_cpuid() in arch/x86/coco/tdx/tdx.c, the 64-bit results
fetched from the untrusted hypervisor are copied directly into the guest's
pt_regs:

arch/x86/coco/tdx/tdx.c:handle_cpuid() {
    ...
	/*
	 * As per TDX GHCI CPUID ABI, r12-r15 registers contain contents of
	 * EAX, EBX, ECX, EDX registers after the CPUID instruction execution.
	 * So copy the register contents back to pt_regs.
	 */
	regs->ax = args.r12;
	regs->bx = args.r13;
	regs->cx = args.r14;
	regs->dx = args.r15;

	return ve_instr_len(ve);
}

Since these are not cast to u32 or zero-extended, does this allow the
hypervisor to supply values with non-zero upper 32 bits and potentially lead
to out-of-bounds memory accesses or control flow hijacking?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260527120544.2903923-1-kas@kernel.org?part=1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 2/2] x86/tdx: Fix zero-extension for 32-bit port I/O
  2026-05-27 12:05 ` [PATCH v3 2/2] x86/tdx: Fix zero-extension for 32-bit port I/O Kiryl Shutsemau (Meta)
@ 2026-05-27 13:12   ` sashiko-bot
  2026-05-27 15:45   ` Edgecombe, Rick P
  2026-05-27 17:45   ` Dave Hansen
  2 siblings, 0 replies; 12+ messages in thread
From: sashiko-bot @ 2026-05-27 13:12 UTC (permalink / raw)
  Cc: kvm

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:

Pre-existing issues:
- [Critical] In `handle_mmio()`, the `val` variable is not fully initialized before being sent to the VMM during a memory-mapped I/O write emulation, leading to a kernel stack memory leak.
- [High] The CPUID instruction emulation in `handle_cpuid()` fails to zero-extend the 32-bit results into their respective 64-bit registers.
- [High] The MMIO emulation for MOVZX and MOVSX instructions in `handle_mmio()` fails to clear the upper 32 bits when the destination is a 32-bit register.
- [High] In `write_msr()`, `regs->ax` is not masked to 32 bits before being ORed to form the MSR value, which can corrupt the upper 32 bits (representing EDX) sent to the VMM.
--

commit f91083b7c0a39ded55e57262547e0ea4e29968a4
Author: Kiryl Shutsemau (Meta) <kas@kernel.org>

x86/tdx: Fix zero-extension for 32-bit port I/O

According to x86 architecture rules, 32-bit operations zero-extend the
result to 64 bits. Update handle_in() to zero out the entire RAX register
when the I/O size is 4 bytes to ensure correct zero-extension.

Link: https://lore.kernel.org/all/CAKw_Dz96rfSQc6Rn+9QBcUFHhmkK+9zu+P=bxowfZwxrATCBRg@mail.gmail.com/

> diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
> index 65119362f9a25..58feca4193269 100644
> --- a/arch/x86/coco/tdx/tdx.c
> +++ b/arch/x86/coco/tdx/tdx.c

[ ... ]

> static int write_msr(struct pt_regs *regs, struct ve_info *ve)
> {
> 	struct tdx_module_args args = {
> 		.r10 = TDX_HYPERCALL_STANDARD,
> 		.r11 = hcall_func(EXIT_REASON_MSR_WRITE),
> 		.r12 = regs->cx,
> 		.r13 = (u64)regs->dx << 32 | regs->ax,
        ^^^^^^^^

[Severity: High]
This isn't a bug introduced by this patch, but does this correctly build the
MSR value?

Since regs->ax is a 64-bit value, it looks like any non-zero bits in the
upper 32 bits of regs->ax will not be cleared.

Will these upper bits then be logically ORed into the upper 32 bits of the
result, corrupting the EDX portion of the WRMSR value sent to the VMM?

Could a guest program with legitimately non-zero data in the upper 32 bits of
RAX prior to the #VE inadvertently write wrong values to MSRs?

[ ... ]

> static int handle_cpuid(struct pt_regs *regs, struct ve_info *ve)
> {
> 	struct tdx_module_args args = {
> 		.r10 = TDX_HYPERCALL_STANDARD,
> 		.r11 = hcall_func(EXIT_REASON_CPUID),
> 		.r12 = regs->ax,
> 		.r13 = regs->cx,
> 	};

[ ... ]

> 	/*
> 	 * As per TDX GHCI CPUID ABI, r12-r15 registers contain contents of
> 	 * EAX, EBX, ECX, EDX registers after the CPUID instruction execution.
> 	 * So copy the register contents back to pt_regs.
> 	 */
> 	regs->ax = args.r12;
> 	regs->bx = args.r13;
> 	regs->cx = args.r14;
> 	regs->dx = args.r15;
        ^^^^^^^^

[Severity: High]
This is a pre-existing issue, but does this correctly handle zero-extension
for 32-bit results into 64-bit registers?

According to x86-64 architecture rules, writes to 32-bit registers must clear
the upper 32 bits of the corresponding 64-bit registers.

Because the values are copied directly from the untrusted VMM (args.r12 to
args.r15) into the registers without masking to 32 bits, could a malicious
VMM inject non-zero values into the upper 32 bits?

Can this corrupt guest user-space or kernel state if the code relies on these
bits being zeroed?

[ ... ]

> static int handle_mmio(struct pt_regs *regs, struct ve_info *ve)
> {
> 	unsigned long *reg, val, vaddr;
                            ^^^

[Severity: Critical]
This is a pre-existing issue, but is val fully initialized before being sent
to the VMM?

When handling an MMIO write of size smaller than 8 bytes, does memcpy only
write to the lowest size bytes, leaving the upper bytes of val uninitialized
with kernel stack data?

[ ... ]

> 	/* Handle writes first */
> 	switch (mmio) {
> 	case INSN_MMIO_WRITE:
> 		memcpy(&val, reg, size);
> 		if (!mmio_write(size, ve->gpa, val))

Because this 8-byte variable is passed to mmio_write() and forwarded to the
untrusted VMM, does this leak uninitialized kernel stack memory to the guest
hypervisor?

[ ... ]

> 	switch (mmio) {
> 	case INSN_MMIO_READ:
> 		/* Zero-extend for 32-bit operation */
> 		extend_size = size == 4 ? sizeof(*reg) : 0;
> 		break;
> 	case INSN_MMIO_READ_ZERO_EXTEND:
> 		/* Zero extend based on operand size */
> 		extend_size = insn.opnd_bytes;
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[Severity: High]
This isn't a bug introduced by this patch, but does this properly clear the
upper 32 bits when the destination is a 32-bit register?

For MOVZX and MOVSX instructions, extend_size is set to insn.opnd_bytes.
When the destination is a 32-bit register, insn.opnd_bytes is 4.

Does memset then only clear or set 4 bytes of the 64-bit register?

Because x86-64 architecture dictates that writing to a 32-bit register always
zero-extends to the full 64-bit register, could compiler-generated code
relying on this zero-extension use garbage values left over in the upper
half, leading to functional regressions or out-of-bounds memory accesses?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260527120544.2903923-1-kas@kernel.org?part=2

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/2] x86/tdx: Fix off-by-one in port I/O handling
  2026-05-27 12:05 ` [PATCH v3 1/2] x86/tdx: Fix off-by-one in port I/O handling Kiryl Shutsemau (Meta)
  2026-05-27 12:47   ` sashiko-bot
@ 2026-05-27 15:38   ` Edgecombe, Rick P
  1 sibling, 0 replies; 12+ messages in thread
From: Edgecombe, Rick P @ 2026-05-27 15:38 UTC (permalink / raw)
  To: x86@kernel.org, mingo@redhat.com, kas@kernel.org, tglx@kernel.org,
	bp@alien8.de, dave.hansen@linux.intel.com
  Cc: seanjc@google.com, Huang, Kai, hpa@zytor.com,
	sathyanarayanan.kuppuswamy@linux.intel.com,
	linux-kernel@vger.kernel.org, tsyrulnikov.borys@gmail.com,
	kvm@vger.kernel.org, linux-coco@lists.linux.dev,
	stable@vger.kernel.org

On Wed, 2026-05-27 at 13:05 +0100, Kiryl Shutsemau (Meta) wrote:
> handle_in() and handle_out() in arch/x86/coco/tdx/tdx.c use:
> 
>     u64 mask = GENMASK(BITS_PER_BYTE * size, 0);
> 
> GENMASK(h, l) includes bit h. For size=1 (INB), this produces
> GENMASK(8, 0) = 0x1FF (9 bits) instead of GENMASK(7, 0) = 0xFF (8
> bits). The mask is one bit too wide for all I/O sizes.
> 
> Fix the mask calculation.
> 
> Fixes: 03149948832a ("x86/tdx: Port I/O: Add runtime hypercalls")
> Reported-by: Borys Tsyrulnikov <tsyrulnikov.borys@gmail.com>
> Link:
> https://lore.kernel.org/all/CAKw_Dz96rfSQc6Rn+9QBcUFHhmkK+9zu+P=bxowfZwxrATCBRg@mail.gmail.com/
> Signed-off-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
> Reviewed-by: Kai Huang <kai.huang@intel.com>
> Reviewed-by: Kuppuswamy Sathyanarayanan
> <sathyanarayanan.kuppuswamy@linux.intel.com>
> Cc: stable@vger.kernel.org

Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 2/2] x86/tdx: Fix zero-extension for 32-bit port I/O
  2026-05-27 12:05 ` [PATCH v3 2/2] x86/tdx: Fix zero-extension for 32-bit port I/O Kiryl Shutsemau (Meta)
  2026-05-27 13:12   ` sashiko-bot
@ 2026-05-27 15:45   ` Edgecombe, Rick P
  2026-05-27 17:45   ` Dave Hansen
  2 siblings, 0 replies; 12+ messages in thread
From: Edgecombe, Rick P @ 2026-05-27 15:45 UTC (permalink / raw)
  To: x86@kernel.org, mingo@redhat.com, kas@kernel.org, tglx@kernel.org,
	bp@alien8.de, dave.hansen@linux.intel.com
  Cc: seanjc@google.com, Huang, Kai, hpa@zytor.com,
	sathyanarayanan.kuppuswamy@linux.intel.com,
	linux-kernel@vger.kernel.org, tsyrulnikov.borys@gmail.com,
	kvm@vger.kernel.org, linux-coco@lists.linux.dev,
	stable@vger.kernel.org

On Wed, 2026-05-27 at 13:05 +0100, Kiryl Shutsemau (Meta) wrote:
> +	/*
> +	 * IN writes the result into a sub-register of RAX. Only the
> +	 * 32-bit form zero-extends; the smaller forms leave the upper
> +	 * bits untouched:
> +	 *
> +	 *   insn  dest  size  bits written     bits preserved
> +	 *   inb   AL    1     RAX[ 7: 0]       RAX[63: 8]
> +	 *   inw   AX    2     RAX[15: 0]       RAX[63:16]
> +	 *   inl   EAX   4     RAX[63: 0]       (none, zero-extended)

We are working on getting the GHCI spec amended to clarify who is supposed to do
this zero-extending and masking, host or guest. For this and the similar
tdvmcalls. The process involves getting all VMMs in agreement.

Today I think the spec doesn't say to *not* do it, so I think it is reasonable
to merge this, but there is some small risk of complications depending on how
that discussion goes.

> +	 *
> +	 * 'mask' only covers the low 'size' bytes, which is exactly the
> +	 * range affected for size 1 and 2. For size 4 the write also
> +	 * clears RAX[63:32], so widen the clear-mask.
> +	 */


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 2/2] x86/tdx: Fix zero-extension for 32-bit port I/O
  2026-05-27 12:05 ` [PATCH v3 2/2] x86/tdx: Fix zero-extension for 32-bit port I/O Kiryl Shutsemau (Meta)
  2026-05-27 13:12   ` sashiko-bot
  2026-05-27 15:45   ` Edgecombe, Rick P
@ 2026-05-27 17:45   ` Dave Hansen
  2026-05-28 10:14     ` Kiryl Shutsemau
  2 siblings, 1 reply; 12+ messages in thread
From: Dave Hansen @ 2026-05-27 17:45 UTC (permalink / raw)
  To: Kiryl Shutsemau (Meta), Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86
  Cc: H . Peter Anvin, Rick Edgecombe, Kuppuswamy Sathyanarayanan,
	Kai Huang, Sean Christopherson, Borys Tsyrulnikov, linux-kernel,
	linux-coco, kvm, stable

[-- Attachment #1: Type: text/plain, Size: 1959 bytes --]

On 5/27/26 05:05, Kiryl Shutsemau (Meta) wrote:
...
> -	/* Update part of the register affected by the emulated instruction */
> -	regs->ax &= ~mask;
> +	/*
> +	 * IN writes the result into a sub-register of RAX. Only the
> +	 * 32-bit form zero-extends; the smaller forms leave the upper
> +	 * bits untouched:
> +	 *
> +	 *   insn  dest  size  bits written     bits preserved
> +	 *   inb   AL    1     RAX[ 7: 0]       RAX[63: 8]
> +	 *   inw   AX    2     RAX[15: 0]       RAX[63:16]
> +	 *   inl   EAX   4     RAX[63: 0]       (none, zero-extended)
> +	 *
> +	 * 'mask' only covers the low 'size' bytes, which is exactly the
> +	 * range affected for size 1 and 2. For size 4 the write also
> +	 * clears RAX[63:32], so widen the clear-mask.
> +	 */
> +	if (size == 4)
> +		regs->ax = 0;
> +	else
> +		regs->ax &= ~mask;
> +

Is there any way we could do this with fewer comments and more code?

I mean, there's only three cases. Why have;

	u64 mask = GENMASK(BITS_PER_BYTE * size - 1, 0);

When there are only 3 possible cases:

	1 => 0xf
	2 => 0xff
	4 => 0xffff

and one of those cases needs a special case on top of it.

Maybe something like this?

	/* Clear out part of RAX so part of args.r11 can be OR'd in: */
	switch (size) {
	case 1:
		/* inb consumes lower 8 bits of r11: */
		regs->ax &= ~GENMASK_ULL(7, 0);
		args.r11 &=  GENMASK_ULL(7, 0);
		break;
	case 2:
		/* inw consumes lower 16 bits of r11: */
		regs->ax &= ~GENMASK_ULL(15, 0);
		args.r11 &=  GENMASK_ULL(15, 0);
		break;
	case 4:
		/* inl is weird and zeros the whole register: */
		regs->ax &= ~GENMASK_ULL(63, 0);
		/* But only consumes 32-bits from r11: */
		args.r11 &=  GENMASK_ULL(31, 0);
		break;
	default:
		/* Probable TDX module bug. Illegal in[bwl] size: */
		WARN_ON_ONCE(1);
		success = 0;
	}

	if (success)
		regs->ax |= args.r11;

It might need a temporary variable for args.r11, but you get the point.
That's basically the data from the comment but written as code.

[-- Attachment #2: tdxinX.patch --]
[-- Type: text/x-patch, Size: 93 bytes --]

 tdx.c |   29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 2/2] x86/tdx: Fix zero-extension for 32-bit port I/O
  2026-05-27 17:45   ` Dave Hansen
@ 2026-05-28 10:14     ` Kiryl Shutsemau
  2026-05-28 16:43       ` Dave Hansen
                         ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Kiryl Shutsemau @ 2026-05-28 10:14 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H . Peter Anvin, Rick Edgecombe, Kuppuswamy Sathyanarayanan,
	Kai Huang, Sean Christopherson, Borys Tsyrulnikov, linux-kernel,
	linux-coco, kvm, stable

On Wed, May 27, 2026 at 10:45:28AM -0700, Dave Hansen wrote:
> On 5/27/26 05:05, Kiryl Shutsemau (Meta) wrote:
> ...
> > -	/* Update part of the register affected by the emulated instruction */
> > -	regs->ax &= ~mask;
> > +	/*
> > +	 * IN writes the result into a sub-register of RAX. Only the
> > +	 * 32-bit form zero-extends; the smaller forms leave the upper
> > +	 * bits untouched:
> > +	 *
> > +	 *   insn  dest  size  bits written     bits preserved
> > +	 *   inb   AL    1     RAX[ 7: 0]       RAX[63: 8]
> > +	 *   inw   AX    2     RAX[15: 0]       RAX[63:16]
> > +	 *   inl   EAX   4     RAX[63: 0]       (none, zero-extended)
> > +	 *
> > +	 * 'mask' only covers the low 'size' bytes, which is exactly the
> > +	 * range affected for size 1 and 2. For size 4 the write also
> > +	 * clears RAX[63:32], so widen the clear-mask.
> > +	 */
> > +	if (size == 4)
> > +		regs->ax = 0;
> > +	else
> > +		regs->ax &= ~mask;
> > +
> 
> Is there any way we could do this with fewer comments and more code?
> 
> I mean, there's only three cases. Why have;
> 
> 	u64 mask = GENMASK(BITS_PER_BYTE * size - 1, 0);
> 
> When there are only 3 possible cases:
> 
> 	1 => 0xf
> 	2 => 0xff
> 	4 => 0xffff
> 
> and one of those cases needs a special case on top of it.
> 
> Maybe something like this?
> 
> 	/* Clear out part of RAX so part of args.r11 can be OR'd in: */
> 	switch (size) {
> 	case 1:
> 		/* inb consumes lower 8 bits of r11: */
> 		regs->ax &= ~GENMASK_ULL(7, 0);
> 		args.r11 &=  GENMASK_ULL(7, 0);
> 		break;
> 	case 2:
> 		/* inw consumes lower 16 bits of r11: */
> 		regs->ax &= ~GENMASK_ULL(15, 0);
> 		args.r11 &=  GENMASK_ULL(15, 0);
> 		break;
> 	case 4:
> 		/* inl is weird and zeros the whole register: */
> 		regs->ax &= ~GENMASK_ULL(63, 0);
> 		/* But only consumes 32-bits from r11: */
> 		args.r11 &=  GENMASK_ULL(31, 0);
> 		break;
> 	default:
> 		/* Probable TDX module bug. Illegal in[bwl] size: */
> 		WARN_ON_ONCE(1);
> 		success = 0;
> 	}
> 
> 	if (success)
> 		regs->ax |= args.r11;
> 
> It might need a temporary variable for args.r11, but you get the point.
> That's basically the data from the comment but written as code.

I hate how verbose it is. All these GENMASK_ULL() make it hard to
follow.

What about the patch below. Inspired by kvm's assign_register().

diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 65119362f9a2..460b9fbabf14 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -693,8 +693,8 @@ static bool handle_in(struct pt_regs *regs, int size, int port)
 		.r13 = PORT_READ,
 		.r14 = port,
 	};
-	u64 mask = GENMASK(BITS_PER_BYTE * size - 1, 0);
 	bool success;
+	u32 val;
 
 	/*
 	 * Emulate the I/O read via hypercall. More info about ABI can be found
@@ -703,10 +703,33 @@ static bool handle_in(struct pt_regs *regs, int size, int port)
 	 */
 	success = !__tdx_hypercall(&args);
 
-	/* Update part of the register affected by the emulated instruction */
-	regs->ax &= ~mask;
 	if (success)
-		regs->ax |= args.r11 & mask;
+		val = args.r11;
+	else
+		val = 0;
+
+	/*
+	 * IN writes the result into a sub-register of RAX.
+	 *
+	 * Only the 32-bit form zero-extends; the smaller forms leave
+	 * the upper bits untouched.
+	 */
+	switch (size) {
+	case 1:
+		*(u8 *)&regs->ax = (u8)val;
+		break;
+	case 2:
+		*(u16 *)&regs->ax = (u16)val;
+		break;
+	case 4:
+		/* zero-extended */
+		regs->ax = val;
+		break;
+	default:
+		/* Probable TDX module bug. Illegal in[bwl] size. */
+		WARN_ON_ONCE(1);
+		break;
+	}
 
 	return success;
 }
-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 2/2] x86/tdx: Fix zero-extension for 32-bit port I/O
  2026-05-28 10:14     ` Kiryl Shutsemau
@ 2026-05-28 16:43       ` Dave Hansen
  2026-05-28 17:25       ` Dave Hansen
  2026-05-28 19:58       ` David Laight
  2 siblings, 0 replies; 12+ messages in thread
From: Dave Hansen @ 2026-05-28 16:43 UTC (permalink / raw)
  To: Kiryl Shutsemau
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H . Peter Anvin, Rick Edgecombe, Kuppuswamy Sathyanarayanan,
	Kai Huang, Sean Christopherson, Borys Tsyrulnikov, linux-kernel,
	linux-coco, kvm, stable

On 5/28/26 03:14, Kiryl Shutsemau wrote:
> +	switch (size) {
> +	case 1:
> +		*(u8 *)&regs->ax = (u8)val;
> +		break;
> +	case 2:
> +		*(u16 *)&regs->ax = (u16)val;
> +		break;

Is this intentionally clever to only work on little endian, or
accidentally clever? This seems like a great IOCCC thing to do, but it's
far too clever for my taste.

I mean, it's making a pointer to a 64-bit value with an 8-bit name and
casting that to an 8-bit pointer and then assigning that to a 32-bit
value cast to a u8.

Is it just my tiny brain that thinks this will be unintelligible on Monday?

How about we just make the CPU do the thinking for us? IN[BWL] and
MOV[BWL] have the same semantics here, right? So even if 'rax' and 'val'
are 64-bit values here, the following should have all the right
behaviors, I think.

I generally loathe inline assembly. But we have a CPU that kinda knows
the rules already. No need for us to laboriously reimplement it. Right?

Thanks to the friendly LLM that knows inline assembly better than I do.
The resulting compiled assembly looks right to me.

/*
 * Use MOV[BWL] to/from registers to match the IN[BWL] behavior
 * including the fact that INL zeros the upper 64-bits while
 * IN[BW] don't zero anything.
 */

    switch (size) {
    case 1:
	// Just write 1 byte of RAX:
        __asm__ volatile ("movb %b1, %b0" : "+q"(rax)
                                          : "q"(val));
        break;
    case 2:
	// Write 2 bytes of RAX:
        __asm__ volatile ("movw %w1, %w0" : "+r"(rax)
                                          : "r"(val));
        break;
    case 4:
	// Write 'val' into lower 32 bits. Zero the upper 32 bits:
        __asm__ volatile ("movl %k1, %k0" : "=r"(rax)
                                          : "r"(val));
        break;
    default:
	// WARN
    }

Thoughts?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 2/2] x86/tdx: Fix zero-extension for 32-bit port I/O
  2026-05-28 10:14     ` Kiryl Shutsemau
  2026-05-28 16:43       ` Dave Hansen
@ 2026-05-28 17:25       ` Dave Hansen
  2026-05-28 19:58       ` David Laight
  2 siblings, 0 replies; 12+ messages in thread
From: Dave Hansen @ 2026-05-28 17:25 UTC (permalink / raw)
  To: Kiryl Shutsemau
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H . Peter Anvin, Rick Edgecombe, Kuppuswamy Sathyanarayanan,
	Kai Huang, Sean Christopherson, Borys Tsyrulnikov, linux-kernel,
	linux-coco, kvm, stable

On 5/28/26 03:14, Kiryl Shutsemau wrote:
> What about the patch below. Inspired by kvm's assign_register().

I think I could stand this if it consolidated this site with kvm's
assign_register(). The copy/paste is too much to bear.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 2/2] x86/tdx: Fix zero-extension for 32-bit port I/O
  2026-05-28 10:14     ` Kiryl Shutsemau
  2026-05-28 16:43       ` Dave Hansen
  2026-05-28 17:25       ` Dave Hansen
@ 2026-05-28 19:58       ` David Laight
  2 siblings, 0 replies; 12+ messages in thread
From: David Laight @ 2026-05-28 19:58 UTC (permalink / raw)
  To: Kiryl Shutsemau
  Cc: Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H . Peter Anvin, Rick Edgecombe,
	Kuppuswamy Sathyanarayanan, Kai Huang, Sean Christopherson,
	Borys Tsyrulnikov, linux-kernel, linux-coco, kvm, stable

On Thu, 28 May 2026 11:14:38 +0100
Kiryl Shutsemau <kas@kernel.org> wrote:

> On Wed, May 27, 2026 at 10:45:28AM -0700, Dave Hansen wrote:
> > On 5/27/26 05:05, Kiryl Shutsemau (Meta) wrote:
> > ...  
> > > -	/* Update part of the register affected by the emulated instruction */
> > > -	regs->ax &= ~mask;
> > > +	/*
> > > +	 * IN writes the result into a sub-register of RAX. Only the
> > > +	 * 32-bit form zero-extends; the smaller forms leave the upper
> > > +	 * bits untouched:
> > > +	 *
> > > +	 *   insn  dest  size  bits written     bits preserved
> > > +	 *   inb   AL    1     RAX[ 7: 0]       RAX[63: 8]
> > > +	 *   inw   AX    2     RAX[15: 0]       RAX[63:16]
> > > +	 *   inl   EAX   4     RAX[63: 0]       (none, zero-extended)
> > > +	 *
> > > +	 * 'mask' only covers the low 'size' bytes, which is exactly the
> > > +	 * range affected for size 1 and 2. For size 4 the write also
> > > +	 * clears RAX[63:32], so widen the clear-mask.
> > > +	 */
> > > +	if (size == 4)
> > > +		regs->ax = 0;
> > > +	else
> > > +		regs->ax &= ~mask;
> > > +  
> > 
> > Is there any way we could do this with fewer comments and more code?
> > 
> > I mean, there's only three cases. Why have;
> > 
> > 	u64 mask = GENMASK(BITS_PER_BYTE * size - 1, 0);
> > 
> > When there are only 3 possible cases:
> > 
> > 	1 => 0xf
> > 	2 => 0xff
> > 	4 => 0xffff
> > 
> > and one of those cases needs a special case on top of it.
> > 
> > Maybe something like this?
> > 
> > 	/* Clear out part of RAX so part of args.r11 can be OR'd in: */
> > 	switch (size) {
> > 	case 1:
> > 		/* inb consumes lower 8 bits of r11: */
> > 		regs->ax &= ~GENMASK_ULL(7, 0);
> > 		args.r11 &=  GENMASK_ULL(7, 0);
> > 		break;
> > 	case 2:
> > 		/* inw consumes lower 16 bits of r11: */
> > 		regs->ax &= ~GENMASK_ULL(15, 0);
> > 		args.r11 &=  GENMASK_ULL(15, 0);
> > 		break;
> > 	case 4:
> > 		/* inl is weird and zeros the whole register: */
> > 		regs->ax &= ~GENMASK_ULL(63, 0);
> > 		/* But only consumes 32-bits from r11: */
> > 		args.r11 &=  GENMASK_ULL(31, 0);
> > 		break;
> > 	default:
> > 		/* Probable TDX module bug. Illegal in[bwl] size: */
> > 		WARN_ON_ONCE(1);
> > 		success = 0;
> > 	}
> > 
> > 	if (success)
> > 		regs->ax |= args.r11;
> > 
> > It might need a temporary variable for args.r11, but you get the point.
> > That's basically the data from the comment but written as code.  
> 
> I hate how verbose it is. All these GENMASK_ULL() make it hard to
> follow.
> 
> What about the patch below. Inspired by kvm's assign_register().
> 
> diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
> index 65119362f9a2..460b9fbabf14 100644
> --- a/arch/x86/coco/tdx/tdx.c
> +++ b/arch/x86/coco/tdx/tdx.c
> @@ -693,8 +693,8 @@ static bool handle_in(struct pt_regs *regs, int size, int port)
>  		.r13 = PORT_READ,
>  		.r14 = port,
>  	};
> -	u64 mask = GENMASK(BITS_PER_BYTE * size - 1, 0);
>  	bool success;
> +	u32 val;
>  
>  	/*
>  	 * Emulate the I/O read via hypercall. More info about ABI can be found
> @@ -703,10 +703,33 @@ static bool handle_in(struct pt_regs *regs, int size, int port)
>  	 */
>  	success = !__tdx_hypercall(&args);
>  
> -	/* Update part of the register affected by the emulated instruction */
> -	regs->ax &= ~mask;
>  	if (success)
> -		regs->ax |= args.r11 & mask;
> +		val = args.r11;
> +	else
> +		val = 0;
> +
> +	/*
> +	 * IN writes the result into a sub-register of RAX.
> +	 *
> +	 * Only the 32-bit form zero-extends; the smaller forms leave
> +	 * the upper bits untouched.
> +	 */
> +	switch (size) {
> +	case 1:
> +		*(u8 *)&regs->ax = (u8)val;
> +		break;
> +	case 2:
> +		*(u16 *)&regs->ax = (u16)val;
> +		break;
> +	case 4:
> +		/* zero-extended */
> +		regs->ax = val;
> +		break;
> +	default:
> +		/* Probable TDX module bug. Illegal in[bwl] size. */
> +		WARN_ON_ONCE(1);
> +		break;
> +	}

Just write it as normal arithmetic code:

	/* IN writes the result into a sub-register of RAX. */
	switch (size) {
	case 1:
		regs->ax = (regs->ax & ~0xfful) | (val & 0xff);
		break;
	case 2:
		regs->ax = (regs->ax & ~0xfffful) | (val & 0xffff);
		break;
	case 4:
	default:
		/* 32bit 'INB' will zero the high bits. */
		regs->ax = val
		break;
	}

Succinct, obvious and readable.

-- David


>  
>  	return success;
>  }


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-05-28 19:58 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-27 12:05 [PATCH v3 0/2] x86/tdx: Port I/O emulation fixes Kiryl Shutsemau (Meta)
2026-05-27 12:05 ` [PATCH v3 1/2] x86/tdx: Fix off-by-one in port I/O handling Kiryl Shutsemau (Meta)
2026-05-27 12:47   ` sashiko-bot
2026-05-27 15:38   ` Edgecombe, Rick P
2026-05-27 12:05 ` [PATCH v3 2/2] x86/tdx: Fix zero-extension for 32-bit port I/O Kiryl Shutsemau (Meta)
2026-05-27 13:12   ` sashiko-bot
2026-05-27 15:45   ` Edgecombe, Rick P
2026-05-27 17:45   ` Dave Hansen
2026-05-28 10:14     ` Kiryl Shutsemau
2026-05-28 16:43       ` Dave Hansen
2026-05-28 17:25       ` Dave Hansen
2026-05-28 19:58       ` David Laight

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.