* Re: [PATCH v2 0/5] guest_memfd fixes for bind and populate
From: Sean Christopherson @ 2026-05-27 18:19 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Kiryl Shutsemau, Rick Edgecombe, Vishal Annapurve, Yan Zhao,
Michael Roth, Isaku Yamahata, Chao Peng, Xiaoyao Li, Zongyao Chen,
Ackerley Tng
Cc: kvm, linux-kernel, linux-coco, Yu Zhang, Fuad Tabba
In-Reply-To: <20260522-fix-sev-gmem-post-populate-v2-0-3f196bfad5a1@google.com>
On Fri, 22 May 2026 15:46:05 -0700, Ackerley Tng wrote:
> This series is a group of fixes for the bind and populate flows for
> guest_memfd, and fixes some issues reported by Sashiko after reviewing the
> guest_memfd in-place conversions series [1] and another fixup series Sean
> posted [3].
>
> Changes in v2:
>
> [...]
Applied 1, 4, and 5 to kvm-x86 sev, with massaged shortlogs+changelogs.
[1/5] KVM: SEV: Pin source page for write when adding CPUID data for SNP guest
https://github.com/kvm-x86/linux/commit/f13e90059908
[2/5] KVM: guest_memfd: Fix possible signed integer overflow
[SKIP]
[3/5] KVM: guest_memfd: Handle errors from xa_store_range() when binding
[SKIP]
[4/5] KVM: SEV: Unmap local kmaps in LIFO order, per highmem requirements
https://github.com/kvm-x86/linux/commit/138f5f9cbe37
[5/5] KVM: SEV: Mark source page dirty when writing back CPUID data on failure
https://github.com/kvm-x86/linux/commit/97cd21d57e9b
--
https://github.com/kvm-x86/linux/tree/next
^ permalink raw reply
* Re: [PATCH v2 0/4] struct page to PFN conversion for TDX guest private memory
From: Sean Christopherson @ 2026-05-27 18:10 UTC (permalink / raw)
To: Sean Christopherson, dave.hansen, pbonzini, Yan Zhao
Cc: tglx, mingo, bp, kas, x86, linux-kernel, kvm, linux-coco,
kai.huang, rick.p.edgecombe, yilun.xu, vannapurve, ackerleytng,
sagis, binbin.wu, xiaoyao.li, isaku.yamahata
In-Reply-To: <20260430014852.24183-1-yan.y.zhao@intel.com>
On Thu, 30 Apr 2026 09:48:52 +0800, Yan Zhao wrote:
> This is v2 of the struct page to PFN conversion series, which converts TDX
> guest private memory mapping/unmapping APIs from taking struct page to
> taking PFN as input.
>
> v2 is based on v7.1.0-rc1 + Sean's 4 cleanup patches (see details in
> section "Base" below). The purpose is to get Dave's Ack, so Sean can take
> it from the KVM x86 tree. The full stack of v2 is available at [14].
>
> [...]
Applied to kvm-x86 mmu, thanks!
[1/4] x86/tdx: Use PFN directly for mapping guest private memory
https://github.com/kvm-x86/linux/commit/6ad0badd765c
[2/4] x86/tdx: Use PFN directly for unmapping guest private memory
https://github.com/kvm-x86/linux/commit/4c7a1247646c
[3/4] x86/tdx: Drop exported function tdx_quirk_reset_page()
https://github.com/kvm-x86/linux/commit/4a72a6dc447d
[4/4] x86/virt/tdx: Move mk_keyed_paddr() to tdx.c due to no external users
https://github.com/kvm-x86/linux/commit/3f330fbb918f
--
https://github.com/kvm-x86/linux/tree/next
^ permalink raw reply
* Re: [PATCH v5 5/5] iommufd/vdevice: add TSM request ioctl
From: Aneesh Kumar K.V @ 2026-05-27 17:49 UTC (permalink / raw)
To: Dan Williams (nvidia), Alexey Kardashevskiy, linux-coco, iommu,
linux-kernel, kvm
Cc: Bjorn Helgaas, Dan Williams, Jason Gunthorpe, Joerg Roedel,
Jonathan Cameron, Kevin Tian, Nicolin Chen, Samuel Ortiz,
Steven Price, Suzuki K Poulose, Will Deacon, Xu Yilun,
Shameer Kolothum, Paolo Bonzini, Tony Krowiak, Halil Pasic,
Jason Herne, Harald Freudenberger, Holger Dengler, Heiko Carstens,
Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
Sven Schnelle, Alex Williamson, Matthew Rosato, Farhan Ali,
Eric Farman, linux-s390
In-Reply-To: <yq5apl2gsw6y.fsf@kernel.org>
Aneesh Kumar K.V <aneesh.kumar@kernel.org> writes:
> "Dan Williams (nvidia)" <djbw@kernel.org> writes:
>
>> Alexey Kardashevskiy wrote:
>>>
......
>>>
>>> I have 3 types of requests to fit here, all go via VM -> KVM -> QEMU -> IOMMUFD -> TSM.
>>>
>>> 1) bind/unbind TDI <- moves to CONFIG_LOCKED, this is "OP";
>>> 2) start/stop TDI <- moves to RUN, this is "GR"? Right now I route it via "OP";
>>> 3) enable/disable MMIO/DMA <- no TDI state change, this is "GR" but which scope is it here?
>>
>> The scope parameter was meant to enumerate a security model for classes
>> of commands that are otherwise opaque to the kernel. However, none of
>> the commands we are targeting are opaque (private specification with
>> unknown effect). It now turns out there is no role for @scope for
>> security.
>>
>> Now a command family that iommufd can validate seems useful. As it
>> stands this implementation aliases command codes across TSMs. Do we
>> proceed with creating an actual shared command uapi for the truly shared
>> commands:
>>
>> TSM_REQ_TYPE_DEFAULT: Commands every arch needs
>> TSM_REQ_READ_OBJECT
>> TSM_REQ_REGEN_OBJECT
>> TSM_REQ_OBJECT_INFO
>> TSM_REQ_VALIDATE_MMIO
>> TSM_REQ_SET_TDI_STATE
>>
>> TSM_REQ_TYPE_SEV: Commands only SEV needs
>> TSM_REQ_SEV_ENABLE_DMA
>> TSM_REQ_SEV_DISABLE_DMA
>>
>> ...or just observe that per CC arch commands are needed to setup the VM
>> so per CC arch commands are needed to marshal device assignment support
>> requests.
>>
>> In that case pci_tsm_req_scope becomes tsm_req_type and is just:
>>
>> TSM_REQ_TYPE_CCA
>> TSM_REQ_TYPE_SEV
>> TSM_REQ_TYPE_TDX
>>
>> I am leaning towards the latter at this point.
>
> But we already have struct pci_tsm_ops::guest_req, which is specific to
> the underlying CC architecture. From the above, pci_tsm_req_scope also
> appears to carry the same information. Is that useful?
>
I think there is value in having the VMM express the guest’s
confidential computing architecture, so that the TSM backend can
validate whether it should handle that guest request ?.
So it would not be the IOMMU validating the scope value, but rather
pci_tsm_ops::guest_req.
static ssize_t cca_tsm_guest_req(struct pci_tdi *tdi, enum pci_tsm_req_scope scope,
sockptr_t req, size_t req_len, sockptr_t resp,
size_t resp_len, u64 *tsm_code)
{
struct pci_dev *pdev = tdi->pdev;
/* reject the guest request if VMM was using the link tsm wrongly. The guest
* was using a wrong CC archiecture with this link tsm
*/
if (scope != TSM_REQ_TYPE_CCA)
return -EINVAL;
Jason Gunthorpe <jgg@ziepe.ca> writes:
> On Tue, May 26, 2026 at 11:17:50PM -0700, Dan Williams (nvidia) wrote:
>
>> In that case pci_tsm_req_scope becomes tsm_req_type and is just:
>>
>> TSM_REQ_TYPE_CCA
>> TSM_REQ_TYPE_SEV
>> TSM_REQ_TYPE_TDX
>>
>> I am leaning towards the latter at this point.
>
> Yeah, this sounds good. I would also include an common op field that
> can be decoded by the TSM driver based on the TYPE above, and the
> usual in/out message buffers.
We already have iommufd_vdevice_tsm_op_ioctl() to handle common
operations. Right now, it handles IOMMU_VDEVICE_TSM_BIND and
IOMMU_VDEVICE_TSM_UNBIND. I guess we should move TSM_REQ_SET_TDI_STATE
operations to that as well?
-aneesh
^ permalink raw reply
* Re: [PATCH v3 2/2] x86/tdx: Fix zero-extension for 32-bit port I/O
From: Dave Hansen @ 2026-05-27 17:45 UTC (permalink / raw)
To: Kiryl Shutsemau (Meta), Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86
Cc: H . Peter Anvin, Rick Edgecombe, Kuppuswamy Sathyanarayanan,
Kai Huang, Sean Christopherson, Borys Tsyrulnikov, linux-kernel,
linux-coco, kvm, stable
In-Reply-To: <20260527120544.2903923-3-kas@kernel.org>
[-- Attachment #1: Type: text/plain, Size: 1959 bytes --]
On 5/27/26 05:05, Kiryl Shutsemau (Meta) wrote:
...
> - /* Update part of the register affected by the emulated instruction */
> - regs->ax &= ~mask;
> + /*
> + * IN writes the result into a sub-register of RAX. Only the
> + * 32-bit form zero-extends; the smaller forms leave the upper
> + * bits untouched:
> + *
> + * insn dest size bits written bits preserved
> + * inb AL 1 RAX[ 7: 0] RAX[63: 8]
> + * inw AX 2 RAX[15: 0] RAX[63:16]
> + * inl EAX 4 RAX[63: 0] (none, zero-extended)
> + *
> + * 'mask' only covers the low 'size' bytes, which is exactly the
> + * range affected for size 1 and 2. For size 4 the write also
> + * clears RAX[63:32], so widen the clear-mask.
> + */
> + if (size == 4)
> + regs->ax = 0;
> + else
> + regs->ax &= ~mask;
> +
Is there any way we could do this with fewer comments and more code?
I mean, there's only three cases. Why have;
u64 mask = GENMASK(BITS_PER_BYTE * size - 1, 0);
When there are only 3 possible cases:
1 => 0xf
2 => 0xff
4 => 0xffff
and one of those cases needs a special case on top of it.
Maybe something like this?
/* Clear out part of RAX so part of args.r11 can be OR'd in: */
switch (size) {
case 1:
/* inb consumes lower 8 bits of r11: */
regs->ax &= ~GENMASK_ULL(7, 0);
args.r11 &= GENMASK_ULL(7, 0);
break;
case 2:
/* inw consumes lower 16 bits of r11: */
regs->ax &= ~GENMASK_ULL(15, 0);
args.r11 &= GENMASK_ULL(15, 0);
break;
case 4:
/* inl is weird and zeros the whole register: */
regs->ax &= ~GENMASK_ULL(63, 0);
/* But only consumes 32-bits from r11: */
args.r11 &= GENMASK_ULL(31, 0);
break;
default:
/* Probable TDX module bug. Illegal in[bwl] size: */
WARN_ON_ONCE(1);
success = 0;
}
if (success)
regs->ax |= args.r11;
It might need a temporary variable for args.r11, but you get the point.
That's basically the data from the comment but written as code.
[-- Attachment #2: tdxinX.patch --]
[-- Type: text/x-patch, Size: 93 bytes --]
tdx.c | 29 ++++++++++++++++++++++++++++-
1 file changed, 28 insertions(+), 1 deletion(-)
^ permalink raw reply
* Re: [PATCH 01/15] x86/virt/tdx: Read global metadata for TDX Module Extensions
From: Sohil Mehta @ 2026-05-27 17:17 UTC (permalink / raw)
To: Xu Yilun
Cc: kas, djbw, rick.p.edgecombe, x86, peter.fang, linux-coco,
linux-kernel, kvm, yilun.xu, baolu.lu, zhenzhong.duan, xiaoyao.li
In-Reply-To: <ahaZPjf+A7ms0Ba9@yilunxu-OptiPlex-7050>
On 5/27/2026 12:11 AM, Xu Yilun wrote:
>>> +struct tdx_sys_info_ext {
>>> + u16 memory_pool_required_pages;
>>> + u8 ext_required;
>>
>> The name ext_required seems like a boolean. It is also used like a
>> boolean later.
>> if (!tdx_sysinfo.ext.ext_required)
>> return 0;
>>
>> But, IIUC, is it actually a mask that lists any feature that needs
>
> No it is just a bool about Extentions needs to be initialized or not.
>
How does the kernel know which features need Extensions? Is there any
hardware enumeration or the kernel just keeps a static list?
^ permalink raw reply
* Re: [PATCH 00/15] Enable TDX Module Extensions and DICE-based TDX Quoting
From: Sohil Mehta @ 2026-05-27 17:09 UTC (permalink / raw)
To: Xu Yilun
Cc: kas, djbw, rick.p.edgecombe, x86, peter.fang, linux-coco,
linux-kernel, kvm, yilun.xu, baolu.lu, zhenzhong.duan, xiaoyao.li
In-Reply-To: <ahbJnCFlGFwxrEkw@yilunxu-OptiPlex-7050>
On 5/27/2026 3:38 AM, Xu Yilun wrote:
>
> Because for security purpose, these add-on features are always needed,
> even if not all of them, so Extensions will most likely be enabled.
>
A cover letter is a good place to explain such nuances, alternate
approaches, and tradeoffs.
> And even if someone switched them off all and saved the memory, compared
> to the memory of a typical TDX capable system (lets say 1TB), the saving
> is still little (0.001%).
>
In this case percentages make it harder to understand. Does it need a
fixed amount of memory (~50MB) irrespective of the feature or the number
of features? If so, it would be good to mention that.
>> In addition, could you briefly describe the complexity we are trading off?
>
> If we delay the Extensions initialization to the first Extension
> SEAMCALL, we need to maintain additional TDX state machine for
> lifecycle, and we need mechanisms to synchronize parallel Extension
> enabling request from multiple callers.
This would be good to include in the cover as well.
^ permalink raw reply
* Re: [PATCH v3 2/2] x86/tdx: Fix zero-extension for 32-bit port I/O
From: Edgecombe, Rick P @ 2026-05-27 15:45 UTC (permalink / raw)
To: x86@kernel.org, mingo@redhat.com, kas@kernel.org, tglx@kernel.org,
bp@alien8.de, dave.hansen@linux.intel.com
Cc: seanjc@google.com, Huang, Kai, hpa@zytor.com,
sathyanarayanan.kuppuswamy@linux.intel.com,
linux-kernel@vger.kernel.org, tsyrulnikov.borys@gmail.com,
kvm@vger.kernel.org, linux-coco@lists.linux.dev,
stable@vger.kernel.org
In-Reply-To: <20260527120544.2903923-3-kas@kernel.org>
On Wed, 2026-05-27 at 13:05 +0100, Kiryl Shutsemau (Meta) wrote:
> + /*
> + * IN writes the result into a sub-register of RAX. Only the
> + * 32-bit form zero-extends; the smaller forms leave the upper
> + * bits untouched:
> + *
> + * insn dest size bits written bits preserved
> + * inb AL 1 RAX[ 7: 0] RAX[63: 8]
> + * inw AX 2 RAX[15: 0] RAX[63:16]
> + * inl EAX 4 RAX[63: 0] (none, zero-extended)
We are working on getting the GHCI spec amended to clarify who is supposed to do
this zero-extending and masking, host or guest. For this and the similar
tdvmcalls. The process involves getting all VMMs in agreement.
Today I think the spec doesn't say to *not* do it, so I think it is reasonable
to merge this, but there is some small risk of complications depending on how
that discussion goes.
> + *
> + * 'mask' only covers the low 'size' bytes, which is exactly the
> + * range affected for size 1 and 2. For size 4 the write also
> + * clears RAX[63:32], so widen the clear-mask.
> + */
^ permalink raw reply
* Re: [PATCH v3 1/2] x86/tdx: Fix off-by-one in port I/O handling
From: Edgecombe, Rick P @ 2026-05-27 15:38 UTC (permalink / raw)
To: x86@kernel.org, mingo@redhat.com, kas@kernel.org, tglx@kernel.org,
bp@alien8.de, dave.hansen@linux.intel.com
Cc: seanjc@google.com, Huang, Kai, hpa@zytor.com,
sathyanarayanan.kuppuswamy@linux.intel.com,
linux-kernel@vger.kernel.org, tsyrulnikov.borys@gmail.com,
kvm@vger.kernel.org, linux-coco@lists.linux.dev,
stable@vger.kernel.org
In-Reply-To: <20260527120544.2903923-2-kas@kernel.org>
On Wed, 2026-05-27 at 13:05 +0100, Kiryl Shutsemau (Meta) wrote:
> handle_in() and handle_out() in arch/x86/coco/tdx/tdx.c use:
>
> u64 mask = GENMASK(BITS_PER_BYTE * size, 0);
>
> GENMASK(h, l) includes bit h. For size=1 (INB), this produces
> GENMASK(8, 0) = 0x1FF (9 bits) instead of GENMASK(7, 0) = 0xFF (8
> bits). The mask is one bit too wide for all I/O sizes.
>
> Fix the mask calculation.
>
> Fixes: 03149948832a ("x86/tdx: Port I/O: Add runtime hypercalls")
> Reported-by: Borys Tsyrulnikov <tsyrulnikov.borys@gmail.com>
> Link:
> https://lore.kernel.org/all/CAKw_Dz96rfSQc6Rn+9QBcUFHhmkK+9zu+P=bxowfZwxrATCBRg@mail.gmail.com/
> Signed-off-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
> Reviewed-by: Kai Huang <kai.huang@intel.com>
> Reviewed-by: Kuppuswamy Sathyanarayanan
> <sathyanarayanan.kuppuswamy@linux.intel.com>
> Cc: stable@vger.kernel.org
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
^ permalink raw reply
* Re: [PATCH 01/15] x86/virt/tdx: Read global metadata for TDX Module Extensions
From: Kiryl Shutsemau @ 2026-05-27 15:35 UTC (permalink / raw)
To: Xiaoyao Li
Cc: Xu Yilun, djbw, rick.p.edgecombe, x86, peter.fang, linux-coco,
linux-kernel, kvm, sohil.mehta, yilun.xu, baolu.lu,
zhenzhong.duan
In-Reply-To: <956fa1e6-2920-4b2e-8037-d4b9d812ae53@intel.com>
On Mon, May 25, 2026 at 02:54:40PM +0800, Xiaoyao Li wrote:
> On 5/22/2026 11:41 AM, Xu Yilun wrote:
> ...
> > +static __init int get_tdx_sys_info_ext(struct tdx_sys_info_ext *sysinfo_ext)
> > +{
> > + int ret = 0;
> > + u64 val;
> > +
> > + if (!ret && !(ret = read_sys_metadata_field(0x3100000100000000, &val)))
> > + sysinfo_ext->memory_pool_required_pages = val;
> > + if (!ret && !(ret = read_sys_metadata_field(0x3100000000000001, &val)))
> > + sysinfo_ext->ext_required = val;
> > +
> > + return ret;
> > +}
> > +
> > static __init int get_tdx_sys_info(struct tdx_sys_info *sysinfo)
> > {
> > int ret = 0;
> > @@ -116,5 +129,8 @@ static __init int get_tdx_sys_info(struct tdx_sys_info *sysinfo)
> > ret = ret ?: get_tdx_sys_info_td_ctrl(&sysinfo->td_ctrl);
> > ret = ret ?: get_tdx_sys_info_td_conf(&sysinfo->td_conf);
> > + if (sysinfo->features.tdx_features0 & TDX_FEATURES0_EXT)
> > + ret = ret ?: get_tdx_sys_info_ext(&sysinfo->ext);
>
> Is it correct to read "memory_pool_required_pages" and "ext_required" so
> early in get_tdx_sys_info()? get_tdx_sys_info() is called before
> config_tdx_module() which calls TDH.SYS.CONFIG.
>
> If I read the TDX module base spec correctly, the amount of memory for
> extensions and EXT_REQUIRED field depends on the enabled features, which is
> determined by TDH.SYS.CONFIG/TDH.SYS.UPDATE ?
This is my read too. Looks like we need a separate step after
config_tdx_module() to readout config-dependatant metadata.
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply
* Re: [PATCH v6 05/43] KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes
From: Ackerley Tng @ 2026-05-27 15:35 UTC (permalink / raw)
To: Sean Christopherson, Fuad Tabba
Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
ira.weiny, jmattson, jthoughton, michael.roth, oupton,
pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
steven.price, willy, wyihan, yan.y.zhao, forkloop, pratyush,
suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Axel Rasmussen,
Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng, Shakeel Butt,
Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka, kvm,
linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
linux-mm, linux-coco
In-Reply-To: <CAEvNRgEZ9vCKkoMC11tVrueAonGWH2x6OeaYYxXGEj2gwHUaKw@mail.gmail.com>
Ackerley Tng <ackerleytng@google.com> writes:
>
> [...snip...]
>
>>
>> Hmm, I wonder if we can figure out a way to consolidate some documentation,
>> because this is _exactly_ the same pattern that x86's host_pfn_mapping_level()
>> deals with (see its big comment below).
>>
>
> This would be great, are you thinking an actual comment or something in
> Documentation/?
>
> Perhaps we could iterate on this a little with me providing the newbie
> perspective. Do you want me to take a stab at writing something up?
>
Please see https://lore.kernel.org/all/20260527-kvm-locking-docs-v1-0-4fe8b602ff47@google.com/T/!
>>
>> [...snip...]
>>
^ permalink raw reply
* Re: SVSM Development Call May 27th, 2026
From: Nicola Ramacciotti @ 2026-05-27 15:34 UTC (permalink / raw)
To: Stefano Garzarella, coconut-svsm, linux-coco; +Cc: Tanish Desai
In-Reply-To: <CAGxU2F4uJLH5OTv+y4712vmNBogfSspN-nJHzDJAz9N6HWeg2g@mail.gmail.com>
Hi all,
On 5/27/26 17:12, Stefano Garzarella wrote:
> On Tue, 26 May 2026 at 17:46, Stefano Garzarella <sgarzare@redhat.com> wrote:
>> Hi,
>>
>> Here is the call for agenda items for this weeks SVSM development
>> call. Please send any agenda items you have in mind as a reply to this
>> email or raise them in the meeting.
> One topic could be the 2 GSoC projects we have this year.
> If Nicola and Tanish join, it would be nice to allow them some time to
> present their projects. They will work with the COCONUT community
> during the summer :-)
Great. I'll be there!
Regards,
Nicola
>
> Stefano
>
>> We will use the LF Zoom instance. Details of the meeting can be found
>> in our governance repository at:
>>
>> https://github.com/coconut-svsm/governance
>>
>> The link to the COCONUT-SVSM calendar is:
>>
>> https://zoom-lfx.platform.linuxfoundation.org/meetings/coconut-svsm?view=week
>>
>> The meeting will be recorded and the recording eventually published.
>>
>> Regards,
>> Stefano
^ permalink raw reply
* Re: [PATCH v5 5/5] iommufd/vdevice: add TSM request ioctl
From: Aneesh Kumar K.V @ 2026-05-27 15:34 UTC (permalink / raw)
To: Dan Williams (nvidia), Alexey Kardashevskiy, linux-coco, iommu,
linux-kernel, kvm
Cc: Bjorn Helgaas, Dan Williams, Jason Gunthorpe, Joerg Roedel,
Jonathan Cameron, Kevin Tian, Nicolin Chen, Samuel Ortiz,
Steven Price, Suzuki K Poulose, Will Deacon, Xu Yilun,
Shameer Kolothum, Paolo Bonzini, Tony Krowiak, Halil Pasic,
Jason Herne, Harald Freudenberger, Holger Dengler, Heiko Carstens,
Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
Sven Schnelle, Alex Williamson, Matthew Rosato, Farhan Ali,
Eric Farman, linux-s390
In-Reply-To: <6a168c8ea7d10_2129b2100e@djbw-dev.notmuch>
"Dan Williams (nvidia)" <djbw@kernel.org> writes:
> Alexey Kardashevskiy wrote:
>>
>>
>> On 26/5/26 01:48, Aneesh Kumar K.V (Arm) wrote:
>> > Add IOMMU_VDEVICE_TSM_REQUEST for issuing TSM guest request/response
>> > transactions against an iommufd vdevice.
>> >
>> > The ioctl takes a vdevice_id plus request/response user buffers and length
>> > fields, and forwards the request through tsm_guest_req() to the PCI TSM
>> > backend. This provides the host-side passthrough path used by CoCo guests
>> > for TSM device attestation and acceptance flows after the device has been
>> > bound to TSM.
>> >
>> > Also add the supporting tsm_guest_req() helper and associated TSM core
>> > interface definitions.
>> >
>> > Based on changes from: Alexey Kardashevskiy <aik@amd.com>
>> >
>> > Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
>> > ---
>> > drivers/iommu/iommufd/iommufd_private.h | 6 ++
>> > drivers/iommu/iommufd/main.c | 3 +
>> > drivers/iommu/iommufd/tsm.c | 68 +++++++++++++++++++++
>> > drivers/virt/coco/tsm-core.c | 39 ++++++++++++
>> > include/linux/pci-tsm.h | 9 +--
>> > include/linux/tsm.h | 25 ++++++++
>> > include/uapi/linux/iommufd.h | 80 +++++++++++++++++++++++++
>> > 7 files changed, 226 insertions(+), 4 deletions(-)
> [..]
>> > diff --git a/drivers/iommu/iommufd/tsm.c b/drivers/iommu/iommufd/tsm.c
>> > index 09ee668dbed9..342fbdb6a6b9 100644
>> > --- a/drivers/iommu/iommufd/tsm.c
>> > +++ b/drivers/iommu/iommufd/tsm.c
>> > @@ -60,3 +60,71 @@ int iommufd_vdevice_tsm_op_ioctl(struct iommufd_ucmd *ucmd)
>> > iommufd_put_object(ucmd->ictx, &vdev->obj);
>> > return rc;
>> > }
>> > +
>> > +static bool iommufd_vdevice_tsm_req_scope_valid(u32 scope)
>> > +{
>> > + if (scope > IOMMU_VDEVICE_TSM_REQ_SCOPE_PCI_LAST)
>> > + return false;
>> > +
>> > + switch (scope) {
>> > + case IOMMU_VDEVICE_TSM_REQ_PCI_INFO:
>> > + case IOMMU_VDEVICE_TSM_REQ_PCI_STATE_CHANGE:
>> > + case IOMMU_VDEVICE_TSM_REQ_PCI_DEBUG_READ:
>> > + case IOMMU_VDEVICE_TSM_REQ_PCI_DEBUG_WRITE:
>>
>> This scope thing still needs clarification.
>>
>> I have 3 types of requests to fit here, all go via VM -> KVM -> QEMU -> IOMMUFD -> TSM.
>>
>> 1) bind/unbind TDI <- moves to CONFIG_LOCKED, this is "OP";
>> 2) start/stop TDI <- moves to RUN, this is "GR"? Right now I route it via "OP";
>> 3) enable/disable MMIO/DMA <- no TDI state change, this is "GR" but which scope is it here?
>
> The scope parameter was meant to enumerate a security model for classes
> of commands that are otherwise opaque to the kernel. However, none of
> the commands we are targeting are opaque (private specification with
> unknown effect). It now turns out there is no role for @scope for
> security.
>
> Now a command family that iommufd can validate seems useful. As it
> stands this implementation aliases command codes across TSMs. Do we
> proceed with creating an actual shared command uapi for the truly shared
> commands:
>
> TSM_REQ_TYPE_DEFAULT: Commands every arch needs
> TSM_REQ_READ_OBJECT
> TSM_REQ_REGEN_OBJECT
> TSM_REQ_OBJECT_INFO
> TSM_REQ_VALIDATE_MMIO
> TSM_REQ_SET_TDI_STATE
>
> TSM_REQ_TYPE_SEV: Commands only SEV needs
> TSM_REQ_SEV_ENABLE_DMA
> TSM_REQ_SEV_DISABLE_DMA
>
> ...or just observe that per CC arch commands are needed to setup the VM
> so per CC arch commands are needed to marshal device assignment support
> requests.
>
> In that case pci_tsm_req_scope becomes tsm_req_type and is just:
>
> TSM_REQ_TYPE_CCA
> TSM_REQ_TYPE_SEV
> TSM_REQ_TYPE_TDX
>
> I am leaning towards the latter at this point.
But we already have struct pci_tsm_ops::guest_req, which is specific to
the underlying CC architecture. From the above, pci_tsm_req_scope also
appears to carry the same information. Is that useful?
-aneesh
^ permalink raw reply
* Re: [PATCH v14 13/44] arm64: RMI: Define the user ABI
From: Marc Zyngier @ 2026-05-27 15:21 UTC (permalink / raw)
To: Steven Price
Cc: kvm, kvmarm, Catalin Marinas, Will Deacon, James Morse,
Oliver Upton, Suzuki K Poulose, Zenghui Yu, linux-arm-kernel,
linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Gavin Shan,
Shanker Donthineni, Alper Gun, Aneesh Kumar K . V, Emi Kisanuki,
Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-14-steven.price@arm.com>
On Wed, 13 May 2026 14:17:21 +0100,
Steven Price <steven.price@arm.com> wrote:
>
> There is one CAP which identified the presence of CCA, and one ioctl.
> The ioctl is used to populate memory during creation of the realm as
> this requires the RMM to copy data from an unprotected address to the
> protected memory - CCA does not support memory conversion where the
> memory contents is preserved as this is incompatible with memory
> encryption.
>
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
> Changes since v13:
> * KVM_ARM_VCPU_RMI_PSCI_COMPLETE removed.
> * KVM_ARM_RMI_POPULATE documentation updated to reflect that the
> structure is written by the kernel.
> * CAP number bumped.
> Changes since v12:
> * Change KVM_ARM_RMI_POPULATE to update the structure with the amount
> that has been progressed rather than return the number of bytes
> populated.
> * Describe the flag KVM_ARM_RMI_POPULATE_FLAGS_MEASURE.
> * CAP number is bumped.
> * NOTE: The PSCI ioctl may be removed in a future spec release.
> Changes since v11:
> * Completely reworked to be more implicit. Rather than having explicit
> CAP operations to progress the realm construction these operations
> are done when needed (on populating and on first vCPU run).
> * Populate and PSCI complete are promoted to proper ioctls.
> Changes since v10:
> * Rename symbols from RME to RMI.
> Changes since v9:
> * Improvements to documentation.
> * Bump the magic number for KVM_CAP_ARM_RME to avoid conflicts.
> Changes since v8:
> * Minor improvements to documentation following review.
> * Bump the magic numbers to avoid conflicts.
> Changes since v7:
> * Add documentation of new ioctls
> * Bump the magic numbers to avoid conflicts
> Changes since v6:
> * Rename some of the symbols to make their usage clearer and avoid
> repetition.
> Changes from v5:
> * Actually expose the new VCPU capability (KVM_ARM_VCPU_REC) by bumping
> KVM_VCPU_MAX_FEATURES - note this also exposes KVM_ARM_VCPU_HAS_EL2!
> ---
> Documentation/virt/kvm/api.rst | 40 ++++++++++++++++++++++++++++++++++
> include/uapi/linux/kvm.h | 13 +++++++++++
> 2 files changed, 53 insertions(+)
$SUBJECT looks wrong. This is a KVM change, not an RMI change.
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 52bbbb553ce1..ca68aae7faa2 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6553,6 +6553,37 @@ KVM_S390_KEYOP_SSKE
> Sets the storage key for the guest address ``guest_addr`` to the key
> specified in ``key``, returning the previous value in ``key``.
>
> +4.145 KVM_ARM_RMI_POPULATE
> +--------------------------
> +
> +:Capability: KVM_CAP_ARM_RMI
> +:Architectures: arm64
> +:Type: vm ioctl
> +:Parameters: struct kvm_arm_rmi_populate (in/out)
> +:Returns: 0 on success, < 0 on error
> +
> +::
> +
> + struct kvm_arm_rmi_populate {
> + __u64 base;
> + __u64 size;
> + __u64 source_uaddr;
> + __u32 flags;
> + __u32 reserved;
> + };
> +
> +Populate a region of protected address space by copying the data from the
> +(non-protected) user space pointer provided into a protected region (backed by
> +guestmem_fd). It implicitly sets the destination region to RIPAS RAM. This is
> +only valid before any VCPUs have been run. The ioctl might not populate the
> +entire region and in this case the kernel updates the fields `base`, `size` and
> +`source_uaddr`. User space may have to repeatedly call it until `size` is 0 to
> +populate the entire region.
> +
> +`flags` can be set to `KVM_ARM_RMI_POPULATE_FLAGS_MEASURE` to request that the
> +populated data is hashed and added to the guest's Realm Initial Measurement
> +(RIM).
Where is that measurement stored? And retrieved? At least a pointer to
that would help.
> +
> .. _kvm_run:
>
> 5. The kvm_run structure
> @@ -8904,6 +8935,15 @@ helpful if user space wants to emulate instructions which are not
> This capability can be enabled dynamically even if VCPUs were already
> created and are running.
>
> +7.47 KVM_CAP_ARM_RMI
> +--------------------
> +
> +:Architectures: arm64
> +:Target: VM
> +:Parameters: None
> +
> +This capability indicates that support for CCA realms is available.
> +
> 8. Other capabilities.
> ======================
>
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 6c8afa2047bf..b8cff0938041 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -996,6 +996,7 @@ struct kvm_enable_cap {
> #define KVM_CAP_S390_USER_OPEREXEC 246
> #define KVM_CAP_S390_KEYOP 247
> #define KVM_CAP_S390_VSIE_ESAMODE 248
> +#define KVM_CAP_ARM_RMI 249
>
> struct kvm_irq_routing_irqchip {
> __u32 irqchip;
> @@ -1669,4 +1670,16 @@ struct kvm_pre_fault_memory {
> __u64 padding[5];
> };
>
> +/* Available with KVM_CAP_ARM_RMI, only for VMs with KVM_VM_TYPE_ARM_REALM */
> +#define KVM_ARM_RMI_POPULATE _IOWR(KVMIO, 0xd7, struct kvm_arm_rmi_populate)
> +#define KVM_ARM_RMI_POPULATE_FLAGS_MEASURE (1 << 0)
> +
> +struct kvm_arm_rmi_populate {
> + __u64 base;
> + __u64 size;
> + __u64 source_uaddr;
> + __u32 flags;
> + __u32 reserved;
> +};
> +
> #endif /* __LINUX_KVM_H */
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply
* Re: SVSM Development Call May 27th, 2026
From: Stefano Garzarella @ 2026-05-27 15:12 UTC (permalink / raw)
To: coconut-svsm, linux-coco; +Cc: Nicola Ramacciotti, Tanish Desai
In-Reply-To: <CAGxU2F5hP=7pA1rKRvq5sgr0t2y1YoUzYCmH8hzaCS58U4+Y3A@mail.gmail.com>
On Tue, 26 May 2026 at 17:46, Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> Hi,
>
> Here is the call for agenda items for this weeks SVSM development
> call. Please send any agenda items you have in mind as a reply to this
> email or raise them in the meeting.
One topic could be the 2 GSoC projects we have this year.
If Nicola and Tanish join, it would be nice to allow them some time to
present their projects. They will work with the COCONUT community
during the summer :-)
Stefano
>
> We will use the LF Zoom instance. Details of the meeting can be found
> in our governance repository at:
>
> https://github.com/coconut-svsm/governance
>
> The link to the COCONUT-SVSM calendar is:
>
> https://zoom-lfx.platform.linuxfoundation.org/meetings/coconut-svsm?view=week
>
> The meeting will be recorded and the recording eventually published.
>
> Regards,
> Stefano
^ permalink raw reply
* Re: [PATCH v5 5/5] iommufd/vdevice: add TSM request ioctl
From: Jason Gunthorpe @ 2026-05-27 12:51 UTC (permalink / raw)
To: Dan Williams (nvidia)
Cc: Alexey Kardashevskiy, Aneesh Kumar K.V (Arm), linux-coco, iommu,
linux-kernel, kvm, Bjorn Helgaas, Joerg Roedel, Jonathan Cameron,
Kevin Tian, Nicolin Chen, Samuel Ortiz, Steven Price,
Suzuki K Poulose, Will Deacon, Xu Yilun, Shameer Kolothum,
Paolo Bonzini, Tony Krowiak, Halil Pasic, Jason Herne,
Harald Freudenberger, Holger Dengler, Heiko Carstens,
Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
Sven Schnelle, Alex Williamson, Matthew Rosato, Farhan Ali,
Eric Farman, linux-s390
In-Reply-To: <6a168c8ea7d10_2129b2100e@djbw-dev.notmuch>
On Tue, May 26, 2026 at 11:17:50PM -0700, Dan Williams (nvidia) wrote:
> In that case pci_tsm_req_scope becomes tsm_req_type and is just:
>
> TSM_REQ_TYPE_CCA
> TSM_REQ_TYPE_SEV
> TSM_REQ_TYPE_TDX
>
> I am leaning towards the latter at this point.
Yeah, this sounds good. I would also include an common op field that
can be decoded by the TSM driver based on the TYPE above, and the
usual in/out message buffers.
Jason
^ permalink raw reply
* Re: [RFC PATCH 14/15] x86/virt/tdx: Embed version info in SEAMCALL leaf function definitions
From: Xu Yilun @ 2026-05-27 11:45 UTC (permalink / raw)
To: Xiaoyao Li
Cc: kas, djbw, rick.p.edgecombe, x86, peter.fang, linux-coco,
linux-kernel, kvm, sohil.mehta, yilun.xu, baolu.lu,
zhenzhong.duan
In-Reply-To: <df4eb3b2-a11e-4d73-b298-af1beb92d53f@intel.com>
On Wed, May 27, 2026 at 03:44:45PM +0800, Xiaoyao Li wrote:
> On 5/27/2026 2:45 PM, Xu Yilun wrote:
> > > > /*
> > > > * TDX module SEAMCALL leaf functions
> > > > */
> > > > @@ -31,7 +44,7 @@
> > > > #define TDH_VP_CREATE 10
> > > > #define TDH_MNG_KEY_FREEID 20
> > > > #define TDH_MNG_INIT 21
> > > > -#define TDH_VP_INIT 22
> > > > +#define TDH_VP_INIT SEAMCALL_LEAF_VER(22, 1)
> > >
> > > how about
> > >
> > > #define TDH_VP_INIT 22
> > > #define TDH_VP_INIT_V1 SEAMCALL_LEAF_VER(TDH_VP_INIT, 1)
> > >
> > > and use TDH_VP_INIT_V1 below?
> >
> > I'm trying to avoid a _Vx postfix if unnecessary. Don't make callers
> > have to choose between versions. The main MACRO should always point to
> > the latest version since later versions are backward compatible.
>
> I don't agree.
>
> The later versions are backwards compatible, but the later versions might
> not be supported by the loaded TDX module.
>
> Usually the callers will have to choose between versions due to the TDX
> module being used varies, just like the case in the next patch.
No, we don't choose SEAMCALL versions based on TDX module versions. The
next patch is an exception, if by the time of merging there are releases
support TDX_SYS_CONFIG v1, I'd rather delete TDX_SYS_CONFIG_V0.
^ permalink raw reply
* [PATCH v3 2/2] x86/tdx: Fix zero-extension for 32-bit port I/O
From: Kiryl Shutsemau (Meta) @ 2026-05-27 12:05 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86
Cc: H . Peter Anvin, Rick Edgecombe, Kuppuswamy Sathyanarayanan,
Kai Huang, Sean Christopherson, Borys Tsyrulnikov, linux-kernel,
linux-coco, kvm, stable, Kiryl Shutsemau (Meta)
In-Reply-To: <20260527120544.2903923-1-kas@kernel.org>
According to x86 architecture rules, 32-bit operations zero-extend the
result to 64 bits. The current implementation of handle_in() only masks
the lower 32 bits, which preserves the upper 32 bits of RAX when a
32-bit port IN instruction is emulated.
Update handle_in() to zero out the entire RAX register when the I/O size
is 4 bytes to ensure correct zero-extension. For smaller sizes (1 or 2
bytes), continue to preserve the unaffected upper bits.
Fixes: 03149948832a ("x86/tdx: Port I/O: Add runtime hypercalls")
Reported-by: Borys Tsyrulnikov <tsyrulnikov.borys@gmail.com>
Link: https://lore.kernel.org/all/CAKw_Dz96rfSQc6Rn+9QBcUFHhmkK+9zu+P=bxowfZwxrATCBRg@mail.gmail.com/
Signed-off-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: stable@vger.kernel.org
---
arch/x86/coco/tdx/tdx.c | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 65119362f9a2..58feca419326 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -703,8 +703,25 @@ static bool handle_in(struct pt_regs *regs, int size, int port)
*/
success = !__tdx_hypercall(&args);
- /* Update part of the register affected by the emulated instruction */
- regs->ax &= ~mask;
+ /*
+ * IN writes the result into a sub-register of RAX. Only the
+ * 32-bit form zero-extends; the smaller forms leave the upper
+ * bits untouched:
+ *
+ * insn dest size bits written bits preserved
+ * inb AL 1 RAX[ 7: 0] RAX[63: 8]
+ * inw AX 2 RAX[15: 0] RAX[63:16]
+ * inl EAX 4 RAX[63: 0] (none, zero-extended)
+ *
+ * 'mask' only covers the low 'size' bytes, which is exactly the
+ * range affected for size 1 and 2. For size 4 the write also
+ * clears RAX[63:32], so widen the clear-mask.
+ */
+ if (size == 4)
+ regs->ax = 0;
+ else
+ regs->ax &= ~mask;
+
if (success)
regs->ax |= args.r11 & mask;
--
2.54.0
^ permalink raw reply related
* [PATCH v3 1/2] x86/tdx: Fix off-by-one in port I/O handling
From: Kiryl Shutsemau (Meta) @ 2026-05-27 12:05 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86
Cc: H . Peter Anvin, Rick Edgecombe, Kuppuswamy Sathyanarayanan,
Kai Huang, Sean Christopherson, Borys Tsyrulnikov, linux-kernel,
linux-coco, kvm, stable, Kiryl Shutsemau (Meta)
In-Reply-To: <20260527120544.2903923-1-kas@kernel.org>
handle_in() and handle_out() in arch/x86/coco/tdx/tdx.c use:
u64 mask = GENMASK(BITS_PER_BYTE * size, 0);
GENMASK(h, l) includes bit h. For size=1 (INB), this produces
GENMASK(8, 0) = 0x1FF (9 bits) instead of GENMASK(7, 0) = 0xFF (8
bits). The mask is one bit too wide for all I/O sizes.
Fix the mask calculation.
Fixes: 03149948832a ("x86/tdx: Port I/O: Add runtime hypercalls")
Reported-by: Borys Tsyrulnikov <tsyrulnikov.borys@gmail.com>
Link: https://lore.kernel.org/all/CAKw_Dz96rfSQc6Rn+9QBcUFHhmkK+9zu+P=bxowfZwxrATCBRg@mail.gmail.com/
Signed-off-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: stable@vger.kernel.org
---
arch/x86/coco/tdx/tdx.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 186915a17c50..65119362f9a2 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -693,7 +693,7 @@ static bool handle_in(struct pt_regs *regs, int size, int port)
.r13 = PORT_READ,
.r14 = port,
};
- u64 mask = GENMASK(BITS_PER_BYTE * size, 0);
+ u64 mask = GENMASK(BITS_PER_BYTE * size - 1, 0);
bool success;
/*
@@ -713,7 +713,7 @@ static bool handle_in(struct pt_regs *regs, int size, int port)
static bool handle_out(struct pt_regs *regs, int size, int port)
{
- u64 mask = GENMASK(BITS_PER_BYTE * size, 0);
+ u64 mask = GENMASK(BITS_PER_BYTE * size - 1, 0);
/*
* Emulate the I/O write via hypercall. More info about ABI can be found
--
2.54.0
^ permalink raw reply related
* [PATCH v3 0/2] x86/tdx: Port I/O emulation fixes
From: Kiryl Shutsemau (Meta) @ 2026-05-27 12:05 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86
Cc: H . Peter Anvin, Rick Edgecombe, Kuppuswamy Sathyanarayanan,
Kai Huang, Sean Christopherson, Borys Tsyrulnikov, linux-kernel,
linux-coco, kvm, stable, Kiryl Shutsemau (Meta)
This series addresses two technical inaccuracies in the TDX guest port
I/O emulation code reported by Borys Tsyrulnikov.
The first patch fixes an off-by-one error in the GENMASK() macro usage
where the mask was being calculated as one bit too wide (e.g. 9 bits for
an 8-bit operation).
The second patch ensures that 32-bit port I/O operations (INL) correctly
zero-extend the result to the full 64-bit RAX register, as required by
the x86 architecture. Currently, the emulation preserves the upper 32
bits of RAX during such operations.
Both issues were introduced in the initial implementation of the runtime
hypercalls for port I/O.
v1: https://lore.kernel.org/all/20260331112430.71425-1-kas@kernel.org/
v2: https://lore.kernel.org/all/20260428125632.129770-1-kas@kernel.org/
Changes in v3:
- Expand the comment in patch 2 with a table describing which RAX
bits each IN form writes vs preserves, clarifying why the 32-bit
case needs to clear RAX[63:32] (Dave Hansen).
- Rebase onto v7.1-rc5.
Changes in v2:
- Rephrase the size check in handle_in() as "if (size == 4)" for
readability (Kuppuswamy)
- Add Link: to the bug report on both patches (Kuppuswamy)
- Collect Reviewed-by tags (Kai Huang, Kuppuswamy Sathyanarayanan)
- Rebase onto v7.1-rc1
Kiryl Shutsemau (Meta) (2):
x86/tdx: Fix off-by-one in port I/O handling
x86/tdx: Fix zero-extension for 32-bit port I/O
arch/x86/coco/tdx/tdx.c | 25 +++++++++++++++++++++----
1 file changed, 21 insertions(+), 4 deletions(-)
base-commit: e7ae89a0c97ce2b68b0983cd01eda67cf373517d
--
2.54.0
^ permalink raw reply
* Re: [PATCH v10 21/25] x86/virt/tdx: Refresh TDX module version after update
From: Kiryl Shutsemau @ 2026-05-27 11:30 UTC (permalink / raw)
To: Chao Gao
Cc: kvm, linux-coco, linux-kernel, binbin.wu, dave.hansen, djbw,
ira.weiny, kai.huang, nik.borisov, paulmck, pbonzini,
reinette.chatre, rick.p.edgecombe, sagis, seanjc, tony.lindgren,
vannapurve, vishal.l.verma, yilun.xu, xiaoyao.li, yan.y.zhao,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
H. Peter Anvin
In-Reply-To: <20260520133909.409394-22-chao.gao@intel.com>
On Wed, May 20, 2026 at 06:38:24AM -0700, Chao Gao wrote:
> The kernel exposes the TDX module version through sysfs so userspace can
> check update compatibility. That information needs to remain accurate
> across runtime updates.
>
> A runtime update may change the module's update_version, so refresh the
> cached version right after a successful update.
>
> Drop __ro_after_init from tdx_sysinfo because it is now updated at runtime.
__read_mostly?
> Do not refresh the rest of tdx_sysinfo, even if some values change across
> updates. TDX module updates are backward compatible, so existing
> tdx_sysinfo consumers, such as KVM, can continue to operate without seeing
> the new values.
>
> Refreshing the full structure would be risky. A tdx_sysinfo consumer may
> initialize its TDX support based on the features originally reported in
> tdx_sysinfo. If a runtime update adds new features and the full structure
> is refreshed, that consumer could observe and use the newly reported
> features without having performed the setup required to use them safely.
>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> v9:
> - don't print old and new version [Dave]
> - explain why it's OK to hide changes from the tdx_sysinfo users [Dave]
> - update versions in stop_machine context
> - don't mention major/minor versions are idential across updates. That fact is
> not relevant here.
> ---
> arch/x86/virt/vmx/tdx/tdx.c | 6 +++++-
> arch/x86/virt/vmx/tdx/tdx_global_metadata.c | 2 +-
> 2 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index e3f5aa272850..55670365a388 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -67,7 +67,7 @@ static struct tdmr_info_list tdx_tdmr_list;
> /* All TDX-usable memory regions. Protected by mem_hotplug_lock. */
> static LIST_HEAD(tdx_memlist);
>
> -static struct tdx_sys_info tdx_sysinfo __ro_after_init;
> +static struct tdx_sys_info tdx_sysinfo;
>
> static DEFINE_RAW_SPINLOCK(sysinit_lock);
>
> @@ -1314,6 +1314,10 @@ int tdx_module_run_update(void)
> if (ret)
> return ret;
>
> + /* Shouldn't fail as the update has succeeded. */
> + ret = get_tdx_sys_info_version(&tdx_sysinfo.version);
> + WARN_ON_ONCE(ret);
> +
Warn, but pretend that everything is fine?
> tdx_module_state.initialized = true;
> return 0;
> }
> diff --git a/arch/x86/virt/vmx/tdx/tdx_global_metadata.c b/arch/x86/virt/vmx/tdx/tdx_global_metadata.c
> index e793dec688ab..e49c300f23d4 100644
> --- a/arch/x86/virt/vmx/tdx/tdx_global_metadata.c
> +++ b/arch/x86/virt/vmx/tdx/tdx_global_metadata.c
> @@ -7,7 +7,7 @@
> * Include this file to other C file instead.
> */
>
> -static __init int get_tdx_sys_info_version(struct tdx_sys_info_version *sysinfo_version)
> +static int get_tdx_sys_info_version(struct tdx_sys_info_version *sysinfo_version)
> {
> int ret = 0;
> u64 val;
> --
> 2.52.0
>
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply
* Re: [PATCH v10 13/25] x86/virt/seamldr: Allocate and populate a module update request
From: Kiryl Shutsemau @ 2026-05-27 11:27 UTC (permalink / raw)
To: Chao Gao
Cc: kvm, linux-coco, linux-kernel, binbin.wu, dave.hansen, djbw,
ira.weiny, kai.huang, nik.borisov, paulmck, pbonzini,
reinette.chatre, rick.p.edgecombe, sagis, seanjc, tony.lindgren,
vannapurve, vishal.l.verma, yilun.xu, xiaoyao.li, yan.y.zhao,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
H. Peter Anvin
In-Reply-To: <20260520133909.409394-14-chao.gao@intel.com>
On Wed, May 20, 2026 at 06:38:16AM -0700, Chao Gao wrote:
> +static void populate_pa_list(u64 *pa_list, const u8 *vmalloc_addr, u32 vmalloc_len_pages)
> +{
> + int i;
> +
> + for (i = 0; i < vmalloc_len_pages; i++) {
> + unsigned long offset = i * PAGE_SIZE;
> + unsigned long pfn = vmalloc_to_pfn(&vmalloc_addr[offset]);
I don't like that we need to assume how the image got allocated this
deep in the stack.
I can imagine situation in the future when we might want to load TDX
module to memory on boot, like initrd. And it won't be vmalloced in this
case.
Wouldn't be better to use a neutral way to get physical address that
doesn't have the assumption? Like, slow_virt_to_phys().
> +
> + pa_list[i] = pfn << PAGE_SHIFT;
> + }
> +}
> +
> +static void populate_seamldr_params(struct seamldr_params *params,
> + const u8 *sig, u32 sig_nr_pages,
> + const u8 *mod, u32 mod_nr_pages)
> +{
> + params->version = 0;
> + params->scenario = SEAMLDR_SCENARIO_UPDATE;
> + params->module_nr_pages = mod_nr_pages;
> +
> + populate_pa_list(params->sigstruct_pages_pa_list, sig, sig_nr_pages);
> + populate_pa_list(params->module_pages_pa_list, mod, mod_nr_pages);
I am not sure what the value to have this as a separate function.
Having it directly in init_seamldr_params() would be easier to follow.
> +}
> +
> +/*
> + * @image points to a vmalloc()'d 'struct tdx_image'. Transform
> + * it into @params which is the P-SEAMLDR ABI format.
> + */
> +static int init_seamldr_params(struct seamldr_params *params,
> + const struct tdx_image *image,
> + u32 image_len)
> +{
> + const struct tdx_image_header *header = &image->header;
> +
> + u32 sigstruct_len = header->sigstruct_nr_pages * PAGE_SIZE;
> + u32 module_len = header->module_nr_pages * PAGE_SIZE;
> +
> + u8 *header_start = (u8 *)header;
> + u8 *header_end = header_start + TDX_IMAGE_HEADER_SIZE;
> +
> + u8 *sigstruct_start = header_end;
> + u8 *sigstruct_end = sigstruct_start + sigstruct_len;
> +
> + u8 *module_start = sigstruct_end;
> +
> + /* Check the calculated payload size against the image size. */
> + if (TDX_IMAGE_HEADER_SIZE + sigstruct_len + module_len != image_len)
> + return -EINVAL;
> +
> + /* Reject unsupported tdx_image ABI versions. */
> + if (header->version != TDX_IMAGE_VERSION_2)
> + return -EINVAL;
> +
> + if (header->sigstruct_nr_pages > SEAMLDR_MAX_NR_SIG_PAGES ||
> + header->module_nr_pages > SEAMLDR_MAX_NR_MODULE_PAGES)
> + return -EINVAL;
> +
> + if (memcmp(header->signature, "TDX-BLOB", sizeof(header->signature)))
> + return -EINVAL;
> +
> + if (memchr_inv(header->reserved, 0, sizeof(header->reserved)))
> + return -EINVAL;
> +
> + populate_seamldr_params(params, sigstruct_start, header->sigstruct_nr_pages,
> + module_start, header->module_nr_pages);
> + return 0;
> +}
> +
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply
* Re: [PATCH 00/15] Enable TDX Module Extensions and DICE-based TDX Quoting
From: Xu Yilun @ 2026-05-27 10:38 UTC (permalink / raw)
To: Sohil Mehta
Cc: kas, djbw, rick.p.edgecombe, x86, peter.fang, linux-coco,
linux-kernel, kvm, yilun.xu, baolu.lu, zhenzhong.duan, xiaoyao.li
In-Reply-To: <2542e761-57ed-4170-8ce9-1de3fc685dea@intel.com>
> > == Overview ==
> >
> > TDX Module introduces the "TDX Module Extensions" to support long
> > running / hard-irq preemptible flows inside. This makes TDX Module
> > capable of handling complex tasks through "Extension SEAMCALLs".
>
> Can we explain a bit more about why these extensions are needed or what
> would happen if the kernel didn't enable them? I ran the series through
> an LLM for my curiosity. I think something on the below lines might be a
> good addition for the cover letter itself.
>
> (Please verify)
>
> The TDX module's normal SEAMCALLs are designed to be short,
> non-preemptible operations. However, some newer features (like
> DICE-based TDX Quoting) require complex, potentially long-running
> computations that can't complete within the tight constraints of a
> single non-preemptible SEAMCALL.
>
> The "TDX Module Extensions" solve this by introducing "Extension
> SEAMCALLs" — a new class of SEAMCALLs that are:
>
> * Long-running — they may take significant time to complete (e.g.,
> cryptographic operations for attestation/quoting).
>
> * Hard-IRQ preemptible — they can be interrupted by hardware interrupts
> and later resumed, so they don't monopolize the CPU or cause
> unacceptable interrupt latency.
>
> Without this mechanism, complex operations like generating DICE
> attestation quotes would either block interrupts for too long
> (unacceptable for a host kernel) or wouldn't be possible inside the TDX
> module at all. The Extensions give the TDX module a way to handle these
> heavyweight tasks while remaining cooperative with the host's
> interrupt/scheduling model.
I'm good to these detailed description. I'll add them to the
cover-letter.
>
> >
> > TDX Module allows some add-on features to use the Extension.
>
> s/Module/module throughout the series.
>
> The existing kernel code predominantly uses the lower case TDX "module".
OK.
>
>
> > The first feature to use Extensions is DICE-based TDX Quoting [1].
> > DICE is an industry-standard, certificate-backed attestation
> > framework that layers evidence through a chain of certificates.
> >
> > This series adds infrastructure to enable the Extensions and then
> > implement DICE-based TDX Quoting.
> >
> > The Extensions consumes relatively large amount of memory (~50MB). So it
> > is designed to be off by default. It must be enabled after basic TDX
> > Module initialization and when add-on features require it. To enable
> > the Extensions, host first adds extra memory to TDX Module via a
> > SEAMCALL (TDH.EXT.MEM.ADD), then uses another SEAMCALL (TDH.EXT.INIT) to
> > initialize Extensions, and then some add-on features, e.g. DICE, could
> > use Extension SEAMCALLs for work. Note that host can never get the added
> > memory back.
> >
> > Theoretically, the Extensions doesn't need to be enabled right after
> > basic TDX initialization. It could be enabled right before the first
> > Extension SEAMCALL is issued. That would save or postpone memory usage.
> > But it isn't worth the complexity, the needs for the Extensions are vast
> > but the savings are little for a typical TDX capable system (about
> > 0.001% of memory). So the Linux decision is to just enable it along with
> > the basic TDX.
> >
>
> I think enabling it by default on TDX platforms (with the module
> extension) might make sense. But the explanation here is slightly
> confusing.
>
> You said earlier that "The Extensions consumes relatively large amount
> of memory (~50MB)" so they must be off by default. Later you say that
Sorry maybe I should say "the firmware design is: 1. Off by default.
2. Must be enabled after basic TDX module ...". I'll try to update the
words.
> "..the saving are little .."
Because for security purpose, these add-on features are always needed,
even if not all of them, so Extensions will most likely be enabled.
And even if someone switched them off all and saved the memory, compared
to the memory of a typical TDX capable system (lets say 1TB), the saving
is still little (0.001%).
>
> Are you saying that the dynamic enabling of the extensions is not worth
The dynamic enabling of the Extensions is not worth.
> it or the dynamic allocation of the memory needed to support them?
>
> In addition, could you briefly describe the complexity we are trading off?
If we delay the Extensions initialization to the first Extension
SEAMCALL, we need to maintain additional TDX state machine for
lifecycle, and we need mechanisms to synchronize parallel Extension
enabling request from multiple callers.
^ permalink raw reply
* Re: [PATCH v10 04/25] x86/virt/tdx: Move TDX_FEATURES0 bits to asm/tdx.h
From: Kiryl Shutsemau @ 2026-05-27 10:56 UTC (permalink / raw)
To: Chao Gao, rick.p.edgecombe
Cc: kvm, linux-coco, linux-kernel, binbin.wu, dave.hansen, djbw,
ira.weiny, kai.huang, nik.borisov, paulmck, pbonzini,
reinette.chatre, sagis, seanjc, tony.lindgren, vannapurve,
vishal.l.verma, yilun.xu, xiaoyao.li, yan.y.zhao, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin
In-Reply-To: <20260520133909.409394-5-chao.gao@intel.com>
On Wed, May 20, 2026 at 06:38:07AM -0700, Chao Gao wrote:
> Future changes will add support for new TDX features exposed as
> TDX_FEATURES0 bits. The presence of these features will need to be checked
> outside of arch/x86/virt. So the feature query helpers, and the
> TDX_FEATURES0 defines they reference, will need to live in the widely
> accessible asm/tdx.h header. Move the existing TDX_FEATURES0 to asm/tdx.h
> so that they can all be kept together.
>
> Opportunistically switch to BIT_ULL() since TDX_FEATURES0 is 64-bit.
>
> No functional change intended.
I don't have a problem with the patch, but it seems to be colliding with
DPAMT patchset that also moves the define around.
Rick, I assume this patchset going upstream first, right?
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply
* Re: [PATCH v10 03/25] x86/virt/tdx: Consolidate TDX global initialization states
From: Kiryl Shutsemau @ 2026-05-27 10:52 UTC (permalink / raw)
To: Chao Gao
Cc: kvm, linux-coco, linux-kernel, binbin.wu, dave.hansen, djbw,
ira.weiny, kai.huang, nik.borisov, paulmck, pbonzini,
reinette.chatre, rick.p.edgecombe, sagis, seanjc, tony.lindgren,
vannapurve, vishal.l.verma, yilun.xu, xiaoyao.li, yan.y.zhao,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
H. Peter Anvin
In-Reply-To: <20260520133909.409394-4-chao.gao@intel.com>
On Wed, May 20, 2026 at 06:38:06AM -0700, Chao Gao wrote:
> The kernel uses several global flags to guard one-time TDX initialization
> flows and prevent them from being repeated.
>
> When the TDX module is updated, all of those states must be reset so that
> the module can be initialized again. Today those states are kept as
> separate global variables, which makes the reset path awkward and easy to
> miss when a new state is added.
>
> Group the states into a single structure so they can be reset together, for
> example with memset(), and so a newly added state won't be missed.
>
> Drop the __ro_after_init annotation from tdx_module_initialized because
> the other two states do not have it. And with TDX module update support,
> all the states need to be writable at runtime.
You still can use __read_mostly.
> Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Kiryl Shutsemau <kas@kernel.org>
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply
* Re: [PATCH v14 23/44] arm64: RMI: Handle RMI_EXIT_RIPAS_CHANGE
From: Wei-Lin Chang @ 2026-05-27 10:52 UTC (permalink / raw)
To: Steven Price, kvm, kvmarm
Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse,
Oliver Upton, Suzuki K Poulose, Zenghui Yu, linux-arm-kernel,
linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Gavin Shan,
Shanker Donthineni, Alper Gun, Aneesh Kumar K . V, Emi Kisanuki,
Vishal Annapurve, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-24-steven.price@arm.com>
Hi,
On Wed, May 13, 2026 at 02:17:31PM +0100, Steven Price wrote:
> The guest can request that a region of it's protected address space is
> switched between RIPAS_RAM and RIPAS_EMPTY (and back) using
> RSI_IPA_STATE_SET. This causes a guest exit with the
> RMI_EXIT_RIPAS_CHANGE code. We treat this as a request to convert a
> protected region to unprotected (or back), exiting to the VMM to make
> the necessary changes to the guest_memfd and memslot mappings. On the
> next entry the RIPAS changes are committed by making RMI_RTT_SET_RIPAS
> calls.
>
> The VMM may wish to reject the RIPAS change requested by the guest. For
> now it can only do this by no longer scheduling the VCPU as we don't
> currently have a usecase for returning that rejection to the guest, but
> by postponing the RMI_RTT_SET_RIPAS changes to entry we leave the door
> open for adding a new ioctl in the future for this purpose.
>
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
> Changes since v13:
> * Switch to the new RMI_RTT_UNPROT_UNMAP range-based API.
> * Drop ugly hack for RMM bug which errored when the RIPAS was already
> set to the desired value.
> Changes since v12:
> * Switch to the new RMM v2.0 RMI_RTT_DATA_UNMAP which can unmap an
> address range.
> Changes since v11:
> * Combine the "Allow VMM to set RIPAS" patch into this one to avoid
> adding functions before they are used.
> * Drop the CAP for setting RIPAS and adapt to changes from previous
> patches.
> Changes since v10:
> * Add comment explaining the assignment of rec->run->exit.ripas_base in
> kvm_complete_ripas_change().
> Changes since v8:
> * Make use of ripas_change() from a previous patch to implement
> realm_set_ipa_state().
> * Update exit.ripas_base after a RIPAS change so that, if instead of
> entering the guest we exit to user space, we don't attempt to repeat
> the RIPAS change (triggering an error from the RMM).
> Changes since v7:
> * Rework the loop in realm_set_ipa_state() to make it clear when the
> 'next' output value of rmi_rtt_set_ripas() is used.
> New patch for v7: The code was previously split awkwardly between two
> other patches.
> ---
> arch/arm64/include/asm/kvm_rmi.h | 6 +
> arch/arm64/kvm/mmu.c | 8 +-
> arch/arm64/kvm/rmi.c | 439 +++++++++++++++++++++++++++++++
> 3 files changed, 450 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_rmi.h b/arch/arm64/include/asm/kvm_rmi.h
> index feb534a6678e..007249a13dbc 100644
> --- a/arch/arm64/include/asm/kvm_rmi.h
> +++ b/arch/arm64/include/asm/kvm_rmi.h
> @@ -88,6 +88,12 @@ int kvm_rec_enter(struct kvm_vcpu *vcpu);
> int kvm_rec_pre_enter(struct kvm_vcpu *vcpu);
> int handle_rec_exit(struct kvm_vcpu *vcpu, int rec_run_status);
>
> +void kvm_realm_unmap_range(struct kvm *kvm,
> + unsigned long ipa,
> + unsigned long size,
> + bool unmap_private,
> + bool may_block);
> +
> static inline bool kvm_realm_is_private_address(struct realm *realm,
> unsigned long addr)
> {
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index eb56d4e7f21a..10ca9dbe40a0 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -319,6 +319,7 @@ static void invalidate_icache_guest_page(void *va, size_t size)
> * @start: The intermediate physical base address of the range to unmap
> * @size: The size of the area to unmap
> * @may_block: Whether or not we are permitted to block
> + * @only_shared: If true then protected mappings should not be unmapped
Do you think it's better if we use enum kvm_gfn_range_filter for this?
Pass KVM_FILTER_{PRIVATE, SHARED} to indicate what to unmap. This way we
don't have the think about booleans. kvm_realm_unmap_range() in patch 23
will have to change too though.
> *
> * Clear a range of stage-2 mappings, lowering the various ref-counts. Must
> * be called while holding mmu_lock (unless for freeing the stage2 pgd before
> @@ -326,7 +327,7 @@ static void invalidate_icache_guest_page(void *va, size_t size)
> * with things behind our backs.
> */
> static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size,
> - bool may_block)
> + bool may_block, bool only_shared)
> {
> struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
> phys_addr_t end = start + size;
> @@ -343,7 +344,7 @@ void kvm_stage2_unmap_range(struct kvm_s2_mmu *mmu, phys_addr_t start,
> if (kvm_vm_is_protected(kvm_s2_mmu_to_kvm(mmu)))
> return;
>
> - __unmap_stage2_range(mmu, start, size, may_block);
> + __unmap_stage2_range(mmu, start, size, may_block, false);
> }
>
> void kvm_stage2_flush_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
> @@ -2418,7 +2419,8 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
>
> __unmap_stage2_range(&kvm->arch.mmu, range->start << PAGE_SHIFT,
> (range->end - range->start) << PAGE_SHIFT,
> - range->may_block);
> + range->may_block,
> + !(range->attr_filter & KVM_FILTER_PRIVATE));
>
> kvm_nested_s2_unmap(kvm, range->may_block);
> return false;
[...]
Thanks,
Wei-Lin Chang
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox