* [PATCH v4 0/3] Enforce host page-size alignment for shared buffers
From: Aneesh Kumar K.V (Arm) @ 2026-04-27 6:31 UTC (permalink / raw)
To: linux-kernel, iommu, linux-coco, linux-arm-kernel, kvmarm
Cc: Aneesh Kumar K.V (Arm), Catalin Marinas, Jason Gunthorpe,
Marc Zyngier, Marek Szyprowski, Robin Murphy, Steven Price,
Suzuki K Poulose, Thomas Gleixner, Will Deacon
Hi all,
This patch series addresses alignment requirements for buffers shared between
private-memory guests and the host.
When running private-memory guests, the guest kernel must apply additional
constraints when allocating buffers that are shared with the hypervisor. These
shared buffers are also accessed by the host kernel and therefore must be
aligned to the host’s page size.
Architectures such as Arm can tolerate realm physical address space PFNs being
mapped as shared memory, as incorrect accesses are detected and reported as GPC
faults. However, relying on this mechanism alone is unsafe and can still lead to
kernel crashes.
This is particularly likely when guest_memfd allocations are mmapped and
accessed from userspace. Once exposed to userspace, it is not possible to
guarantee that applications will only access the intended 4K shared region
rather than the full 64K page mapped into their address space. Such userspace
addresses may also be passed back into the kernel and accessed via the linear
map, potentially resulting in a GPC fault and a kernel crash.
To address this, the series introduces a new helpers,
mem_decrypt_granule_size() and mem_decrypt_align(), which allows callers to
enforce the required alignment for shared buffers.
Changes from v3:
https://lore.kernel.org/all/20260309102625.2315725-1-aneesh.kumar@kernel.org
* Fix build error reported by kernel test robot <lkp@intel.com>
Changes from v2:
https://lore.kernel.org/all/20251221160920.297689-1-aneesh.kumar@kernel.org
* Rebase to latest kernel
* Consider swiotlb always decrypted and don't align when allocating from swiotlb.
Changes from v1:
* Rename the helper to mem_encrypt_align
* Improve the commit message
* Handle DMA allocations from contiguous memory
* Handle DMA allocations from the pool
* swiotlb is still considered unencrypted. Support for an encrypted swiotlb pool
is left as TODO and is independent of this series.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Steven Price <steven.price@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Will Deacon <will@kernel.org>
Aneesh Kumar K.V (Arm) (3):
dma-direct: swiotlb: handle swiotlb alloc/free outside
__dma_direct_alloc_pages
swiotlb: dma: its: Enforce host page-size alignment for shared buffers
coco: guest: arm64: Query host IPA-change alignment via RHI
arch/arm64/include/asm/mem_encrypt.h | 3 ++
arch/arm64/include/asm/rhi.h | 24 ++++++++++++
arch/arm64/include/asm/rsi.h | 2 +
arch/arm64/include/asm/rsi_cmds.h | 10 +++++
arch/arm64/include/asm/rsi_smc.h | 7 ++++
arch/arm64/kernel/Makefile | 2 +-
arch/arm64/kernel/rhi.c | 54 ++++++++++++++++++++++++++
arch/arm64/kernel/rsi.c | 13 +++++++
arch/arm64/mm/mem_encrypt.c | 27 +++++++++++--
drivers/irqchip/irq-gic-v3-its.c | 20 ++++++----
include/linux/mem_encrypt.h | 14 +++++++
kernel/dma/contiguous.c | 10 +++++
kernel/dma/direct.c | 58 ++++++++++++++++++++++++----
kernel/dma/pool.c | 4 +-
kernel/dma/swiotlb.c | 21 ++++++----
15 files changed, 240 insertions(+), 29 deletions(-)
create mode 100644 arch/arm64/include/asm/rhi.h
create mode 100644 arch/arm64/kernel/rhi.c
--
2.43.0
^ permalink raw reply
* Re: [PATCH v2 30/31] coco/tdx-host: Implement IDE stream setup/teardown
From: Xu Yilun @ 2026-04-27 3:54 UTC (permalink / raw)
To: Tian, Kevin
Cc: linux-coco@lists.linux.dev, linux-pci@vger.kernel.org,
Williams, Dan J, x86@kernel.org, Gao, Chao, Jiang, Dave,
baolu.lu@linux.intel.com, Xu, Yilun, Duan, Zhenzhong,
kvm@vger.kernel.org, Edgecombe, Rick P,
dave.hansen@linux.intel.com, kas@kernel.org, Li, Xiaoyao,
Verma, Vishal L, linux-kernel@vger.kernel.org
In-Reply-To: <BN9PR11MB5276DBD859A8122620EB86CE8C2B2@BN9PR11MB5276.namprd11.prod.outlook.com>
On Fri, Apr 24, 2026 at 07:05:32AM +0000, Tian, Kevin wrote:
> > From: Xu Yilun <yilun.xu@linux.intel.com>
> > Sent: Wednesday, April 22, 2026 5:58 PM
> >
> > On Thu, Apr 09, 2026 at 08:02:33AM +0000, Tian, Kevin wrote:
> > > > From: Xu Yilun <yilun.xu@linux.intel.com>
> > > > Sent: Saturday, March 28, 2026 12:02 AM
> > > >
> > > > Implementation for a most straightforward Selective IDE stream setup.
> > > > Hard code all parameters for Stream Control Register. And no IDE Key
> > > > Refresh support.
> > > >
> > >
> > > 'more straightforward', compared to what?
>
> a typo.
>
> >
> > Actually it is " *most* straightforward", I just mean "very".
>
> When you say "most straightforward", then I want to know what are
> other options to compare. If you think that the thought practice
OK, I think the use of "a most" is somewhat misleading. I don't want to
emphasize on comparison. I just want to give a summary about the
implementation: hard code all parameters, give no option for
configurations, no optional features supported e.g. KEY Refresh.
So is it better I just s/most/very:
Implementation for a very straightforward Selective IDE stream setup...
> leading to the 'most' definition is important, then please elaborate.
>
> otherwise I'd just remove that sentence.
>
^ permalink raw reply
* Re: [PATCH v2 24/31] coco/tdx-host: Add a helper to exchange SPDM messages through DOE
From: Xu Yilun @ 2026-04-27 3:34 UTC (permalink / raw)
To: Tian, Kevin
Cc: linux-coco@lists.linux.dev, linux-pci@vger.kernel.org,
Williams, Dan J, x86@kernel.org, Gao, Chao, Jiang, Dave,
baolu.lu@linux.intel.com, Xu, Yilun, Duan, Zhenzhong,
kvm@vger.kernel.org, Edgecombe, Rick P,
dave.hansen@linux.intel.com, kas@kernel.org, Li, Xiaoyao,
Verma, Vishal L, linux-kernel@vger.kernel.org
In-Reply-To: <BN9PR11MB52767B17432AF4A2622091608C2B2@BN9PR11MB5276.namprd11.prod.outlook.com>
On Fri, Apr 24, 2026 at 07:01:32AM +0000, Tian, Kevin wrote:
> > From: Xu Yilun <yilun.xu@linux.intel.com>
> > Sent: Wednesday, April 22, 2026 5:41 PM
> >
> > On Thu, Apr 09, 2026 at 07:56:06AM +0000, Tian, Kevin wrote:
> > > > From: Xu Yilun <yilun.xu@linux.intel.com>
> > > > Sent: Saturday, March 28, 2026 12:01 AM
> > > > +
> > > > +static int __maybe_unused tdx_spdm_msg_exchange(struct
> > tdx_tsm_link
> > > > *tlink,
> > > > + void *request, size_t
> > > > request_sz,
> > > > + void *response, size_t
> > > > response_sz)
> > > > +{
> > > > + struct pci_dev *pdev = tlink->pci.base_tsm.pdev;
> > >
> > > call it pci_spdm_msg_exchange() and pass in struct pci_dev directly.
> >
> > I don't think so. There is kernel managed spdm transfer support WIP,
> > which is another topic. We don't want to mix the namespace with that
> > one.
>
> pci_spdm_raw_msg_exchange() then, since you said currently only
> one user i.e. tdx?
>
> If the kernel managed spdm doesn't support the raw format, then there
> won't be conflict.
I think kernel managed spdm APIs should not support the raw format (with
DOE header). SPDM protocol could be run on various transit so make no
sense to have special care for DOE.
So this "raw_msg_exchange" is dedicated for PCI/TSM. Yes there won't be
conflict, but it's important "pci_spdm_" prefix only serve kernel
managed spdm. PCI_TSM managed SPDM has fundamental logical differences
with the kernel managed one.
>
> if it supports (i.e. the 2nd user), then this should be moved to pci core.
>
> >
> > And we also don't name it tsm_spdm_msg_exchange, cause TSM firmwares
> > output different blobs for vendor TSM drivers to transfer. E.g. TDX
> > Module outputs buffers with DOE header & SPDM header, other vendors
> > (AMD IIRC) outputs buffers with only SPDM header. So this function is
> > TDX specific.
> >
>
> TDX is just an user of that. All the logic here is about handling the
> raw format, nothing specific to tdx.
It is special, kernel managed spdm won't expect an all-in-one format,
other vendors either. It is TDX's decision to output this all-in-one
format for OS to transfer.
^ permalink raw reply
* Re: [PATCH v2 23/31] coco/tdx-host: Setup all trusted IOMMUs on TDX Connect init
From: Xu Yilun @ 2026-04-27 3:10 UTC (permalink / raw)
To: Tian, Kevin
Cc: linux-coco@lists.linux.dev, linux-pci@vger.kernel.org,
Williams, Dan J, x86@kernel.org, Gao, Chao, Jiang, Dave,
baolu.lu@linux.intel.com, Xu, Yilun, Duan, Zhenzhong,
kvm@vger.kernel.org, Edgecombe, Rick P,
dave.hansen@linux.intel.com, kas@kernel.org, Li, Xiaoyao,
Verma, Vishal L, linux-kernel@vger.kernel.org
In-Reply-To: <BN9PR11MB5276916B241153121738EB208C2B2@BN9PR11MB5276.namprd11.prod.outlook.com>
On Fri, Apr 24, 2026 at 06:54:54AM +0000, Tian, Kevin wrote:
> > From: Xu Yilun <yilun.xu@linux.intel.com>
> > Sent: Wednesday, April 22, 2026 5:27 PM
> >
> > On Thu, Apr 09, 2026 at 07:51:56AM +0000, Tian, Kevin wrote:
> > > > From: Xu Yilun <yilun.xu@linux.intel.com>
> > > > Sent: Saturday, March 28, 2026 12:01 AM
> > > >
> > > > Setup all trusted IOMMUs on TDX Connect initialization and clear all on
> > > > TDX Connect removal.
> > > >
> > > > Trusted IOMMU setup is the pre-condition for all following TDX Connect
> > > > operations such as SPDM/IDE setup. It is more of a platform
> > > > configuration than a standalone IOMMU configuration, so put the
> > > > implementation in tdx-host driver.
> > > >
> > >
> > > not sure what above tries to tell. why is it a platform configuration
> > > when you have seamcalls on each IOMMU?
> >
> > This is to say the TDH.IOMMU.SETUP relates to PCIe SPDM/IDE, it is not
> > just about IOMMU. By identifying the
> >
> > for_each_iommu(iommu)
> > tdh.iommu.setup(iommu)
> >
> > as a platform configuration, it justifies why we trigger this
> > configuration at tdx-host driver probe, rather than in some
> > IOMMU/IOMMUFD API.
>
> iommu drivers also involve PCI, e.g. call pci_enable_ats(), etc.
>
> so having relation to PCIe SPDM/IDE is not an argument of
> platform vs. IOMMU.
OK, I think I could delete the platform vs. IOMMU thing in commit log.
>
> Actually I'm OK to put that logic in tdx-host. Just the explanation
> here doesn't make much sense...
>
^ permalink raw reply
* Re: [PATCH v2 19/31] iommu/vt-d: Reserve the MSB domain ID bit for the TDX module
From: Xu Yilun @ 2026-04-27 2:50 UTC (permalink / raw)
To: Tian, Kevin
Cc: linux-coco@lists.linux.dev, linux-pci@vger.kernel.org,
Williams, Dan J, x86@kernel.org, Gao, Chao, Jiang, Dave,
baolu.lu@linux.intel.com, Xu, Yilun, Duan, Zhenzhong,
kvm@vger.kernel.org, Edgecombe, Rick P,
dave.hansen@linux.intel.com, kas@kernel.org, Li, Xiaoyao,
Verma, Vishal L, linux-kernel@vger.kernel.org
In-Reply-To: <BN9PR11MB52763F59424C6E4AFD7F63988C2B2@BN9PR11MB5276.namprd11.prod.outlook.com>
> > > btw in patch23 commit msg:
> > >
> > > "
> > > There is no dedicated way to enumerate which IOMMU devices support
> > > trusted operations. The host has to call TDH.IOMMU.SETUP on all IOMMU
> > > devices and tell their trusted capability by the return value.
> > > "
> > >
> > > which implies that ecap_tdxc() alone doesn't really report the capability?
> >
> > Ah, good catch. Let me explain:
> >
> > ecap_tdxc does report the capability. This bit is special cause both
> > trusted part & untrusted part access it.
> >
> > For IOMMU driver (which now handles the untrusted part), it can directly
> > query to this bit and decide what to do.
> >
> > But for tdx-host driver which handles the trusted part, it shouldn't
> > speculate into the IOMMU for capability enumeration. TDX Module has more
> > concerns about trusted capability, including the related I/O stack
>
> I guess "more concerns" means that there are more conditions for
> TDX module to look at beyond ecap_tdxc(), so it's not appropriate
> for tdx-host driver to check ecap alone.
Exactly.
>
> > capabilities e.g. SPDM/IDE cap... So in patch23 I actually mean we
> > don't have an enumeration SEAMCALL for trusted capability, I will
> > refactor that message:
> >
> > There is no dedicated *SEAMCALL* to enumerate which IOMMU devices
> > support
> > trusted operations...
^ permalink raw reply
* Re: [GIT PULL] Trusted Security Manager (PCIe TSM) Update for 7.1
From: pr-tracker-bot @ 2026-04-26 19:35 UTC (permalink / raw)
To: Dan Williams; +Cc: Linus Torvalds, linux-coco, linux-kernel
In-Reply-To: <69ee3920ef32e_fe08310024@djbw-dev.notmuch>
The pull request you sent on Sun, 26 Apr 2026 09:11:12 -0700:
> git://git.kernel.org/pub/scm/linux/kernel/git/devsec/tsm tags/tsm-for-7.1
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/20b64cf8705a0f6268bb9a320eb6b4c425f3ec6c
Thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html
^ permalink raw reply
* Re: [GIT PULL] Trusted Security Manager (PCIe TSM) Update for 7.1
From: Linus Torvalds @ 2026-04-26 16:58 UTC (permalink / raw)
To: Dan Williams; +Cc: linux-coco, linux-kernel
In-Reply-To: <69ee3920ef32e_fe08310024@djbw-dev.notmuch>
On Sun, 26 Apr 2026 at 09:11, Dan Williams <djbw@kernel.org> wrote:
>
> The other motivation for sending this one-patch pull request is to test
> that you have my key with updated email.
I didn't, but honestly, I don't match key issuer ID's or the email
address the pull is sent from anyway.
So I just want the signature to be valid, and from a key I know.
And it all verified with the old key, just with a
issuer "djbw@kernel.org" does not match any User ID
warning.
I wouldn't have cared - or noticed - had you not mentioned it.
But I did update the key so the issuer warning is gone too.
Linus
^ permalink raw reply
* [GIT PULL] Trusted Security Manager (PCIe TSM) Update for 7.1
From: Dan Williams @ 2026-04-26 16:11 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-coco, linux-kernel
Hi Linus, please pull from:
git://git.kernel.org/pub/scm/linux/kernel/git/devsec/tsm tags/tsm-for-7.1
...to receive a small update for the TSM core. It is arguably a fix and
coming in late as I have been offline the past few weeks.
Recall that you asked for searchable help with the TLA disease last time
[1]. "PCIe TSM" turns up useful results.
The other motivation for sending this one-patch pull request is to test
that you have my key with updated email.
This has been in linux-next for a while with no reported issues.
[1]: http://lore.kernel.org/CAHk-=whjvmBiZ=oMnR-R9rqzEPnGCaU7dNLkY1RHXwjRCAR5YQ@mail.gmail.com
--
The following changes since commit f338e77383789c0cae23ca3d48adcc5e9e137e3c:
Linux 7.0-rc4 (2026-03-15 13:52:05 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/devsec/tsm tags/tsm-for-7.1
for you to fetch changes up to 3177779ae17db4c66c851f799505fb95c7530c03:
virt: coco: change tsm_class to a const struct (2026-04-02 15:45:18 -0700)
----------------------------------------------------------------
tsm for 7.1
- Drop class_create() for the "tsm" class
----------------------------------------------------------------
Jori Koolstra (1):
virt: coco: change tsm_class to a const struct
drivers/virt/coco/tsm-core.c | 19 +++++++++----------
1 file changed, 9 insertions(+), 10 deletions(-)
^ permalink raw reply
* Re: [PATCH v5 1/2] dma-mapping: introduce DMA_ATTR_CC_SHARED for shared memory
From: Jason Gunthorpe @ 2026-04-26 13:05 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: Jiri Pirko, dri-devel, linaro-mm-sig, iommu, linux-media,
sumit.semwal, benjamin.gaignard, Brian.Starkey, jstultz,
tjmercier, christian.koenig, m.szyprowski, robin.murphy, leon,
sean.anderson, ptesarik, catalin.marinas, suzuki.poulose,
steven.price, thomas.lendacky, john.allen, ashish.kalra,
suravee.suthikulpanit, linux-coco
In-Reply-To: <20260424225514.GE804026@ziepe.ca>
> > static inline dma_addr_t dma_direct_map_phys(struct device *dev,
> > phys_addr_t phys, size_t size, enum dma_data_direction dir,
> > unsigned long attrs, bool flush)
> > {
> > dma_addr_t dma_addr;
> >
> > if (is_swiotlb_force_bounce(dev)) {
> > if (attrs & (DMA_ATTR_MMIO | DMA_ATTR_REQUIRE_COHERENT))
> > return DMA_MAPPING_ERROR;
> >
> > return swiotlb_map(dev, phys, size, dir, attrs);
> > }
> >
> > if (attrs & DMA_ATTR_MMIO) {
> > dma_addr = phys;
> > if (unlikely(!dma_capable(dev, dma_addr, size, false, attrs)))
> > goto err_overflow;
> > goto dma_mapped;
>
> I suspect P2P is probably broken on CC because this doesn't make
> sense..
Actually, I suppose it is fully broken because it will jump to swiotlb
and then should fail.
> This should flow into the
> phys_to_dma_unencrypted/phys_to_dma_encrypted block as well AFAICT, it
> shouldn't just assign phys. Assigning phys to dma on a CC system is
> always wrong, right?
>
> It is is more like
>
> /* To be updated, callers should specify MMIO | CC_SHARED instead of
> * implying it. */
> if (attrs & DMA_ATTR_MMIO)
> attrs |= DMA_ATTR_CC_SHARED;
So no need for this if, we can go directly to marking the MMIO callers
with DMA_ATTR_CC_SHARED once this is fixed for mmio:
> if (attrs & DMA_ATTR_CC_SHARED) {
> dma_addr = phys_to_dma_unencrypted(dev, phys);
> } else {
> dma_addr = phys_to_dma_encrypted(dev, phys);
> }
Jasn
^ permalink raw reply
* Re: [PATCH v5 1/2] dma-mapping: introduce DMA_ATTR_CC_SHARED for shared memory
From: Jason Gunthorpe @ 2026-04-24 22:55 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: Jiri Pirko, dri-devel, linaro-mm-sig, iommu, linux-media,
sumit.semwal, benjamin.gaignard, Brian.Starkey, jstultz,
tjmercier, christian.koenig, m.szyprowski, robin.murphy, leon,
sean.anderson, ptesarik, catalin.marinas, suzuki.poulose,
steven.price, thomas.lendacky, john.allen, ashish.kalra,
suravee.suthikulpanit, linux-coco
In-Reply-To: <yq5aik9jcpzm.fsf@kernel.org>
On Wed, Apr 22, 2026 at 02:48:37PM +0530, Aneesh Kumar K.V wrote:
> Jason Gunthorpe <jgg@ziepe.ca> writes:
>
> > On Tue, Apr 21, 2026 at 01:53:31PM +0200, Jiri Pirko wrote:
> >> >> You reach there when is_swiotlb_force_bounce(dev) is true and
> >> >> DMA_ATTR_CC_SHARED is set. What am I missing?
> >> >>
> >> >
> >> >So a swiotlb_force_bounce will not use swiotlb bouncing if
> >> >DMA_ATTR_CC_SHARED is set ?
> >>
> >> Correct. Bouncing does not make sense in this case, as shared memory is
> >> already being mapped.
> >
> > It is a little bit mangled, there are many reasons force_swiotlb can
> > be set, but we loose them as it flows through - swiotlb_init()
> > just has a simple SWIOTLB_FORCE
> >
> > Ideally DMA_ATTR_CC_SHARED would skip swiotlb only if it is being
> > selected for CC reasons. For instance if you have the swiotlb force
> > command line parameter I would still expect it bounce shared memory.
> >
> > Arguably I think this arch flow is misdesigned, the
> > is_swiotlb_force_bounce() should not be used for CC. dma_capable() is
> > the correct API to check if the device can DMA to the presented
> > address, and it will trigger swiotlb_map() just the same without
> > creating this gap.
> >
> > Jason
>
> Something like this?
Yeah that reads pretty sanely.
> static inline dma_addr_t dma_direct_map_phys(struct device *dev,
> phys_addr_t phys, size_t size, enum dma_data_direction dir,
> unsigned long attrs, bool flush)
> {
> dma_addr_t dma_addr;
>
> if (is_swiotlb_force_bounce(dev)) {
> if (attrs & (DMA_ATTR_MMIO | DMA_ATTR_REQUIRE_COHERENT))
> return DMA_MAPPING_ERROR;
>
> return swiotlb_map(dev, phys, size, dir, attrs);
> }
>
> if (attrs & DMA_ATTR_MMIO) {
> dma_addr = phys;
> if (unlikely(!dma_capable(dev, dma_addr, size, false, attrs)))
> goto err_overflow;
> goto dma_mapped;
I suspect P2P is probably broken on CC because this doesn't make
sense..
This should flow into the
phys_to_dma_unencrypted/phys_to_dma_encrypted block as well AFAICT, it
shouldn't just assign phys. Assigning phys to dma on a CC system is
always wrong, right?
It is is more like
/* To be updated, callers should specify MMIO | CC_SHARED instead of
* implying it. */
if (attrs & DMA_ATTR_MMIO)
attrs |= DMA_ATTR_CC_SHARED;
if (attrs & DMA_ATTR_CC_SHARED) {
dma_addr = phys_to_dma_unencrypted(dev, phys);
} else {
dma_addr = phys_to_dma_encrypted(dev, phys);
}
if (!dma_capable()) {
if (attrs & (DMA_ATTR_MMIO | DMA_ATTR_REQUIRE_COHERENT)
fail
}
> and dma_capable() now does
> static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size,
> bool is_ram, unsigned long attrs)
> {
> ....
>
> /*
> * if phys addr attribute is encrypted but the
> * device is forcing an encrypted dma addr
> */
> if (!(attrs & DMA_ATTR_CC_SHARED) && force_dma_unencrypted(dev))
> return false;
Yeah
And with the above little edits it works for MMIO now too.
Jason
^ permalink raw reply
* Re: [PATCH v2 10/31] x86/virt/tdx: Add extra memory to TDX Module for Extensions
From: Xu Yilun @ 2026-04-24 10:41 UTC (permalink / raw)
To: Huang, Kai
Cc: kvm@vger.kernel.org, linux-coco@lists.linux.dev, Li, Xiaoyao,
dave.hansen@linux.intel.com, baolu.lu@linux.intel.com,
linux-kernel@vger.kernel.org, kas@kernel.org, Xu, Yilun,
Verma, Vishal L, Jiang, Dave, Duan, Zhenzhong, Gao, Chao,
Edgecombe, Rick P, linux-pci@vger.kernel.org, x86@kernel.org,
dan.j.williams@intel.com
In-Reply-To: <668a903ba2dccff2d641ac15b74deacc95ef19a6.camel@intel.com>
On Fri, Apr 24, 2026 at 08:09:16AM +0000, Huang, Kai wrote:
> On Fri, 2026-04-24 at 11:07 +0800, Xu Yilun wrote:
> > On Thu, Apr 23, 2026 at 10:29:31PM +0000, Huang, Kai wrote:
> > > On Thu, 2026-04-23 at 17:05 +0000, Edgecombe, Rick P wrote:
> > > > On Thu, 2026-04-23 at 00:59 +0000, Huang, Kai wrote:
> > > > > Ditto here. I don't think we should introduce any more cond_resched().
> > > > >
> > > > > Btw, I think technically we can reuse the seamcall_ir_resched() you introduced
> > > > > later, albeit in which a local '_args' is used as a copy of the original 'args',
> > > > > but that has no harm for the case where we can just use the 'args' to loop.
> > > > >
> > > > > I am wondering whether we can just use that here, or we just get rid of that
> > > > > helper (then open retry by the callers of these SEAMCALL wrappers), since there
> > > > > will be more cases where we need to manually set 'resume=1' in the SEAMCALL
> > > > > input 'args' when retrying TDX_INTERRUPTED_RESUMABLE.
> > > >
> > > > I kind of like the latter option to open code more of this stuff. The stacks of
> > > > seamcall wrapper macros is already too much.
> > >
> > > Agreed.
> > >
> > > And SEAMCALL *users* can actually come up with their own version of wrapper(s)
> > > to do the retry. E.g., currently seamcall_ir_resched() is only used for IOMMU
> > > SEAMCALLs, and we can put this wrapper in the IOMMU code or coco/tdx-host.
> >
> > After we have introduced TDX Module Extension, irq preemptable
> > EXT-SEAMCALLs become a common concept.
> >
>
> It has been a concept since before the EXT-SEAMCALLs actually. For instance,
No, they look similar but different. EXT-SEAMCALLs are truly irq
preemptable and resumable to its context. Other SEAMCALLs just
periodically yield and don't have a generic way to save/resume their
context. Sometime you need to pass in resume flag on 2nd time, which
means the secure world forget where they were and can't really resume
all by itself.
What I mean is, EXT-SEAMCALLs should never need to play tricks on
input parameters. Just input what is originally inputted, the secure
world doesn't need hint to resume itself. So the int-retry process
should be common.
> TDX live migration using blocking export doesn't need any opt-in via module
> extension (only the non-blocking way needs), but the SEAMCALLs to export/import
> TD/vCPU/memory are all interruptible.
>
> In fact, they had the "latency requirement" behind the INTERRUPT_RESUMABLE in
> mind at the very beginning. It's just at that point all SEAMCALLs were not that
> heavy.
>
> > It is irq preemptable so that the
> > secure world remembers and resumes the context, no need for host to
> > remind via resume lag.
>
> The fact is the aforementioned live migration related export/import SEAMCALLs
> (there are 8 at least, but maybe more) all requires the explicit setting of
> 'resume=1' (plus using the SEAMCALL output as input for retry). I don't know
Yes, so they are not truly interrupt resumable and should be specially
treated.
> the story behind this, though. There might be some tricky thing here for the
> module to remember and manage (e.g., migration has a concept of "migration
> stream", and the resume is per-stream).
>
> >
> > Today there are 3 EXT-SEAMCALLs, TDH_SPDM_CONNECT/DISCONNECT/MNG,
> > irq preemption handling is a general requirement for them, and I think
> > it is still true for any further EXT-SEAMCALLs.
> >
> > So I think a general helper for EXT-SEAMCALLs makes sense.
>
> Yes conceptually I agree, but not need to distinguish EXT-SEAMCALLs or not IMHO.
>
> The problem is there isn't a common rule to follow.
>
> E.g., let's say "the module can remember thus no resume flag is needed", how
> about the SEAMCALL inputs? Can the "output" args be directly used as input for
> retry, or the original input should always be used?
Since EXT-SEAMCALLs don't depend on input tricks to resume, there could
be a common rule, now it is defined as "the original input should always
be used".
>
> Not to mention there's existing SEAMCALLs which require explicitly setting
> 'resume=1'.
>
> I believe we can use some smart hack to implement a common one to cover all
> cases above, but I am not sure whether it's worth to do (maybe we can have a try
> to see how does it look like, though, I think).
>
> Given the SEAMCALLs for TDX Connect seem to follow one rule to retry, and live
> migration SEAMCALLs follow another rule, it seems for now the simplest way is to
> introduce the needed retry helper in the layer of SEAMCALL *user* (TDX Connect
> and migration).
>
> > TDH.IOMMU.SETUP, however, is another case. It is not a EXT-SEAMCALL but
> > happened to follow the same irq-retry handling process. To avoid code
> > duplication we have:
> >
> > /*
> > * seamcall_ret_ir_exec() aliases seamcall_ret_ir_resched() for
> > * documentation purposes. It documents the TDX Module extension
> > * seamcalls that are long running / hard-irq preemptible flows that
> > * generate events. The calls using seamcall_ret_ir_resched() are long
> > * running flows, that periodically yield.
> > */
> > #define seamcall_ret_ir_exec seamcall_ret_ir_resched
> >
> > TDH.IOMMU.SETUP uses seamcall_ret_ir_resched(), and EXT-SEAMCALLs use
> > seamcall_ret_ir_exec().
> >
> > How do you think?
>
> Sorry I don't quite get. What does "exec" postfix mean?
It is 'execution', means EXT-SEAMCALLs can resume their execution. But
since you have concern, maybe some better name?
>
> From patch 25, they are all in TDX core, so I don't quite get why we need to
> distinguish EXT-SEAMCALLs vs normal ones. IMHO it's an additional layer which
EXT-SEAMCALLs have generic way to resume, while others don't. So we need
a helper for EXT-SEAMCALLs. For other SEAMCALLs that happens to process
the same way, we are avoiding code duplication, but should clearly
distinguish the purpose so make another name as documentation.
But if any concern, we could delete the int-retry support for normal
SEAMCALLs, they are not generic as you said.
> doesn't actually help address any problem.
>
> Btw, we should really get rid of the "resched()" postfix from the function name
> since cond_resched() is no longer needed and possibility of rescheduling is
> implied pretty much all places in the kernel code now (except some special code
> such as code in IRQ context).
Yes, thanks to remind me again.
^ permalink raw reply
* Re: [PATCH v2 08/19] PCI/TSM: Add "evidence" support
From: Aneesh Kumar K.V @ 2026-04-24 10:15 UTC (permalink / raw)
To: Dan Williams, Lukas Wunner, Dan Williams
Cc: linux-coco, linux-pci, gregkh, aik, yilun.xu, bhelgaas,
alistair23, jgg, Donald Hunter, Jakub Kicinski
In-Reply-To: <69ba57199b38b_7ee31005c@dwillia2-mobl4.notmuch>
Dan Williams <dan.j.williams@intel.com> writes:
> The ARM CCA spec does reference the EAT nonce which is 64-bytes. So it
> may be the case that PCI_TSM_MAX_NONCE_SIZE != SPDM_NONCE_SZ depending
> on what evidence can be collected over this interface, but I am not
> finding any spec references for 256, Aneesh?
That was probably me getting confused about 256 bits in an earlier
version of the code. Also, the RmiVdevMeasureParams structure layout
adds 256 bytes of padding between the different fields.
-aneesh
^ permalink raw reply
* Re: [PATCH v2 10/31] x86/virt/tdx: Add extra memory to TDX Module for Extensions
From: Huang, Kai @ 2026-04-24 9:10 UTC (permalink / raw)
To: yilun.xu@linux.intel.com
Cc: kvm@vger.kernel.org, Li, Xiaoyao, linux-coco@lists.linux.dev,
dave.hansen@linux.intel.com, baolu.lu@linux.intel.com,
kas@kernel.org, linux-kernel@vger.kernel.org, Xu, Yilun,
Verma, Vishal L, Jiang, Dave, Duan, Zhenzhong, Gao, Chao,
Edgecombe, Rick P, linux-pci@vger.kernel.org,
dan.j.williams@intel.com, x86@kernel.org
In-Reply-To: <668a903ba2dccff2d641ac15b74deacc95ef19a6.camel@intel.com>
On Fri, 2026-04-24 at 08:09 +0000, Huang, Kai wrote:
> I believe we can use some smart hack to implement a common one to cover all
> cases above, but I am not sure whether it's worth to do (maybe we can have a try
> to see how does it look like, though, I think).
So the problem is use what as input when retry, and one way is to provide a
callback to allow the user to provide a specific function to update the 'args'
before retrying.
Something like below? I am not sure I like it though, because as Rick said
there's too much SEAMCALL wrapper/macros already.
typedef void (*args_update_func_t)(struct tdx_module_args *args,
struct tdx_module_args *ori);
static __always_inline u64 sc_retry_intr(sc_func_t func, u64 fn,
struct tdx_module_args *args,
args_update_func_t update_args)
{
struct tdx_module_args _args = *args;
u64 ret;
do {
ret = sc_retry(func, fn, &_args);
if (ret != TDX_INTERRUPT_RESUMABLE)
break;
update_args(&_args, args);
} while (1);
*args = _args;
return ret;
}
#define seamcall_ret_intr(_fn, _args, _args_update_f) \
sc_retry_intr(__seamcall_ret, (_fn), (_args), (_args_update_f))
^ permalink raw reply
* Re: [PATCH v2 10/31] x86/virt/tdx: Add extra memory to TDX Module for Extensions
From: Huang, Kai @ 2026-04-24 8:09 UTC (permalink / raw)
To: yilun.xu@linux.intel.com
Cc: kvm@vger.kernel.org, linux-coco@lists.linux.dev, Li, Xiaoyao,
dave.hansen@linux.intel.com, baolu.lu@linux.intel.com,
linux-kernel@vger.kernel.org, kas@kernel.org, Xu, Yilun,
Verma, Vishal L, Jiang, Dave, Duan, Zhenzhong, Gao, Chao,
Edgecombe, Rick P, linux-pci@vger.kernel.org, x86@kernel.org,
dan.j.williams@intel.com
In-Reply-To: <aereVy8zVb2be2hh@yilunxu-OptiPlex-7050>
On Fri, 2026-04-24 at 11:07 +0800, Xu Yilun wrote:
> On Thu, Apr 23, 2026 at 10:29:31PM +0000, Huang, Kai wrote:
> > On Thu, 2026-04-23 at 17:05 +0000, Edgecombe, Rick P wrote:
> > > On Thu, 2026-04-23 at 00:59 +0000, Huang, Kai wrote:
> > > > Ditto here. I don't think we should introduce any more cond_resched().
> > > >
> > > > Btw, I think technically we can reuse the seamcall_ir_resched() you introduced
> > > > later, albeit in which a local '_args' is used as a copy of the original 'args',
> > > > but that has no harm for the case where we can just use the 'args' to loop.
> > > >
> > > > I am wondering whether we can just use that here, or we just get rid of that
> > > > helper (then open retry by the callers of these SEAMCALL wrappers), since there
> > > > will be more cases where we need to manually set 'resume=1' in the SEAMCALL
> > > > input 'args' when retrying TDX_INTERRUPTED_RESUMABLE.
> > >
> > > I kind of like the latter option to open code more of this stuff. The stacks of
> > > seamcall wrapper macros is already too much.
> >
> > Agreed.
> >
> > And SEAMCALL *users* can actually come up with their own version of wrapper(s)
> > to do the retry. E.g., currently seamcall_ir_resched() is only used for IOMMU
> > SEAMCALLs, and we can put this wrapper in the IOMMU code or coco/tdx-host.
>
> After we have introduced TDX Module Extension, irq preemptable
> EXT-SEAMCALLs become a common concept.
>
It has been a concept since before the EXT-SEAMCALLs actually. For instance,
TDX live migration using blocking export doesn't need any opt-in via module
extension (only the non-blocking way needs), but the SEAMCALLs to export/import
TD/vCPU/memory are all interruptible.
In fact, they had the "latency requirement" behind the INTERRUPT_RESUMABLE in
mind at the very beginning. It's just at that point all SEAMCALLs were not that
heavy.
> It is irq preemptable so that the
> secure world remembers and resumes the context, no need for host to
> remind via resume lag.
The fact is the aforementioned live migration related export/import SEAMCALLs
(there are 8 at least, but maybe more) all requires the explicit setting of
'resume=1' (plus using the SEAMCALL output as input for retry). I don't know
the story behind this, though. There might be some tricky thing here for the
module to remember and manage (e.g., migration has a concept of "migration
stream", and the resume is per-stream).
>
> Today there are 3 EXT-SEAMCALLs, TDH_SPDM_CONNECT/DISCONNECT/MNG,
> irq preemption handling is a general requirement for them, and I think
> it is still true for any further EXT-SEAMCALLs.
>
> So I think a general helper for EXT-SEAMCALLs makes sense.
Yes conceptually I agree, but not need to distinguish EXT-SEAMCALLs or not IMHO.
The problem is there isn't a common rule to follow.
E.g., let's say "the module can remember thus no resume flag is needed", how
about the SEAMCALL inputs? Can the "output" args be directly used as input for
retry, or the original input should always be used?
Not to mention there's existing SEAMCALLs which require explicitly setting
'resume=1'.
I believe we can use some smart hack to implement a common one to cover all
cases above, but I am not sure whether it's worth to do (maybe we can have a try
to see how does it look like, though, I think).
Given the SEAMCALLs for TDX Connect seem to follow one rule to retry, and live
migration SEAMCALLs follow another rule, it seems for now the simplest way is to
introduce the needed retry helper in the layer of SEAMCALL *user* (TDX Connect
and migration).
> TDH.IOMMU.SETUP, however, is another case. It is not a EXT-SEAMCALL but
> happened to follow the same irq-retry handling process. To avoid code
> duplication we have:
>
> /*
> * seamcall_ret_ir_exec() aliases seamcall_ret_ir_resched() for
> * documentation purposes. It documents the TDX Module extension
> * seamcalls that are long running / hard-irq preemptible flows that
> * generate events. The calls using seamcall_ret_ir_resched() are long
> * running flows, that periodically yield.
> */
> #define seamcall_ret_ir_exec seamcall_ret_ir_resched
>
> TDH.IOMMU.SETUP uses seamcall_ret_ir_resched(), and EXT-SEAMCALLs use
> seamcall_ret_ir_exec().
>
> How do you think?
Sorry I don't quite get. What does "exec" postfix mean?
From patch 25, they are all in TDX core, so I don't quite get why we need to
distinguish EXT-SEAMCALLs vs normal ones. IMHO it's an additional layer which
doesn't actually help address any problem.
Btw, we should really get rid of the "resched()" postfix from the function name
since cond_resched() is no longer needed and possibility of rescheduling is
implied pretty much all places in the kernel code now (except some special code
such as code in IRQ context).
^ permalink raw reply
* RE: [PATCH v2 30/31] coco/tdx-host: Implement IDE stream setup/teardown
From: Tian, Kevin @ 2026-04-24 7:05 UTC (permalink / raw)
To: Xu Yilun
Cc: linux-coco@lists.linux.dev, linux-pci@vger.kernel.org,
Williams, Dan J, x86@kernel.org, Gao, Chao, Jiang, Dave,
baolu.lu@linux.intel.com, Xu, Yilun, Duan, Zhenzhong,
kvm@vger.kernel.org, Edgecombe, Rick P,
dave.hansen@linux.intel.com, kas@kernel.org, Li, Xiaoyao,
Verma, Vishal L, linux-kernel@vger.kernel.org
In-Reply-To: <aeibi7JzfZ2Kx0NU@yilunxu-OptiPlex-7050>
> From: Xu Yilun <yilun.xu@linux.intel.com>
> Sent: Wednesday, April 22, 2026 5:58 PM
>
> On Thu, Apr 09, 2026 at 08:02:33AM +0000, Tian, Kevin wrote:
> > > From: Xu Yilun <yilun.xu@linux.intel.com>
> > > Sent: Saturday, March 28, 2026 12:02 AM
> > >
> > > Implementation for a most straightforward Selective IDE stream setup.
> > > Hard code all parameters for Stream Control Register. And no IDE Key
> > > Refresh support.
> > >
> >
> > 'more straightforward', compared to what?
a typo.
>
> Actually it is " *most* straightforward", I just mean "very".
When you say "most straightforward", then I want to know what are
other options to compare. If you think that the thought practice
leading to the 'most' definition is important, then please elaborate.
otherwise I'd just remove that sentence.
^ permalink raw reply
* RE: [PATCH v2 24/31] coco/tdx-host: Add a helper to exchange SPDM messages through DOE
From: Tian, Kevin @ 2026-04-24 7:01 UTC (permalink / raw)
To: Xu Yilun
Cc: linux-coco@lists.linux.dev, linux-pci@vger.kernel.org,
Williams, Dan J, x86@kernel.org, Gao, Chao, Jiang, Dave,
baolu.lu@linux.intel.com, Xu, Yilun, Duan, Zhenzhong,
kvm@vger.kernel.org, Edgecombe, Rick P,
dave.hansen@linux.intel.com, kas@kernel.org, Li, Xiaoyao,
Verma, Vishal L, linux-kernel@vger.kernel.org
In-Reply-To: <aeiXuludvkZedOg5@yilunxu-OptiPlex-7050>
> From: Xu Yilun <yilun.xu@linux.intel.com>
> Sent: Wednesday, April 22, 2026 5:41 PM
>
> On Thu, Apr 09, 2026 at 07:56:06AM +0000, Tian, Kevin wrote:
> > > From: Xu Yilun <yilun.xu@linux.intel.com>
> > > Sent: Saturday, March 28, 2026 12:01 AM
> > > +
> > > +static int __maybe_unused tdx_spdm_msg_exchange(struct
> tdx_tsm_link
> > > *tlink,
> > > + void *request, size_t
> > > request_sz,
> > > + void *response, size_t
> > > response_sz)
> > > +{
> > > + struct pci_dev *pdev = tlink->pci.base_tsm.pdev;
> >
> > call it pci_spdm_msg_exchange() and pass in struct pci_dev directly.
>
> I don't think so. There is kernel managed spdm transfer support WIP,
> which is another topic. We don't want to mix the namespace with that
> one.
pci_spdm_raw_msg_exchange() then, since you said currently only
one user i.e. tdx?
If the kernel managed spdm doesn't support the raw format, then there
won't be conflict.
if it supports (i.e. the 2nd user), then this should be moved to pci core.
>
> And we also don't name it tsm_spdm_msg_exchange, cause TSM firmwares
> output different blobs for vendor TSM drivers to transfer. E.g. TDX
> Module outputs buffers with DOE header & SPDM header, other vendors
> (AMD IIRC) outputs buffers with only SPDM header. So this function is
> TDX specific.
>
TDX is just an user of that. All the logic here is about handling the
raw format, nothing specific to tdx.
^ permalink raw reply
* RE: [PATCH v2 20/31] x86/virt/tdx: Add a helper to loop on TDX_INTERRUPTED_RESUMABLE
From: Tian, Kevin @ 2026-04-24 6:57 UTC (permalink / raw)
To: Xu Yilun
Cc: linux-coco@lists.linux.dev, linux-pci@vger.kernel.org,
Williams, Dan J, x86@kernel.org, Gao, Chao, Jiang, Dave,
baolu.lu@linux.intel.com, Xu, Yilun, Duan, Zhenzhong,
kvm@vger.kernel.org, Edgecombe, Rick P,
dave.hansen@linux.intel.com, kas@kernel.org, Li, Xiaoyao,
Verma, Vishal L, linux-kernel@vger.kernel.org
In-Reply-To: <aehk/2DEizc1eG+e@yilunxu-OptiPlex-7050>
> From: Xu Yilun <yilun.xu@linux.intel.com>
> Sent: Wednesday, April 22, 2026 2:05 PM
>
> On Thu, Apr 09, 2026 at 07:21:48AM +0000, Tian, Kevin wrote:
> > > From: Xu Yilun <yilun.xu@linux.intel.com>
> > > Sent: Saturday, March 28, 2026 12:01 AM
> > >
> > > +static u64 __maybe_unused __seamcall_ir_resched(sc_func_t sc_func,
> u64
> > > fn,
> > > + struct tdx_module_args *args)
> > > +{
> >
> > 'ir' sounds redundant with the trailing 'resched'?
>
> Mm.. I want to 'ir' to reflect the loop-retry is dedicated for
> INTERRUPTED_RESUMABLE in TDX context. When you say not big deal, I
> assume I can keep the naming?
>
that's ok
^ permalink raw reply
* RE: [PATCH v2 23/31] coco/tdx-host: Setup all trusted IOMMUs on TDX Connect init
From: Tian, Kevin @ 2026-04-24 6:54 UTC (permalink / raw)
To: Xu Yilun
Cc: linux-coco@lists.linux.dev, linux-pci@vger.kernel.org,
Williams, Dan J, x86@kernel.org, Gao, Chao, Jiang, Dave,
baolu.lu@linux.intel.com, Xu, Yilun, Duan, Zhenzhong,
kvm@vger.kernel.org, Edgecombe, Rick P,
dave.hansen@linux.intel.com, kas@kernel.org, Li, Xiaoyao,
Verma, Vishal L, linux-kernel@vger.kernel.org
In-Reply-To: <aeiUartvxzsvq2Ye@yilunxu-OptiPlex-7050>
> From: Xu Yilun <yilun.xu@linux.intel.com>
> Sent: Wednesday, April 22, 2026 5:27 PM
>
> On Thu, Apr 09, 2026 at 07:51:56AM +0000, Tian, Kevin wrote:
> > > From: Xu Yilun <yilun.xu@linux.intel.com>
> > > Sent: Saturday, March 28, 2026 12:01 AM
> > >
> > > Setup all trusted IOMMUs on TDX Connect initialization and clear all on
> > > TDX Connect removal.
> > >
> > > Trusted IOMMU setup is the pre-condition for all following TDX Connect
> > > operations such as SPDM/IDE setup. It is more of a platform
> > > configuration than a standalone IOMMU configuration, so put the
> > > implementation in tdx-host driver.
> > >
> >
> > not sure what above tries to tell. why is it a platform configuration
> > when you have seamcalls on each IOMMU?
>
> This is to say the TDH.IOMMU.SETUP relates to PCIe SPDM/IDE, it is not
> just about IOMMU. By identifying the
>
> for_each_iommu(iommu)
> tdh.iommu.setup(iommu)
>
> as a platform configuration, it justifies why we trigger this
> configuration at tdx-host driver probe, rather than in some
> IOMMU/IOMMUFD API.
iommu drivers also involve PCI, e.g. call pci_enable_ats(), etc.
so having relation to PCIe SPDM/IDE is not an argument of
platform vs. IOMMU.
Actually I'm OK to put that logic in tdx-host. Just the explanation
here doesn't make much sense...
^ permalink raw reply
* RE: [PATCH v2 22/31] iommu/vt-d: Export a helper to do function for each dmar_drhd_unit
From: Tian, Kevin @ 2026-04-24 6:50 UTC (permalink / raw)
To: Xu Yilun
Cc: linux-coco@lists.linux.dev, linux-pci@vger.kernel.org,
Williams, Dan J, x86@kernel.org, Gao, Chao, Jiang, Dave,
baolu.lu@linux.intel.com, Xu, Yilun, Duan, Zhenzhong,
kvm@vger.kernel.org, Edgecombe, Rick P,
dave.hansen@linux.intel.com, kas@kernel.org, Li, Xiaoyao,
Verma, Vishal L, linux-kernel@vger.kernel.org
In-Reply-To: <aehrup2/IWdq62mm@yilunxu-OptiPlex-7050>
> From: Xu Yilun <yilun.xu@linux.intel.com>
> Sent: Wednesday, April 22, 2026 2:34 PM
>
> On Thu, Apr 09, 2026 at 07:49:46AM +0000, Tian, Kevin wrote:
> > > From: Xu Yilun <yilun.xu@linux.intel.com>
> > > Sent: Saturday, March 28, 2026 12:01 AM
> > >
> > > @@ -86,6 +86,8 @@ extern struct list_head dmar_drhd_units;
> > > dmar_rcu_check()) \
> > > if (i=drhd->iommu, 0) {} else
> > >
> > > +int do_for_each_drhd_unit(int (*fn)(struct dmar_drhd_unit *));
> > > +
> > > static inline bool dmar_rcu_check(void)
> >
> > It's a bit weird to insert it here. Move it to follow for_each_iommu().
>
> Sorry, it is following for_each_iommu(), is it?
>
yeah, I misread it.
^ permalink raw reply
* RE: [PATCH v2 19/31] iommu/vt-d: Reserve the MSB domain ID bit for the TDX module
From: Tian, Kevin @ 2026-04-24 6:49 UTC (permalink / raw)
To: Xu Yilun
Cc: linux-coco@lists.linux.dev, linux-pci@vger.kernel.org,
Williams, Dan J, x86@kernel.org, Gao, Chao, Jiang, Dave,
baolu.lu@linux.intel.com, Xu, Yilun, Duan, Zhenzhong,
kvm@vger.kernel.org, Edgecombe, Rick P,
dave.hansen@linux.intel.com, kas@kernel.org, Li, Xiaoyao,
Verma, Vishal L, linux-kernel@vger.kernel.org
In-Reply-To: <aehkFicutMwa2LP+@yilunxu-OptiPlex-7050>
> From: Xu Yilun <yilun.xu@linux.intel.com>
> Sent: Wednesday, April 22, 2026 2:01 PM
>
> > Here we need more words to explain the strategy here.
> >
> > The comment says "When IOMMU is *enabled*...", but the code here
> > just checks the static capability. It's probably a design choice that you
> > don't want to add complexity on recycling DIDs when TDX connect
> > is actually enabled, but it's worth a note here.
>
> Yes, that's the rationale. I'll add it to comments.
btw halving the DID space permanently on any platforms supporting
TDX connect doesn't sound a good design. It may break usages which
already uses more than 50% of the DID space but have no business
to do with TDX connect.
It makes more sense to cut it down in-fly when tdx connect is initialized.
If the higher half DIDs have been used then fail TDX connect. otherwise
adjust the max domain id.
>
> >
> > btw in patch23 commit msg:
> >
> > "
> > There is no dedicated way to enumerate which IOMMU devices support
> > trusted operations. The host has to call TDH.IOMMU.SETUP on all IOMMU
> > devices and tell their trusted capability by the return value.
> > "
> >
> > which implies that ecap_tdxc() alone doesn't really report the capability?
>
> Ah, good catch. Let me explain:
>
> ecap_tdxc does report the capability. This bit is special cause both
> trusted part & untrusted part access it.
>
> For IOMMU driver (which now handles the untrusted part), it can directly
> query to this bit and decide what to do.
>
> But for tdx-host driver which handles the trusted part, it shouldn't
> speculate into the IOMMU for capability enumeration. TDX Module has more
> concerns about trusted capability, including the related I/O stack
I guess "more concerns" means that there are more conditions for
TDX module to look at beyond ecap_tdxc(), so it's not appropriate
for tdx-host driver to check ecap alone.
> capabilities e.g. SPDM/IDE cap... So in patch23 I actually mean we
> don't have an enumeration SEAMCALL for trusted capability, I will
> refactor that message:
>
> There is no dedicated *SEAMCALL* to enumerate which IOMMU devices
> support
> trusted operations...
>
> >
> > anyway all of those need a better explanation here...
^ permalink raw reply
* Re: [PATCH v2 10/31] x86/virt/tdx: Add extra memory to TDX Module for Extensions
From: Xu Yilun @ 2026-04-24 3:07 UTC (permalink / raw)
To: Huang, Kai
Cc: dan.j.williams@intel.com, linux-pci@vger.kernel.org,
linux-coco@lists.linux.dev, Edgecombe, Rick P, x86@kernel.org,
Gao, Chao, Xu, Yilun, dave.hansen@linux.intel.com,
baolu.lu@linux.intel.com, kas@kernel.org, Li, Xiaoyao,
linux-kernel@vger.kernel.org, Verma, Vishal L, Jiang, Dave,
kvm@vger.kernel.org, Duan, Zhenzhong
In-Reply-To: <48171f4ab772da546beccec3c7a95fe442524b42.camel@intel.com>
On Thu, Apr 23, 2026 at 10:29:31PM +0000, Huang, Kai wrote:
> On Thu, 2026-04-23 at 17:05 +0000, Edgecombe, Rick P wrote:
> > On Thu, 2026-04-23 at 00:59 +0000, Huang, Kai wrote:
> > > Ditto here. I don't think we should introduce any more cond_resched().
> > >
> > > Btw, I think technically we can reuse the seamcall_ir_resched() you introduced
> > > later, albeit in which a local '_args' is used as a copy of the original 'args',
> > > but that has no harm for the case where we can just use the 'args' to loop.
> > >
> > > I am wondering whether we can just use that here, or we just get rid of that
> > > helper (then open retry by the callers of these SEAMCALL wrappers), since there
> > > will be more cases where we need to manually set 'resume=1' in the SEAMCALL
> > > input 'args' when retrying TDX_INTERRUPTED_RESUMABLE.
> >
> > I kind of like the latter option to open code more of this stuff. The stacks of
> > seamcall wrapper macros is already too much.
>
> Agreed.
>
> And SEAMCALL *users* can actually come up with their own version of wrapper(s)
> to do the retry. E.g., currently seamcall_ir_resched() is only used for IOMMU
> SEAMCALLs, and we can put this wrapper in the IOMMU code or coco/tdx-host.
After we have introduced TDX Module Extension, irq preemptable
EXT-SEAMCALLs become a common concept. It is irq preemptable so that the
secure world remembers and resumes the context, no need for host to
remind via resume lag.
Today there are 3 EXT-SEAMCALLs, TDH_SPDM_CONNECT/DISCONNECT/MNG,
irq preemption handling is a general requirement for them, and I think
it is still true for any further EXT-SEAMCALLs.
So I think a general helper for EXT-SEAMCALLs makes sense.
TDH.IOMMU.SETUP, however, is another case. It is not a EXT-SEAMCALL but
happened to follow the same irq-retry handling process. To avoid code
duplication we have:
/*
* seamcall_ret_ir_exec() aliases seamcall_ret_ir_resched() for
* documentation purposes. It documents the TDX Module extension
* seamcalls that are long running / hard-irq preemptible flows that
* generate events. The calls using seamcall_ret_ir_resched() are long
* running flows, that periodically yield.
*/
#define seamcall_ret_ir_exec seamcall_ret_ir_resched
TDH.IOMMU.SETUP uses seamcall_ret_ir_resched(), and EXT-SEAMCALLs use
seamcall_ret_ir_exec().
How do you think?
^ permalink raw reply
* Re: [PATCH v2 10/31] x86/virt/tdx: Add extra memory to TDX Module for Extensions
From: Huang, Kai @ 2026-04-23 22:29 UTC (permalink / raw)
To: dan.j.williams@intel.com, linux-pci@vger.kernel.org,
linux-coco@lists.linux.dev, Edgecombe, Rick P,
yilun.xu@linux.intel.com, x86@kernel.org
Cc: Gao, Chao, Xu, Yilun, dave.hansen@linux.intel.com,
baolu.lu@linux.intel.com, kas@kernel.org, Li, Xiaoyao,
linux-kernel@vger.kernel.org, Verma, Vishal L, Jiang, Dave,
kvm@vger.kernel.org, Duan, Zhenzhong
In-Reply-To: <3b0bc0dea544b10bd2fb0304a96d2671f263829b.camel@intel.com>
On Thu, 2026-04-23 at 17:05 +0000, Edgecombe, Rick P wrote:
> On Thu, 2026-04-23 at 00:59 +0000, Huang, Kai wrote:
> > Ditto here. I don't think we should introduce any more cond_resched().
> >
> > Btw, I think technically we can reuse the seamcall_ir_resched() you introduced
> > later, albeit in which a local '_args' is used as a copy of the original 'args',
> > but that has no harm for the case where we can just use the 'args' to loop.
> >
> > I am wondering whether we can just use that here, or we just get rid of that
> > helper (then open retry by the callers of these SEAMCALL wrappers), since there
> > will be more cases where we need to manually set 'resume=1' in the SEAMCALL
> > input 'args' when retrying TDX_INTERRUPTED_RESUMABLE.
>
> I kind of like the latter option to open code more of this stuff. The stacks of
> seamcall wrapper macros is already too much.
Agreed.
And SEAMCALL *users* can actually come up with their own version of wrapper(s)
to do the retry. E.g., currently seamcall_ir_resched() is only used for IOMMU
SEAMCALLs, and we can put this wrapper in the IOMMU code or coco/tdx-host.
^ permalink raw reply
* Re: [PATCH v2 10/31] x86/virt/tdx: Add extra memory to TDX Module for Extensions
From: Huang, Kai @ 2026-04-23 21:55 UTC (permalink / raw)
To: yilun.xu@linux.intel.com
Cc: kvm@vger.kernel.org, linux-coco@lists.linux.dev, Li, Xiaoyao,
dave.hansen@linux.intel.com, baolu.lu@linux.intel.com,
kas@kernel.org, linux-kernel@vger.kernel.org, Xu, Yilun,
Jiang, Dave, Verma, Vishal L, Duan, Zhenzhong, Gao, Chao,
Edgecombe, Rick P, linux-pci@vger.kernel.org, x86@kernel.org,
dan.j.williams@intel.com
In-Reply-To: <aepLsi/cHnw3Cfgs@yilunxu-OptiPlex-7050>
>
> >
> > Btw, I think technically we can reuse the seamcall_ir_resched() you introduced
> > later, albeit in which a local '_args' is used as a copy of the original 'args',
> > but that has no harm for the case where we can just use the 'args' to loop.
>
> No we can't. TDH_EXT_MEM_ADD is designed to use output parameter RCX
> to override/update input parameter RCX, so the caller doesn't have to
> do manual parameter update on retry call. Using seamcall_ir_resched()
> makes each retry use the original RCX, not the updated one.
OK I wish there's a comment saying there's additional output besides error code
via RAX and we just need to feed the output as input again when retrying the
SEAMCALL.
>
> >
> > I am wondering whether we can just use that here, or we just get rid of that
> > helper (then open retry by the callers of these SEAMCALL wrappers), since there
> > will be more cases where we need to manually set 'resume=1' in the SEAMCALL
> > input 'args' when retrying TDX_INTERRUPTED_RESUMABLE.
>
> I'd like to know why some SEAMCALLs needs resume flag but others don't.
> If there is chance we don't introduce too much variants for the same thing,
> that's most friendly to OS. And "no resume flag" is my best preference.
I don't know either.
>
> For now, I can see only one SEAMCALL with resume flag in mainline,
> tdh_phymem_cache_wb(). I'd rather we treat it as an exception and no
> resume flag any more if possible.
Right, but there will be more, and setting 'resumed=1' is even different from
how tdh_phymem_cache_wb() does.
^ permalink raw reply
* Re: [PATCH v2 10/31] x86/virt/tdx: Add extra memory to TDX Module for Extensions
From: Edgecombe, Rick P @ 2026-04-23 17:05 UTC (permalink / raw)
To: dan.j.williams@intel.com, linux-pci@vger.kernel.org,
linux-coco@lists.linux.dev, Huang, Kai, yilun.xu@linux.intel.com,
x86@kernel.org
Cc: Gao, Chao, Xu, Yilun, Jiang, Dave, baolu.lu@linux.intel.com,
kas@kernel.org, dave.hansen@linux.intel.com, Li, Xiaoyao,
Verma, Vishal L, Duan, Zhenzhong, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org
In-Reply-To: <89bccca9a409e02667278fd83f0b7b9064557dab.camel@intel.com>
On Thu, 2026-04-23 at 00:59 +0000, Huang, Kai wrote:
> Ditto here. I don't think we should introduce any more cond_resched().
>
> Btw, I think technically we can reuse the seamcall_ir_resched() you introduced
> later, albeit in which a local '_args' is used as a copy of the original 'args',
> but that has no harm for the case where we can just use the 'args' to loop.
>
> I am wondering whether we can just use that here, or we just get rid of that
> helper (then open retry by the callers of these SEAMCALL wrappers), since there
> will be more cases where we need to manually set 'resume=1' in the SEAMCALL
> input 'args' when retrying TDX_INTERRUPTED_RESUMABLE.
I kind of like the latter option to open code more of this stuff. The stacks of
seamcall wrapper macros is already too much.
^ permalink raw reply
* Re: [PATCH v2 10/31] x86/virt/tdx: Add extra memory to TDX Module for Extensions
From: Xu Yilun @ 2026-04-23 16:41 UTC (permalink / raw)
To: Huang, Kai
Cc: dan.j.williams@intel.com, linux-pci@vger.kernel.org,
linux-coco@lists.linux.dev, x86@kernel.org, Gao, Chao,
Edgecombe, Rick P, Xu, Yilun, Jiang, Dave,
dave.hansen@linux.intel.com, baolu.lu@linux.intel.com,
Duan, Zhenzhong, kas@kernel.org, Verma, Vishal L, Li, Xiaoyao,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <89bccca9a409e02667278fd83f0b7b9064557dab.camel@intel.com>
On Thu, Apr 23, 2026 at 12:59:55AM +0000, Huang, Kai wrote:
> On Sat, 2026-03-28 at 00:01 +0800, Xu Yilun wrote:
> > +static int tdx_ext_mem_add(struct tdx_page_array *ext_mem)
> > +{
> > + struct tdx_module_args args = {
> > + .rcx = hpa_list_info_assign_raw(ext_mem),
> > + };
> > + u64 r;
> > +
> > + tdx_clflush_page_array(ext_mem);
> > +
> > + do {
> > + r = seamcall_ret(TDH_EXT_MEM_ADD, &args);
> > + cond_resched();
> > + } while (r == TDX_INTERRUPTED_RESUMABLE);
>
>
> Ditto here. I don't think we should introduce any more cond_resched().
Good to me.
>
> Btw, I think technically we can reuse the seamcall_ir_resched() you introduced
> later, albeit in which a local '_args' is used as a copy of the original 'args',
> but that has no harm for the case where we can just use the 'args' to loop.
No we can't. TDH_EXT_MEM_ADD is designed to use output parameter RCX
to override/update input parameter RCX, so the caller doesn't have to
do manual parameter update on retry call. Using seamcall_ir_resched()
makes each retry use the original RCX, not the updated one.
>
> I am wondering whether we can just use that here, or we just get rid of that
> helper (then open retry by the callers of these SEAMCALL wrappers), since there
> will be more cases where we need to manually set 'resume=1' in the SEAMCALL
> input 'args' when retrying TDX_INTERRUPTED_RESUMABLE.
I'd like to know why some SEAMCALLs needs resume flag but others don't.
If there is chance we don't introduce too much variants for the same thing,
that's most friendly to OS. And "no resume flag" is my best preference.
For now, I can see only one SEAMCALL with resume flag in mainline,
tdh_phymem_cache_wb(). I'd rather we treat it as an exception and no
resume flag any more if possible.
Then we don't have to make all following efforts, they are complex...
>
> Unless you have good idea to unify them all?
>
> E.g., we have something like below in our internal KVM code, using macros to do
> 'resume=1' and retry as the caller wishes. But my understanding is Dave
> probably won't like macros. :-)
>
> (you may see broken indent/text due to text wrapper and sorry for that.)
>
> /*
> * ...
> *
> * The retry_func and update_args allow the SEAMCALL to be retried in a loop if
> * it can still return other error code when there's no race from both KVM and
> * vCPUs and can be "retried" until it succeeds.
> */
> #define tdh_do_no_vcpus_retry(tdh_func, kvm, retry_func, update_args, args...)\
> ({ \
> struct kvm_tdx *__kvm_tdx = to_kvm_tdx(kvm); \
> u64 __err; \
> \
> lockdep_assert_held_write(&kvm->mmu_lock); \
> \
> __err = retry_func(tdh_func, update_args, args); \
> if (unlikely(tdx_operand_busy(__err))) { \
> WRITE_ONCE(__kvm_tdx->wait_for_sept_zap, true); \
> kvm_make_all_cpus_request(kvm, KVM_REQ_OUTSIDE_GUEST_MODE); \
> \
> __err = retry_func(tdh_func, update_args, args); \
> \
> WRITE_ONCE(__kvm_tdx->wait_for_sept_zap, false); \
> } \
> __err; \
> })
>
> #define tdh_intr_retry(tdh_func, update_args, args...) \
> ({ \
> u64 ____err; \
> \
> do { \
> ____err = tdh_func(args); \
> \
> if ((____err & TDX_SEAMCALL_STATUS_MASK) != \
> TDX_INTERRUPTED_RESUMABLE)
> \
> break; \
> \
> update_args; \
> } while (1); \
> ____err; \
> })
>
> #define tdh_no_retry(tdh_func, update_args, args...) tdh_func(args)
>
> #define tdh_do_no_vcpus(tdh_func, kvm, args...) \
> tdh_do_no_vcpus_retry(tdh_func, kvm, tdh_no_retry, ;, args)
>
> #define tdh_do_no_vcpus_intr_retry(tdh_func, kvm, update_args, args...) \
> tdh_do_no_vcpus_retry(tdh_func, kvm, tdh_intr_retry, update_args, args)
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox