Linux Confidential Computing Development

Linux Confidential Computing Development
 help / color / mirror / Atom feed

* Re: [PATCH v6 00/20] dma-mapping: Use DMA_ATTR_CC_SHARED through direct, pool and swiotlb paths
From: Catalin Marinas @ 2026-06-09 13:43 UTC (permalink / raw)
  To: Aneesh Kumar K.V (Arm)
  Cc: iommu, linux-arm-kernel, linux-kernel, linux-coco, Robin Murphy,
	Marek Szyprowski, Will Deacon, Marc Zyngier, Steven Price,
	Suzuki K Poulose, Jiri Pirko, Jason Gunthorpe, Mostafa Saleh,
	Petr Tesarik, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
	linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, x86
In-Reply-To: <20260604083959.1265923-1-aneesh.kumar@kernel.org>

On Thu, Jun 04, 2026 at 02:09:39PM +0530, Aneesh Kumar K.V (Arm) wrote:
> This series propagates DMA_ATTR_CC_SHARED through the dma-direct,
> dma-pool, and swiotlb paths so that encrypted and decrypted DMA buffers
> are handled consistently.
> 
> Today, the direct DMA path mostly relies on force_dma_unencrypted() for
> shared/decrypted buffer handling. This series consolidates the
> force_dma_unencrypted() checks in the top-level functions and ensures
> that the remaining DMA interfaces use DMA attributes to make the correct
> decisions.

Please check Sashiko's reports, it has some good points:

https://sashiko.dev/#/patchset/20260604083959.1265923-1-aneesh.kumar@kernel.org

I think the main one is the swiotlb_tbl_map_single() changes which break
AMD SME host support. There cc_platform_has(CC_ATTR_MEM_ENCRYPT) is true
but force_dma_unencrypted() is false. Normally you'd not end up on this
path but you can have swiotlb=force.

> Aneesh Kumar K.V (Arm) (20):
>   s390: Expose protected virtualization through cc_platform_has()
>   dma-direct: swiotlb: handle swiotlb alloc/free outside
>     __dma_direct_alloc_pages
>   dma-direct: use DMA_ATTR_CC_SHARED in alloc/free paths
>   dma-pool: track decrypted atomic pools and select them via attrs
>   dma: swiotlb: pass mapping attributes by reference
>   dma: swiotlb: track pool encryption state and honor DMA_ATTR_CC_SHARED
>   dma-mapping: make dma_pgprot() honor DMA_ATTR_CC_SHARED
>   dma-direct: pass attrs to dma_capable() for DMA_ATTR_CC_SHARED checks
>   dma-direct: make dma_direct_map_phys() honor DMA_ATTR_CC_SHARED
>   dma-direct: set decrypted flag for remapped DMA allocations

Patch 10 above...

>   dma-direct: select DMA address encoding from DMA_ATTR_CC_SHARED
>   dma-pool: fix page leak in atomic_pool_expand() cleanup

Patch 12...

>   dma-direct: rename ret to cpu_addr in alloc helpers
>   dma-direct: return struct page from dma_direct_alloc_from_pool()
>   iommu/dma: Check atomic pool allocation result directly

and I think patches 14, 15 are independent fixes. Some of them even have
Fixes: tags and Cc: stable. Please move them to the beginning of the
series to avoid inadvertent dependencies and make them harder to
backport. It's also easier to follow the series without random fixes for
mainline in the middle.

>   dma: swiotlb: free dynamic pools from process context
>   dma: swiotlb: handle set_memory_decrypted() failures
>   dma: free atomic pool pages by physical address
>   swiotlb: Preserve allocation virtual address for dynamic pools
>   swiotlb: remove unused SWIOTLB_FORCE flag

-- 
Catalin

^ permalink raw reply

* Re: [PATCH v6 01/20] s390: Expose protected virtualization through cc_platform_has()
From: Catalin Marinas @ 2026-06-09 13:44 UTC (permalink / raw)
  To: Aneesh Kumar K.V (Arm)
  Cc: iommu, linux-arm-kernel, linux-kernel, linux-coco, Robin Murphy,
	Marek Szyprowski, Will Deacon, Marc Zyngier, Steven Price,
	Suzuki K Poulose, Jiri Pirko, Jason Gunthorpe, Mostafa Saleh,
	Petr Tesarik, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
	linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, x86, Halil Pasic,
	Matthew Rosato, Jaehoon Kim
In-Reply-To: <20260604083959.1265923-2-aneesh.kumar@kernel.org>

On Thu, Jun 04, 2026 at 02:09:40PM +0530, Aneesh Kumar K.V (Arm) wrote:
> Protected virtualization guests use memory encryption, so advertise that to
> the rest of the kernel through cc_platform_has(CC_ATTR_MEM_ENCRYPT).
> 
> s390 already forces DMA mappings to be unencrypted for protected
> virtualization guests through force_dma_unencrypted(). Add
> ARCH_HAS_CC_PLATFORM and provide the matching cc_platform_has()
> implementation
> 
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> ---

Nit: just drop the --- line if you did intend to cc those people.
Nothing wrong for them to end up in the commit log (proof that they've
been cc'ed if they did not reply ;)).

-- 
Catalin

^ permalink raw reply

* Re: [PATCH v6 20/20] swiotlb: remove unused SWIOTLB_FORCE flag
From: Petr Tesarik @ 2026-06-09 13:44 UTC (permalink / raw)
  To: Aneesh Kumar K.V (Arm)
  Cc: iommu, linux-arm-kernel, linux-kernel, linux-coco, Robin Murphy,
	Marek Szyprowski, Will Deacon, Marc Zyngier, Steven Price,
	Suzuki K Poulose, Catalin Marinas, Jiri Pirko, Jason Gunthorpe,
	Mostafa Saleh, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
	linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, x86
In-Reply-To: <20260604083959.1265923-21-aneesh.kumar@kernel.org>

On Thu,  4 Jun 2026 14:09:59 +0530
"Aneesh Kumar K.V (Arm)" <aneesh.kumar@kernel.org> wrote:

> SWIOTLB_FORCE has no remaining in-tree users. Forced bouncing is now
> controlled through the swiotlb=force command line option via
> swiotlb_force_bounce.
> 
> Remove the unused flag and simplify the force_bounce initialization.
> 
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> ---
>  include/linux/swiotlb.h | 1 -
>  kernel/dma/swiotlb.c    | 3 +--
>  2 files changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 526f82e9da45..af88ca7182f4 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -15,7 +15,6 @@ struct page;
>  struct scatterlist;
>  
>  #define SWIOTLB_VERBOSE	(1 << 0) /* verbose initialization */
> -#define SWIOTLB_FORCE	(1 << 1) /* force bounce buffering */
>  #define SWIOTLB_ANY	(1 << 2) /* allow any memory for the buffer */

These constants are kernel-internal, so let's not leave a hole in the
bitmask... I mean, what about changing SWIOTLB_ANY to (1 << 1) after
you remove SWIOTLB_FORCE?

Other than that, LGTM.

I consider this whole series a big step towards saner handling of
encrypted/decrypted memory for DMA buffers. Thank you for your effort!

Petr T

>  
>  /*
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index e4bd8c9eaeda..81cc4928e949 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -400,8 +400,7 @@ void __init swiotlb_init_remap(bool addressing_limit, unsigned int flags,
>  	if (swiotlb_force_disable)
>  		return;
>  
> -	io_tlb_default_mem.force_bounce =
> -		swiotlb_force_bounce || (flags & SWIOTLB_FORCE);
> +	io_tlb_default_mem.force_bounce = swiotlb_force_bounce;
>  
>  #ifdef CONFIG_SWIOTLB_DYNAMIC
>  	if (!remap)


^ permalink raw reply

* Re: [PATCH v6 14/20] dma-direct: return struct page from dma_direct_alloc_from_pool()
From: Catalin Marinas @ 2026-06-09 13:45 UTC (permalink / raw)
  To: Aneesh Kumar K.V (Arm)
  Cc: iommu, linux-arm-kernel, linux-kernel, linux-coco, Robin Murphy,
	Marek Szyprowski, Will Deacon, Marc Zyngier, Steven Price,
	Suzuki K Poulose, Jiri Pirko, Jason Gunthorpe, Mostafa Saleh,
	Petr Tesarik, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
	linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, x86, stable, Michael Kelley
In-Reply-To: <20260604083959.1265923-15-aneesh.kumar@kernel.org>

On Thu, Jun 04, 2026 at 02:09:53PM +0530, Aneesh Kumar K.V (Arm) wrote:
> Commit 5b138c534fda ("dma-direct: factor out a dma_direct_alloc_from_pool
> helper") changed dma_direct_alloc_from_pool() to return the CPU address
> from dma_alloc_from_pool(). That fits dma_direct_alloc(), but
> dma_direct_alloc_pages() also uses the helper and expects a struct page *.
> 
> Fix this by making dma_direct_alloc_from_pool() return the struct page *
> again, and pass the CPU address back through an out-parameter for the
> dma_direct_alloc() caller.
> 
> Fixes: 5b138c534fda ("dma-direct: factor out a dma_direct_alloc_from_pool helper")
> Cc: stable@vger.kernel.org
> 
> Tested-by: Michael Kelley <mhklinux@outlook.com>
> Tested-by: Mostafa Saleh <smostafa@google.com>
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>

Nit: remove the empty line after Cc: stable. It may confuse tooling.

-- 
Catalin

^ permalink raw reply

* Re: [PATCH v6 14/20] dma-direct: return struct page from dma_direct_alloc_from_pool()
From: Jason Gunthorpe @ 2026-06-09 14:15 UTC (permalink / raw)
  To: Aneesh Kumar K.V (Arm)
  Cc: iommu, linux-arm-kernel, linux-kernel, linux-coco, Robin Murphy,
	Marek Szyprowski, Will Deacon, Marc Zyngier, Steven Price,
	Suzuki K Poulose, Catalin Marinas, Jiri Pirko, Mostafa Saleh,
	Petr Tesarik, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
	linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, x86, stable, Michael Kelley
In-Reply-To: <20260604083959.1265923-15-aneesh.kumar@kernel.org>

On Thu, Jun 04, 2026 at 02:09:53PM +0530, Aneesh Kumar K.V (Arm) wrote:
> @@ -270,9 +270,12 @@ void *dma_direct_alloc(struct device *dev, size_t size,
>  	 * the atomic pools instead if we aren't allowed block.
>  	 */
>  	if ((remap || (attrs & DMA_ATTR_CC_SHARED)) &&
> -	    dma_direct_use_pool(dev, gfp))
> -		return dma_direct_alloc_from_pool(dev, size, dma_handle,
> -						  gfp, attrs);
> +	    dma_direct_use_pool(dev, gfp)) {
> +		page = dma_direct_alloc_from_pool(dev, size,
> +					dma_handle, &cpu_addr,
> +					gfp, attrs);
> +		return page ? cpu_addr : NULL;
> +	}

You should probably put this at the start of the series so it can be
backported

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

To Petr's question I think this just shows nobody is really stressing
the PCI dma paths on CC VMs today.

	if (force_dma_unencrypted(dev) && dma_direct_use_pool(dev, gfp))
		return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp);

For instance the places even calling dma_alloc_pages() don't look like
things people would use in a CC VM.

Jason

^ permalink raw reply

* Re: [PATCH v6 04/20] dma-pool: track decrypted atomic pools and select them via attrs
From: Jason Gunthorpe @ 2026-06-09 14:32 UTC (permalink / raw)
  To: Aneesh Kumar K.V (Arm)
  Cc: iommu, linux-arm-kernel, linux-kernel, linux-coco, Robin Murphy,
	Marek Szyprowski, Will Deacon, Marc Zyngier, Steven Price,
	Suzuki K Poulose, Catalin Marinas, Jiri Pirko, Mostafa Saleh,
	Petr Tesarik, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
	linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, x86, Jiri Pirko,
	Michael Kelley
In-Reply-To: <20260604083959.1265923-5-aneesh.kumar@kernel.org>

On Thu, Jun 04, 2026 at 02:09:43PM +0530, Aneesh Kumar K.V (Arm) wrote:
>  struct page *dma_alloc_from_pool(struct device *dev, size_t size,
> -		void **cpu_addr, gfp_t gfp,
> +		void **cpu_addr, gfp_t gfp, unsigned long attrs,
>  		bool (*phys_addr_ok)(struct device *, phys_addr_t, size_t))
>  {
> -	struct gen_pool *pool = NULL;
> +	struct dma_gen_pool *dma_pool = NULL;
>  	struct page *page;
>  	bool pool_found = false;
>  
> -	while ((pool = dma_guess_pool(pool, gfp))) {
> +	while ((dma_pool = dma_guess_pool(dma_pool, gfp))) {
> +
> +		if (dma_pool->unencrypted != !!(attrs & DMA_ATTR_CC_SHARED))
> +			continue;

I don't think you should be overloading DMA_ATTR_CC_SHARED like this.

	/*
	 * DMA_ATTR_CC_SHARED is not a caller-visible dma_alloc_*()
	 * attribute. The direct allocator uses it internally after it has
	 * decided that the backing pages must be shared/decrypted, so the
	 * rest of the allocation path can consistently select DMA addresses,
	 * choose compatible pools and restore encryption on free.
	 */
	if (attrs & DMA_ATTR_CC_SHARED)
		return NULL;

	if (force_dma_unencrypted(dev)) {
		attrs |= DMA_ATTR_CC_SHARED;
		mark_mem_decrypt = true;
	}

It is fine to have a bit inside the attrs that is only used by the
internal logic, but it needs to have a clearer name
__DMA_ATTR_REQUIRE_CC_SHARED perhaps.

The sashiko note does look legit though:

	if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
	    !gfpflags_allow_blocking(gfp) && !coherent) {
		page = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &cpu_addr,
					   gfp, attrs, NULL);
		if (!page)
			return NULL;

I don't see anything doing the force_dma_unencrypted test along this
callchain..

I guess it should be done one step up in dma_alloc_attrs() instead of
in dma_direct_alloc()?

Jason

^ permalink raw reply

* Re: [PATCH v6 00/20] dma-mapping: Use DMA_ATTR_CC_SHARED through direct, pool and swiotlb paths
From: Jason Gunthorpe @ 2026-06-09 14:47 UTC (permalink / raw)
  To: Catalin Marinas, Alexey Kardashevskiy
  Cc: Aneesh Kumar K.V (Arm), iommu, linux-arm-kernel, linux-kernel,
	linux-coco, Robin Murphy, Marek Szyprowski, Will Deacon,
	Marc Zyngier, Steven Price, Suzuki K Poulose, Jiri Pirko,
	Mostafa Saleh, Petr Tesarik, Dan Williams, Xu Yilun, linuxppc-dev,
	linux-s390, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, x86
In-Reply-To: <aigYbK12D8uKQvJF@arm.com>

On Tue, Jun 09, 2026 at 02:43:08PM +0100, Catalin Marinas wrote:
> On Thu, Jun 04, 2026 at 02:09:39PM +0530, Aneesh Kumar K.V (Arm) wrote:
> > This series propagates DMA_ATTR_CC_SHARED through the dma-direct,
> > dma-pool, and swiotlb paths so that encrypted and decrypted DMA buffers
> > are handled consistently.
> > 
> > Today, the direct DMA path mostly relies on force_dma_unencrypted() for
> > shared/decrypted buffer handling. This series consolidates the
> > force_dma_unencrypted() checks in the top-level functions and ensures
> > that the remaining DMA interfaces use DMA attributes to make the correct
> > decisions.
> 
> Please check Sashiko's reports, it has some good points:
> 
> https://sashiko.dev/#/patchset/20260604083959.1265923-1-aneesh.kumar@kernel.org
> 
> I think the main one is the swiotlb_tbl_map_single() changes which break
> AMD SME host support. There cc_platform_has(CC_ATTR_MEM_ENCRYPT) is true
> but force_dma_unencrypted() is false. Normally you'd not end up on this
> path but you can have swiotlb=force.

IMHO that's an AMD issue, not with the design of this series..

The series is right, a device that is !force_dma_decrypted() must be
considerd to be a trusted device and we must never place any DMA
mappings for a trusted device into shared memory.

That AMD has done somethine insane:

bool force_dma_unencrypted(struct device *dev)
{
	/*
	 * For SEV, all DMA must be to unencrypted addresses.
	 */
	if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
		return true;

	/*
	 * For SME, all DMA must be to unencrypted addresses if the
	 * device does not support DMA to addresses that include the
	 * encryption mask.
	 */
	if (cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT)) {
		u64 dma_enc_mask = DMA_BIT_MASK(__ffs64(sme_me_mask));
		u64 dma_dev_mask = min_not_zero(dev->coherent_dma_mask,
						dev->bus_dma_limit);

		if (dma_dev_mask <= dma_enc_mask)
			return true;
	}

Is an AMD issue. We already have an address mask limit system built
into the DMA API, arch code should not be co-opting the CC mechanism
to create a special pool for address limited devices.

The correct thing is to ensure the DMA API is checking any address
limits on the actual true dma_addr_t, not on an intermediate like a
phys_addr before it is adjusted with any C bit. Then it is a normal
low address swiotlb bounce like any other.

I think we can ignore this Sashiko remark, in real systems the use of
swiotlb for 64 bit devices is very rare. Though it would be good to
remove this code from AMD...

Jason

^ permalink raw reply

* SVSM Development Call June 10th, 2026
From: Jörg Rödel @ 2026-06-09 14:47 UTC (permalink / raw)
  To: coconut-svsm, linux-coco

Hi,

Here is the call for agenda items for this weeks SVSM development call.  Please
send any agenda items you have in mind as a reply to this email or raise them
in the meeting.

We will use the LF Zoom instance. Details of the meeting  can be found in our
governance repository at:

	https://github.com/coconut-svsm/governance

The link to the COCONUT-SVSM calendar is:

	https://zoom-lfx.platform.linuxfoundation.org/meetings/coconut-svsm?view=week

The meeting will be recorded and the recording eventually published.

Regards,

	Jörg

^ permalink raw reply

* Re: [PATCH 03/15] x86/virt/tdx: Make TDX Module initialize Extensions
From: Adrian Hunter @ 2026-06-09 15:14 UTC (permalink / raw)
  To: Xu Yilun, kas, djbw, rick.p.edgecombe, x86, peter.fang
  Cc: linux-coco, linux-kernel, kvm, sohil.mehta, yilun.xu, baolu.lu,
	zhenzhong.duan, xiaoyao.li
In-Reply-To: <20260522034128.3144354-4-yilun.xu@linux.intel.com>

On 22/05/2026 06:41, Xu Yilun wrote:
> +/* Initialize the TDX Module Extensions then Extension-SEAMCALLs can be used */

Reads slightly better without "the", so taking Tony's suggestion
one word less:

"Initialize TDX Module Extensions for Extension-SEAMCALLs"

> +static int tdx_ext_init(void)
> +{
> +	struct tdx_module_args args = {};
> +	u64 r;
> +
> +	do {
> +		r = seamcall(TDH_EXT_INIT, &args);
> +	} while (r == TDX_INTERRUPTED_RESUMABLE);
> +
> +	if (r != TDX_SUCCESS)

There seems to be TDX_PREV_FEATURES_ENABLED which is unused,
but could it turn up here?

> +		return -EFAULT;
> +
> +	return 0;
> +}
Otherwise:

Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>


^ permalink raw reply

* Re: [PATCH v4 01/47] x86/tsc: Never re-calibrate TSC frequency if its exact timing is known
From: Sean Christopherson @ 2026-06-09 17:17 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Paolo Bonzini, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	Kiryl Shutsemau, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Long Li, Ajay Kaher, Alexey Makhalov, Jan Kiszka,
	Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
	John Stultz, H. Peter Anvin, Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, kvm, linux-kernel, linux-coco, linux-hyperv,
	virtualization, xen-devel, David Woodhouse, Tom Lendacky,
	Nikunj A Dadhania, David Woodhouse, Michael Kelley
In-Reply-To: <87a4t86a0l.ffs@fw13>

On Fri, Jun 05, 2026, Thomas Gleixner wrote:
> On Fri, Jun 05 2026 at 11:04, Sean Christopherson wrote:
> But we also should have a check in the TSC init code somewhere which
> validates that X86_FEATURE_CONSTANT_TSC is set when
> X86_FEATURE_TSC_KNOWN_FREQ is set. X86_FEATURE_TSC_KNOWN_FREQ is useless
> w/o X86_FEATURE_CONSTANT_TSC.

Ugh, any objection to punting on this for now?  KVM and Xen guests will trigger
TSC_KNOWN_FREQ without CONSTANT_TSC, thanks to commits:

  e10f78050323 ("kvmclock: fix TSC calibration for nested guests")
  898ec52d2ba0 ("x86/xen/time: Set the X86_FEATURE_TSC_KNOWN_FREQ flag in xen_tsc_khz()")

Hyper-V guests might as well?  Hyper-V's handling of TSC is weird, even for a
hypervisor.

Even when the frequency is provided in CPUID by the hypervisor, QEMU at least
requires a fairly explicit opt-in to advertise CONSTANT_TSC, presumably to try
to prevent users from shooting themselves in the foot.

^ permalink raw reply

* Re: [PATCH v4 01/47] x86/tsc: Never re-calibrate TSC frequency if its exact timing is known
From: Thomas Gleixner @ 2026-06-09 19:27 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	Kiryl Shutsemau, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Long Li, Ajay Kaher, Alexey Makhalov, Jan Kiszka,
	Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
	John Stultz, H. Peter Anvin, Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, kvm, linux-kernel, linux-coco, linux-hyperv,
	virtualization, xen-devel, David Woodhouse, Tom Lendacky,
	Nikunj A Dadhania, David Woodhouse, Michael Kelley
In-Reply-To: <aihKj-0nP7bUbNHH@google.com>

On Tue, Jun 09 2026 at 10:17, Sean Christopherson wrote:

> On Fri, Jun 05, 2026, Thomas Gleixner wrote:
>> On Fri, Jun 05 2026 at 11:04, Sean Christopherson wrote:
>> But we also should have a check in the TSC init code somewhere which
>> validates that X86_FEATURE_CONSTANT_TSC is set when
>> X86_FEATURE_TSC_KNOWN_FREQ is set. X86_FEATURE_TSC_KNOWN_FREQ is useless
>> w/o X86_FEATURE_CONSTANT_TSC.
>
> Ugh, any objection to punting on this for now?  KVM and Xen guests will trigger
> TSC_KNOWN_FREQ without CONSTANT_TSC, thanks to commits:
>
>   e10f78050323 ("kvmclock: fix TSC calibration for nested guests")
>   898ec52d2ba0 ("x86/xen/time: Set the X86_FEATURE_TSC_KNOWN_FREQ flag in xen_tsc_khz()")
>
> Hyper-V guests might as well?  Hyper-V's handling of TSC is weird, even for a
> hypervisor.

Hypervisors are ranked by weirdness? I ranked them by insanity so far.

> Even when the frequency is provided in CPUID by the hypervisor, QEMU at least
> requires a fairly explicit opt-in to advertise CONSTANT_TSC, presumably to try
> to prevent users from shooting themselves in the foot.

Bah. We really should have enforced the dependency when we introduced
KNOWN_FREQ. But that ship has sailed.

Though for correctness sake this should be fixed at some point in the
foreseeable future.

Thanks,

        tglx

^ permalink raw reply

* Re: [PATCH v4 02/47] x86/tsc: Add a standalone helpers for getting TSC info from CPUID.0x15
From: Sean Christopherson @ 2026-06-09 19:28 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Dave Hansen, x86,
	Kiryl Shutsemau, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Long Li, Ajay Kaher, Alexey Makhalov, Jan Kiszka,
	Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
	John Stultz, H. Peter Anvin, Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, kvm, linux-kernel, linux-coco, linux-hyperv,
	virtualization, xen-devel, David Woodhouse, Tom Lendacky,
	Nikunj A Dadhania, David Woodhouse, Michael Kelley,
	Thomas Gleixner
In-Reply-To: <20260602034916.GGah5SvARd77mkvxe3@fat_crate.local>

On Mon, Jun 01, 2026, Borislav Petkov wrote:
> On Fri, May 29, 2026 at 07:43:49AM -0700, Sean Christopherson wrote:
> > +static int cpuid_get_tsc_info(struct cpuid_tsc_info *info)
> > +{
> > +	unsigned int ecx_hz, edx;
> > +
> > +	memset(info, 0, sizeof(*info));
> 
> Let's not clear this unnecessarily...
> 
> > +
> > +	if (boot_cpu_data.cpuid_level < CPUID_LEAF_TSC)
> > +		return -ENOENT;
> 
> ... just to return here...
> 
> > +
> > +	/* CPUID 15H TSC/Crystal ratio, plus optionally Crystal Hz */
> > +	cpuid(CPUID_LEAF_TSC, &info->denominator, &info->numerator, &ecx_hz, &edx);
> > +
> > +	if (!info->denominator || !info->numerator)
> > +		return -ENOENT;
> 
> ... or here.
> 
> We wanna clear it here, when we'll return success.

Actually, if we take the approach of relying on the user to check the return
code, then there's no need to zero the struct since all fields will be explicitly
written, especially if we drop the "tsc_khz" field.  I was zeroing the field
purely as defense in depth.

^ permalink raw reply

* Re: [PATCH 01/15] x86/virt/tdx: Read global metadata for TDX Module Extensions
From: Xu Yilun @ 2026-06-10  3:20 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: kas, djbw, rick.p.edgecombe, x86, peter.fang, linux-coco,
	linux-kernel, kvm, sohil.mehta, yilun.xu, baolu.lu,
	zhenzhong.duan, xiaoyao.li
In-Reply-To: <1783e7bd-3759-41f7-93e3-2f9e21264bd4@intel.com>

On Tue, Jun 09, 2026 at 04:06:50PM +0300, Adrian Hunter wrote:
> On 22/05/2026 06:41, Xu Yilun wrote:
> > Add reading of the global metadata for TDX Module Extensions.
> 
> For tip, isn't the expectation to explain the context first.  The
> very first patch, might be a good place to explain a bit about
> TDX Module Extensions in general.

Yes. I'm trying to add a long context for the first patch but was
suggested to move to cover-letter. I think I can add a brief
introduction at the beginning:

  TDX module introduces a new concept caled "TDX module Extension" to
  support long running / hard-irq preemptible flows inside. ...

^ permalink raw reply

* Re: [PATCH 02/15] x86/virt/tdx: Add extra memory to TDX Module for Extensions
From: Xu Yilun @ 2026-06-10  5:13 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: kas, djbw, rick.p.edgecombe, x86, peter.fang, linux-coco,
	linux-kernel, kvm, sohil.mehta, yilun.xu, baolu.lu,
	zhenzhong.duan, xiaoyao.li
In-Reply-To: <f7238ced-73a0-44f3-8a50-23eb5a2da5dc@intel.com>

On Tue, Jun 09, 2026 at 04:38:31PM +0300, Adrian Hunter wrote:
> On 22/05/2026 06:41, Xu Yilun wrote:
> > +static int tdx_ext_mem_add(struct page *root, unsigned int nr_pages)
> > +{
> > +	struct tdx_module_args args = {
> > +		.rcx = to_hpa_list_info(root, nr_pages),
> > +	};
> > +	u64 r;
> > +
> > +	tdx_clflush_hpa_list(root, nr_pages);
> > +
> > +	do {
> > +		/*
> > +		 * TDH_EXT_MEM_ADD is designed to use output parameter RCX to
> > +		 * override/update input parameter RCX, so the caller doesn't
> > +		 * have to do manual parameter update on retry call.
> > +		 */
> > +		r = seamcall_ret(TDH_EXT_MEM_ADD, &args);
> > +	} while (r == TDX_INTERRUPTED_RESUMABLE);
> 
> Kishon already mentioned checking only the status

Could you elaborate on why we should mask? I assume the mask is only
needed when the lower bits ([31:0]) are defined to contain extra
information. TDX_INTERRUPTED_RESUMABLE is not the case so we could make
the code change simpler.

And if some non-zero bits appears there, it is a Module bug that we
should not skip.

> 
> > +
> > +	if (r != TDX_SUCCESS)
> 
> Similarly could this also be TDX_EXT_MEMORY_POOL_FULL?

I don't think we should pass the case. The Module provides the number of
required pages via metadata, host follows and feeds pages but the Module
said "Sorry, I'm already full". This is inconsistent behavior that we
should call out.

> 
> > +		return -EFAULT;
> > +
> > +	return 0;
> > +}
> 

^ permalink raw reply

* Re: [PATCH 02/15] x86/virt/tdx: Add extra memory to TDX Module for Extensions
From: Adrian Hunter @ 2026-06-10  5:43 UTC (permalink / raw)
  To: Xu Yilun
  Cc: kas, djbw, rick.p.edgecombe, x86, peter.fang, linux-coco,
	linux-kernel, kvm, sohil.mehta, yilun.xu, baolu.lu,
	zhenzhong.duan, xiaoyao.li
In-Reply-To: <aijygJ9YqskIOvB3@yilunxu-OptiPlex-7050>

On 10/06/2026 08:13, Xu Yilun wrote:
> On Tue, Jun 09, 2026 at 04:38:31PM +0300, Adrian Hunter wrote:
>> On 22/05/2026 06:41, Xu Yilun wrote:
>>> +static int tdx_ext_mem_add(struct page *root, unsigned int nr_pages)
>>> +{
>>> +	struct tdx_module_args args = {
>>> +		.rcx = to_hpa_list_info(root, nr_pages),
>>> +	};
>>> +	u64 r;
>>> +
>>> +	tdx_clflush_hpa_list(root, nr_pages);
>>> +
>>> +	do {
>>> +		/*
>>> +		 * TDH_EXT_MEM_ADD is designed to use output parameter RCX to
>>> +		 * override/update input parameter RCX, so the caller doesn't
>>> +		 * have to do manual parameter update on retry call.
>>> +		 */
>>> +		r = seamcall_ret(TDH_EXT_MEM_ADD, &args);
>>> +	} while (r == TDX_INTERRUPTED_RESUMABLE);
>>
>> Kishon already mentioned checking only the status
> 
> Could you elaborate on why we should mask? I assume the mask is only
> needed when the lower bits ([31:0]) are defined to contain extra
> information. TDX_INTERRUPTED_RESUMABLE is not the case so we could make
> the code change simpler.
> 
> And if some non-zero bits appears there, it is a Module bug that we
> should not skip.

Agreed

> 
>>
>>> +
>>> +	if (r != TDX_SUCCESS)
>>
>> Similarly could this also be TDX_EXT_MEMORY_POOL_FULL?
> 
> I don't think we should pass the case. The Module provides the number of
> required pages via metadata, host follows and feeds pages but the Module
> said "Sorry, I'm already full". This is inconsistent behavior that we
> should call out.

TDX_EXT_MEMORY_POOL_FULL is not an error code. It is a success code,
so the question is whether it will ever show up on, say, the very last
TDH_EXT_MEM_ADD.

> 
>>
>>> +		return -EFAULT;
>>> +
>>> +	return 0;
>>> +}
>>


^ permalink raw reply

* Re: [PATCH v6 04/20] dma-pool: track decrypted atomic pools and select them via attrs
From: Aneesh Kumar K.V @ 2026-06-10  8:07 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, linux-arm-kernel, linux-kernel, linux-coco, Robin Murphy,
	Marek Szyprowski, Will Deacon, Marc Zyngier, Steven Price,
	Suzuki K Poulose, Catalin Marinas, Jiri Pirko, Mostafa Saleh,
	Petr Tesarik, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
	linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, x86, Jiri Pirko,
	Michael Kelley
In-Reply-To: <20260609143242.GK2764304@ziepe.ca>

Jason Gunthorpe <jgg@ziepe.ca> writes:

> On Thu, Jun 04, 2026 at 02:09:43PM +0530, Aneesh Kumar K.V (Arm) wrote:
>>  struct page *dma_alloc_from_pool(struct device *dev, size_t size,
>> -		void **cpu_addr, gfp_t gfp,
>> +		void **cpu_addr, gfp_t gfp, unsigned long attrs,
>>  		bool (*phys_addr_ok)(struct device *, phys_addr_t, size_t))
>>  {
>> -	struct gen_pool *pool = NULL;
>> +	struct dma_gen_pool *dma_pool = NULL;
>>  	struct page *page;
>>  	bool pool_found = false;
>>  
>> -	while ((pool = dma_guess_pool(pool, gfp))) {
>> +	while ((dma_pool = dma_guess_pool(dma_pool, gfp))) {
>> +
>> +		if (dma_pool->unencrypted != !!(attrs & DMA_ATTR_CC_SHARED))
>> +			continue;
>
> I don't think you should be overloading DMA_ATTR_CC_SHARED like this.
>
> 	/*
> 	 * DMA_ATTR_CC_SHARED is not a caller-visible dma_alloc_*()
> 	 * attribute. The direct allocator uses it internally after it has
> 	 * decided that the backing pages must be shared/decrypted, so the
> 	 * rest of the allocation path can consistently select DMA addresses,
> 	 * choose compatible pools and restore encryption on free.
> 	 */
> 	if (attrs & DMA_ATTR_CC_SHARED)
> 		return NULL;
>
> 	if (force_dma_unencrypted(dev)) {
> 		attrs |= DMA_ATTR_CC_SHARED;
> 		mark_mem_decrypt = true;
> 	}
>
> It is fine to have a bit inside the attrs that is only used by the
> internal logic, but it needs to have a clearer name
> __DMA_ATTR_REQUIRE_CC_SHARED perhaps.
>

Are you suggesting adding another attribute in addition to
DMA_ATTR_CC_SHARED?

Is the idea that __DMA_ATTR_REQUIRE_CC_SHARED would be used in the
allocation path to request a CC_SHARED allocation, while
DMA_ATTR_CC_SHARED would be used in the mapping path to describe the
attribute of the address?



>
> The sashiko note does look legit though:
>
> 	if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
> 	    !gfpflags_allow_blocking(gfp) && !coherent) {
> 		page = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &cpu_addr,
> 					   gfp, attrs, NULL);
> 		if (!page)
> 			return NULL;
>
> I don't see anything doing the force_dma_unencrypted test along this
> callchain..
>
> I guess it should be done one step up in dma_alloc_attrs() instead of
> in dma_direct_alloc()?
>

Yes, I'll move it there.

-aneesh

^ permalink raw reply

* Re: [PATCH 02/15] x86/virt/tdx: Add extra memory to TDX Module for Extensions
From: Xu Yilun @ 2026-06-10  7:44 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: kas, djbw, rick.p.edgecombe, x86, peter.fang, linux-coco,
	linux-kernel, kvm, sohil.mehta, yilun.xu, baolu.lu,
	zhenzhong.duan, xiaoyao.li
In-Reply-To: <33974159-ec7c-4c5a-845f-02ea4e67a0c4@intel.com>

> >>> +	if (r != TDX_SUCCESS)
> >>
> >> Similarly could this also be TDX_EXT_MEMORY_POOL_FULL?
> > 
> > I don't think we should pass the case. The Module provides the number of
> > required pages via metadata, host follows and feeds pages but the Module
> > said "Sorry, I'm already full". This is inconsistent behavior that we
> > should call out.
> 
> TDX_EXT_MEMORY_POOL_FULL is not an error code. It is a success code,
> so the question is whether it will ever show up on, say, the very last
> TDH_EXT_MEM_ADD.

It will not show up. I got from Module team that it shows up when the
poll is already full and host still adds more pages.

^ permalink raw reply

* Re: [PATCH 03/15] x86/virt/tdx: Make TDX Module initialize Extensions
From: Xu Yilun @ 2026-06-10  8:09 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: kas, djbw, rick.p.edgecombe, x86, peter.fang, linux-coco,
	linux-kernel, kvm, sohil.mehta, yilun.xu, baolu.lu,
	zhenzhong.duan, xiaoyao.li
In-Reply-To: <1f6bf040-6a90-47f8-93dd-0e36d4f256dc@intel.com>

On Tue, Jun 09, 2026 at 06:14:22PM +0300, Adrian Hunter wrote:
> On 22/05/2026 06:41, Xu Yilun wrote:
> > +/* Initialize the TDX Module Extensions then Extension-SEAMCALLs can be used */
> 
> Reads slightly better without "the", so taking Tony's suggestion
> one word less:
> 
> "Initialize TDX Module Extensions for Extension-SEAMCALLs"

OK, included.

> 
> > +static int tdx_ext_init(void)
> > +{
> > +	struct tdx_module_args args = {};
> > +	u64 r;
> > +
> > +	do {
> > +		r = seamcall(TDH_EXT_INIT, &args);
> > +	} while (r == TDX_INTERRUPTED_RESUMABLE);
> > +
> > +	if (r != TDX_SUCCESS)
> 
> There seems to be TDX_PREV_FEATURES_ENABLED which is unused,
> but could it turn up here?

“Yes, but not now.” from Module team. It is some future thing
under discussion.

^ permalink raw reply

* Re: [PATCH v6 06/20] dma: swiotlb: track pool encryption state and honor DMA_ATTR_CC_SHARED
From: Aneesh Kumar K.V @ 2026-06-10  8:46 UTC (permalink / raw)
  To: Petr Tesarik
  Cc: iommu, linux-arm-kernel, linux-kernel, linux-coco, Robin Murphy,
	Marek Szyprowski, Will Deacon, Marc Zyngier, Steven Price,
	Suzuki K Poulose, Catalin Marinas, Jiri Pirko, Jason Gunthorpe,
	Mostafa Saleh, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
	linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, x86, Jiri Pirko,
	Michael Kelley
In-Reply-To: <20260609144836.4ecea34e@mordecai>

Petr Tesarik <ptesarik@suse.com> writes:

> On Thu,  4 Jun 2026 14:09:45 +0530
> "Aneesh Kumar K.V (Arm)" <aneesh.kumar@kernel.org> wrote:
>

...

>>  /*
>>   * If memory encryption is supported, phys_to_dma will set the memory encryption
>>   * bit in the DMA address, and dma_to_phys will clear it.
>> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
>> index 29187cec90d8..4dcbf3931be1 100644
>> --- a/include/linux/swiotlb.h
>> +++ b/include/linux/swiotlb.h
>> @@ -81,6 +81,7 @@ struct io_tlb_pool {
>>  	struct list_head node;
>>  	struct rcu_head rcu;
>>  	bool transient;
>> +	bool unencrypted;
>
> IIUC this is a copy of the unencrypted member in the corresponding
> struct io_tlb_mem. In other words, if pools are allocated dynamically,
> all pools must have the same encryption state, correct?
>

That is correct. The reason we have the unencrypted member in struct
io_tlb_pool is that we need it in swiotlb_dyn_free().

When freeing memory from an unencrypted pool, we need to convert the
memory back to private/encrypted

>
>>  #endif
>>  };
>>  
>> @@ -111,6 +112,7 @@ struct io_tlb_mem {
>>  	struct dentry *debugfs;
>>  	bool force_bounce;
>>  	bool for_alloc;
>> +	bool unencrypted;
>>  #ifdef CONFIG_SWIOTLB_DYNAMIC
>>  	bool can_grow;
>>  	u64 phys_limit;
>> @@ -282,7 +284,8 @@ static inline void swiotlb_sync_single_for_cpu(struct device *dev,
>>  extern void swiotlb_print_info(void);

....

>> @@ -604,30 +621,26 @@ static struct page *alloc_dma_pages(gfp_t gfp, size_t bytes, u64 phys_limit)
>>   * @dev:	Device for which a memory pool is allocated.
>>   * @bytes:	Size of the buffer.
>>   * @phys_limit:	Maximum allowed physical address of the buffer.
>> + * @attrs:	DMA attributes for the allocation.
>>   * @gfp:	GFP flags for the allocation.
>>   *
>>   * Return: Allocated pages, or %NULL on allocation failure.
>>   */
>>  static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
>> -		u64 phys_limit, gfp_t gfp)
>> +		u64 phys_limit, unsigned long attrs, gfp_t gfp)
>
> If my assumption above is correct, then I prefer to add a struct
> io_tlb_mem *mem parameter here and calculate the allocation attributes
> inside this function, so you don't have to repeat it in the callers.
>

Will switch to that. That is, we will use struct io_tlb_mem->unencrypted
to determine the pool attribute instead of using unsigned long attrs

>
>>  {
>>  	struct page *page;
>> -	unsigned long attrs = 0;
>>  
>>  	/*
>>  	 * Allocate from the atomic pools if memory is encrypted and
>>  	 * the allocation is atomic, because decrypting may block.
>>  	 */
>> -	if (!gfpflags_allow_blocking(gfp) && dev && force_dma_unencrypted(dev)) {
>> +	if (!gfpflags_allow_blocking(gfp) && (attrs & DMA_ATTR_CC_SHARED)) {
>
> You're removing the check that dev is non-NULL. This is fine, because
> the only call with dev == NULL is from swiotlb_dyn_alloc(), and that one
> uses GFP_KERNEL (i.e. allows blocking). However, if this is an intended
> optimization, I'd rather have it in a separate commit, with this
> explanation why it's OK to do it.
>
> The rest of the patch looks good to me.
>

I'll add that back.

-aneesh

^ permalink raw reply

* [Invitation] bi-weekly guest_memfd upstream call on 2026-06-11
From: David Hildenbrand (Arm) @ 2026-06-10 15:59 UTC (permalink / raw)
  To: linux-coco@lists.linux.dev, linux-mm@kvack.org, KVM

Hi,

Our next guest_memfd upstream call is scheduled for tomorrow, Thursday,
2026-06-11 at 8:00 - 9:00am (GMT-07:00) Pacific Time - Vancouver.

So far we don't have a lot of topics: I'll have some clarifying questions around
virtio-mem for CoCo, and likely it would be a good idea to talk about memory
hot(un)plug+memory ballooning for CoCo in general (what already works, what
doesn't, which requirements/limitations do different platforms have, etc). Apart
from that, I'm sure interesting stuff will come up.

We'll be using the following Google meet:
http://meet.google.com/wxp-wtju-jzw

The meeting notes can be found at [1], where we also link recordings and
collect current guest_memfd upstream proposals. If you want an google
calendar invitation that also covers all future meetings, just write me
or Ackerley a mail.

To put something to discuss onto the agenda, reply to this mail or add
them to the "Topics/questions for next meeting(s)" section in the
meeting notes as a comment.

[1]
https://docs.google.com/document/d/1M6766BzdY1Lhk7LiR5IqVR8B8mG3cr-cxTxOrAosPOk/edit?usp=sharing

-- 
Cheers,

David

^ permalink raw reply

* Re: [PATCH v6 04/20] dma-pool: track decrypted atomic pools and select them via attrs
From: Jason Gunthorpe @ 2026-06-10 16:41 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: iommu, linux-arm-kernel, linux-kernel, linux-coco, Robin Murphy,
	Marek Szyprowski, Will Deacon, Marc Zyngier, Steven Price,
	Suzuki K Poulose, Catalin Marinas, Jiri Pirko, Mostafa Saleh,
	Petr Tesarik, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
	linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, x86, Jiri Pirko,
	Michael Kelley
In-Reply-To: <yq5afr2uzum9.fsf@kernel.org>

On Wed, Jun 10, 2026 at 01:37:26PM +0530, Aneesh Kumar K.V wrote:
> Jason Gunthorpe <jgg@ziepe.ca> writes:
> 
> > On Thu, Jun 04, 2026 at 02:09:43PM +0530, Aneesh Kumar K.V (Arm) wrote:
> >>  struct page *dma_alloc_from_pool(struct device *dev, size_t size,
> >> -		void **cpu_addr, gfp_t gfp,
> >> +		void **cpu_addr, gfp_t gfp, unsigned long attrs,
> >>  		bool (*phys_addr_ok)(struct device *, phys_addr_t, size_t))
> >>  {
> >> -	struct gen_pool *pool = NULL;
> >> +	struct dma_gen_pool *dma_pool = NULL;
> >>  	struct page *page;
> >>  	bool pool_found = false;
> >>  
> >> -	while ((pool = dma_guess_pool(pool, gfp))) {
> >> +	while ((dma_pool = dma_guess_pool(dma_pool, gfp))) {
> >> +
> >> +		if (dma_pool->unencrypted != !!(attrs & DMA_ATTR_CC_SHARED))
> >> +			continue;
> >
> > I don't think you should be overloading DMA_ATTR_CC_SHARED like this.
> >
> > 	/*
> > 	 * DMA_ATTR_CC_SHARED is not a caller-visible dma_alloc_*()
> > 	 * attribute. The direct allocator uses it internally after it has
> > 	 * decided that the backing pages must be shared/decrypted, so the
> > 	 * rest of the allocation path can consistently select DMA addresses,
> > 	 * choose compatible pools and restore encryption on free.
> > 	 */
> > 	if (attrs & DMA_ATTR_CC_SHARED)
> > 		return NULL;
> >
> > 	if (force_dma_unencrypted(dev)) {
> > 		attrs |= DMA_ATTR_CC_SHARED;
> > 		mark_mem_decrypt = true;
> > 	}
> >
> > It is fine to have a bit inside the attrs that is only used by the
> > internal logic, but it needs to have a clearer name
> > __DMA_ATTR_REQUIRE_CC_SHARED perhaps.
> >
> 
> Are you suggesting adding another attribute in addition to
> DMA_ATTR_CC_SHARED?
> 
> Is the idea that __DMA_ATTR_REQUIRE_CC_SHARED would be used in the
> allocation path to request a CC_SHARED allocation, while
> DMA_ATTR_CC_SHARED would be used in the mapping path to describe the
> attribute of the address?

Yeah, it is a thought at least

Maybe a comment is good enough.

I just find it hard to follow when we have this dual usage. Like the
code above for dma_pool->unencrypted is completely wrong if it is an
"attribute of an address". Easy to cut & paste that into the wrong
context.

Especially if you move things up higher.. having the alloc set both
CC_SHARED and REQUIRE_CC_SHARED or maybe ALLOC_CC_SHARED would make it
clearer that the alloc code lives under that callchain

Jason

^ permalink raw reply

* Re: [PATCH v7 00/42] guest_memfd: In-place conversion support
From: Ackerley Tng @ 2026-06-10 17:49 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Ackerley Tng via B4 Relay, aik, andrew.jones, binbin.wu, brauner,
	chao.p.peng, david, ira.weiny, jmattson, jthoughton, michael.roth,
	oupton, pankaj.gupta, qperret, rick.p.edgecombe, rientjes,
	shivankg, steven.price, tabba, willy, wyihan, yan.y.zhao,
	forkloop, pratyush, suzuki.poulose, aneesh.kumar, liam,
	Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Steven Rostedt,
	Masami Hiramatsu, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
	Shuah Khan, Vishal Annapurve, Andrew Morton, Chris Li,
	Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng,
	Shakeel Butt, Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka,
	kvm, linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm, linux-coco
In-Reply-To: <aiMVLtblIKu1DQWJ@google.com>

Sean Christopherson <seanjc@google.com> writes:

> On Thu, Jun 04, 2026, Ackerley Tng wrote:
>> Sean Christopherson <seanjc@google.com> writes:
>> >> + KVM: selftests: Test conversion with elevated page refcount
>> >>     + Askar pointed out that soon vmsplice may not pin pages. Should I
>> >>       pin pages through CONFIG_GUP_TEST like in [2]? I prefer not to
>> >>       take a dependency on CONFIG_GUP_TEST.
>> >
>> > I'm not exactly excited about taking a dependency on CONFIG_GUP_TEST either, but
>> > it probably is the least awful choice.  E.g. KVM also pins pages is certain flows,
>> > but we're _also_ actively working to remove the need to pin.
>> >
>> > Hmm, maybe IORING_REGISTER_PBUF_RING?  AFAICT, it's almost literally a "pin user
>> > memory" syscall.
>> >
>>
>> Hmm that takes a dependency on io_uring, which isn't always compiled
>> in. Between CONFIG_IO_URING and CONFIG_GUP_TEST, I'd rather
>> CONFIG_GUP_TEST.
>
> Or try both?  If it's not a ridiculous amount of work.

CONFIG_GUP_TEST was tried in [1]

[1] https://lore.kernel.org/all/baa8838f623102931e755cf34c86314b305af49c.1747264138.git.ackerleytng@google.com/

It looks like this

  static void pin_pages(void *vaddr, uint64_t size)
  {
  	const struct pin_longterm_test args = {
  		.addr = (uint64_t)vaddr,
  		.size = size,
  		.flags = PIN_LONGTERM_TEST_FLAG_USE_WRITE,
  	};

  	gup_test_fd = open("/sys/kernel/debug/gup_test", O_RDWR);
  	TEST_REQUIRE(gup_test_fd > 0);

  	TEST_ASSERT_EQ(ioctl(gup_test_fd, PIN_LONGTERM_TEST_START, &args), 0);
  }

  static void unpin_pages(void)
  {
  	TEST_ASSERT_EQ(ioctl(gup_test_fd, PIN_LONGTERM_TEST_STOP), 0);
  }

So in the test I'll call pin_pages(), then try to convert, see that it
fails with EAGAIN and reports the expected error_offset, then I call
unpin_pages(), then I convert again and expect success.

Are you uncomfortable with the CONFIG_GUP_TEST interface? What would you
like me to try with CONFIG_IO_URING? I'm thinking that the main
difference between the two is just down to which non-default CONFIG
option we want to take for guest_memfd tests.

^ permalink raw reply

* Re: SVSM Development Call June 10th, 2026
From: Jörg Rödel @ 2026-06-10 18:35 UTC (permalink / raw)
  To: coconut-svsm, linux-coco
In-Reply-To: <aigneULi6Pmcjexa@8bytes.org>

Meeting minutes are ready in this PR:

	https://github.com/coconut-svsm/governance/pull/112

-Joerg

^ permalink raw reply

* Re: [PATCH v7 04/42] KVM: Stub in ability to disable per-VM memory attribute tracking
From: Sean Christopherson @ 2026-06-10 22:19 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	ira.weiny, jmattson, jthoughton, michael.roth, oupton,
	pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
	steven.price, tabba, willy, wyihan, yan.y.zhao, forkloop,
	pratyush, suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
	Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
	Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng, Shakeel Butt,
	Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka, kvm,
	linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm, linux-coco
In-Reply-To: <20260522-gmem-inplace-conversion-v7-4-2f0fae496530@google.com>

On Fri, May 22, 2026, Ackerley Tng wrote:
> From: Sean Christopherson <seanjc@google.com>
> 
> Introduce the basic infrastructure to allow per-VM memory attribute
> tracking to be disabled. This will be built-upon in a later patch, where a
> module param can disable per-VM memory attribute tracking.
> 
> Split the Kconfig option into a base KVM_MEMORY_ATTRIBUTES and the
> existing KVM_VM_MEMORY_ATTRIBUTES. The base option provides the core
> plumbing, while the latter enables the full per-VM tracking via an xarray
> and the associated ioctls.
> 
> kvm_get_memory_attributes() now performs a static call that either looks up
> kvm->mem_attr_array with CONFIG_KVM_VM_MEMORY_ATTRIBUTES is enabled, or
> just returns 0 otherwise. The static call can be patched depending on
> whether per-VM tracking is enabled by the CONFIG.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Reviewed-by: Fuad Tabba <tabba@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> ---

...

> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index abb9cfa3eb04d..ee26f1d9b5fda 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -101,6 +101,17 @@ EXPORT_SYMBOL_FOR_KVM_INTERNAL(halt_poll_ns_shrink);
>  static bool __ro_after_init allow_unsafe_mappings;
>  module_param(allow_unsafe_mappings, bool, 0444);
>  
> +#ifdef CONFIG_KVM_MEMORY_ATTRIBUTES
> +#ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
> +static bool vm_memory_attributes = true;
> +#else
> +#define vm_memory_attributes false
> +#endif
> +DEFINE_STATIC_CALL_RET0(__kvm_get_memory_attributes, kvm_get_memory_attributes_t);
> +EXPORT_SYMBOL_FOR_KVM_INTERNAL(STATIC_CALL_KEY(__kvm_get_memory_attributes));
> +EXPORT_SYMBOL_FOR_KVM_INTERNAL(STATIC_CALL_TRAMP(__kvm_get_memory_attributes));
> +#endif

Fudge.  This morning's PUCK discussion about VBS made me realize that we really
don't want to kill off _all_ per-VM attributes like this, we really just want to
kill off PRIVATE.  And even if RWX protections never arrive, conceptually shoving
all attributes into guest_memfd doesn't make any sense, because it really is only
the private vs. shared state that is tied to the physical memory, things like RWX
protections aren't so tightly couple to the data.

It'll require a bit of minor surgery to these patches, but the silver lining is
that I think the end code will be slightly easier to follow.

I'll sync with you off-list to splice in the changes to your current series (I
have them sketched out).

^ permalink raw reply

* Re: [PATCH v7 06/42] KVM: guest_memfd: Update kvm_gmem_populate() to use gmem attributes
From: Sean Christopherson @ 2026-06-10 22:23 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	ira.weiny, jmattson, jthoughton, michael.roth, oupton,
	pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
	steven.price, tabba, willy, wyihan, yan.y.zhao, forkloop,
	pratyush, suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
	Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
	Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng, Shakeel Butt,
	Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka, kvm,
	linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm, linux-coco
In-Reply-To: <20260522-gmem-inplace-conversion-v7-6-2f0fae496530@google.com>

On Fri, May 22, 2026, Ackerley Tng wrote:
> Update the guest_memfd populate() flow to pull memory attributes from the
> gmem instance instead of the VM when KVM is not configured to track
> shared/private status in the VM.
> 
> Rename the per-VM API to make it clear that it retrieves per-VM
> attributes, i.e. is not suitable for use outside of flows that are
> specific to generic per-VM attributes.
> 
> Co-developed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Reviewed-by: Fuad Tabba <tabba@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>

We should squash this in with the previous patch, i.e. wire up PRIVATE to gmem
in a single patch (sans the ioctl support).  I had a hell of time figure out how
the range-based lookup was supposed to work when revisiting the "wire up" patch,
until I realized populate() was handled in the next patch.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox