Linux Confidential Computing Development

Linux Confidential Computing Development
 help / color / mirror / Atom feed

* Re: [PATCH v4 04/13] dma: swiotlb: track pool encryption state and honor DMA_ATTR_CC_SHARED
From: Jason Gunthorpe @ 2026-05-14 12:35 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: Aneesh Kumar K.V (Arm), iommu, linux-arm-kernel, linux-kernel,
	linux-coco, Robin Murphy, Marek Szyprowski, Will Deacon,
	Marc Zyngier, Steven Price, Suzuki K Poulose, Catalin Marinas,
	Jiri Pirko, Petr Tesarik, Alexey Kardashevskiy, Dan Williams,
	Xu Yilun, linuxppc-dev, linux-s390, Madhavan Srinivasan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy (CS GROUP),
	Alexander Gordeev, Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, x86
In-Reply-To: <agW2lzJI-20DyJVe@google.com>

> > How will pKVM signal what kind of memory the DMA needs then?
> > 
> > Does it use set_memory_decrypted()? How can it use
> > set_memory_decrypted() without offering CC_ATTR_MEM_ENCRYPT ?
> 
> pKVM (hypervisor) doesn’t signal anything.
> The VMM when running protected guests will use restricted dma-pools
> for emulated vritio devices in the guest, which gets decrypted by
> the guest kernel and hence shared with the host kernel, and then
> traffic is bounced via the pool.

That really does sound like CC and set_memory_decrypted() to me..

> It’s also worth noting that bouncing here isn't just about visibility.
> Because memory sharing operates at page granularity, bouncing sub-page
> allocations through the restricted pool prevents adjacent, sensitive
> guest data from being exposed to the untrusted host.

That's a somewhat different problem, we have the dev->trusted stuff
that is supposed to deal with this kind of security. We need it for
IOMMU based systems too, eg hot plug thunderbolt should have it.

Then CC issue is more that the DMA API can't decrypt random passed in
memory because doing so often requires changing the PTEs pointing at
the page so it would break everything if done transparently.

> > > I believe that the pool should have a way to control it’s property
> > > (encrypted or decrypted) and that takes priority over whatever
> > > attributes comes from allocation.
> > 
> > We should get here because dma_capable() fails, and then swiotlb needs
> > to return something that makes dma_capable() succeed. Yes, it should
> > return details about the thing it decided, but it shouldn't have been
> > pre-created with some idea how to make dma_capable() work.
> 
> That sounds neat, but at the end we have force_dma_unencrypted() in
> dma_capable() which is just hardcoded to true/false by the platform.

For now, the next step is it becomes per-device and dynamic during the
device lifecycle.

> How is that different from having the state static by the pool?

statically attached pools to the device are not so flexible when
devices have dynamically changing capabilities..

> > If dma_capable() can fail, then swiotlb should know exactly what to do
> > to fix it.
> 
> dma_capable() returns a bool, I don’t think it can know what exactly
> went wrong (based on address, size, attrs, dev...)

Yes, but I think the design is swiotlb is supposed to re-inspect what
is going on against the limits dma_capable checks and then select the
correct remedy..

> While we can debate the aesthetics of the setup , this is
> the exisitng behaviour for Linux, which existed for years
> and pKVM relies on and is used extensively.
> And, this patch alters that long-standing logic and introduces
> a functional regression.

Yeah, Aneesh needs to do something here, I'm pointing out it is
entirely seperate thing from the CC path we are working on which is
decoupling CC from reylying on force swiotlb.

> We can address this by either adjusting this patch or by changing
> pKVM guests to be more aligned with other CCA guests which is
> something I have been wondering about if it would help reduce
> bouncing.

Every time I look at pkvm I think it is just ARM CCA with a different
design and no access to the unique HW features..

> > If we can make that work then maybe the flows are designed correctly.
> 
> Mmm, I am not sure I understand this one, shouldn’t the device also be
> notified about the switch in memory state, if it expects to read/write
> decrypted memory, how would that work if the kernel changes it to an
> encrypted one?

Nothing on the device changes. In a CC world we put the device in a
T=0 or T=1 state before the driver loads and the expectation from the
DMA API is that the device will only use that T=x DMA type during
operation.

A T=1 state device can access all of memory, private or shared. Any
information the platform may need is encoded in the dma_addr_t or in
the S1 IOPTEs.

So we never need to tell the device driver what kind of memory the DMA
is targetting, and we NEVER expect a device in T=1 mode to have to
issue a T=0 DMA to use the DMA API.

In a pkvm world it should be the same, the S2 table for the SMMU will
control what the device can access, and if the SMMU points to a
"private" or "shared" page is not something the device needs to know
or care about.

Jason

^ permalink raw reply

* Re: [PATCH v5 0/3] Switch Arm CCA to use an auxiliary device instead of a platform device
From: Greg KH @ 2026-05-14 12:45 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-coco, linux-arm-kernel, linux-kernel, Catalin Marinas,
	Jeremy Linton, Jonathan Cameron, Lorenzo Pieralisi, Mark Rutland,
	Sudeep Holla, Will Deacon, Steven Price, Suzuki K Poulose
In-Reply-To: <yq5ase7u5kmz.fsf@kernel.org>

On Thu, May 14, 2026 at 04:21:48PM +0530, Aneesh Kumar K.V wrote:
> Greg KH <gregkh@linuxfoundation.org> writes:
> 
> > On Thu, May 14, 2026 at 03:10:27PM +0530, Aneesh Kumar K.V (Arm) wrote:
> >> As discussed here:
> >> https://lore.kernel.org/all/20250728135216.48084-12-aneesh.kumar@kernel.org
> >> 
> >> The general feedback was that a platform device should not be used when
> >> there is no underlying platform resource to represent. The existing CCA
> >> support uses a platform device solely to anchor the TSM interface in the
> >> device hierarchy, which is not an appropriate use of a platform device.
> >> Use an auxiliary device instead to track CCA support.
> >
> > Why an aux device?  If this has no platform resources, please use the
> > faux bus support instead, that is what it is there for.  aux devices are
> > used when you are sharing a real resource among different "child"
> > drivers, and need some way to coordinate that sharing.  If you have no
> > resources, there's nothing to share, so no need for the complexity that
> > aux gives you, just use faux instead.
> >
> 
> We did discuss between faux an auxiliary devices early here
> https://lore.kernel.org/all/20251010135922.GC3833649@ziepe.ca
> 
> To summarize auxiliary device was choosen so that we can do module
> autoloading.

That's not a valid reason to use the aux driver, sorry.  If you have
hardware that triggers an auto-module-load, then this is really a
hardware driver.  If it is a "virtual" driver like this, then you need
to explicitly load it on your own.  Don't abuse apis for reasons that
they are not designed for.

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH v4 04/13] dma: swiotlb: track pool encryption state and honor DMA_ATTR_CC_SHARED
From: Aneesh Kumar K.V @ 2026-05-14 12:48 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: iommu, linux-arm-kernel, linux-kernel, linux-coco, Robin Murphy,
	Marek Szyprowski, Will Deacon, Marc Zyngier, Steven Price,
	Suzuki K Poulose, Catalin Marinas, Jiri Pirko, Jason Gunthorpe,
	Petr Tesarik, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
	linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, x86
In-Reply-To: <agW5rhE9n2gDQ0w5@google.com>

Mostafa Saleh <smostafa@google.com> writes:

> On Thu, May 14, 2026 at 11:24:42AM +0530, Aneesh Kumar K.V wrote:
>> Mostafa Saleh <smostafa@google.com> writes:
>> 
>> > On Tue, May 12, 2026 at 02:33:59PM +0530, Aneesh Kumar K.V (Arm) wrote:
>> >> Teach swiotlb to distinguish between encrypted and decrypted bounce
>> >> buffer pools, and make allocation and mapping paths select a pool whose
>> >> state matches the requested DMA attributes.
>> >> 
>> >> Add a decrypted flag to io_tlb_mem, initialize it for the default and
>> >> restricted pools, and propagate DMA_ATTR_CC_SHARED into swiotlb pool
>> >> allocation. Reject swiotlb alloc/map requests when the selected pool does
>> >> not match the required encrypted/decrypted state.
>> >> 
>> >> Also return DMA addresses with the matching phys_to_dma_{encrypted,
>> >> unencrypted} helper so the DMA address encoding stays consistent with the
>> >> chosen pool.
>> >> 
>> >> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
>> >> ---
>> >>  include/linux/dma-direct.h |  10 ++++
>> >>  include/linux/swiotlb.h    |   8 ++-
>> >>  kernel/dma/direct.c        |  14 +++--
>> >>  kernel/dma/swiotlb.c       | 108 +++++++++++++++++++++++++++----------
>> >>  4 files changed, 107 insertions(+), 33 deletions(-)
>> >> 
>> >> diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
>> >> index c249912456f9..94fad4e7c11e 100644
>> >> --- a/include/linux/dma-direct.h
>> >> +++ b/include/linux/dma-direct.h
>> >> @@ -77,6 +77,10 @@ static inline dma_addr_t dma_range_map_max(const struct bus_dma_region *map)
>> >>  #ifndef phys_to_dma_unencrypted
>> >>  #define phys_to_dma_unencrypted		phys_to_dma
>> >>  #endif
>> >> +
>> >> +#ifndef phys_to_dma_encrypted
>> >> +#define phys_to_dma_encrypted		phys_to_dma
>> >> +#endif
>> >>  #else
>> >>  static inline dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
>> >>  {
>> >> @@ -90,6 +94,12 @@ static inline dma_addr_t phys_to_dma_unencrypted(struct device *dev,
>> >>  {
>> >>  	return dma_addr_unencrypted(__phys_to_dma(dev, paddr));
>> >>  }
>> >> +
>> >> +static inline dma_addr_t phys_to_dma_encrypted(struct device *dev,
>> >> +		phys_addr_t paddr)
>> >> +{
>> >> +	return dma_addr_encrypted(__phys_to_dma(dev, paddr));
>> >> +}
>> >>  /*
>> >>   * If memory encryption is supported, phys_to_dma will set the memory encryption
>> >>   * bit in the DMA address, and dma_to_phys will clear it.
>> >> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
>> >> index 3dae0f592063..b3fa3c6e0169 100644
>> >> --- a/include/linux/swiotlb.h
>> >> +++ b/include/linux/swiotlb.h
>> >> @@ -81,6 +81,7 @@ struct io_tlb_pool {
>> >>  	struct list_head node;
>> >>  	struct rcu_head rcu;
>> >>  	bool transient;
>> >> +	bool unencrypted;
>> >>  #endif
>> >>  };
>> >>  
>> >> @@ -111,6 +112,7 @@ struct io_tlb_mem {
>> >>  	struct dentry *debugfs;
>> >>  	bool force_bounce;
>> >>  	bool for_alloc;
>> >> +	bool unencrypted;
>> >>  #ifdef CONFIG_SWIOTLB_DYNAMIC
>> >>  	bool can_grow;
>> >>  	u64 phys_limit;
>> >> @@ -282,7 +284,8 @@ static inline void swiotlb_sync_single_for_cpu(struct device *dev,
>> >>  extern void swiotlb_print_info(void);
>> >>  
>> >>  #ifdef CONFIG_DMA_RESTRICTED_POOL
>> >> -struct page *swiotlb_alloc(struct device *dev, size_t size);
>> >> +struct page *swiotlb_alloc(struct device *dev, size_t size,
>> >> +		unsigned long attrs);
>> >>  bool swiotlb_free(struct device *dev, struct page *page, size_t size);
>> >>  
>> >>  static inline bool is_swiotlb_for_alloc(struct device *dev)
>> >> @@ -290,7 +293,8 @@ static inline bool is_swiotlb_for_alloc(struct device *dev)
>> >>  	return dev->dma_io_tlb_mem->for_alloc;
>> >>  }
>> >>  #else
>> >> -static inline struct page *swiotlb_alloc(struct device *dev, size_t size)
>> >> +static inline struct page *swiotlb_alloc(struct device *dev, size_t size,
>> >> +		unsigned long attrs)
>> >>  {
>> >>  	return NULL;
>> >>  }
>> >> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
>> >> index dc2907439b3d..97ae4fa10521 100644
>> >> --- a/kernel/dma/direct.c
>> >> +++ b/kernel/dma/direct.c
>> >> @@ -104,9 +104,10 @@ static void __dma_direct_free_pages(struct device *dev, struct page *page,
>> >>  	dma_free_contiguous(dev, page, size);
>> >>  }
>> >>  
>> >> -static struct page *dma_direct_alloc_swiotlb(struct device *dev, size_t size)
>> >> +static struct page *dma_direct_alloc_swiotlb(struct device *dev, size_t size,
>> >> +		unsigned long attrs)
>> >>  {
>> >> -	struct page *page = swiotlb_alloc(dev, size);
>> >> +	struct page *page = swiotlb_alloc(dev, size, attrs);
>> >>  
>> >>  	if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
>> >>  		swiotlb_free(dev, page, size);
>> >> @@ -266,8 +267,12 @@ void *dma_direct_alloc(struct device *dev, size_t size,
>> >>  						  gfp, attrs);
>> >>  
>> >>  	if (is_swiotlb_for_alloc(dev)) {
>> >> -		page = dma_direct_alloc_swiotlb(dev, size);
>> >> +		page = dma_direct_alloc_swiotlb(dev, size, attrs);
>> >>  		if (page) {
>> >> +			/*
>> >> +			 * swiotlb allocations comes from pool already marked
>> >> +			 * decrypted
>> >> +			 */
>> >>  			mark_mem_decrypt = false;
>> >>  			goto setup_page;
>> >>  		}
>> >> @@ -374,6 +379,7 @@ void dma_direct_free(struct device *dev, size_t size,
>> >>  		return;
>> >>  
>> >>  	if (swiotlb_find_pool(dev, dma_to_phys(dev, dma_addr)))
>> >> +		/* Swiotlb doesn't need a page attribute update on free */
>> >>  		mark_mem_encrypted = false;
>> >>  
>> >>  	if (is_vmalloc_addr(cpu_addr)) {
>> >> @@ -403,7 +409,7 @@ struct page *dma_direct_alloc_pages(struct device *dev, size_t size,
>> >>  						  gfp, attrs);
>> >>  
>> >>  	if (is_swiotlb_for_alloc(dev)) {
>> >> -		page = dma_direct_alloc_swiotlb(dev, size);
>> >> +		page = dma_direct_alloc_swiotlb(dev, size, attrs);
>> >>  		if (!page)
>> >>  			return NULL;
>> >>  
>> >> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
>> >> index ab4eccbaa076..065663be282c 100644
>> >> --- a/kernel/dma/swiotlb.c
>> >> +++ b/kernel/dma/swiotlb.c
>> >> @@ -259,10 +259,21 @@ void __init swiotlb_update_mem_attributes(void)
>> >>  	struct io_tlb_pool *mem = &io_tlb_default_mem.defpool;
>> >>  	unsigned long bytes;
>> >>  
>> >> +	/*
>> >> +	 * if platform support memory encryption, swiotlb buffers are
>> >> +	 * decrypted by default.
>> >> +	 */
>> >> +	if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
>> >> +		io_tlb_default_mem.unencrypted = true;
>> >> +	else
>> >> +		io_tlb_default_mem.unencrypted = false;
>> >> +
>> >>  	if (!mem->nslabs || mem->late_alloc)
>> >>  		return;
>> >>  	bytes = PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT);
>> >> -	set_memory_decrypted((unsigned long)mem->vaddr, bytes >> PAGE_SHIFT);
>> >> +
>> >> +	if (io_tlb_default_mem.unencrypted)
>> >> +		set_memory_decrypted((unsigned long)mem->vaddr, bytes >> PAGE_SHIFT);
>> >>  }
>> >>  
>> >>  static void swiotlb_init_io_tlb_pool(struct io_tlb_pool *mem, phys_addr_t start,
>> >> @@ -505,8 +516,10 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
>> >>  	if (!mem->slots)
>> >>  		goto error_slots;
>> >>  
>> >> -	set_memory_decrypted((unsigned long)vstart,
>> >> -			     (nslabs << IO_TLB_SHIFT) >> PAGE_SHIFT);
>> >> +	if (io_tlb_default_mem.unencrypted)
>> >> +		set_memory_decrypted((unsigned long)vstart,
>> >> +				     (nslabs << IO_TLB_SHIFT) >> PAGE_SHIFT);
>> >> +
>> >>  	swiotlb_init_io_tlb_pool(mem, virt_to_phys(vstart), nslabs, true,
>> >>  				 nareas);
>> >>  	add_mem_pool(&io_tlb_default_mem, mem);
>> >> @@ -539,7 +552,9 @@ void __init swiotlb_exit(void)
>> >>  	tbl_size = PAGE_ALIGN(mem->end - mem->start);
>> >>  	slots_size = PAGE_ALIGN(array_size(sizeof(*mem->slots), mem->nslabs));
>> >>  
>> >> -	set_memory_encrypted(tbl_vaddr, tbl_size >> PAGE_SHIFT);
>> >> +	if (io_tlb_default_mem.unencrypted)
>> >> +		set_memory_encrypted(tbl_vaddr, tbl_size >> PAGE_SHIFT);
>> >> +
>> >>  	if (mem->late_alloc) {
>> >>  		area_order = get_order(array_size(sizeof(*mem->areas),
>> >>  			mem->nareas));
>> >> @@ -563,6 +578,7 @@ void __init swiotlb_exit(void)
>> >>   * @gfp:	GFP flags for the allocation.
>> >>   * @bytes:	Size of the buffer.
>> >>   * @phys_limit:	Maximum allowed physical address of the buffer.
>> >> + * @unencrypted: true to allocate unencrypted memory, false for encrypted memory
>> >>   *
>> >>   * Allocate pages from the buddy allocator. If successful, make the allocated
>> >>   * pages decrypted that they can be used for DMA.
>> >> @@ -570,7 +586,8 @@ void __init swiotlb_exit(void)
>> >>   * Return: Decrypted pages, %NULL on allocation failure, or ERR_PTR(-EAGAIN)
>> >>   * if the allocated physical address was above @phys_limit.
>> >>   */
>> >> -static struct page *alloc_dma_pages(gfp_t gfp, size_t bytes, u64 phys_limit)
>> >> +static struct page *alloc_dma_pages(gfp_t gfp, size_t bytes,
>> >> +		u64 phys_limit, bool unencrypted)
>> >>  {
>> >>  	unsigned int order = get_order(bytes);
>> >>  	struct page *page;
>> >> @@ -588,13 +605,13 @@ static struct page *alloc_dma_pages(gfp_t gfp, size_t bytes, u64 phys_limit)
>> >>  	}
>> >>  
>> >>  	vaddr = phys_to_virt(paddr);
>> >> -	if (set_memory_decrypted((unsigned long)vaddr, PFN_UP(bytes)))
>> >> +	if (unencrypted && set_memory_decrypted((unsigned long)vaddr, PFN_UP(bytes)))
>> >>  		goto error;
>> >>  	return page;
>> >>  
>> >>  error:
>> >>  	/* Intentional leak if pages cannot be encrypted again. */
>> >> -	if (!set_memory_encrypted((unsigned long)vaddr, PFN_UP(bytes)))
>> >> +	if (unencrypted && !set_memory_encrypted((unsigned long)vaddr, PFN_UP(bytes)))
>> >>  		__free_pages(page, order);
>> >>  	return NULL;
>> >>  }
>> >> @@ -604,30 +621,26 @@ static struct page *alloc_dma_pages(gfp_t gfp, size_t bytes, u64 phys_limit)
>> >>   * @dev:	Device for which a memory pool is allocated.
>> >>   * @bytes:	Size of the buffer.
>> >>   * @phys_limit:	Maximum allowed physical address of the buffer.
>> >> + * @attrs:	DMA attributes for the allocation.
>> >>   * @gfp:	GFP flags for the allocation.
>> >>   *
>> >>   * Return: Allocated pages, or %NULL on allocation failure.
>> >>   */
>> >>  static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
>> >> -		u64 phys_limit, gfp_t gfp)
>> >> +		u64 phys_limit, unsigned long attrs, gfp_t gfp)
>> >>  {
>> >>  	struct page *page;
>> >> -	unsigned long attrs = 0;
>> >>  
>> >>  	/*
>> >>  	 * Allocate from the atomic pools if memory is encrypted and
>> >>  	 * the allocation is atomic, because decrypting may block.
>> >>  	 */
>> >> -	if (!gfpflags_allow_blocking(gfp) && dev && force_dma_unencrypted(dev)) {
>> >> +	if (!gfpflags_allow_blocking(gfp) && (attrs & DMA_ATTR_CC_SHARED)) {
>> >>  		void *vaddr;
>> >>  
>> >>  		if (!IS_ENABLED(CONFIG_DMA_COHERENT_POOL))
>> >>  			return NULL;
>> >>  
>> >> -		/* swiotlb considered decrypted by default */
>> >> -		if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
>> >> -			attrs = DMA_ATTR_CC_SHARED;
>> >> -
>> >>  		return dma_alloc_from_pool(dev, bytes, &vaddr, gfp,
>> >>  					   attrs, dma_coherent_ok);
>> >>  	}
>> >> @@ -638,7 +651,8 @@ static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
>> >>  	else if (phys_limit <= DMA_BIT_MASK(32))
>> >>  		gfp |= __GFP_DMA32;
>> >>  
>> >> -	while (IS_ERR(page = alloc_dma_pages(gfp, bytes, phys_limit))) {
>> >> +	while (IS_ERR(page = alloc_dma_pages(gfp, bytes, phys_limit,
>> >> +					     !!(attrs & DMA_ATTR_CC_SHARED)))) {
>> >>  		if (IS_ENABLED(CONFIG_ZONE_DMA32) &&
>> >>  		    phys_limit < DMA_BIT_MASK(64) &&
>> >>  		    !(gfp & (__GFP_DMA32 | __GFP_DMA)))
>> >> @@ -657,15 +671,18 @@ static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
>> >>   * swiotlb_free_tlb() - free a dynamically allocated IO TLB buffer
>> >>   * @vaddr:	Virtual address of the buffer.
>> >>   * @bytes:	Size of the buffer.
>> >> + * @unencrypted: true if @vaddr was allocated decrypted and must be
>> >> + *	re-encrypted before being freed
>> >>   */
>> >> -static void swiotlb_free_tlb(void *vaddr, size_t bytes)
>> >> +static void swiotlb_free_tlb(void *vaddr, size_t bytes, bool unencrypted)
>> >>  {
>> >>  	if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
>> >>  	    dma_free_from_pool(NULL, vaddr, bytes))
>> >>  		return;
>> >>  
>> >>  	/* Intentional leak if pages cannot be encrypted again. */
>> >> -	if (!set_memory_encrypted((unsigned long)vaddr, PFN_UP(bytes)))
>> >> +	if (!unencrypted ||
>> >> +	    !set_memory_encrypted((unsigned long)vaddr, PFN_UP(bytes)))
>> >>  		__free_pages(virt_to_page(vaddr), get_order(bytes));
>> >>  }
>> >>  
>> >> @@ -676,6 +693,7 @@ static void swiotlb_free_tlb(void *vaddr, size_t bytes)
>> >>   * @nslabs:	Desired (maximum) number of slabs.
>> >>   * @nareas:	Number of areas.
>> >>   * @phys_limit:	Maximum DMA buffer physical address.
>> >> + * @attrs:	DMA attributes for the allocation.
>> >>   * @gfp:	GFP flags for the allocations.
>> >>   *
>> >>   * Allocate and initialize a new IO TLB memory pool. The actual number of
>> >> @@ -686,7 +704,8 @@ static void swiotlb_free_tlb(void *vaddr, size_t bytes)
>> >>   */
>> >>  static struct io_tlb_pool *swiotlb_alloc_pool(struct device *dev,
>> >>  		unsigned long minslabs, unsigned long nslabs,
>> >> -		unsigned int nareas, u64 phys_limit, gfp_t gfp)
>> >> +		unsigned int nareas, u64 phys_limit, unsigned long attrs,
>> >> +		gfp_t gfp)
>> >>  {
>> >>  	struct io_tlb_pool *pool;
>> >>  	unsigned int slot_order;
>> >> @@ -704,9 +723,10 @@ static struct io_tlb_pool *swiotlb_alloc_pool(struct device *dev,
>> >>  	if (!pool)
>> >>  		goto error;
>> >>  	pool->areas = (void *)pool + sizeof(*pool);
>> >> +	pool->unencrypted = !!(attrs & DMA_ATTR_CC_SHARED);
>> >>  
>> >>  	tlb_size = nslabs << IO_TLB_SHIFT;
>> >> -	while (!(tlb = swiotlb_alloc_tlb(dev, tlb_size, phys_limit, gfp))) {
>> >> +	while (!(tlb = swiotlb_alloc_tlb(dev, tlb_size, phys_limit, attrs, gfp))) {
>> >>  		if (nslabs <= minslabs)
>> >>  			goto error_tlb;
>> >>  		nslabs = ALIGN(nslabs >> 1, IO_TLB_SEGSIZE);
>> >> @@ -724,7 +744,8 @@ static struct io_tlb_pool *swiotlb_alloc_pool(struct device *dev,
>> >>  	return pool;
>> >>  
>> >>  error_slots:
>> >> -	swiotlb_free_tlb(page_address(tlb), tlb_size);
>> >> +	swiotlb_free_tlb(page_address(tlb), tlb_size,
>> >> +			 !!(attrs & DMA_ATTR_CC_SHARED));
>> >>  error_tlb:
>> >>  	kfree(pool);
>> >>  error:
>> >> @@ -742,7 +763,9 @@ static void swiotlb_dyn_alloc(struct work_struct *work)
>> >>  	struct io_tlb_pool *pool;
>> >>  
>> >>  	pool = swiotlb_alloc_pool(NULL, IO_TLB_MIN_SLABS, default_nslabs,
>> >> -				  default_nareas, mem->phys_limit, GFP_KERNEL);
>> >> +				  default_nareas, mem->phys_limit,
>> >> +				  mem->unencrypted ? DMA_ATTR_CC_SHARED : 0,
>> >> +				  GFP_KERNEL);
>> >>  	if (!pool) {
>> >>  		pr_warn_ratelimited("Failed to allocate new pool");
>> >>  		return;
>> >> @@ -762,7 +785,7 @@ static void swiotlb_dyn_free(struct rcu_head *rcu)
>> >>  	size_t tlb_size = pool->end - pool->start;
>> >>  
>> >>  	free_pages((unsigned long)pool->slots, get_order(slots_size));
>> >> -	swiotlb_free_tlb(pool->vaddr, tlb_size);
>> >> +	swiotlb_free_tlb(pool->vaddr, tlb_size, pool->unencrypted);
>> >>  	kfree(pool);
>> >>  }
>> >>  
>> >> @@ -1232,6 +1255,7 @@ static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
>> >>  	nslabs = nr_slots(alloc_size);
>> >>  	phys_limit = min_not_zero(*dev->dma_mask, dev->bus_dma_limit);
>> >>  	pool = swiotlb_alloc_pool(dev, nslabs, nslabs, 1, phys_limit,
>> >> +				  mem->unencrypted ? DMA_ATTR_CC_SHARED : 0,
>> >>  				  GFP_NOWAIT);
>> >>  	if (!pool)
>> >>  		return -1;
>> >> @@ -1394,6 +1418,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
>> >>  		enum dma_data_direction dir, unsigned long attrs)
>> >>  {
>> >>  	struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
>> >> +	bool require_decrypted = false;
>> >>  	unsigned int offset;
>> >>  	struct io_tlb_pool *pool;
>> >>  	unsigned int i;
>> >> @@ -1411,6 +1436,16 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
>> >>  	if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
>> >>  		pr_warn_once("Memory encryption is active and system is using DMA bounce buffers\n");
>> >>  
>> >> +	/*
>> >> +	 * if we are trying to swiotlb map a decrypted paddr or the paddr is encrypted
>> >> +	 * but the device is forcing decryption, use decrypted io_tlb_mem
>> >> +	 */
>> >> +	if ((attrs & DMA_ATTR_CC_SHARED) || force_dma_unencrypted(dev))
>> >> +		require_decrypted = true;
>> >> +
>> >> +	if (require_decrypted != mem->unencrypted)
>> >> +		return (phys_addr_t)DMA_MAPPING_ERROR;
>> >> +
>> >>  	/*
>> >>  	 * The default swiotlb memory pool is allocated with PAGE_SIZE
>> >>  	 * alignment. If a mapping is requested with larger alignment,
>> >> @@ -1608,8 +1643,14 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t paddr, size_t size,
>> >>  	if (swiotlb_addr == (phys_addr_t)DMA_MAPPING_ERROR)
>> >>  		return DMA_MAPPING_ERROR;
>> >>  
>> >> -	/* Ensure that the address returned is DMA'ble */
>> >> -	dma_addr = phys_to_dma_unencrypted(dev, swiotlb_addr);
>> >> +	/*
>> >> +	 * Use the allocated io_tlb_mem encryption type to determine dma addr.
>> >> +	 */
>> >> +	if (dev->dma_io_tlb_mem->unencrypted)
>> >> +		dma_addr = phys_to_dma_unencrypted(dev, swiotlb_addr);
>> >> +	else
>> >> +		dma_addr = phys_to_dma_encrypted(dev, swiotlb_addr);
>> >> +
>> >>  	if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
>> >>  		__swiotlb_tbl_unmap_single(dev, swiotlb_addr, size, dir,
>> >>  			attrs | DMA_ATTR_SKIP_CPU_SYNC,
>> >> @@ -1773,7 +1814,8 @@ static inline void swiotlb_create_debugfs_files(struct io_tlb_mem *mem,
>> >>  
>> >>  #ifdef CONFIG_DMA_RESTRICTED_POOL
>> >>  
>> >> -struct page *swiotlb_alloc(struct device *dev, size_t size)
>> >> +struct page *swiotlb_alloc(struct device *dev, size_t size,
>> >> +		unsigned long attrs)
>> >>  {
>> >>  	struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
>> >>  	struct io_tlb_pool *pool;
>> >> @@ -1784,6 +1826,9 @@ struct page *swiotlb_alloc(struct device *dev, size_t size)
>> >>  	if (!mem)
>> >>  		return NULL;
>> >>  
>> >> +	if (mem->unencrypted != !!(attrs & DMA_ATTR_CC_SHARED))
>> >> +		return NULL;
>> >> +
>> >>  	align = (1 << (get_order(size) + PAGE_SHIFT)) - 1;
>> >>  	index = swiotlb_find_slots(dev, 0, size, align, &pool);
>> >>  	if (index == -1)
>> >> @@ -1853,9 +1898,18 @@ static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
>> >>  			kfree(mem);
>> >>  			return -ENOMEM;
>> >>  		}
>> >> +		/*
>> >> +		 * if platform supports memory encryption,
>> >> +		 * restricted mem pool is decrypted by default
>> >> +		 */
>> >> +		if (cc_platform_has(CC_ATTR_MEM_ENCRYPT)) {
>> >> +			mem->unencrypted = true;
>> >> +			set_memory_decrypted((unsigned long)phys_to_virt(rmem->base),
>> >> +					     rmem->size >> PAGE_SHIFT);
>> >> +		} else {
>> >> +			mem->unencrypted = false;
>> >> +		}
>> >
>> > This breaks pKVM as it doesn’t set CC_ATTR_MEM_ENCRYPT, so all virtio
>> > traffic now fails.
>> >
>> > Also, by design, some drivers are clueless about bouncing, so
>> > I believe that the pool should have a way to control it’s property
>> > (encrypted or decrypted) and that takes priority over whatever
>> > attributes comes from allocation.
>> > And that brings us to the same point whether it’s better to return
>> > the memory along with it’s state or we pass the requested state.
>> > I think for other cases it’s fine for the device/DMA-API to dictate
>> > the attrs, but not in restricted-dma case, the firmware just knows better.
>> >
>> 
>> Is it that the pKVM guest kernel does not have awareness of
>> encrypted/decrypted DMA allocations? Instead, the firmware attaches
>> hypervisor-shared pages to the device via restricted-dma-pool? The
>> kernel then has swiotlb->for_alloc = true, and hence all DMA allocations
>> go through the restricted-dma-pool?
>
> Yes.
>
>> 
>> Given that pKVM supports pkvm_set_memory_encrypted() and
>> pkvm_set_memory_decrypted(), can we consider adding CC_ATTR_MEM_ENCRYPT
>> support to pKVM? It would also be good to investigate whether we can set
>> force_dma_unencrypted(dev) to true where needed.
>
> I was looking in to that, but it didn't work because
> force_dma_unencrypted() is broken with restricted-dma due to the
> double decryption issue, that's when I sent my first series [1]
>
> May be we should land some basic fixes for that path so we can
> convert pKVM, then we do the full rework.
>
> I will revive my old work and see if I can send a RFC.
>
> [1] https://lore.kernel.org/all/20260305170335.963568-1-smostafa@google.com/
>

With this series, can you check whether the only change needed is
something like the following?

modified   kernel/dma/swiotlb.c
@@ -1905,7 +1905,8 @@ static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
 		 * if platform supports memory encryption,
 		 * restricted mem pool is decrypted by default
 		 */
-		if (cc_platform_has(CC_ATTR_MEM_ENCRYPT)) {
+		//if (cc_platform_has(CC_ATTR_MEM_ENCRYPT)) {
+		if (true) {
 			mem->unencrypted = true;
 			set_memory_decrypted((unsigned long)phys_to_virt(rmem->base),
 					     rmem->size >> PAGE_SHIFT);

>
>> 
>> I agree that this patch, as it stands, can break pKVM because we are now
>> missing the set_memory_decrypted() call required for pKVM to work.
>> 
>> We now mark the swiotlb io_tlb_mem as unencrypted/encrypted in the guest
>> using struct io_tlb_mem->unencrypted. I am not clear what we can use for
>> pKVM to conditionalize this so that it works for both protected and
>> unprotected guests.
>
> There is no problem with non-protected guests as they don't use memory
> encryption, my initial thought was that th encrpyted/decrypted is
> per-pool property which is decided by FW (device-tree).
>

What I meant was that we need a generic way to identify a pKVM guest, so
that we can use it in the conditional above.

-aneesh

^ permalink raw reply

* Re: [PATCH v5 1/3] firmware: smccc: coco: Manage arm-smccc platform device and CCA auxiliary drivers
From: Sudeep Holla @ 2026-05-14 12:50 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: Aneesh Kumar K.V (Arm), linux-coco, linux-arm-kernel,
	Sudeep Holla, linux-kernel, Catalin Marinas, Greg KH,
	Jeremy Linton, Jonathan Cameron, Lorenzo Pieralisi, Mark Rutland,
	Will Deacon, Steven Price
In-Reply-To: <0c88bcee-65b5-4328-87e6-e1c714c3d1ca@arm.com>

On Thu, May 14, 2026 at 12:04:13PM +0100, Suzuki K Poulose wrote:
> Hi Aneesh
> 
> On 14/05/2026 10:40, Aneesh Kumar K.V (Arm) wrote:
> > Make the SMCCC driver responsible for registering the arm-smccc platform
> > device and after confirming the relevant SMCCC function IDs, create
> > the arm_cca_guest auxiliary device.
> > 
> 
> There are a few changes squashed in to this patch.

I had similar thoughts but I didn't get into forming a reply, now I can keep
it small, thanks for that 😉.

> Please could we split the patch in the following order ?
> 
> 1. Add platform device for arm-smccc
> 2. Move TRNG to Auxilliary Device - (Even though it is a later patch, move
> it before the RSI changes)
> 3. Move RSI dev as Auxilliary
> 4. Add the firmware sysfs ABI.
> 

I agree with the logical split of functionality above.

> That way, first two could be merged while we figure out (3) and (4)
> 

I disagree with this though. We don't want to merge this unless (3) is
agreed. There is no point in doing that unless we agree the approach for
RSI as well.

-- 
Regards,
Sudeep

^ permalink raw reply

* Re: [PATCH v5 1/3] firmware: smccc: coco: Manage arm-smccc platform device and CCA auxiliary drivers
From: Greg KH @ 2026-05-14 12:55 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: Aneesh Kumar K.V (Arm), linux-coco, linux-arm-kernel,
	linux-kernel, Catalin Marinas, Jeremy Linton, Jonathan Cameron,
	Lorenzo Pieralisi, Mark Rutland, Sudeep Holla, Will Deacon,
	Steven Price
In-Reply-To: <0c88bcee-65b5-4328-87e6-e1c714c3d1ca@arm.com>

On Thu, May 14, 2026 at 12:04:13PM +0100, Suzuki K Poulose wrote:
> Hi Aneesh
> 
> On 14/05/2026 10:40, Aneesh Kumar K.V (Arm) wrote:
> > Make the SMCCC driver responsible for registering the arm-smccc platform
> > device and after confirming the relevant SMCCC function IDs, create
> > the arm_cca_guest auxiliary device.
> > 
> 
> There are a few changes squashed in to this patch. Please could we
> split the patch in the following order ?
> 
> 1. Add platform device for arm-smccc

Do not make any more "fake" platform devices please.

> 2. Move TRNG to Auxilliary Device - (Even though it is a later patch, move
> it before the RSI changes)

No, move it to the faux api please.

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH v5 1/3] firmware: smccc: coco: Manage arm-smccc platform device and CCA auxiliary drivers
From: Catalin Marinas @ 2026-05-14 13:19 UTC (permalink / raw)
  To: Greg KH
  Cc: Suzuki K Poulose, Aneesh Kumar K.V (Arm), linux-coco,
	linux-arm-kernel, linux-kernel, Jeremy Linton, Jonathan Cameron,
	Lorenzo Pieralisi, Mark Rutland, Sudeep Holla, Will Deacon,
	Steven Price
In-Reply-To: <2026051420-amusement-drove-73e6@gregkh>

On Thu, May 14, 2026 at 02:55:48PM +0200, Greg Kroah-Hartman wrote:
> On Thu, May 14, 2026 at 12:04:13PM +0100, Suzuki K Poulose wrote:
> > On 14/05/2026 10:40, Aneesh Kumar K.V (Arm) wrote:
> > > Make the SMCCC driver responsible for registering the arm-smccc platform
> > > device and after confirming the relevant SMCCC function IDs, create
> > > the arm_cca_guest auxiliary device.
> > > 
> > 
> > There are a few changes squashed in to this patch. Please could we
> > split the patch in the following order ?
> > 
> > 1. Add platform device for arm-smccc
> 
> Do not make any more "fake" platform devices please.
> 
> > 2. Move TRNG to Auxilliary Device - (Even though it is a later patch, move
> > it before the RSI changes)
> 
> No, move it to the faux api please.

So should we end up with:

  /sys/devices/faux/arm-smccc/
    smccc_trng/
    arm-rsi-dev/
      tsm/tsm0

  /sys/class/tsm/tsm0
    -> ../../devices/faux/arm-smccc/arm-rsi-dev/tsm/tsm0

  /sys/firmware/cca/
    realm_guest

-- 
Catalin

^ permalink raw reply

* Re: [PATCH v5 1/3] firmware: smccc: coco: Manage arm-smccc platform device and CCA auxiliary drivers
From: Catalin Marinas @ 2026-05-14 13:23 UTC (permalink / raw)
  To: Aneesh Kumar K.V (Arm)
  Cc: linux-coco, linux-arm-kernel, linux-kernel, Greg KH,
	Jeremy Linton, Jonathan Cameron, Lorenzo Pieralisi, Mark Rutland,
	Sudeep Holla, Will Deacon, Steven Price, Suzuki K Poulose
In-Reply-To: <20260514094030.42495-2-aneesh.kumar@kernel.org>

On Thu, May 14, 2026 at 03:10:28PM +0530, Aneesh Kumar K.V (Arm) wrote:
> diff --git a/drivers/firmware/smccc/rmm.h b/drivers/firmware/smccc/rmm.h
> new file mode 100644
> index 000000000000..a47a650d4f51
> --- /dev/null
> +++ b/drivers/firmware/smccc/rmm.h
> @@ -0,0 +1,17 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _SMCCC_RMM_H
> +#define _SMCCC_RMM_H
> +
> +#include <linux/platform_device.h>
> +
> +#ifdef CONFIG_ARM64
> +#include <asm/rsi_cmds.h>
> +void __init register_rsi_device(struct platform_device *pdev);
> +#else
> +
> +static void __init register_rsi_device(struct platform_device *pdev)

Nit: static inline here (I think Sashiko mentioned it on a previous
version.

> +{
> +
> +}

And unnecessary empty line between curly braces.

Just these notpicks for now. Suzuki and Sudeep already covered the
splitting of this patch and we need to agree on the sysfs hierarchy.

-- 
Catalin

^ permalink raw reply

* Re: [PATCH v5 1/3] firmware: smccc: coco: Manage arm-smccc platform device and CCA auxiliary drivers
From: Greg KH @ 2026-05-14 13:25 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Suzuki K Poulose, Aneesh Kumar K.V (Arm), linux-coco,
	linux-arm-kernel, linux-kernel, Jeremy Linton, Jonathan Cameron,
	Lorenzo Pieralisi, Mark Rutland, Sudeep Holla, Will Deacon,
	Steven Price
In-Reply-To: <agXL12bNh4gGyK1K@arm.com>

On Thu, May 14, 2026 at 02:19:19PM +0100, Catalin Marinas wrote:
> On Thu, May 14, 2026 at 02:55:48PM +0200, Greg Kroah-Hartman wrote:
> > On Thu, May 14, 2026 at 12:04:13PM +0100, Suzuki K Poulose wrote:
> > > On 14/05/2026 10:40, Aneesh Kumar K.V (Arm) wrote:
> > > > Make the SMCCC driver responsible for registering the arm-smccc platform
> > > > device and after confirming the relevant SMCCC function IDs, create
> > > > the arm_cca_guest auxiliary device.
> > > > 
> > > 
> > > There are a few changes squashed in to this patch. Please could we
> > > split the patch in the following order ?
> > > 
> > > 1. Add platform device for arm-smccc
> > 
> > Do not make any more "fake" platform devices please.
> > 
> > > 2. Move TRNG to Auxilliary Device - (Even though it is a later patch, move
> > > it before the RSI changes)
> > 
> > No, move it to the faux api please.
> 
> So should we end up with:
> 
>   /sys/devices/faux/arm-smccc/
>     smccc_trng/
>     arm-rsi-dev/

What types are these child devices?  Also faux ones?

If so, great.

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH v5 1/3] firmware: smccc: coco: Manage arm-smccc platform device and CCA auxiliary drivers
From: Catalin Marinas @ 2026-05-14 13:46 UTC (permalink / raw)
  To: Greg KH
  Cc: Suzuki K Poulose, Aneesh Kumar K.V (Arm), linux-coco,
	linux-arm-kernel, linux-kernel, Jeremy Linton, Jonathan Cameron,
	Lorenzo Pieralisi, Mark Rutland, Sudeep Holla, Will Deacon,
	Steven Price
In-Reply-To: <2026051445-magician-coffee-962f@gregkh>

On Thu, May 14, 2026 at 03:25:34PM +0200, Greg Kroah-Hartman wrote:
> On Thu, May 14, 2026 at 02:19:19PM +0100, Catalin Marinas wrote:
> > On Thu, May 14, 2026 at 02:55:48PM +0200, Greg Kroah-Hartman wrote:
> > > On Thu, May 14, 2026 at 12:04:13PM +0100, Suzuki K Poulose wrote:
> > > > On 14/05/2026 10:40, Aneesh Kumar K.V (Arm) wrote:
> > > > > Make the SMCCC driver responsible for registering the arm-smccc platform
> > > > > device and after confirming the relevant SMCCC function IDs, create
> > > > > the arm_cca_guest auxiliary device.
> > > > > 
> > > > 
> > > > There are a few changes squashed in to this patch. Please could we
> > > > split the patch in the following order ?
> > > > 
> > > > 1. Add platform device for arm-smccc
> > > 
> > > Do not make any more "fake" platform devices please.
> > > 
> > > > 2. Move TRNG to Auxilliary Device - (Even though it is a later patch, move
> > > > it before the RSI changes)
> > > 
> > > No, move it to the faux api please.
> > 
> > So should we end up with:
> > 
> >   /sys/devices/faux/arm-smccc/
> >     smccc_trng/
> >     arm-rsi-dev/
> 
> What types are these child devices?  Also faux ones?

They'd also be faux devices with this structure (in practice they are
firmware interfaces that may be backed by some hardware like in the TRNG
case, though not directly accessible to Linux).

-- 
Catalin

^ permalink raw reply

* Re: [PATCH v4 04/13] dma: swiotlb: track pool encryption state and honor DMA_ATTR_CC_SHARED
From: Mostafa Saleh @ 2026-05-14 14:21 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: iommu, linux-arm-kernel, linux-kernel, linux-coco, Robin Murphy,
	Marek Szyprowski, Will Deacon, Marc Zyngier, Steven Price,
	Suzuki K Poulose, Catalin Marinas, Jiri Pirko, Jason Gunthorpe,
	Petr Tesarik, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
	linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, x86
In-Reply-To: <yq5apl2y5f96.fsf@kernel.org>

On Thu, May 14, 2026 at 06:18:05PM +0530, Aneesh Kumar K.V wrote:
> Mostafa Saleh <smostafa@google.com> writes:
> 
> > On Thu, May 14, 2026 at 11:24:42AM +0530, Aneesh Kumar K.V wrote:
> >> Mostafa Saleh <smostafa@google.com> writes:
> >> 
> >> > On Tue, May 12, 2026 at 02:33:59PM +0530, Aneesh Kumar K.V (Arm) wrote:
> >> >> Teach swiotlb to distinguish between encrypted and decrypted bounce
> >> >> buffer pools, and make allocation and mapping paths select a pool whose
> >> >> state matches the requested DMA attributes.
> >> >> 
> >> >> Add a decrypted flag to io_tlb_mem, initialize it for the default and
> >> >> restricted pools, and propagate DMA_ATTR_CC_SHARED into swiotlb pool
> >> >> allocation. Reject swiotlb alloc/map requests when the selected pool does
> >> >> not match the required encrypted/decrypted state.
> >> >> 
> >> >> Also return DMA addresses with the matching phys_to_dma_{encrypted,
> >> >> unencrypted} helper so the DMA address encoding stays consistent with the
> >> >> chosen pool.
> >> >> 
> >> >> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> >> >> ---
> >> >>  include/linux/dma-direct.h |  10 ++++
> >> >>  include/linux/swiotlb.h    |   8 ++-
> >> >>  kernel/dma/direct.c        |  14 +++--
> >> >>  kernel/dma/swiotlb.c       | 108 +++++++++++++++++++++++++++----------
> >> >>  4 files changed, 107 insertions(+), 33 deletions(-)
> >> >> 
> >> >> diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
> >> >> index c249912456f9..94fad4e7c11e 100644
> >> >> --- a/include/linux/dma-direct.h
> >> >> +++ b/include/linux/dma-direct.h
> >> >> @@ -77,6 +77,10 @@ static inline dma_addr_t dma_range_map_max(const struct bus_dma_region *map)
> >> >>  #ifndef phys_to_dma_unencrypted
> >> >>  #define phys_to_dma_unencrypted		phys_to_dma
> >> >>  #endif
> >> >> +
> >> >> +#ifndef phys_to_dma_encrypted
> >> >> +#define phys_to_dma_encrypted		phys_to_dma
> >> >> +#endif
> >> >>  #else
> >> >>  static inline dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
> >> >>  {
> >> >> @@ -90,6 +94,12 @@ static inline dma_addr_t phys_to_dma_unencrypted(struct device *dev,
> >> >>  {
> >> >>  	return dma_addr_unencrypted(__phys_to_dma(dev, paddr));
> >> >>  }
> >> >> +
> >> >> +static inline dma_addr_t phys_to_dma_encrypted(struct device *dev,
> >> >> +		phys_addr_t paddr)
> >> >> +{
> >> >> +	return dma_addr_encrypted(__phys_to_dma(dev, paddr));
> >> >> +}
> >> >>  /*
> >> >>   * If memory encryption is supported, phys_to_dma will set the memory encryption
> >> >>   * bit in the DMA address, and dma_to_phys will clear it.
> >> >> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> >> >> index 3dae0f592063..b3fa3c6e0169 100644
> >> >> --- a/include/linux/swiotlb.h
> >> >> +++ b/include/linux/swiotlb.h
> >> >> @@ -81,6 +81,7 @@ struct io_tlb_pool {
> >> >>  	struct list_head node;
> >> >>  	struct rcu_head rcu;
> >> >>  	bool transient;
> >> >> +	bool unencrypted;
> >> >>  #endif
> >> >>  };
> >> >>  
> >> >> @@ -111,6 +112,7 @@ struct io_tlb_mem {
> >> >>  	struct dentry *debugfs;
> >> >>  	bool force_bounce;
> >> >>  	bool for_alloc;
> >> >> +	bool unencrypted;
> >> >>  #ifdef CONFIG_SWIOTLB_DYNAMIC
> >> >>  	bool can_grow;
> >> >>  	u64 phys_limit;
> >> >> @@ -282,7 +284,8 @@ static inline void swiotlb_sync_single_for_cpu(struct device *dev,
> >> >>  extern void swiotlb_print_info(void);
> >> >>  
> >> >>  #ifdef CONFIG_DMA_RESTRICTED_POOL
> >> >> -struct page *swiotlb_alloc(struct device *dev, size_t size);
> >> >> +struct page *swiotlb_alloc(struct device *dev, size_t size,
> >> >> +		unsigned long attrs);
> >> >>  bool swiotlb_free(struct device *dev, struct page *page, size_t size);
> >> >>  
> >> >>  static inline bool is_swiotlb_for_alloc(struct device *dev)
> >> >> @@ -290,7 +293,8 @@ static inline bool is_swiotlb_for_alloc(struct device *dev)
> >> >>  	return dev->dma_io_tlb_mem->for_alloc;
> >> >>  }
> >> >>  #else
> >> >> -static inline struct page *swiotlb_alloc(struct device *dev, size_t size)
> >> >> +static inline struct page *swiotlb_alloc(struct device *dev, size_t size,
> >> >> +		unsigned long attrs)
> >> >>  {
> >> >>  	return NULL;
> >> >>  }
> >> >> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> >> >> index dc2907439b3d..97ae4fa10521 100644
> >> >> --- a/kernel/dma/direct.c
> >> >> +++ b/kernel/dma/direct.c
> >> >> @@ -104,9 +104,10 @@ static void __dma_direct_free_pages(struct device *dev, struct page *page,
> >> >>  	dma_free_contiguous(dev, page, size);
> >> >>  }
> >> >>  
> >> >> -static struct page *dma_direct_alloc_swiotlb(struct device *dev, size_t size)
> >> >> +static struct page *dma_direct_alloc_swiotlb(struct device *dev, size_t size,
> >> >> +		unsigned long attrs)
> >> >>  {
> >> >> -	struct page *page = swiotlb_alloc(dev, size);
> >> >> +	struct page *page = swiotlb_alloc(dev, size, attrs);
> >> >>  
> >> >>  	if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
> >> >>  		swiotlb_free(dev, page, size);
> >> >> @@ -266,8 +267,12 @@ void *dma_direct_alloc(struct device *dev, size_t size,
> >> >>  						  gfp, attrs);
> >> >>  
> >> >>  	if (is_swiotlb_for_alloc(dev)) {
> >> >> -		page = dma_direct_alloc_swiotlb(dev, size);
> >> >> +		page = dma_direct_alloc_swiotlb(dev, size, attrs);
> >> >>  		if (page) {
> >> >> +			/*
> >> >> +			 * swiotlb allocations comes from pool already marked
> >> >> +			 * decrypted
> >> >> +			 */
> >> >>  			mark_mem_decrypt = false;
> >> >>  			goto setup_page;
> >> >>  		}
> >> >> @@ -374,6 +379,7 @@ void dma_direct_free(struct device *dev, size_t size,
> >> >>  		return;
> >> >>  
> >> >>  	if (swiotlb_find_pool(dev, dma_to_phys(dev, dma_addr)))
> >> >> +		/* Swiotlb doesn't need a page attribute update on free */
> >> >>  		mark_mem_encrypted = false;
> >> >>  
> >> >>  	if (is_vmalloc_addr(cpu_addr)) {
> >> >> @@ -403,7 +409,7 @@ struct page *dma_direct_alloc_pages(struct device *dev, size_t size,
> >> >>  						  gfp, attrs);
> >> >>  
> >> >>  	if (is_swiotlb_for_alloc(dev)) {
> >> >> -		page = dma_direct_alloc_swiotlb(dev, size);
> >> >> +		page = dma_direct_alloc_swiotlb(dev, size, attrs);
> >> >>  		if (!page)
> >> >>  			return NULL;
> >> >>  
> >> >> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> >> >> index ab4eccbaa076..065663be282c 100644
> >> >> --- a/kernel/dma/swiotlb.c
> >> >> +++ b/kernel/dma/swiotlb.c
> >> >> @@ -259,10 +259,21 @@ void __init swiotlb_update_mem_attributes(void)
> >> >>  	struct io_tlb_pool *mem = &io_tlb_default_mem.defpool;
> >> >>  	unsigned long bytes;
> >> >>  
> >> >> +	/*
> >> >> +	 * if platform support memory encryption, swiotlb buffers are
> >> >> +	 * decrypted by default.
> >> >> +	 */
> >> >> +	if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
> >> >> +		io_tlb_default_mem.unencrypted = true;
> >> >> +	else
> >> >> +		io_tlb_default_mem.unencrypted = false;
> >> >> +
> >> >>  	if (!mem->nslabs || mem->late_alloc)
> >> >>  		return;
> >> >>  	bytes = PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT);
> >> >> -	set_memory_decrypted((unsigned long)mem->vaddr, bytes >> PAGE_SHIFT);
> >> >> +
> >> >> +	if (io_tlb_default_mem.unencrypted)
> >> >> +		set_memory_decrypted((unsigned long)mem->vaddr, bytes >> PAGE_SHIFT);
> >> >>  }
> >> >>  
> >> >>  static void swiotlb_init_io_tlb_pool(struct io_tlb_pool *mem, phys_addr_t start,
> >> >> @@ -505,8 +516,10 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
> >> >>  	if (!mem->slots)
> >> >>  		goto error_slots;
> >> >>  
> >> >> -	set_memory_decrypted((unsigned long)vstart,
> >> >> -			     (nslabs << IO_TLB_SHIFT) >> PAGE_SHIFT);
> >> >> +	if (io_tlb_default_mem.unencrypted)
> >> >> +		set_memory_decrypted((unsigned long)vstart,
> >> >> +				     (nslabs << IO_TLB_SHIFT) >> PAGE_SHIFT);
> >> >> +
> >> >>  	swiotlb_init_io_tlb_pool(mem, virt_to_phys(vstart), nslabs, true,
> >> >>  				 nareas);
> >> >>  	add_mem_pool(&io_tlb_default_mem, mem);
> >> >> @@ -539,7 +552,9 @@ void __init swiotlb_exit(void)
> >> >>  	tbl_size = PAGE_ALIGN(mem->end - mem->start);
> >> >>  	slots_size = PAGE_ALIGN(array_size(sizeof(*mem->slots), mem->nslabs));
> >> >>  
> >> >> -	set_memory_encrypted(tbl_vaddr, tbl_size >> PAGE_SHIFT);
> >> >> +	if (io_tlb_default_mem.unencrypted)
> >> >> +		set_memory_encrypted(tbl_vaddr, tbl_size >> PAGE_SHIFT);
> >> >> +
> >> >>  	if (mem->late_alloc) {
> >> >>  		area_order = get_order(array_size(sizeof(*mem->areas),
> >> >>  			mem->nareas));
> >> >> @@ -563,6 +578,7 @@ void __init swiotlb_exit(void)
> >> >>   * @gfp:	GFP flags for the allocation.
> >> >>   * @bytes:	Size of the buffer.
> >> >>   * @phys_limit:	Maximum allowed physical address of the buffer.
> >> >> + * @unencrypted: true to allocate unencrypted memory, false for encrypted memory
> >> >>   *
> >> >>   * Allocate pages from the buddy allocator. If successful, make the allocated
> >> >>   * pages decrypted that they can be used for DMA.
> >> >> @@ -570,7 +586,8 @@ void __init swiotlb_exit(void)
> >> >>   * Return: Decrypted pages, %NULL on allocation failure, or ERR_PTR(-EAGAIN)
> >> >>   * if the allocated physical address was above @phys_limit.
> >> >>   */
> >> >> -static struct page *alloc_dma_pages(gfp_t gfp, size_t bytes, u64 phys_limit)
> >> >> +static struct page *alloc_dma_pages(gfp_t gfp, size_t bytes,
> >> >> +		u64 phys_limit, bool unencrypted)
> >> >>  {
> >> >>  	unsigned int order = get_order(bytes);
> >> >>  	struct page *page;
> >> >> @@ -588,13 +605,13 @@ static struct page *alloc_dma_pages(gfp_t gfp, size_t bytes, u64 phys_limit)
> >> >>  	}
> >> >>  
> >> >>  	vaddr = phys_to_virt(paddr);
> >> >> -	if (set_memory_decrypted((unsigned long)vaddr, PFN_UP(bytes)))
> >> >> +	if (unencrypted && set_memory_decrypted((unsigned long)vaddr, PFN_UP(bytes)))
> >> >>  		goto error;
> >> >>  	return page;
> >> >>  
> >> >>  error:
> >> >>  	/* Intentional leak if pages cannot be encrypted again. */
> >> >> -	if (!set_memory_encrypted((unsigned long)vaddr, PFN_UP(bytes)))
> >> >> +	if (unencrypted && !set_memory_encrypted((unsigned long)vaddr, PFN_UP(bytes)))
> >> >>  		__free_pages(page, order);
> >> >>  	return NULL;
> >> >>  }
> >> >> @@ -604,30 +621,26 @@ static struct page *alloc_dma_pages(gfp_t gfp, size_t bytes, u64 phys_limit)
> >> >>   * @dev:	Device for which a memory pool is allocated.
> >> >>   * @bytes:	Size of the buffer.
> >> >>   * @phys_limit:	Maximum allowed physical address of the buffer.
> >> >> + * @attrs:	DMA attributes for the allocation.
> >> >>   * @gfp:	GFP flags for the allocation.
> >> >>   *
> >> >>   * Return: Allocated pages, or %NULL on allocation failure.
> >> >>   */
> >> >>  static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
> >> >> -		u64 phys_limit, gfp_t gfp)
> >> >> +		u64 phys_limit, unsigned long attrs, gfp_t gfp)
> >> >>  {
> >> >>  	struct page *page;
> >> >> -	unsigned long attrs = 0;
> >> >>  
> >> >>  	/*
> >> >>  	 * Allocate from the atomic pools if memory is encrypted and
> >> >>  	 * the allocation is atomic, because decrypting may block.
> >> >>  	 */
> >> >> -	if (!gfpflags_allow_blocking(gfp) && dev && force_dma_unencrypted(dev)) {
> >> >> +	if (!gfpflags_allow_blocking(gfp) && (attrs & DMA_ATTR_CC_SHARED)) {
> >> >>  		void *vaddr;
> >> >>  
> >> >>  		if (!IS_ENABLED(CONFIG_DMA_COHERENT_POOL))
> >> >>  			return NULL;
> >> >>  
> >> >> -		/* swiotlb considered decrypted by default */
> >> >> -		if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
> >> >> -			attrs = DMA_ATTR_CC_SHARED;
> >> >> -
> >> >>  		return dma_alloc_from_pool(dev, bytes, &vaddr, gfp,
> >> >>  					   attrs, dma_coherent_ok);
> >> >>  	}
> >> >> @@ -638,7 +651,8 @@ static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
> >> >>  	else if (phys_limit <= DMA_BIT_MASK(32))
> >> >>  		gfp |= __GFP_DMA32;
> >> >>  
> >> >> -	while (IS_ERR(page = alloc_dma_pages(gfp, bytes, phys_limit))) {
> >> >> +	while (IS_ERR(page = alloc_dma_pages(gfp, bytes, phys_limit,
> >> >> +					     !!(attrs & DMA_ATTR_CC_SHARED)))) {
> >> >>  		if (IS_ENABLED(CONFIG_ZONE_DMA32) &&
> >> >>  		    phys_limit < DMA_BIT_MASK(64) &&
> >> >>  		    !(gfp & (__GFP_DMA32 | __GFP_DMA)))
> >> >> @@ -657,15 +671,18 @@ static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
> >> >>   * swiotlb_free_tlb() - free a dynamically allocated IO TLB buffer
> >> >>   * @vaddr:	Virtual address of the buffer.
> >> >>   * @bytes:	Size of the buffer.
> >> >> + * @unencrypted: true if @vaddr was allocated decrypted and must be
> >> >> + *	re-encrypted before being freed
> >> >>   */
> >> >> -static void swiotlb_free_tlb(void *vaddr, size_t bytes)
> >> >> +static void swiotlb_free_tlb(void *vaddr, size_t bytes, bool unencrypted)
> >> >>  {
> >> >>  	if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
> >> >>  	    dma_free_from_pool(NULL, vaddr, bytes))
> >> >>  		return;
> >> >>  
> >> >>  	/* Intentional leak if pages cannot be encrypted again. */
> >> >> -	if (!set_memory_encrypted((unsigned long)vaddr, PFN_UP(bytes)))
> >> >> +	if (!unencrypted ||
> >> >> +	    !set_memory_encrypted((unsigned long)vaddr, PFN_UP(bytes)))
> >> >>  		__free_pages(virt_to_page(vaddr), get_order(bytes));
> >> >>  }
> >> >>  
> >> >> @@ -676,6 +693,7 @@ static void swiotlb_free_tlb(void *vaddr, size_t bytes)
> >> >>   * @nslabs:	Desired (maximum) number of slabs.
> >> >>   * @nareas:	Number of areas.
> >> >>   * @phys_limit:	Maximum DMA buffer physical address.
> >> >> + * @attrs:	DMA attributes for the allocation.
> >> >>   * @gfp:	GFP flags for the allocations.
> >> >>   *
> >> >>   * Allocate and initialize a new IO TLB memory pool. The actual number of
> >> >> @@ -686,7 +704,8 @@ static void swiotlb_free_tlb(void *vaddr, size_t bytes)
> >> >>   */
> >> >>  static struct io_tlb_pool *swiotlb_alloc_pool(struct device *dev,
> >> >>  		unsigned long minslabs, unsigned long nslabs,
> >> >> -		unsigned int nareas, u64 phys_limit, gfp_t gfp)
> >> >> +		unsigned int nareas, u64 phys_limit, unsigned long attrs,
> >> >> +		gfp_t gfp)
> >> >>  {
> >> >>  	struct io_tlb_pool *pool;
> >> >>  	unsigned int slot_order;
> >> >> @@ -704,9 +723,10 @@ static struct io_tlb_pool *swiotlb_alloc_pool(struct device *dev,
> >> >>  	if (!pool)
> >> >>  		goto error;
> >> >>  	pool->areas = (void *)pool + sizeof(*pool);
> >> >> +	pool->unencrypted = !!(attrs & DMA_ATTR_CC_SHARED);
> >> >>  
> >> >>  	tlb_size = nslabs << IO_TLB_SHIFT;
> >> >> -	while (!(tlb = swiotlb_alloc_tlb(dev, tlb_size, phys_limit, gfp))) {
> >> >> +	while (!(tlb = swiotlb_alloc_tlb(dev, tlb_size, phys_limit, attrs, gfp))) {
> >> >>  		if (nslabs <= minslabs)
> >> >>  			goto error_tlb;
> >> >>  		nslabs = ALIGN(nslabs >> 1, IO_TLB_SEGSIZE);
> >> >> @@ -724,7 +744,8 @@ static struct io_tlb_pool *swiotlb_alloc_pool(struct device *dev,
> >> >>  	return pool;
> >> >>  
> >> >>  error_slots:
> >> >> -	swiotlb_free_tlb(page_address(tlb), tlb_size);
> >> >> +	swiotlb_free_tlb(page_address(tlb), tlb_size,
> >> >> +			 !!(attrs & DMA_ATTR_CC_SHARED));
> >> >>  error_tlb:
> >> >>  	kfree(pool);
> >> >>  error:
> >> >> @@ -742,7 +763,9 @@ static void swiotlb_dyn_alloc(struct work_struct *work)
> >> >>  	struct io_tlb_pool *pool;
> >> >>  
> >> >>  	pool = swiotlb_alloc_pool(NULL, IO_TLB_MIN_SLABS, default_nslabs,
> >> >> -				  default_nareas, mem->phys_limit, GFP_KERNEL);
> >> >> +				  default_nareas, mem->phys_limit,
> >> >> +				  mem->unencrypted ? DMA_ATTR_CC_SHARED : 0,
> >> >> +				  GFP_KERNEL);
> >> >>  	if (!pool) {
> >> >>  		pr_warn_ratelimited("Failed to allocate new pool");
> >> >>  		return;
> >> >> @@ -762,7 +785,7 @@ static void swiotlb_dyn_free(struct rcu_head *rcu)
> >> >>  	size_t tlb_size = pool->end - pool->start;
> >> >>  
> >> >>  	free_pages((unsigned long)pool->slots, get_order(slots_size));
> >> >> -	swiotlb_free_tlb(pool->vaddr, tlb_size);
> >> >> +	swiotlb_free_tlb(pool->vaddr, tlb_size, pool->unencrypted);
> >> >>  	kfree(pool);
> >> >>  }
> >> >>  
> >> >> @@ -1232,6 +1255,7 @@ static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
> >> >>  	nslabs = nr_slots(alloc_size);
> >> >>  	phys_limit = min_not_zero(*dev->dma_mask, dev->bus_dma_limit);
> >> >>  	pool = swiotlb_alloc_pool(dev, nslabs, nslabs, 1, phys_limit,
> >> >> +				  mem->unencrypted ? DMA_ATTR_CC_SHARED : 0,
> >> >>  				  GFP_NOWAIT);
> >> >>  	if (!pool)
> >> >>  		return -1;
> >> >> @@ -1394,6 +1418,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
> >> >>  		enum dma_data_direction dir, unsigned long attrs)
> >> >>  {
> >> >>  	struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
> >> >> +	bool require_decrypted = false;
> >> >>  	unsigned int offset;
> >> >>  	struct io_tlb_pool *pool;
> >> >>  	unsigned int i;
> >> >> @@ -1411,6 +1436,16 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
> >> >>  	if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
> >> >>  		pr_warn_once("Memory encryption is active and system is using DMA bounce buffers\n");
> >> >>  
> >> >> +	/*
> >> >> +	 * if we are trying to swiotlb map a decrypted paddr or the paddr is encrypted
> >> >> +	 * but the device is forcing decryption, use decrypted io_tlb_mem
> >> >> +	 */
> >> >> +	if ((attrs & DMA_ATTR_CC_SHARED) || force_dma_unencrypted(dev))
> >> >> +		require_decrypted = true;
> >> >> +
> >> >> +	if (require_decrypted != mem->unencrypted)
> >> >> +		return (phys_addr_t)DMA_MAPPING_ERROR;
> >> >> +
> >> >>  	/*
> >> >>  	 * The default swiotlb memory pool is allocated with PAGE_SIZE
> >> >>  	 * alignment. If a mapping is requested with larger alignment,
> >> >> @@ -1608,8 +1643,14 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t paddr, size_t size,
> >> >>  	if (swiotlb_addr == (phys_addr_t)DMA_MAPPING_ERROR)
> >> >>  		return DMA_MAPPING_ERROR;
> >> >>  
> >> >> -	/* Ensure that the address returned is DMA'ble */
> >> >> -	dma_addr = phys_to_dma_unencrypted(dev, swiotlb_addr);
> >> >> +	/*
> >> >> +	 * Use the allocated io_tlb_mem encryption type to determine dma addr.
> >> >> +	 */
> >> >> +	if (dev->dma_io_tlb_mem->unencrypted)
> >> >> +		dma_addr = phys_to_dma_unencrypted(dev, swiotlb_addr);
> >> >> +	else
> >> >> +		dma_addr = phys_to_dma_encrypted(dev, swiotlb_addr);
> >> >> +
> >> >>  	if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
> >> >>  		__swiotlb_tbl_unmap_single(dev, swiotlb_addr, size, dir,
> >> >>  			attrs | DMA_ATTR_SKIP_CPU_SYNC,
> >> >> @@ -1773,7 +1814,8 @@ static inline void swiotlb_create_debugfs_files(struct io_tlb_mem *mem,
> >> >>  
> >> >>  #ifdef CONFIG_DMA_RESTRICTED_POOL
> >> >>  
> >> >> -struct page *swiotlb_alloc(struct device *dev, size_t size)
> >> >> +struct page *swiotlb_alloc(struct device *dev, size_t size,
> >> >> +		unsigned long attrs)
> >> >>  {
> >> >>  	struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
> >> >>  	struct io_tlb_pool *pool;
> >> >> @@ -1784,6 +1826,9 @@ struct page *swiotlb_alloc(struct device *dev, size_t size)
> >> >>  	if (!mem)
> >> >>  		return NULL;
> >> >>  
> >> >> +	if (mem->unencrypted != !!(attrs & DMA_ATTR_CC_SHARED))
> >> >> +		return NULL;
> >> >> +
> >> >>  	align = (1 << (get_order(size) + PAGE_SHIFT)) - 1;
> >> >>  	index = swiotlb_find_slots(dev, 0, size, align, &pool);
> >> >>  	if (index == -1)
> >> >> @@ -1853,9 +1898,18 @@ static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
> >> >>  			kfree(mem);
> >> >>  			return -ENOMEM;
> >> >>  		}
> >> >> +		/*
> >> >> +		 * if platform supports memory encryption,
> >> >> +		 * restricted mem pool is decrypted by default
> >> >> +		 */
> >> >> +		if (cc_platform_has(CC_ATTR_MEM_ENCRYPT)) {
> >> >> +			mem->unencrypted = true;
> >> >> +			set_memory_decrypted((unsigned long)phys_to_virt(rmem->base),
> >> >> +					     rmem->size >> PAGE_SHIFT);
> >> >> +		} else {
> >> >> +			mem->unencrypted = false;
> >> >> +		}
> >> >
> >> > This breaks pKVM as it doesn’t set CC_ATTR_MEM_ENCRYPT, so all virtio
> >> > traffic now fails.
> >> >
> >> > Also, by design, some drivers are clueless about bouncing, so
> >> > I believe that the pool should have a way to control it’s property
> >> > (encrypted or decrypted) and that takes priority over whatever
> >> > attributes comes from allocation.
> >> > And that brings us to the same point whether it’s better to return
> >> > the memory along with it’s state or we pass the requested state.
> >> > I think for other cases it’s fine for the device/DMA-API to dictate
> >> > the attrs, but not in restricted-dma case, the firmware just knows better.
> >> >
> >> 
> >> Is it that the pKVM guest kernel does not have awareness of
> >> encrypted/decrypted DMA allocations? Instead, the firmware attaches
> >> hypervisor-shared pages to the device via restricted-dma-pool? The
> >> kernel then has swiotlb->for_alloc = true, and hence all DMA allocations
> >> go through the restricted-dma-pool?
> >
> > Yes.
> >
> >> 
> >> Given that pKVM supports pkvm_set_memory_encrypted() and
> >> pkvm_set_memory_decrypted(), can we consider adding CC_ATTR_MEM_ENCRYPT
> >> support to pKVM? It would also be good to investigate whether we can set
> >> force_dma_unencrypted(dev) to true where needed.
> >
> > I was looking in to that, but it didn't work because
> > force_dma_unencrypted() is broken with restricted-dma due to the
> > double decryption issue, that's when I sent my first series [1]
> >
> > May be we should land some basic fixes for that path so we can
> > convert pKVM, then we do the full rework.
> >
> > I will revive my old work and see if I can send a RFC.
> >
> > [1] https://lore.kernel.org/all/20260305170335.963568-1-smostafa@google.com/
> >
> 
> With this series, can you check whether the only change needed is
> something like the following?
> 
> modified   kernel/dma/swiotlb.c
> @@ -1905,7 +1905,8 @@ static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
>  		 * if platform supports memory encryption,
>  		 * restricted mem pool is decrypted by default
>  		 */
> -		if (cc_platform_has(CC_ATTR_MEM_ENCRYPT)) {
> +		//if (cc_platform_has(CC_ATTR_MEM_ENCRYPT)) {
> +		if (true) {
>  			mem->unencrypted = true;
>  			set_memory_decrypted((unsigned long)phys_to_virt(rmem->base),
>  					     rmem->size >> PAGE_SHIFT);

Yes, that boots, but I will need to do more tests.

> 
> >
> >> 
> >> I agree that this patch, as it stands, can break pKVM because we are now
> >> missing the set_memory_decrypted() call required for pKVM to work.
> >> 
> >> We now mark the swiotlb io_tlb_mem as unencrypted/encrypted in the guest
> >> using struct io_tlb_mem->unencrypted. I am not clear what we can use for
> >> pKVM to conditionalize this so that it works for both protected and
> >> unprotected guests.
> >
> > There is no problem with non-protected guests as they don't use memory
> > encryption, my initial thought was that th encrpyted/decrypted is
> > per-pool property which is decided by FW (device-tree).
> >
> 
> What I meant was that we need a generic way to identify a pKVM guest, so
> that we can use it in the conditional above.

I have this patch, with that I can boot with your series unmodified,
but I will need to do more testing.

From d795b4c4ee2437587616b2b342e9996afe6d6680 Mon Sep 17 00:00:00 2001
From: Mostafa Saleh <smostafa@google.com>
Date: Thu, 14 May 2026 13:46:15 +0000
Subject: [PATCH] arm64/coco: Add pKVM as a CC platform

pKVM does support memory encryption, expose that to the rest of
the kernel through cc_platform_has()

At the moment, all devices inside the guest are emulated which
requires its memory to be shared back to the host (decrypted), so
set force_dma_unencrypted() to always return true.

Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
 arch/arm64/include/asm/hypervisor.h           |  6 ++++++
 arch/arm64/include/asm/mem_encrypt.h          |  3 ++-
 arch/arm64/kernel/rsi.c                       | 12 ------------
 arch/arm64/mm/init.c                          | 13 +++++++++++++
 drivers/virt/coco/pkvm-guest/arm-pkvm-guest.c |  5 +++++
 5 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/include/asm/hypervisor.h b/arch/arm64/include/asm/hypervisor.h
index a12fd897c877..1b0e15f290be 100644
--- a/arch/arm64/include/asm/hypervisor.h
+++ b/arch/arm64/include/asm/hypervisor.h
@@ -10,8 +10,14 @@ void kvm_arm_target_impl_cpu_init(void);

 #ifdef CONFIG_ARM_PKVM_GUEST
 void pkvm_init_hyp_services(void);
+bool is_protected_kvm_guest(void);
 #else
 static inline void pkvm_init_hyp_services(void) { };
+
+static inline bool is_protected_kvm_guest(void)
+{
+	return false;
+}
 #endif

 static inline void kvm_arch_init_hyp_services(void)
diff --git a/arch/arm64/include/asm/mem_encrypt.h b/arch/arm64/include/asm/mem_encrypt.h
index 314b2b52025f..636f45b4d8af 100644
--- a/arch/arm64/include/asm/mem_encrypt.h
+++ b/arch/arm64/include/asm/mem_encrypt.h
@@ -2,6 +2,7 @@
 #ifndef __ASM_MEM_ENCRYPT_H
 #define __ASM_MEM_ENCRYPT_H

+#include <asm/hypervisor.h>
 #include <asm/rsi.h>

 struct device;
@@ -20,7 +21,7 @@ int realm_register_memory_enc_ops(void);

 static inline bool force_dma_unencrypted(struct device *dev)
 {
-	return is_realm_world();
+	return is_realm_world() || is_protected_kvm_guest();
 }

 /*
diff --git a/arch/arm64/kernel/rsi.c b/arch/arm64/kernel/rsi.c
index 92160f2e57ff..25ca75ce1a4d 100644
--- a/arch/arm64/kernel/rsi.c
+++ b/arch/arm64/kernel/rsi.c
@@ -7,7 +7,6 @@
 #include <linux/memblock.h>
 #include <linux/psci.h>
 #include <linux/swiotlb.h>
-#include <linux/cc_platform.h>
 #include <linux/platform_device.h>

 #include <asm/io.h>
@@ -23,17 +22,6 @@ EXPORT_SYMBOL(prot_ns_shared);
 DEFINE_STATIC_KEY_FALSE_RO(rsi_present);
 EXPORT_SYMBOL(rsi_present);

-bool cc_platform_has(enum cc_attr attr)
-{
-	switch (attr) {
-	case CC_ATTR_MEM_ENCRYPT:
-		return is_realm_world();
-	default:
-		return false;
-	}
-}
-EXPORT_SYMBOL_GPL(cc_platform_has);
-
 static bool rsi_version_matches(void)
 {
 	unsigned long ver_lower, ver_higher;
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index acf67c7064db..a087ac5b15f7 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -12,6 +12,7 @@
 #include <linux/swap.h>
 #include <linux/init.h>
 #include <linux/cache.h>
+#include <linux/cc_platform.h>
 #include <linux/mman.h>
 #include <linux/nodemask.h>
 #include <linux/initrd.h>
@@ -36,6 +37,7 @@

 #include <asm/boot.h>
 #include <asm/fixmap.h>
+#include <asm/hypervisor.h>
 #include <asm/kasan.h>
 #include <asm/kernel-pgtable.h>
 #include <asm/kvm_host.h>
@@ -414,6 +416,17 @@ void dump_mem_limit(void)
 	}
 }

+bool cc_platform_has(enum cc_attr attr)
+{
+	switch (attr) {
+	case CC_ATTR_MEM_ENCRYPT:
+		return is_realm_world() || is_protected_kvm_guest();
+	default:
+		return false;
+	}
+}
+EXPORT_SYMBOL_GPL(cc_platform_has);
+
 #ifdef CONFIG_EXECMEM
 static u64 module_direct_base __ro_after_init = 0;
 static u64 module_plt_base __ro_after_init = 0;
diff --git a/drivers/virt/coco/pkvm-guest/arm-pkvm-guest.c b/drivers/virt/coco/pkvm-guest/arm-pkvm-guest.c
index 4230b817a80b..297e6d6019b8 100644
--- a/drivers/virt/coco/pkvm-guest/arm-pkvm-guest.c
+++ b/drivers/virt/coco/pkvm-guest/arm-pkvm-guest.c
@@ -95,6 +95,11 @@ static int mmio_guard_ioremap_hook(phys_addr_t phys, size_t size,
 	return 0;
 }

+bool is_protected_kvm_guest(void)
+{
+	return !!pkvm_granule;
+}
+
 void pkvm_init_hyp_services(void)
 {
 	int i;
--
2.54.0.563.g4f69b47b94-goog


Thanks,
Mostafa
> 
> -aneesh

^ permalink raw reply related

* Re: [PATCH v5 1/3] firmware: smccc: coco: Manage arm-smccc platform device and CCA auxiliary drivers
From: Greg KH @ 2026-05-14 14:23 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Suzuki K Poulose, Aneesh Kumar K.V (Arm), linux-coco,
	linux-arm-kernel, linux-kernel, Jeremy Linton, Jonathan Cameron,
	Lorenzo Pieralisi, Mark Rutland, Sudeep Holla, Will Deacon,
	Steven Price
In-Reply-To: <agXSI6PPf4uR7VBL@arm.com>

On Thu, May 14, 2026 at 02:46:11PM +0100, Catalin Marinas wrote:
> On Thu, May 14, 2026 at 03:25:34PM +0200, Greg Kroah-Hartman wrote:
> > On Thu, May 14, 2026 at 02:19:19PM +0100, Catalin Marinas wrote:
> > > On Thu, May 14, 2026 at 02:55:48PM +0200, Greg Kroah-Hartman wrote:
> > > > On Thu, May 14, 2026 at 12:04:13PM +0100, Suzuki K Poulose wrote:
> > > > > On 14/05/2026 10:40, Aneesh Kumar K.V (Arm) wrote:
> > > > > > Make the SMCCC driver responsible for registering the arm-smccc platform
> > > > > > device and after confirming the relevant SMCCC function IDs, create
> > > > > > the arm_cca_guest auxiliary device.
> > > > > > 
> > > > > 
> > > > > There are a few changes squashed in to this patch. Please could we
> > > > > split the patch in the following order ?
> > > > > 
> > > > > 1. Add platform device for arm-smccc
> > > > 
> > > > Do not make any more "fake" platform devices please.
> > > > 
> > > > > 2. Move TRNG to Auxilliary Device - (Even though it is a later patch, move
> > > > > it before the RSI changes)
> > > > 
> > > > No, move it to the faux api please.
> > > 
> > > So should we end up with:
> > > 
> > >   /sys/devices/faux/arm-smccc/
> > >     smccc_trng/
> > >     arm-rsi-dev/
> > 
> > What types are these child devices?  Also faux ones?
> 
> They'd also be faux devices with this structure (in practice they are
> firmware interfaces that may be backed by some hardware like in the TRNG
> case, though not directly accessible to Linux).

Great, seems sane to me!

^ permalink raw reply

* Re: [PATCH v5 1/3] firmware: smccc: coco: Manage arm-smccc platform device and CCA auxiliary drivers
From: Aneesh Kumar K.V @ 2026-05-14 14:37 UTC (permalink / raw)
  To: Greg KH, Suzuki K Poulose
  Cc: linux-coco, linux-arm-kernel, linux-kernel, Catalin Marinas,
	Jeremy Linton, Jonathan Cameron, Lorenzo Pieralisi, Mark Rutland,
	Sudeep Holla, Will Deacon, Steven Price
In-Reply-To: <2026051420-amusement-drove-73e6@gregkh>

Greg KH <gregkh@linuxfoundation.org> writes:

> On Thu, May 14, 2026 at 12:04:13PM +0100, Suzuki K Poulose wrote:
>> Hi Aneesh
>> 
>> On 14/05/2026 10:40, Aneesh Kumar K.V (Arm) wrote:
>> > Make the SMCCC driver responsible for registering the arm-smccc platform
>> > device and after confirming the relevant SMCCC function IDs, create
>> > the arm_cca_guest auxiliary device.
>> > 
>> 
>> There are a few changes squashed in to this patch. Please could we
>> split the patch in the following order ?
>> 
>> 1. Add platform device for arm-smccc
>
> Do not make any more "fake" platform devices please.
>
>> 2. Move TRNG to Auxilliary Device - (Even though it is a later patch, move
>> it before the RSI changes)
>
> No, move it to the faux api please.
>

Maybe I was not complete in my previous reply. I did not want to repeat
the entire thread, so I quoted the lore link for more details.

1. We have platform firmware-provided SMCCC interfaces. Based on the
support/availability of these function IDs, we want to load multiple
drivers.
2. This patch series adds a platform device to represent the
firmware-provided SMCCC resource.
3. Different SMCCC ranges are now represented as auxiliary devices.
4. Different subsystems, such as TSM, can autoload their backend drivers
based on the availability of these SMCCC ranges, which are now
represented as auxiliary devices.

You had agreed to all of this in the previous discussion here:
https://lore.kernel.org/all/2025101516-handbook-hyphen-62ec@gregkh

-aneesh

^ permalink raw reply

* Re: [PATCH v4 04/13] dma: swiotlb: track pool encryption state and honor DMA_ATTR_CC_SHARED
From: Jason Gunthorpe @ 2026-05-14 14:37 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Mostafa Saleh, iommu, linux-arm-kernel, linux-kernel, linux-coco,
	Robin Murphy, Marek Szyprowski, Will Deacon, Marc Zyngier,
	Steven Price, Suzuki K Poulose, Catalin Marinas, Jiri Pirko,
	Petr Tesarik, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
	linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, x86
In-Reply-To: <yq5apl2y5f96.fsf@kernel.org>

On Thu, May 14, 2026 at 06:18:05PM +0530, Aneesh Kumar K.V wrote:
> > There is no problem with non-protected guests as they don't use memory
> > encryption, my initial thought was that th encrpyted/decrypted is
> > per-pool property which is decided by FW (device-tree).
> 
> What I meant was that we need a generic way to identify a pKVM guest, so
> that we can use it in the conditional above.

If I understood Mostafa's remarks I think different devices in the
guest need shared/decrypted and some don't? Ie a virtio hypervisor
device needs shared while a real PCI device doesn't? Is that right?

In CC terms that would be a mixture of T=0 and T=1 devices hardwired
and signaled by firwmare..

Ideally we'd have a flow where if the arch precreates a swiotlb pool
with special parameters this overrides all other decision making. Then
this series is about making CC NOT use that flow... ??

Jason

^ permalink raw reply

* Re: [PATCH v5 1/3] firmware: smccc: coco: Manage arm-smccc platform device and CCA auxiliary drivers
From: Aneesh Kumar K.V @ 2026-05-14 14:38 UTC (permalink / raw)
  To: Catalin Marinas, Greg KH
  Cc: Suzuki K Poulose, linux-coco, linux-arm-kernel, linux-kernel,
	Jeremy Linton, Jonathan Cameron, Lorenzo Pieralisi, Mark Rutland,
	Sudeep Holla, Will Deacon, Steven Price
In-Reply-To: <agXL12bNh4gGyK1K@arm.com>

Catalin Marinas <catalin.marinas@arm.com> writes:

> On Thu, May 14, 2026 at 02:55:48PM +0200, Greg Kroah-Hartman wrote:
>> On Thu, May 14, 2026 at 12:04:13PM +0100, Suzuki K Poulose wrote:
>> > On 14/05/2026 10:40, Aneesh Kumar K.V (Arm) wrote:
>> > > Make the SMCCC driver responsible for registering the arm-smccc platform
>> > > device and after confirming the relevant SMCCC function IDs, create
>> > > the arm_cca_guest auxiliary device.
>> > > 
>> > 
>> > There are a few changes squashed in to this patch. Please could we
>> > split the patch in the following order ?
>> > 
>> > 1. Add platform device for arm-smccc
>> 
>> Do not make any more "fake" platform devices please.
>> 
>> > 2. Move TRNG to Auxilliary Device - (Even though it is a later patch, move
>> > it before the RSI changes)
>> 
>> No, move it to the faux api please.
>
> So should we end up with:
>
>   /sys/devices/faux/arm-smccc/
>     smccc_trng/
>     arm-rsi-dev/
>       tsm/tsm0
>
>   /sys/class/tsm/tsm0
>     -> ../../devices/faux/arm-smccc/arm-rsi-dev/tsm/tsm0
>
>   /sys/firmware/cca/
>     realm_guest

But we need the ability to autoload different TSM backend drivers based
on the support/availability of these SMCCC function-id ranges. faux
device don't support that.

-aneesh

^ permalink raw reply

* Re: [PATCH v4 04/13] dma: swiotlb: track pool encryption state and honor DMA_ATTR_CC_SHARED
From: Aneesh Kumar K.V @ 2026-05-14 14:43 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: iommu, linux-arm-kernel, linux-kernel, linux-coco, Robin Murphy,
	Marek Szyprowski, Will Deacon, Marc Zyngier, Steven Price,
	Suzuki K Poulose, Catalin Marinas, Jiri Pirko, Jason Gunthorpe,
	Petr Tesarik, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
	linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, x86
In-Reply-To: <agXaby-7L7yS3Vva@google.com>

Mostafa Saleh <smostafa@google.com> writes:

> On Thu, May 14, 2026 at 06:18:05PM +0530, Aneesh Kumar K.V wrote:
>> Mostafa Saleh <smostafa@google.com> writes:
>> 
>> > On Thu, May 14, 2026 at 11:24:42AM +0530, Aneesh Kumar K.V wrote:
>> >> Mostafa Saleh <smostafa@google.com> writes:
>> >> 
>> >> > On Tue, May 12, 2026 at 02:33:59PM +0530, Aneesh Kumar K.V (Arm) wrote:
>> >> >> Teach swiotlb to distinguish between encrypted and decrypted bounce
>> >> >> buffer pools, and make allocation and mapping paths select a pool whose
>> >> >> state matches the requested DMA attributes.
>> >> >> 
>> >> >> Add a decrypted flag to io_tlb_mem, initialize it for the default and
>> >> >> restricted pools, and propagate DMA_ATTR_CC_SHARED into swiotlb pool
>> >> >> allocation. Reject swiotlb alloc/map requests when the selected pool does
>> >> >> not match the required encrypted/decrypted state.
>> >> >> 
>> >> >> Also return DMA addresses with the matching phys_to_dma_{encrypted,
>> >> >> unencrypted} helper so the DMA address encoding stays consistent with the
>> >> >> chosen pool.
>> >> >> 
>> >> >> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
>> >> >> ---
>> >> >>  include/linux/dma-direct.h |  10 ++++
>> >> >>  include/linux/swiotlb.h    |   8 ++-
>> >> >>  kernel/dma/direct.c        |  14 +++--
>> >> >>  kernel/dma/swiotlb.c       | 108 +++++++++++++++++++++++++++----------
>> >> >>  4 files changed, 107 insertions(+), 33 deletions(-)
>> >> >> 
>> >> >> diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
>> >> >> index c249912456f9..94fad4e7c11e 100644
>> >> >> --- a/include/linux/dma-direct.h
>> >> >> +++ b/include/linux/dma-direct.h
>> >> >> @@ -77,6 +77,10 @@ static inline dma_addr_t dma_range_map_max(const struct bus_dma_region *map)
>> >> >>  #ifndef phys_to_dma_unencrypted
>> >> >>  #define phys_to_dma_unencrypted		phys_to_dma
>> >> >>  #endif
>> >> >> +
>> >> >> +#ifndef phys_to_dma_encrypted
>> >> >> +#define phys_to_dma_encrypted		phys_to_dma
>> >> >> +#endif
>> >> >>  #else
>> >> >>  static inline dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
>> >> >>  {
>> >> >> @@ -90,6 +94,12 @@ static inline dma_addr_t phys_to_dma_unencrypted(struct device *dev,
>> >> >>  {
>> >> >>  	return dma_addr_unencrypted(__phys_to_dma(dev, paddr));
>> >> >>  }
>> >> >> +
>> >> >> +static inline dma_addr_t phys_to_dma_encrypted(struct device *dev,
>> >> >> +		phys_addr_t paddr)
>> >> >> +{
>> >> >> +	return dma_addr_encrypted(__phys_to_dma(dev, paddr));
>> >> >> +}
>> >> >>  /*
>> >> >>   * If memory encryption is supported, phys_to_dma will set the memory encryption
>> >> >>   * bit in the DMA address, and dma_to_phys will clear it.
>> >> >> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
>> >> >> index 3dae0f592063..b3fa3c6e0169 100644
>> >> >> --- a/include/linux/swiotlb.h
>> >> >> +++ b/include/linux/swiotlb.h
>> >> >> @@ -81,6 +81,7 @@ struct io_tlb_pool {
>> >> >>  	struct list_head node;
>> >> >>  	struct rcu_head rcu;
>> >> >>  	bool transient;
>> >> >> +	bool unencrypted;
>> >> >>  #endif
>> >> >>  };
>> >> >>  
>> >> >> @@ -111,6 +112,7 @@ struct io_tlb_mem {
>> >> >>  	struct dentry *debugfs;
>> >> >>  	bool force_bounce;
>> >> >>  	bool for_alloc;
>> >> >> +	bool unencrypted;
>> >> >>  #ifdef CONFIG_SWIOTLB_DYNAMIC
>> >> >>  	bool can_grow;
>> >> >>  	u64 phys_limit;
>> >> >> @@ -282,7 +284,8 @@ static inline void swiotlb_sync_single_for_cpu(struct device *dev,
>> >> >>  extern void swiotlb_print_info(void);
>> >> >>  
>> >> >>  #ifdef CONFIG_DMA_RESTRICTED_POOL
>> >> >> -struct page *swiotlb_alloc(struct device *dev, size_t size);
>> >> >> +struct page *swiotlb_alloc(struct device *dev, size_t size,
>> >> >> +		unsigned long attrs);
>> >> >>  bool swiotlb_free(struct device *dev, struct page *page, size_t size);
>> >> >>  
>> >> >>  static inline bool is_swiotlb_for_alloc(struct device *dev)
>> >> >> @@ -290,7 +293,8 @@ static inline bool is_swiotlb_for_alloc(struct device *dev)
>> >> >>  	return dev->dma_io_tlb_mem->for_alloc;
>> >> >>  }
>> >> >>  #else
>> >> >> -static inline struct page *swiotlb_alloc(struct device *dev, size_t size)
>> >> >> +static inline struct page *swiotlb_alloc(struct device *dev, size_t size,
>> >> >> +		unsigned long attrs)
>> >> >>  {
>> >> >>  	return NULL;
>> >> >>  }
>> >> >> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
>> >> >> index dc2907439b3d..97ae4fa10521 100644
>> >> >> --- a/kernel/dma/direct.c
>> >> >> +++ b/kernel/dma/direct.c
>> >> >> @@ -104,9 +104,10 @@ static void __dma_direct_free_pages(struct device *dev, struct page *page,
>> >> >>  	dma_free_contiguous(dev, page, size);
>> >> >>  }
>> >> >>  
>> >> >> -static struct page *dma_direct_alloc_swiotlb(struct device *dev, size_t size)
>> >> >> +static struct page *dma_direct_alloc_swiotlb(struct device *dev, size_t size,
>> >> >> +		unsigned long attrs)
>> >> >>  {
>> >> >> -	struct page *page = swiotlb_alloc(dev, size);
>> >> >> +	struct page *page = swiotlb_alloc(dev, size, attrs);
>> >> >>  
>> >> >>  	if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
>> >> >>  		swiotlb_free(dev, page, size);
>> >> >> @@ -266,8 +267,12 @@ void *dma_direct_alloc(struct device *dev, size_t size,
>> >> >>  						  gfp, attrs);
>> >> >>  
>> >> >>  	if (is_swiotlb_for_alloc(dev)) {
>> >> >> -		page = dma_direct_alloc_swiotlb(dev, size);
>> >> >> +		page = dma_direct_alloc_swiotlb(dev, size, attrs);
>> >> >>  		if (page) {
>> >> >> +			/*
>> >> >> +			 * swiotlb allocations comes from pool already marked
>> >> >> +			 * decrypted
>> >> >> +			 */
>> >> >>  			mark_mem_decrypt = false;
>> >> >>  			goto setup_page;
>> >> >>  		}
>> >> >> @@ -374,6 +379,7 @@ void dma_direct_free(struct device *dev, size_t size,
>> >> >>  		return;
>> >> >>  
>> >> >>  	if (swiotlb_find_pool(dev, dma_to_phys(dev, dma_addr)))
>> >> >> +		/* Swiotlb doesn't need a page attribute update on free */
>> >> >>  		mark_mem_encrypted = false;
>> >> >>  
>> >> >>  	if (is_vmalloc_addr(cpu_addr)) {
>> >> >> @@ -403,7 +409,7 @@ struct page *dma_direct_alloc_pages(struct device *dev, size_t size,
>> >> >>  						  gfp, attrs);
>> >> >>  
>> >> >>  	if (is_swiotlb_for_alloc(dev)) {
>> >> >> -		page = dma_direct_alloc_swiotlb(dev, size);
>> >> >> +		page = dma_direct_alloc_swiotlb(dev, size, attrs);
>> >> >>  		if (!page)
>> >> >>  			return NULL;
>> >> >>  
>> >> >> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
>> >> >> index ab4eccbaa076..065663be282c 100644
>> >> >> --- a/kernel/dma/swiotlb.c
>> >> >> +++ b/kernel/dma/swiotlb.c
>> >> >> @@ -259,10 +259,21 @@ void __init swiotlb_update_mem_attributes(void)
>> >> >>  	struct io_tlb_pool *mem = &io_tlb_default_mem.defpool;
>> >> >>  	unsigned long bytes;
>> >> >>  
>> >> >> +	/*
>> >> >> +	 * if platform support memory encryption, swiotlb buffers are
>> >> >> +	 * decrypted by default.
>> >> >> +	 */
>> >> >> +	if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
>> >> >> +		io_tlb_default_mem.unencrypted = true;
>> >> >> +	else
>> >> >> +		io_tlb_default_mem.unencrypted = false;
>> >> >> +
>> >> >>  	if (!mem->nslabs || mem->late_alloc)
>> >> >>  		return;
>> >> >>  	bytes = PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT);
>> >> >> -	set_memory_decrypted((unsigned long)mem->vaddr, bytes >> PAGE_SHIFT);
>> >> >> +
>> >> >> +	if (io_tlb_default_mem.unencrypted)
>> >> >> +		set_memory_decrypted((unsigned long)mem->vaddr, bytes >> PAGE_SHIFT);
>> >> >>  }
>> >> >>  
>> >> >>  static void swiotlb_init_io_tlb_pool(struct io_tlb_pool *mem, phys_addr_t start,
>> >> >> @@ -505,8 +516,10 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
>> >> >>  	if (!mem->slots)
>> >> >>  		goto error_slots;
>> >> >>  
>> >> >> -	set_memory_decrypted((unsigned long)vstart,
>> >> >> -			     (nslabs << IO_TLB_SHIFT) >> PAGE_SHIFT);
>> >> >> +	if (io_tlb_default_mem.unencrypted)
>> >> >> +		set_memory_decrypted((unsigned long)vstart,
>> >> >> +				     (nslabs << IO_TLB_SHIFT) >> PAGE_SHIFT);
>> >> >> +
>> >> >>  	swiotlb_init_io_tlb_pool(mem, virt_to_phys(vstart), nslabs, true,
>> >> >>  				 nareas);
>> >> >>  	add_mem_pool(&io_tlb_default_mem, mem);
>> >> >> @@ -539,7 +552,9 @@ void __init swiotlb_exit(void)
>> >> >>  	tbl_size = PAGE_ALIGN(mem->end - mem->start);
>> >> >>  	slots_size = PAGE_ALIGN(array_size(sizeof(*mem->slots), mem->nslabs));
>> >> >>  
>> >> >> -	set_memory_encrypted(tbl_vaddr, tbl_size >> PAGE_SHIFT);
>> >> >> +	if (io_tlb_default_mem.unencrypted)
>> >> >> +		set_memory_encrypted(tbl_vaddr, tbl_size >> PAGE_SHIFT);
>> >> >> +
>> >> >>  	if (mem->late_alloc) {
>> >> >>  		area_order = get_order(array_size(sizeof(*mem->areas),
>> >> >>  			mem->nareas));
>> >> >> @@ -563,6 +578,7 @@ void __init swiotlb_exit(void)
>> >> >>   * @gfp:	GFP flags for the allocation.
>> >> >>   * @bytes:	Size of the buffer.
>> >> >>   * @phys_limit:	Maximum allowed physical address of the buffer.
>> >> >> + * @unencrypted: true to allocate unencrypted memory, false for encrypted memory
>> >> >>   *
>> >> >>   * Allocate pages from the buddy allocator. If successful, make the allocated
>> >> >>   * pages decrypted that they can be used for DMA.
>> >> >> @@ -570,7 +586,8 @@ void __init swiotlb_exit(void)
>> >> >>   * Return: Decrypted pages, %NULL on allocation failure, or ERR_PTR(-EAGAIN)
>> >> >>   * if the allocated physical address was above @phys_limit.
>> >> >>   */
>> >> >> -static struct page *alloc_dma_pages(gfp_t gfp, size_t bytes, u64 phys_limit)
>> >> >> +static struct page *alloc_dma_pages(gfp_t gfp, size_t bytes,
>> >> >> +		u64 phys_limit, bool unencrypted)
>> >> >>  {
>> >> >>  	unsigned int order = get_order(bytes);
>> >> >>  	struct page *page;
>> >> >> @@ -588,13 +605,13 @@ static struct page *alloc_dma_pages(gfp_t gfp, size_t bytes, u64 phys_limit)
>> >> >>  	}
>> >> >>  
>> >> >>  	vaddr = phys_to_virt(paddr);
>> >> >> -	if (set_memory_decrypted((unsigned long)vaddr, PFN_UP(bytes)))
>> >> >> +	if (unencrypted && set_memory_decrypted((unsigned long)vaddr, PFN_UP(bytes)))
>> >> >>  		goto error;
>> >> >>  	return page;
>> >> >>  
>> >> >>  error:
>> >> >>  	/* Intentional leak if pages cannot be encrypted again. */
>> >> >> -	if (!set_memory_encrypted((unsigned long)vaddr, PFN_UP(bytes)))
>> >> >> +	if (unencrypted && !set_memory_encrypted((unsigned long)vaddr, PFN_UP(bytes)))
>> >> >>  		__free_pages(page, order);
>> >> >>  	return NULL;
>> >> >>  }
>> >> >> @@ -604,30 +621,26 @@ static struct page *alloc_dma_pages(gfp_t gfp, size_t bytes, u64 phys_limit)
>> >> >>   * @dev:	Device for which a memory pool is allocated.
>> >> >>   * @bytes:	Size of the buffer.
>> >> >>   * @phys_limit:	Maximum allowed physical address of the buffer.
>> >> >> + * @attrs:	DMA attributes for the allocation.
>> >> >>   * @gfp:	GFP flags for the allocation.
>> >> >>   *
>> >> >>   * Return: Allocated pages, or %NULL on allocation failure.
>> >> >>   */
>> >> >>  static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
>> >> >> -		u64 phys_limit, gfp_t gfp)
>> >> >> +		u64 phys_limit, unsigned long attrs, gfp_t gfp)
>> >> >>  {
>> >> >>  	struct page *page;
>> >> >> -	unsigned long attrs = 0;
>> >> >>  
>> >> >>  	/*
>> >> >>  	 * Allocate from the atomic pools if memory is encrypted and
>> >> >>  	 * the allocation is atomic, because decrypting may block.
>> >> >>  	 */
>> >> >> -	if (!gfpflags_allow_blocking(gfp) && dev && force_dma_unencrypted(dev)) {
>> >> >> +	if (!gfpflags_allow_blocking(gfp) && (attrs & DMA_ATTR_CC_SHARED)) {
>> >> >>  		void *vaddr;
>> >> >>  
>> >> >>  		if (!IS_ENABLED(CONFIG_DMA_COHERENT_POOL))
>> >> >>  			return NULL;
>> >> >>  
>> >> >> -		/* swiotlb considered decrypted by default */
>> >> >> -		if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
>> >> >> -			attrs = DMA_ATTR_CC_SHARED;
>> >> >> -
>> >> >>  		return dma_alloc_from_pool(dev, bytes, &vaddr, gfp,
>> >> >>  					   attrs, dma_coherent_ok);
>> >> >>  	}
>> >> >> @@ -638,7 +651,8 @@ static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
>> >> >>  	else if (phys_limit <= DMA_BIT_MASK(32))
>> >> >>  		gfp |= __GFP_DMA32;
>> >> >>  
>> >> >> -	while (IS_ERR(page = alloc_dma_pages(gfp, bytes, phys_limit))) {
>> >> >> +	while (IS_ERR(page = alloc_dma_pages(gfp, bytes, phys_limit,
>> >> >> +					     !!(attrs & DMA_ATTR_CC_SHARED)))) {
>> >> >>  		if (IS_ENABLED(CONFIG_ZONE_DMA32) &&
>> >> >>  		    phys_limit < DMA_BIT_MASK(64) &&
>> >> >>  		    !(gfp & (__GFP_DMA32 | __GFP_DMA)))
>> >> >> @@ -657,15 +671,18 @@ static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
>> >> >>   * swiotlb_free_tlb() - free a dynamically allocated IO TLB buffer
>> >> >>   * @vaddr:	Virtual address of the buffer.
>> >> >>   * @bytes:	Size of the buffer.
>> >> >> + * @unencrypted: true if @vaddr was allocated decrypted and must be
>> >> >> + *	re-encrypted before being freed
>> >> >>   */
>> >> >> -static void swiotlb_free_tlb(void *vaddr, size_t bytes)
>> >> >> +static void swiotlb_free_tlb(void *vaddr, size_t bytes, bool unencrypted)
>> >> >>  {
>> >> >>  	if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
>> >> >>  	    dma_free_from_pool(NULL, vaddr, bytes))
>> >> >>  		return;
>> >> >>  
>> >> >>  	/* Intentional leak if pages cannot be encrypted again. */
>> >> >> -	if (!set_memory_encrypted((unsigned long)vaddr, PFN_UP(bytes)))
>> >> >> +	if (!unencrypted ||
>> >> >> +	    !set_memory_encrypted((unsigned long)vaddr, PFN_UP(bytes)))
>> >> >>  		__free_pages(virt_to_page(vaddr), get_order(bytes));
>> >> >>  }
>> >> >>  
>> >> >> @@ -676,6 +693,7 @@ static void swiotlb_free_tlb(void *vaddr, size_t bytes)
>> >> >>   * @nslabs:	Desired (maximum) number of slabs.
>> >> >>   * @nareas:	Number of areas.
>> >> >>   * @phys_limit:	Maximum DMA buffer physical address.
>> >> >> + * @attrs:	DMA attributes for the allocation.
>> >> >>   * @gfp:	GFP flags for the allocations.
>> >> >>   *
>> >> >>   * Allocate and initialize a new IO TLB memory pool. The actual number of
>> >> >> @@ -686,7 +704,8 @@ static void swiotlb_free_tlb(void *vaddr, size_t bytes)
>> >> >>   */
>> >> >>  static struct io_tlb_pool *swiotlb_alloc_pool(struct device *dev,
>> >> >>  		unsigned long minslabs, unsigned long nslabs,
>> >> >> -		unsigned int nareas, u64 phys_limit, gfp_t gfp)
>> >> >> +		unsigned int nareas, u64 phys_limit, unsigned long attrs,
>> >> >> +		gfp_t gfp)
>> >> >>  {
>> >> >>  	struct io_tlb_pool *pool;
>> >> >>  	unsigned int slot_order;
>> >> >> @@ -704,9 +723,10 @@ static struct io_tlb_pool *swiotlb_alloc_pool(struct device *dev,
>> >> >>  	if (!pool)
>> >> >>  		goto error;
>> >> >>  	pool->areas = (void *)pool + sizeof(*pool);
>> >> >> +	pool->unencrypted = !!(attrs & DMA_ATTR_CC_SHARED);
>> >> >>  
>> >> >>  	tlb_size = nslabs << IO_TLB_SHIFT;
>> >> >> -	while (!(tlb = swiotlb_alloc_tlb(dev, tlb_size, phys_limit, gfp))) {
>> >> >> +	while (!(tlb = swiotlb_alloc_tlb(dev, tlb_size, phys_limit, attrs, gfp))) {
>> >> >>  		if (nslabs <= minslabs)
>> >> >>  			goto error_tlb;
>> >> >>  		nslabs = ALIGN(nslabs >> 1, IO_TLB_SEGSIZE);
>> >> >> @@ -724,7 +744,8 @@ static struct io_tlb_pool *swiotlb_alloc_pool(struct device *dev,
>> >> >>  	return pool;
>> >> >>  
>> >> >>  error_slots:
>> >> >> -	swiotlb_free_tlb(page_address(tlb), tlb_size);
>> >> >> +	swiotlb_free_tlb(page_address(tlb), tlb_size,
>> >> >> +			 !!(attrs & DMA_ATTR_CC_SHARED));
>> >> >>  error_tlb:
>> >> >>  	kfree(pool);
>> >> >>  error:
>> >> >> @@ -742,7 +763,9 @@ static void swiotlb_dyn_alloc(struct work_struct *work)
>> >> >>  	struct io_tlb_pool *pool;
>> >> >>  
>> >> >>  	pool = swiotlb_alloc_pool(NULL, IO_TLB_MIN_SLABS, default_nslabs,
>> >> >> -				  default_nareas, mem->phys_limit, GFP_KERNEL);
>> >> >> +				  default_nareas, mem->phys_limit,
>> >> >> +				  mem->unencrypted ? DMA_ATTR_CC_SHARED : 0,
>> >> >> +				  GFP_KERNEL);
>> >> >>  	if (!pool) {
>> >> >>  		pr_warn_ratelimited("Failed to allocate new pool");
>> >> >>  		return;
>> >> >> @@ -762,7 +785,7 @@ static void swiotlb_dyn_free(struct rcu_head *rcu)
>> >> >>  	size_t tlb_size = pool->end - pool->start;
>> >> >>  
>> >> >>  	free_pages((unsigned long)pool->slots, get_order(slots_size));
>> >> >> -	swiotlb_free_tlb(pool->vaddr, tlb_size);
>> >> >> +	swiotlb_free_tlb(pool->vaddr, tlb_size, pool->unencrypted);
>> >> >>  	kfree(pool);
>> >> >>  }
>> >> >>  
>> >> >> @@ -1232,6 +1255,7 @@ static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
>> >> >>  	nslabs = nr_slots(alloc_size);
>> >> >>  	phys_limit = min_not_zero(*dev->dma_mask, dev->bus_dma_limit);
>> >> >>  	pool = swiotlb_alloc_pool(dev, nslabs, nslabs, 1, phys_limit,
>> >> >> +				  mem->unencrypted ? DMA_ATTR_CC_SHARED : 0,
>> >> >>  				  GFP_NOWAIT);
>> >> >>  	if (!pool)
>> >> >>  		return -1;
>> >> >> @@ -1394,6 +1418,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
>> >> >>  		enum dma_data_direction dir, unsigned long attrs)
>> >> >>  {
>> >> >>  	struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
>> >> >> +	bool require_decrypted = false;
>> >> >>  	unsigned int offset;
>> >> >>  	struct io_tlb_pool *pool;
>> >> >>  	unsigned int i;
>> >> >> @@ -1411,6 +1436,16 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
>> >> >>  	if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
>> >> >>  		pr_warn_once("Memory encryption is active and system is using DMA bounce buffers\n");
>> >> >>  
>> >> >> +	/*
>> >> >> +	 * if we are trying to swiotlb map a decrypted paddr or the paddr is encrypted
>> >> >> +	 * but the device is forcing decryption, use decrypted io_tlb_mem
>> >> >> +	 */
>> >> >> +	if ((attrs & DMA_ATTR_CC_SHARED) || force_dma_unencrypted(dev))
>> >> >> +		require_decrypted = true;
>> >> >> +
>> >> >> +	if (require_decrypted != mem->unencrypted)
>> >> >> +		return (phys_addr_t)DMA_MAPPING_ERROR;
>> >> >> +
>> >> >>  	/*
>> >> >>  	 * The default swiotlb memory pool is allocated with PAGE_SIZE
>> >> >>  	 * alignment. If a mapping is requested with larger alignment,
>> >> >> @@ -1608,8 +1643,14 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t paddr, size_t size,
>> >> >>  	if (swiotlb_addr == (phys_addr_t)DMA_MAPPING_ERROR)
>> >> >>  		return DMA_MAPPING_ERROR;
>> >> >>  
>> >> >> -	/* Ensure that the address returned is DMA'ble */
>> >> >> -	dma_addr = phys_to_dma_unencrypted(dev, swiotlb_addr);
>> >> >> +	/*
>> >> >> +	 * Use the allocated io_tlb_mem encryption type to determine dma addr.
>> >> >> +	 */
>> >> >> +	if (dev->dma_io_tlb_mem->unencrypted)
>> >> >> +		dma_addr = phys_to_dma_unencrypted(dev, swiotlb_addr);
>> >> >> +	else
>> >> >> +		dma_addr = phys_to_dma_encrypted(dev, swiotlb_addr);
>> >> >> +
>> >> >>  	if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
>> >> >>  		__swiotlb_tbl_unmap_single(dev, swiotlb_addr, size, dir,
>> >> >>  			attrs | DMA_ATTR_SKIP_CPU_SYNC,
>> >> >> @@ -1773,7 +1814,8 @@ static inline void swiotlb_create_debugfs_files(struct io_tlb_mem *mem,
>> >> >>  
>> >> >>  #ifdef CONFIG_DMA_RESTRICTED_POOL
>> >> >>  
>> >> >> -struct page *swiotlb_alloc(struct device *dev, size_t size)
>> >> >> +struct page *swiotlb_alloc(struct device *dev, size_t size,
>> >> >> +		unsigned long attrs)
>> >> >>  {
>> >> >>  	struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
>> >> >>  	struct io_tlb_pool *pool;
>> >> >> @@ -1784,6 +1826,9 @@ struct page *swiotlb_alloc(struct device *dev, size_t size)
>> >> >>  	if (!mem)
>> >> >>  		return NULL;
>> >> >>  
>> >> >> +	if (mem->unencrypted != !!(attrs & DMA_ATTR_CC_SHARED))
>> >> >> +		return NULL;
>> >> >> +
>> >> >>  	align = (1 << (get_order(size) + PAGE_SHIFT)) - 1;
>> >> >>  	index = swiotlb_find_slots(dev, 0, size, align, &pool);
>> >> >>  	if (index == -1)
>> >> >> @@ -1853,9 +1898,18 @@ static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
>> >> >>  			kfree(mem);
>> >> >>  			return -ENOMEM;
>> >> >>  		}
>> >> >> +		/*
>> >> >> +		 * if platform supports memory encryption,
>> >> >> +		 * restricted mem pool is decrypted by default
>> >> >> +		 */
>> >> >> +		if (cc_platform_has(CC_ATTR_MEM_ENCRYPT)) {
>> >> >> +			mem->unencrypted = true;
>> >> >> +			set_memory_decrypted((unsigned long)phys_to_virt(rmem->base),
>> >> >> +					     rmem->size >> PAGE_SHIFT);
>> >> >> +		} else {
>> >> >> +			mem->unencrypted = false;
>> >> >> +		}
>> >> >
>> >> > This breaks pKVM as it doesn’t set CC_ATTR_MEM_ENCRYPT, so all virtio
>> >> > traffic now fails.
>> >> >
>> >> > Also, by design, some drivers are clueless about bouncing, so
>> >> > I believe that the pool should have a way to control it’s property
>> >> > (encrypted or decrypted) and that takes priority over whatever
>> >> > attributes comes from allocation.
>> >> > And that brings us to the same point whether it’s better to return
>> >> > the memory along with it’s state or we pass the requested state.
>> >> > I think for other cases it’s fine for the device/DMA-API to dictate
>> >> > the attrs, but not in restricted-dma case, the firmware just knows better.
>> >> >
>> >> 
>> >> Is it that the pKVM guest kernel does not have awareness of
>> >> encrypted/decrypted DMA allocations? Instead, the firmware attaches
>> >> hypervisor-shared pages to the device via restricted-dma-pool? The
>> >> kernel then has swiotlb->for_alloc = true, and hence all DMA allocations
>> >> go through the restricted-dma-pool?
>> >
>> > Yes.
>> >
>> >> 
>> >> Given that pKVM supports pkvm_set_memory_encrypted() and
>> >> pkvm_set_memory_decrypted(), can we consider adding CC_ATTR_MEM_ENCRYPT
>> >> support to pKVM? It would also be good to investigate whether we can set
>> >> force_dma_unencrypted(dev) to true where needed.
>> >
>> > I was looking in to that, but it didn't work because
>> > force_dma_unencrypted() is broken with restricted-dma due to the
>> > double decryption issue, that's when I sent my first series [1]
>> >
>> > May be we should land some basic fixes for that path so we can
>> > convert pKVM, then we do the full rework.
>> >
>> > I will revive my old work and see if I can send a RFC.
>> >
>> > [1] https://lore.kernel.org/all/20260305170335.963568-1-smostafa@google.com/
>> >
>> 
>> With this series, can you check whether the only change needed is
>> something like the following?
>> 
>> modified   kernel/dma/swiotlb.c
>> @@ -1905,7 +1905,8 @@ static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
>>  		 * if platform supports memory encryption,
>>  		 * restricted mem pool is decrypted by default
>>  		 */
>> -		if (cc_platform_has(CC_ATTR_MEM_ENCRYPT)) {
>> +		//if (cc_platform_has(CC_ATTR_MEM_ENCRYPT)) {
>> +		if (true) {
>>  			mem->unencrypted = true;
>>  			set_memory_decrypted((unsigned long)phys_to_virt(rmem->base),
>>  					     rmem->size >> PAGE_SHIFT);
>
> Yes, that boots, but I will need to do more tests.
>
>> 
>> >
>> >> 
>> >> I agree that this patch, as it stands, can break pKVM because we are now
>> >> missing the set_memory_decrypted() call required for pKVM to work.
>> >> 
>> >> We now mark the swiotlb io_tlb_mem as unencrypted/encrypted in the guest
>> >> using struct io_tlb_mem->unencrypted. I am not clear what we can use for
>> >> pKVM to conditionalize this so that it works for both protected and
>> >> unprotected guests.
>> >
>> > There is no problem with non-protected guests as they don't use memory
>> > encryption, my initial thought was that th encrpyted/decrypted is
>> > per-pool property which is decided by FW (device-tree).
>> >
>> 
>> What I meant was that we need a generic way to identify a pKVM guest, so
>> that we can use it in the conditional above.
>
> I have this patch, with that I can boot with your series unmodified,
> but I will need to do more testing.
>

Thanks, I can add this to the series once you complete the required testing.

>
> From d795b4c4ee2437587616b2b342e9996afe6d6680 Mon Sep 17 00:00:00 2001
> From: Mostafa Saleh <smostafa@google.com>
> Date: Thu, 14 May 2026 13:46:15 +0000
> Subject: [PATCH] arm64/coco: Add pKVM as a CC platform
>
> pKVM does support memory encryption, expose that to the rest of
> the kernel through cc_platform_has()
>
> At the moment, all devices inside the guest are emulated which
> requires its memory to be shared back to the host (decrypted), so
> set force_dma_unencrypted() to always return true.
>
> Signed-off-by: Mostafa Saleh <smostafa@google.com>
> ---
>  arch/arm64/include/asm/hypervisor.h           |  6 ++++++
>  arch/arm64/include/asm/mem_encrypt.h          |  3 ++-
>  arch/arm64/kernel/rsi.c                       | 12 ------------
>  arch/arm64/mm/init.c                          | 13 +++++++++++++
>  drivers/virt/coco/pkvm-guest/arm-pkvm-guest.c |  5 +++++
>  5 files changed, 26 insertions(+), 13 deletions(-)
>
> diff --git a/arch/arm64/include/asm/hypervisor.h b/arch/arm64/include/asm/hypervisor.h
> index a12fd897c877..1b0e15f290be 100644
> --- a/arch/arm64/include/asm/hypervisor.h
> +++ b/arch/arm64/include/asm/hypervisor.h
> @@ -10,8 +10,14 @@ void kvm_arm_target_impl_cpu_init(void);
>
>  #ifdef CONFIG_ARM_PKVM_GUEST
>  void pkvm_init_hyp_services(void);
> +bool is_protected_kvm_guest(void);
>  #else
>  static inline void pkvm_init_hyp_services(void) { };
> +
> +static inline bool is_protected_kvm_guest(void)
> +{
> +	return false;
> +}
>  #endif
>
>  static inline void kvm_arch_init_hyp_services(void)
> diff --git a/arch/arm64/include/asm/mem_encrypt.h b/arch/arm64/include/asm/mem_encrypt.h
> index 314b2b52025f..636f45b4d8af 100644
> --- a/arch/arm64/include/asm/mem_encrypt.h
> +++ b/arch/arm64/include/asm/mem_encrypt.h
> @@ -2,6 +2,7 @@
>  #ifndef __ASM_MEM_ENCRYPT_H
>  #define __ASM_MEM_ENCRYPT_H
>
> +#include <asm/hypervisor.h>
>  #include <asm/rsi.h>
>
>  struct device;
> @@ -20,7 +21,7 @@ int realm_register_memory_enc_ops(void);
>
>  static inline bool force_dma_unencrypted(struct device *dev)
>  {
> -	return is_realm_world();
> +	return is_realm_world() || is_protected_kvm_guest();
>  }
>
>  /*
> diff --git a/arch/arm64/kernel/rsi.c b/arch/arm64/kernel/rsi.c
> index 92160f2e57ff..25ca75ce1a4d 100644
> --- a/arch/arm64/kernel/rsi.c
> +++ b/arch/arm64/kernel/rsi.c
> @@ -7,7 +7,6 @@
>  #include <linux/memblock.h>
>  #include <linux/psci.h>
>  #include <linux/swiotlb.h>
> -#include <linux/cc_platform.h>
>  #include <linux/platform_device.h>
>
>  #include <asm/io.h>
> @@ -23,17 +22,6 @@ EXPORT_SYMBOL(prot_ns_shared);
>  DEFINE_STATIC_KEY_FALSE_RO(rsi_present);
>  EXPORT_SYMBOL(rsi_present);
>
> -bool cc_platform_has(enum cc_attr attr)
> -{
> -	switch (attr) {
> -	case CC_ATTR_MEM_ENCRYPT:
> -		return is_realm_world();
> -	default:
> -		return false;
> -	}
> -}
> -EXPORT_SYMBOL_GPL(cc_platform_has);
> -
>  static bool rsi_version_matches(void)
>  {
>  	unsigned long ver_lower, ver_higher;
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index acf67c7064db..a087ac5b15f7 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -12,6 +12,7 @@
>  #include <linux/swap.h>
>  #include <linux/init.h>
>  #include <linux/cache.h>
> +#include <linux/cc_platform.h>
>  #include <linux/mman.h>
>  #include <linux/nodemask.h>
>  #include <linux/initrd.h>
> @@ -36,6 +37,7 @@
>
>  #include <asm/boot.h>
>  #include <asm/fixmap.h>
> +#include <asm/hypervisor.h>
>  #include <asm/kasan.h>
>  #include <asm/kernel-pgtable.h>
>  #include <asm/kvm_host.h>
> @@ -414,6 +416,17 @@ void dump_mem_limit(void)
>  	}
>  }
>
> +bool cc_platform_has(enum cc_attr attr)
> +{
> +	switch (attr) {
> +	case CC_ATTR_MEM_ENCRYPT:
> +		return is_realm_world() || is_protected_kvm_guest();
> +	default:
> +		return false;
> +	}
> +}
> +EXPORT_SYMBOL_GPL(cc_platform_has);
> +
>  #ifdef CONFIG_EXECMEM
>  static u64 module_direct_base __ro_after_init = 0;
>  static u64 module_plt_base __ro_after_init = 0;
> diff --git a/drivers/virt/coco/pkvm-guest/arm-pkvm-guest.c b/drivers/virt/coco/pkvm-guest/arm-pkvm-guest.c
> index 4230b817a80b..297e6d6019b8 100644
> --- a/drivers/virt/coco/pkvm-guest/arm-pkvm-guest.c
> +++ b/drivers/virt/coco/pkvm-guest/arm-pkvm-guest.c
> @@ -95,6 +95,11 @@ static int mmio_guard_ioremap_hook(phys_addr_t phys, size_t size,
>  	return 0;
>  }
>
> +bool is_protected_kvm_guest(void)
> +{
> +	return !!pkvm_granule;
> +}
> +
>  void pkvm_init_hyp_services(void)
>  {
>  	int i;


-aneesh

^ permalink raw reply

* Re: [PATCH v4 04/13] dma: swiotlb: track pool encryption state and honor DMA_ATTR_CC_SHARED
From: Mostafa Saleh @ 2026-05-14 14:43 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Aneesh Kumar K.V (Arm), iommu, linux-arm-kernel, linux-kernel,
	linux-coco, Robin Murphy, Marek Szyprowski, Will Deacon,
	Marc Zyngier, Steven Price, Suzuki K Poulose, Catalin Marinas,
	Jiri Pirko, Petr Tesarik, Alexey Kardashevskiy, Dan Williams,
	Xu Yilun, linuxppc-dev, linux-s390, Madhavan Srinivasan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy (CS GROUP),
	Alexander Gordeev, Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, x86
In-Reply-To: <20260514123529.GZ7702@ziepe.ca>

On Thu, May 14, 2026 at 09:35:29AM -0300, Jason Gunthorpe wrote:
> > > How will pKVM signal what kind of memory the DMA needs then?
> > > 
> > > Does it use set_memory_decrypted()? How can it use
> > > set_memory_decrypted() without offering CC_ATTR_MEM_ENCRYPT ?
> > 
> > pKVM (hypervisor) doesn’t signal anything.
> > The VMM when running protected guests will use restricted dma-pools
> > for emulated vritio devices in the guest, which gets decrypted by
> > the guest kernel and hence shared with the host kernel, and then
> > traffic is bounced via the pool.
> 
> That really does sound like CC and set_memory_decrypted() to me..
> 
> > It’s also worth noting that bouncing here isn't just about visibility.
> > Because memory sharing operates at page granularity, bouncing sub-page
> > allocations through the restricted pool prevents adjacent, sensitive
> > guest data from being exposed to the untrusted host.
> 
> That's a somewhat different problem, we have the dev->trusted stuff
> that is supposed to deal with this kind of security. We need it for
> IOMMU based systems too, eg hot plug thunderbolt should have it.

I see that it is used only for dma-iommu and for PCI devices.
However, I think that should be a problem with other CCA solutions
with emulated devices as they are untrusted. As I'd expect they
would have virtio devices.

> 
> Then CC issue is more that the DMA API can't decrypt random passed in
> memory because doing so often requires changing the PTEs pointing at
> the page so it would break everything if done transparently.
> 
> > > > I believe that the pool should have a way to control it’s property
> > > > (encrypted or decrypted) and that takes priority over whatever
> > > > attributes comes from allocation.
> > > 
> > > We should get here because dma_capable() fails, and then swiotlb needs
> > > to return something that makes dma_capable() succeed. Yes, it should
> > > return details about the thing it decided, but it shouldn't have been
> > > pre-created with some idea how to make dma_capable() work.
> > 
> > That sounds neat, but at the end we have force_dma_unencrypted() in
> > dma_capable() which is just hardcoded to true/false by the platform.
> 
> For now, the next step is it becomes per-device and dynamic during the
> device lifecycle.
> 
> > How is that different from having the state static by the pool?
> 
> statically attached pools to the device are not so flexible when
> devices have dynamically changing capabilities..

Pools can be per-device also. A device can have mutiple pools with
different memory attrs, which then can be matched by the DMA code
at runtime, it's not as flexible, but removes some complexity from
the guest code.

> 
> > > If dma_capable() can fail, then swiotlb should know exactly what to do
> > > to fix it.
> > 
> > dma_capable() returns a bool, I don’t think it can know what exactly
> > went wrong (based on address, size, attrs, dev...)
> 
> Yes, but I think the design is swiotlb is supposed to re-inspect what
> is going on against the limits dma_capable checks and then select the
> correct remedy..

I see, but that’s not part of this series, and probably would require
some rework so dma_capable() can return an error code (ERANGE, EPERM...)
so that caller can deal with that.

> 
> > While we can debate the aesthetics of the setup , this is
> > the exisitng behaviour for Linux, which existed for years
> > and pKVM relies on and is used extensively.
> > And, this patch alters that long-standing logic and introduces
> > a functional regression.
> 
> Yeah, Aneesh needs to do something here, I'm pointing out it is
> entirely seperate thing from the CC path we are working on which is
> decoupling CC from reylying on force swiotlb.

I am looking into converting pKVM to use the CC stuff, I replied with
a patch to Aneesh in this thread. However, I need to do more testing
and make sure there are not any unwanted consequences.

> 
> > We can address this by either adjusting this patch or by changing
> > pKVM guests to be more aligned with other CCA guests which is
> > something I have been wondering about if it would help reduce
> > bouncing.
> 
> Every time I look at pkvm I think it is just ARM CCA with a different
> design and no access to the unique HW features..
> 
> > > If we can make that work then maybe the flows are designed correctly.
> > 
> > Mmm, I am not sure I understand this one, shouldn’t the device also be
> > notified about the switch in memory state, if it expects to read/write
> > decrypted memory, how would that work if the kernel changes it to an
> > encrypted one?
> 
> Nothing on the device changes. In a CC world we put the device in a
> T=0 or T=1 state before the driver loads and the expectation from the
> DMA API is that the device will only use that T=x DMA type during
> operation.
> 
> A T=1 state device can access all of memory, private or shared. Any
> information the platform may need is encoded in the dma_addr_t or in
> the S1 IOPTEs.
> 
> So we never need to tell the device driver what kind of memory the DMA
> is targetting, and we NEVER expect a device in T=1 mode to have to
> issue a T=0 DMA to use the DMA API.
> 
> In a pkvm world it should be the same, the S2 table for the SMMU will
> control what the device can access, and if the SMMU points to a
> "private" or "shared" page is not something the device needs to know
> or care about.

I see that's because dma-iommu chooses the attrs for iommu_map().

In pKVM, dma_addr_t and IOPTE are the same for private and shared,
so nothing differs in that case.
We don’t expect pass-through devices to interact with shared
memory (T=0) at the moment.
However, I can see use cases for that, where the host and the guest
collaborate with device passthrough and require zero copy.

One other interesting case for device-passthrough is non-coherent
devices which then require private pools for bouncing.

Thanks,
Mostafa

> 
> Jason

^ permalink raw reply

* Re: [PATCH v4 04/13] dma: swiotlb: track pool encryption state and honor DMA_ATTR_CC_SHARED
From: Mostafa Saleh @ 2026-05-14 15:43 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Aneesh Kumar K.V, iommu, linux-arm-kernel, linux-kernel,
	linux-coco, Robin Murphy, Marek Szyprowski, Will Deacon,
	Marc Zyngier, Steven Price, Suzuki K Poulose, Catalin Marinas,
	Jiri Pirko, Petr Tesarik, Alexey Kardashevskiy, Dan Williams,
	Xu Yilun, linuxppc-dev, linux-s390, Madhavan Srinivasan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy (CS GROUP),
	Alexander Gordeev, Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, x86
In-Reply-To: <20260514143733.GB7702@ziepe.ca>

On Thu, May 14, 2026 at 11:37:33AM -0300, Jason Gunthorpe wrote:
> On Thu, May 14, 2026 at 06:18:05PM +0530, Aneesh Kumar K.V wrote:
> > > There is no problem with non-protected guests as they don't use memory
> > > encryption, my initial thought was that th encrpyted/decrypted is
> > > per-pool property which is decided by FW (device-tree).
> > 
> > What I meant was that we need a generic way to identify a pKVM guest, so
> > that we can use it in the conditional above.
> 
> If I understood Mostafa's remarks I think different devices in the
> guest need shared/decrypted and some don't? Ie a virtio hypervisor
> device needs shared while a real PCI device doesn't? Is that right?

In upstream, device passthrough is not supported, but that case is
supported in Android and we plan to upstream it (it currently
depends on the SMMUv3 series first)

> 
> In CC terms that would be a mixture of T=0 and T=1 devices hardwired
> and signaled by firwmare..
> 
> Ideally we'd have a flow where if the arch precreates a swiotlb pool
> with special parameters this overrides all other decision making. Then
> this series is about making CC NOT use that flow... ??

Yes, I believe that will be needed, we do this at android by a per-pool
property added in the device tree.

Thanks,
Mostafa

> 
> Jason

^ permalink raw reply

* Re: [PATCH v5 1/3] firmware: smccc: coco: Manage arm-smccc platform device and CCA auxiliary drivers
From: Catalin Marinas @ 2026-05-14 17:10 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Greg KH, Suzuki K Poulose, linux-coco, linux-arm-kernel,
	linux-kernel, Jeremy Linton, Jonathan Cameron, Lorenzo Pieralisi,
	Mark Rutland, Sudeep Holla, Will Deacon, Steven Price
In-Reply-To: <yq5ajyt65a5y.fsf@kernel.org>

On Thu, May 14, 2026 at 08:08:01PM +0530, Aneesh Kumar K.V wrote:
> Catalin Marinas <catalin.marinas@arm.com> writes:
> > On Thu, May 14, 2026 at 02:55:48PM +0200, Greg Kroah-Hartman wrote:
> >> On Thu, May 14, 2026 at 12:04:13PM +0100, Suzuki K Poulose wrote:
> >> > On 14/05/2026 10:40, Aneesh Kumar K.V (Arm) wrote:
> >> > > Make the SMCCC driver responsible for registering the arm-smccc platform
> >> > > device and after confirming the relevant SMCCC function IDs, create
> >> > > the arm_cca_guest auxiliary device.
> >> > > 
> >> > 
> >> > There are a few changes squashed in to this patch. Please could we
> >> > split the patch in the following order ?
> >> > 
> >> > 1. Add platform device for arm-smccc
> >> 
> >> Do not make any more "fake" platform devices please.
> >> 
> >> > 2. Move TRNG to Auxilliary Device - (Even though it is a later patch, move
> >> > it before the RSI changes)
> >> 
> >> No, move it to the faux api please.
> >
> > So should we end up with:
> >
> >   /sys/devices/faux/arm-smccc/
> >     smccc_trng/
> >     arm-rsi-dev/
> >       tsm/tsm0
> >
> >   /sys/class/tsm/tsm0
> >     -> ../../devices/faux/arm-smccc/arm-rsi-dev/tsm/tsm0
> >
> >   /sys/firmware/cca/
> >     realm_guest
> 
> But we need the ability to autoload different TSM backend drivers based
> on the support/availability of these SMCCC function-id ranges. faux
> device don't support that.

It breaks this but can we not have some systemd rule that checks
/sys/firmware/cca/realm_guest and modprobes arm_cca_guest?

Alternatively, we could do a request_module("arm_cca_guest") if
RSI is available when we check it in smccc_devices_init(). Or make it
even fancier with a request_module("arm-smccc-service-...") (some ID
range while arm-cca-guest.c has a corresponding MODULE_ALIAS() for that
SMCCC range.

-- 
Catalin

^ permalink raw reply

* Re: [PATCH v5 1/3] firmware: smccc: coco: Manage arm-smccc platform device and CCA auxiliary drivers
From: Greg KH @ 2026-05-14 17:13 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Catalin Marinas, Suzuki K Poulose, linux-coco, linux-arm-kernel,
	linux-kernel, Jeremy Linton, Jonathan Cameron, Lorenzo Pieralisi,
	Mark Rutland, Sudeep Holla, Will Deacon, Steven Price
In-Reply-To: <yq5ajyt65a5y.fsf@kernel.org>

On Thu, May 14, 2026 at 08:08:01PM +0530, Aneesh Kumar K.V wrote:
> Catalin Marinas <catalin.marinas@arm.com> writes:
> 
> > On Thu, May 14, 2026 at 02:55:48PM +0200, Greg Kroah-Hartman wrote:
> >> On Thu, May 14, 2026 at 12:04:13PM +0100, Suzuki K Poulose wrote:
> >> > On 14/05/2026 10:40, Aneesh Kumar K.V (Arm) wrote:
> >> > > Make the SMCCC driver responsible for registering the arm-smccc platform
> >> > > device and after confirming the relevant SMCCC function IDs, create
> >> > > the arm_cca_guest auxiliary device.
> >> > > 
> >> > 
> >> > There are a few changes squashed in to this patch. Please could we
> >> > split the patch in the following order ?
> >> > 
> >> > 1. Add platform device for arm-smccc
> >> 
> >> Do not make any more "fake" platform devices please.
> >> 
> >> > 2. Move TRNG to Auxilliary Device - (Even though it is a later patch, move
> >> > it before the RSI changes)
> >> 
> >> No, move it to the faux api please.
> >
> > So should we end up with:
> >
> >   /sys/devices/faux/arm-smccc/
> >     smccc_trng/
> >     arm-rsi-dev/
> >       tsm/tsm0
> >
> >   /sys/class/tsm/tsm0
> >     -> ../../devices/faux/arm-smccc/arm-rsi-dev/tsm/tsm0
> >
> >   /sys/firmware/cca/
> >     realm_guest
> 
> But we need the ability to autoload different TSM backend drivers based
> on the support/availability of these SMCCC function-id ranges. faux
> device don't support that.

If you really need to "autoload" things, then do it like any other
normal bus or class does this, and send the proper uevents as needed.
Don't abuse the auxbus code for this please!

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH v5 1/3] firmware: smccc: coco: Manage arm-smccc platform device and CCA auxiliary drivers
From: Greg KH @ 2026-05-14 17:14 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Suzuki K Poulose, linux-coco, linux-arm-kernel, linux-kernel,
	Catalin Marinas, Jeremy Linton, Jonathan Cameron,
	Lorenzo Pieralisi, Mark Rutland, Sudeep Holla, Will Deacon,
	Steven Price
In-Reply-To: <yq5alddm5a6w.fsf@kernel.org>

On Thu, May 14, 2026 at 08:07:27PM +0530, Aneesh Kumar K.V wrote:
> Greg KH <gregkh@linuxfoundation.org> writes:
> 
> > On Thu, May 14, 2026 at 12:04:13PM +0100, Suzuki K Poulose wrote:
> >> Hi Aneesh
> >> 
> >> On 14/05/2026 10:40, Aneesh Kumar K.V (Arm) wrote:
> >> > Make the SMCCC driver responsible for registering the arm-smccc platform
> >> > device and after confirming the relevant SMCCC function IDs, create
> >> > the arm_cca_guest auxiliary device.
> >> > 
> >> 
> >> There are a few changes squashed in to this patch. Please could we
> >> split the patch in the following order ?
> >> 
> >> 1. Add platform device for arm-smccc
> >
> > Do not make any more "fake" platform devices please.
> >
> >> 2. Move TRNG to Auxilliary Device - (Even though it is a later patch, move
> >> it before the RSI changes)
> >
> > No, move it to the faux api please.
> >
> 
> 
> Maybe I was not complete in my previous reply. I did not want to repeat
> the entire thread, so I quoted the lore link for more details.
> 
> 1. We have platform firmware-provided SMCCC interfaces. Based on the
> support/availability of these function IDs, we want to load multiple
> drivers.
> 2. This patch series adds a platform device to represent the
> firmware-provided SMCCC resource.
> 3. Different SMCCC ranges are now represented as auxiliary devices.
> 4. Different subsystems, such as TSM, can autoload their backend drivers
> based on the availability of these SMCCC ranges, which are now
> represented as auxiliary devices.
> 
> You had agreed to all of this in the previous discussion here:
> https://lore.kernel.org/all/2025101516-handbook-hyphen-62ec@gregkh

Then why did someone say "this is a fake platform device with no actual
resources"?  That's what I was triggering off of.

Again, if you have actual platform resources, GREAT, use a platform
device and aux.  If you do not, then do NOT use a platform device.

totally confused,

greg k-h

^ permalink raw reply

* [PATCH v2 00/15] KVM: x86: Clean up kvm_<reg>_{read,write}() mess
From: Sean Christopherson @ 2026-05-14 21:53 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
	Kiryl Shutsemau, David Woodhouse, Paul Durrant
  Cc: Dave Hansen, Rick Edgecombe, kvm, x86, linux-coco, linux-kernel,
	Yosry Ahmed, Kai Huang, Binbin Wu

Add proper, explicit "raw" versions of kvm_<reg>_{read,write}(), along
with "e" versions (for hardcoded 32-bit accesses), and convert the
existing kvm_<reg>_{read,write}() APIs into mode-aware variants.

This was prompted by commit 435741a4e766 ("KVM: SVM: Properly check RAX
on #GP intercept of SVM instructions"), where using kvm_rax_read() to
get EAX/RAX would have (*very* surprisingly) been wrong as it's actually
a "raw" variant that doesn't truncate accesses when the guest is in 32-bit
mode.

Aside from my dislike of inconsistent APIs, I really want to avoid carrying
code that's subtly relying on using kvm_register_read(...) when accessing a
hardcoded register.

Fix a handful of minor warts along the way.

Oh, and introduce regs.{c,h}, which just a "minor" addendum.  Yosry pointed
out that moving _more_ code into x86.h was rather gross (especially since the
code split was super arbitrary), and it turns out that create regs.{c,h} isn't
all that hard.  In the future, I think we can also add msr.{c,h}, so I very
deliberately didn't include that functionality in regs.{c,h}.

v2:
 - Collect tags. [Yosry, Kai
 - Fix some truly egregious goofs. [Binbin]
 - Rename kvm_cache_regs.h => regs.h, add regs.c. [Yosry, though he'll
   probably yell at me for saying this was his suggestion :-) ]
 - Drop superfluous casting/masking of e*x() usage. [Kai]

v1: https://lore.kernel.org/all/20260409235622.2052730-1-seanjc@google.com

Sean Christopherson (15):
  KVM: SVM: Truncate INVLPGA address in compatibility mode
  KVM: x86/xen: Bug the VM if 32-bit KVM observes a 64-bit mode
    hypercall
  KVM: x86/xen: Don't truncate RAX when handling hypercall from
    protected guest
  KVM: VMX: Read 32-bit GPR values for ENCLS instructions outside of
    64-bit mode
  KVM: x86: Trace hypercall register *after* truncating values for
    32-bit
  KVM: x86: Rename kvm_cache_regs.h => regs.h
  KVM: x86: Move inlined CR and DR helpers from x86.h to regs.h
  KVM: x86: Add mode-aware versions of kvm_<reg>_{read,write}() helpers
  KVM: x86: Drop non-raw kvm_<reg>_write() helpers
  KVM: nSVM: Use kvm_rax_read() now that it's mode-aware
  Revert "KVM: VMX: Read 32-bit GPR values for ENCLS instructions
    outside of 64-bit mode"
  KVM: x86: Harden is_64_bit_hypercall() against bugs on 32-bit kernels
  KVM: x86: Move update_cr8_intercept() to lapic.c
  KVM: x86: Move kvm_pv_async_pf_enabled() to x86.h (as an inline)
  KVM: x86: Move the bulk of register specific code from x86.c to regs.c

 arch/x86/include/asm/kvm_host.h           |   2 -
 arch/x86/kvm/Makefile                     |   4 +-
 arch/x86/kvm/cpuid.c                      |  12 +-
 arch/x86/kvm/emulate.c                    |   2 +-
 arch/x86/kvm/hyperv.c                     |  21 +-
 arch/x86/kvm/hyperv.h                     |   4 +-
 arch/x86/kvm/lapic.c                      |  28 +-
 arch/x86/kvm/lapic.h                      |   1 +
 arch/x86/kvm/mmu.h                        |   2 +-
 arch/x86/kvm/mmu/mmu.c                    |   2 +-
 arch/x86/kvm/regs.c                       | 829 +++++++++++++++++++
 arch/x86/kvm/{kvm_cache_regs.h => regs.h} | 203 ++++-
 arch/x86/kvm/smm.c                        |   2 +-
 arch/x86/kvm/svm/nested.c                 |   8 +-
 arch/x86/kvm/svm/svm.c                    |  19 +-
 arch/x86/kvm/svm/svm.h                    |   2 +-
 arch/x86/kvm/vmx/nested.c                 |   8 +-
 arch/x86/kvm/vmx/nested.h                 |   2 +-
 arch/x86/kvm/vmx/sgx.c                    |   6 +-
 arch/x86/kvm/vmx/tdx.c                    |  18 +-
 arch/x86/kvm/vmx/vmx.c                    |   2 +-
 arch/x86/kvm/vmx/vmx.h                    |   2 +-
 arch/x86/kvm/x86.c                        | 935 +---------------------
 arch/x86/kvm/x86.h                        | 116 +--
 arch/x86/kvm/xen.c                        |  39 +-
 25 files changed, 1162 insertions(+), 1107 deletions(-)
 create mode 100644 arch/x86/kvm/regs.c
 rename arch/x86/kvm/{kvm_cache_regs.h => regs.h} (58%)


base-commit: a9512a611bd030088f13477258d1f8103cceaa40
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply

* [PATCH v2 01/15] KVM: SVM: Truncate INVLPGA address in compatibility mode
From: Sean Christopherson @ 2026-05-14 21:53 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
	Kiryl Shutsemau, David Woodhouse, Paul Durrant
  Cc: Dave Hansen, Rick Edgecombe, kvm, x86, linux-coco, linux-kernel,
	Yosry Ahmed, Kai Huang, Binbin Wu
In-Reply-To: <20260514215355.1648463-1-seanjc@google.com>

Check for full 64-bit mode, not just long mode, when truncating the
virtual address as part of INVLPGA emulation.  Compatibility mode doesn't
support 64-bit addressing.

Note, the FIXME still applies, e.g. if the guest deliberately targeted
EAX while in 64-bit via an address size override.  That flaw isn't worth
fixing as it would require decoding the code stream, which would open a
an entirely different can of worms, and in practice no sane guest would
shove garbage into RAX[63:32] and execute INVLPGA.

Note #2, VMSAVE, VMLOAD, and VMRUN all suffer from the same architectural
flaw of not providing the full linear address in a VMCB exit information
field, because, quoting the APM verbatim:

  the linear address is available directly from the guest rAX register

(VMSAVE, VMLOAD, and VMRUN take a physical address, but they're behavior
with respect to rAX is otherwise identical).

Fixes: bc9eff67fc35 ("KVM: SVM: Use default rAX size for INVLPGA emulation")
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e74fcde6155e..4ad87f8df392 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2415,7 +2415,7 @@ static int invlpga_interception(struct kvm_vcpu *vcpu)
 		return 1;

 	/* FIXME: Handle an address size prefix. */
-	if (!is_long_mode(vcpu))
+	if (!is_64_bit_mode(vcpu))
 		gva = (u32)gva;

 	trace_kvm_invlpga(to_svm(vcpu)->vmcb->save.rip, asid, gva);
-- 
2.54.0.563.g4f69b47b94-goog

^ permalink raw reply related

* [PATCH v2 02/15] KVM: x86/xen: Bug the VM if 32-bit KVM observes a 64-bit mode hypercall
From: Sean Christopherson @ 2026-05-14 21:53 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
	Kiryl Shutsemau, David Woodhouse, Paul Durrant
  Cc: Dave Hansen, Rick Edgecombe, kvm, x86, linux-coco, linux-kernel,
	Yosry Ahmed, Kai Huang, Binbin Wu
In-Reply-To: <20260514215355.1648463-1-seanjc@google.com>

Bug the VM if 32-bit KVM attempts to handle a 64-bit hypercall, primarily
so that a future change to set "input" in mode-specific code doesn't
trigger a false positive warn=>error:

  arch/x86/kvm/xen.c:1687:6: error: variable 'input' is used uninitialized
                                    whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
   1687 |         if (!longmode) {
        |             ^~~~~~~~~
  arch/x86/kvm/xen.c:1708:31: note: uninitialized use occurs here
   1708 |         trace_kvm_xen_hypercall(cpl, input, params[0], params[1], params[2],
        |                                      ^~~~~
  x86/kvm/xen.c:1687:2: note: remove the 'if' if its condition is always true
   1687 |         if (!longmode) {
        |         ^~~~~~~~~~~~~~
  arch/x86/kvm/xen.c:1677:11: note: initialize the variable 'input' to silence this warning
   1677 |         u64 input, params[6], r = -ENOSYS;
        |                  ^
  1 error generated.

Note, params[] also has the same flaw, but -Wsometimes-uninitialized
doesn't seem to be enforced for arrays, presumably because it's difficult
to avoid false positives on specific entries.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/xen.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 91fd3673c09a..6d9be74bb673 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -1694,16 +1694,19 @@ int kvm_xen_hypercall(struct kvm_vcpu *vcpu)
 		params[4] = (u32)kvm_rdi_read(vcpu);
 		params[5] = (u32)kvm_rbp_read(vcpu);
 	}
-#ifdef CONFIG_X86_64
 	else {
+#ifdef CONFIG_X86_64
 		params[0] = (u64)kvm_rdi_read(vcpu);
 		params[1] = (u64)kvm_rsi_read(vcpu);
 		params[2] = (u64)kvm_rdx_read(vcpu);
 		params[3] = (u64)kvm_r10_read(vcpu);
 		params[4] = (u64)kvm_r8_read(vcpu);
 		params[5] = (u64)kvm_r9_read(vcpu);
-	}
+#else
+		KVM_BUG_ON(1, vcpu->kvm);
+		return -EIO;
 #endif
+	}
 	cpl = kvm_x86_call(get_cpl)(vcpu);
 	trace_kvm_xen_hypercall(cpl, input, params[0], params[1], params[2],
 				params[3], params[4], params[5]);
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related

* [PATCH v2 03/15] KVM: x86/xen: Don't truncate RAX when handling hypercall from protected guest
From: Sean Christopherson @ 2026-05-14 21:53 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
	Kiryl Shutsemau, David Woodhouse, Paul Durrant
  Cc: Dave Hansen, Rick Edgecombe, kvm, x86, linux-coco, linux-kernel,
	Yosry Ahmed, Kai Huang, Binbin Wu
In-Reply-To: <20260514215355.1648463-1-seanjc@google.com>

Don't truncate RAX when handling a Xen hypercall for a guest with protected
state, as KVM's ABI is to assume the guest is in 64-bit for such cases
(the guest leaving garbage in 63:32 after a transition to 32-bit mode is
far less likely than 63:32 being necessary to complete the hypercall).

Fixes: b5aead0064f3 ("KVM: x86: Assume a 64-bit hypercall for guests with protected state")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/xen.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 6d9be74bb673..895095dc684e 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -1678,15 +1678,14 @@ int kvm_xen_hypercall(struct kvm_vcpu *vcpu)
 	bool handled = false;
 	u8 cpl;
 
-	input = (u64)kvm_register_read(vcpu, VCPU_REGS_RAX);
-
 	/* Hyper-V hypercalls get bit 31 set in EAX */
-	if ((input & 0x80000000) &&
+	if ((kvm_rax_read(vcpu) & 0x80000000) &&
 	    kvm_hv_hypercall_enabled(vcpu))
 		return kvm_hv_hypercall(vcpu);
 
 	longmode = is_64_bit_hypercall(vcpu);
 	if (!longmode) {
+		input = (u32)kvm_rax_read(vcpu);
 		params[0] = (u32)kvm_rbx_read(vcpu);
 		params[1] = (u32)kvm_rcx_read(vcpu);
 		params[2] = (u32)kvm_rdx_read(vcpu);
@@ -1696,6 +1695,7 @@ int kvm_xen_hypercall(struct kvm_vcpu *vcpu)
 	}
 	else {
 #ifdef CONFIG_X86_64
+		input = (u64)kvm_rax_read(vcpu);
 		params[0] = (u64)kvm_rdi_read(vcpu);
 		params[1] = (u64)kvm_rsi_read(vcpu);
 		params[2] = (u64)kvm_rdx_read(vcpu);
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related

* [PATCH v2 04/15] KVM: VMX: Read 32-bit GPR values for ENCLS instructions outside of 64-bit mode
From: Sean Christopherson @ 2026-05-14 21:53 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
	Kiryl Shutsemau, David Woodhouse, Paul Durrant
  Cc: Dave Hansen, Rick Edgecombe, kvm, x86, linux-coco, linux-kernel,
	Yosry Ahmed, Kai Huang, Binbin Wu
In-Reply-To: <20260514215355.1648463-1-seanjc@google.com>

When getting register values for ENCLS emulation, use kvm_register_read()
instead of kvm_<reg>_read() so that bits 63:32 of the register are dropped
if the guest is in 32-bit mode.

Note, the misleading/surprising behavior of kvm_<reg>_read() being "raw"
variants under the hood will be addressed once all non-benign bugs are
fixed.

Fixes: 70210c044b4e ("KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions")
Fixes: b6f084ca5538 ("KVM: VMX: Add ENCLS[EINIT] handler to support SGX Launch Control (LC)")
Acked-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/sgx.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
index df1d0cf76947..4c61fc33f764 100644
--- a/arch/x86/kvm/vmx/sgx.c
+++ b/arch/x86/kvm/vmx/sgx.c
@@ -225,8 +225,8 @@ static int handle_encls_ecreate(struct kvm_vcpu *vcpu)
 	struct x86_exception ex;
 	int r;
 
-	if (sgx_get_encls_gva(vcpu, kvm_rbx_read(vcpu), 32, 32, &pageinfo_gva) ||
-	    sgx_get_encls_gva(vcpu, kvm_rcx_read(vcpu), 4096, 4096, &secs_gva))
+	if (sgx_get_encls_gva(vcpu, kvm_register_read(vcpu, VCPU_REGS_RBX), 32, 32, &pageinfo_gva) ||
+	    sgx_get_encls_gva(vcpu, kvm_register_read(vcpu, VCPU_REGS_RCX), 4096, 4096, &secs_gva))
 		return 1;
 
 	/*
@@ -302,9 +302,9 @@ static int handle_encls_einit(struct kvm_vcpu *vcpu)
 	gpa_t sig_gpa, secs_gpa, token_gpa;
 	int ret, trapnr;
 
-	if (sgx_get_encls_gva(vcpu, kvm_rbx_read(vcpu), 1808, 4096, &sig_gva) ||
-	    sgx_get_encls_gva(vcpu, kvm_rcx_read(vcpu), 4096, 4096, &secs_gva) ||
-	    sgx_get_encls_gva(vcpu, kvm_rdx_read(vcpu), 304, 512, &token_gva))
+	if (sgx_get_encls_gva(vcpu, kvm_register_read(vcpu, VCPU_REGS_RBX), 1808, 4096, &sig_gva) ||
+	    sgx_get_encls_gva(vcpu, kvm_register_read(vcpu, VCPU_REGS_RCX), 4096, 4096, &secs_gva) ||
+	    sgx_get_encls_gva(vcpu, kvm_register_read(vcpu, VCPU_REGS_RDX), 304, 512, &token_gva))
 		return 1;
 
 	/*
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox