LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v4 04/13] dma: swiotlb: track pool encryption state and honor DMA_ATTR_CC_SHARED
From: Mostafa Saleh @ 2026-05-14 14:43 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Aneesh Kumar K.V (Arm), iommu, linux-arm-kernel, linux-kernel,
	linux-coco, Robin Murphy, Marek Szyprowski, Will Deacon,
	Marc Zyngier, Steven Price, Suzuki K Poulose, Catalin Marinas,
	Jiri Pirko, Petr Tesarik, Alexey Kardashevskiy, Dan Williams,
	Xu Yilun, linuxppc-dev, linux-s390, Madhavan Srinivasan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy (CS GROUP),
	Alexander Gordeev, Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, x86
In-Reply-To: <20260514123529.GZ7702@ziepe.ca>

On Thu, May 14, 2026 at 09:35:29AM -0300, Jason Gunthorpe wrote:
> > > How will pKVM signal what kind of memory the DMA needs then?
> > > 
> > > Does it use set_memory_decrypted()? How can it use
> > > set_memory_decrypted() without offering CC_ATTR_MEM_ENCRYPT ?
> > 
> > pKVM (hypervisor) doesn’t signal anything.
> > The VMM when running protected guests will use restricted dma-pools
> > for emulated vritio devices in the guest, which gets decrypted by
> > the guest kernel and hence shared with the host kernel, and then
> > traffic is bounced via the pool.
> 
> That really does sound like CC and set_memory_decrypted() to me..
> 
> > It’s also worth noting that bouncing here isn't just about visibility.
> > Because memory sharing operates at page granularity, bouncing sub-page
> > allocations through the restricted pool prevents adjacent, sensitive
> > guest data from being exposed to the untrusted host.
> 
> That's a somewhat different problem, we have the dev->trusted stuff
> that is supposed to deal with this kind of security. We need it for
> IOMMU based systems too, eg hot plug thunderbolt should have it.

I see that it is used only for dma-iommu and for PCI devices.
However, I think that should be a problem with other CCA solutions
with emulated devices as they are untrusted. As I'd expect they
would have virtio devices.

> 
> Then CC issue is more that the DMA API can't decrypt random passed in
> memory because doing so often requires changing the PTEs pointing at
> the page so it would break everything if done transparently.
> 
> > > > I believe that the pool should have a way to control it’s property
> > > > (encrypted or decrypted) and that takes priority over whatever
> > > > attributes comes from allocation.
> > > 
> > > We should get here because dma_capable() fails, and then swiotlb needs
> > > to return something that makes dma_capable() succeed. Yes, it should
> > > return details about the thing it decided, but it shouldn't have been
> > > pre-created with some idea how to make dma_capable() work.
> > 
> > That sounds neat, but at the end we have force_dma_unencrypted() in
> > dma_capable() which is just hardcoded to true/false by the platform.
> 
> For now, the next step is it becomes per-device and dynamic during the
> device lifecycle.
> 
> > How is that different from having the state static by the pool?
> 
> statically attached pools to the device are not so flexible when
> devices have dynamically changing capabilities..

Pools can be per-device also. A device can have mutiple pools with
different memory attrs, which then can be matched by the DMA code
at runtime, it's not as flexible, but removes some complexity from
the guest code.

> 
> > > If dma_capable() can fail, then swiotlb should know exactly what to do
> > > to fix it.
> > 
> > dma_capable() returns a bool, I don’t think it can know what exactly
> > went wrong (based on address, size, attrs, dev...)
> 
> Yes, but I think the design is swiotlb is supposed to re-inspect what
> is going on against the limits dma_capable checks and then select the
> correct remedy..

I see, but that’s not part of this series, and probably would require
some rework so dma_capable() can return an error code (ERANGE, EPERM...)
so that caller can deal with that.

> 
> > While we can debate the aesthetics of the setup , this is
> > the exisitng behaviour for Linux, which existed for years
> > and pKVM relies on and is used extensively.
> > And, this patch alters that long-standing logic and introduces
> > a functional regression.
> 
> Yeah, Aneesh needs to do something here, I'm pointing out it is
> entirely seperate thing from the CC path we are working on which is
> decoupling CC from reylying on force swiotlb.

I am looking into converting pKVM to use the CC stuff, I replied with
a patch to Aneesh in this thread. However, I need to do more testing
and make sure there are not any unwanted consequences.

> 
> > We can address this by either adjusting this patch or by changing
> > pKVM guests to be more aligned with other CCA guests which is
> > something I have been wondering about if it would help reduce
> > bouncing.
> 
> Every time I look at pkvm I think it is just ARM CCA with a different
> design and no access to the unique HW features..
> 
> > > If we can make that work then maybe the flows are designed correctly.
> > 
> > Mmm, I am not sure I understand this one, shouldn’t the device also be
> > notified about the switch in memory state, if it expects to read/write
> > decrypted memory, how would that work if the kernel changes it to an
> > encrypted one?
> 
> Nothing on the device changes. In a CC world we put the device in a
> T=0 or T=1 state before the driver loads and the expectation from the
> DMA API is that the device will only use that T=x DMA type during
> operation.
> 
> A T=1 state device can access all of memory, private or shared. Any
> information the platform may need is encoded in the dma_addr_t or in
> the S1 IOPTEs.
> 
> So we never need to tell the device driver what kind of memory the DMA
> is targetting, and we NEVER expect a device in T=1 mode to have to
> issue a T=0 DMA to use the DMA API.
> 
> In a pkvm world it should be the same, the S2 table for the SMMU will
> control what the device can access, and if the SMMU points to a
> "private" or "shared" page is not something the device needs to know
> or care about.

I see that's because dma-iommu chooses the attrs for iommu_map().

In pKVM, dma_addr_t and IOPTE are the same for private and shared,
so nothing differs in that case.
We don’t expect pass-through devices to interact with shared
memory (T=0) at the moment.
However, I can see use cases for that, where the host and the guest
collaborate with device passthrough and require zero copy.

One other interesting case for device-passthrough is non-coherent
devices which then require private pools for bouncing.

Thanks,
Mostafa

> 
> Jason


^ permalink raw reply

* Re: [PATCH v4 04/13] dma: swiotlb: track pool encryption state and honor DMA_ATTR_CC_SHARED
From: Mostafa Saleh @ 2026-05-14 14:21 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: iommu, linux-arm-kernel, linux-kernel, linux-coco, Robin Murphy,
	Marek Szyprowski, Will Deacon, Marc Zyngier, Steven Price,
	Suzuki K Poulose, Catalin Marinas, Jiri Pirko, Jason Gunthorpe,
	Petr Tesarik, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
	linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, x86
In-Reply-To: <yq5apl2y5f96.fsf@kernel.org>

On Thu, May 14, 2026 at 06:18:05PM +0530, Aneesh Kumar K.V wrote:
> Mostafa Saleh <smostafa@google.com> writes:
> 
> > On Thu, May 14, 2026 at 11:24:42AM +0530, Aneesh Kumar K.V wrote:
> >> Mostafa Saleh <smostafa@google.com> writes:
> >> 
> >> > On Tue, May 12, 2026 at 02:33:59PM +0530, Aneesh Kumar K.V (Arm) wrote:
> >> >> Teach swiotlb to distinguish between encrypted and decrypted bounce
> >> >> buffer pools, and make allocation and mapping paths select a pool whose
> >> >> state matches the requested DMA attributes.
> >> >> 
> >> >> Add a decrypted flag to io_tlb_mem, initialize it for the default and
> >> >> restricted pools, and propagate DMA_ATTR_CC_SHARED into swiotlb pool
> >> >> allocation. Reject swiotlb alloc/map requests when the selected pool does
> >> >> not match the required encrypted/decrypted state.
> >> >> 
> >> >> Also return DMA addresses with the matching phys_to_dma_{encrypted,
> >> >> unencrypted} helper so the DMA address encoding stays consistent with the
> >> >> chosen pool.
> >> >> 
> >> >> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> >> >> ---
> >> >>  include/linux/dma-direct.h |  10 ++++
> >> >>  include/linux/swiotlb.h    |   8 ++-
> >> >>  kernel/dma/direct.c        |  14 +++--
> >> >>  kernel/dma/swiotlb.c       | 108 +++++++++++++++++++++++++++----------
> >> >>  4 files changed, 107 insertions(+), 33 deletions(-)
> >> >> 
> >> >> diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
> >> >> index c249912456f9..94fad4e7c11e 100644
> >> >> --- a/include/linux/dma-direct.h
> >> >> +++ b/include/linux/dma-direct.h
> >> >> @@ -77,6 +77,10 @@ static inline dma_addr_t dma_range_map_max(const struct bus_dma_region *map)
> >> >>  #ifndef phys_to_dma_unencrypted
> >> >>  #define phys_to_dma_unencrypted		phys_to_dma
> >> >>  #endif
> >> >> +
> >> >> +#ifndef phys_to_dma_encrypted
> >> >> +#define phys_to_dma_encrypted		phys_to_dma
> >> >> +#endif
> >> >>  #else
> >> >>  static inline dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
> >> >>  {
> >> >> @@ -90,6 +94,12 @@ static inline dma_addr_t phys_to_dma_unencrypted(struct device *dev,
> >> >>  {
> >> >>  	return dma_addr_unencrypted(__phys_to_dma(dev, paddr));
> >> >>  }
> >> >> +
> >> >> +static inline dma_addr_t phys_to_dma_encrypted(struct device *dev,
> >> >> +		phys_addr_t paddr)
> >> >> +{
> >> >> +	return dma_addr_encrypted(__phys_to_dma(dev, paddr));
> >> >> +}
> >> >>  /*
> >> >>   * If memory encryption is supported, phys_to_dma will set the memory encryption
> >> >>   * bit in the DMA address, and dma_to_phys will clear it.
> >> >> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> >> >> index 3dae0f592063..b3fa3c6e0169 100644
> >> >> --- a/include/linux/swiotlb.h
> >> >> +++ b/include/linux/swiotlb.h
> >> >> @@ -81,6 +81,7 @@ struct io_tlb_pool {
> >> >>  	struct list_head node;
> >> >>  	struct rcu_head rcu;
> >> >>  	bool transient;
> >> >> +	bool unencrypted;
> >> >>  #endif
> >> >>  };
> >> >>  
> >> >> @@ -111,6 +112,7 @@ struct io_tlb_mem {
> >> >>  	struct dentry *debugfs;
> >> >>  	bool force_bounce;
> >> >>  	bool for_alloc;
> >> >> +	bool unencrypted;
> >> >>  #ifdef CONFIG_SWIOTLB_DYNAMIC
> >> >>  	bool can_grow;
> >> >>  	u64 phys_limit;
> >> >> @@ -282,7 +284,8 @@ static inline void swiotlb_sync_single_for_cpu(struct device *dev,
> >> >>  extern void swiotlb_print_info(void);
> >> >>  
> >> >>  #ifdef CONFIG_DMA_RESTRICTED_POOL
> >> >> -struct page *swiotlb_alloc(struct device *dev, size_t size);
> >> >> +struct page *swiotlb_alloc(struct device *dev, size_t size,
> >> >> +		unsigned long attrs);
> >> >>  bool swiotlb_free(struct device *dev, struct page *page, size_t size);
> >> >>  
> >> >>  static inline bool is_swiotlb_for_alloc(struct device *dev)
> >> >> @@ -290,7 +293,8 @@ static inline bool is_swiotlb_for_alloc(struct device *dev)
> >> >>  	return dev->dma_io_tlb_mem->for_alloc;
> >> >>  }
> >> >>  #else
> >> >> -static inline struct page *swiotlb_alloc(struct device *dev, size_t size)
> >> >> +static inline struct page *swiotlb_alloc(struct device *dev, size_t size,
> >> >> +		unsigned long attrs)
> >> >>  {
> >> >>  	return NULL;
> >> >>  }
> >> >> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> >> >> index dc2907439b3d..97ae4fa10521 100644
> >> >> --- a/kernel/dma/direct.c
> >> >> +++ b/kernel/dma/direct.c
> >> >> @@ -104,9 +104,10 @@ static void __dma_direct_free_pages(struct device *dev, struct page *page,
> >> >>  	dma_free_contiguous(dev, page, size);
> >> >>  }
> >> >>  
> >> >> -static struct page *dma_direct_alloc_swiotlb(struct device *dev, size_t size)
> >> >> +static struct page *dma_direct_alloc_swiotlb(struct device *dev, size_t size,
> >> >> +		unsigned long attrs)
> >> >>  {
> >> >> -	struct page *page = swiotlb_alloc(dev, size);
> >> >> +	struct page *page = swiotlb_alloc(dev, size, attrs);
> >> >>  
> >> >>  	if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
> >> >>  		swiotlb_free(dev, page, size);
> >> >> @@ -266,8 +267,12 @@ void *dma_direct_alloc(struct device *dev, size_t size,
> >> >>  						  gfp, attrs);
> >> >>  
> >> >>  	if (is_swiotlb_for_alloc(dev)) {
> >> >> -		page = dma_direct_alloc_swiotlb(dev, size);
> >> >> +		page = dma_direct_alloc_swiotlb(dev, size, attrs);
> >> >>  		if (page) {
> >> >> +			/*
> >> >> +			 * swiotlb allocations comes from pool already marked
> >> >> +			 * decrypted
> >> >> +			 */
> >> >>  			mark_mem_decrypt = false;
> >> >>  			goto setup_page;
> >> >>  		}
> >> >> @@ -374,6 +379,7 @@ void dma_direct_free(struct device *dev, size_t size,
> >> >>  		return;
> >> >>  
> >> >>  	if (swiotlb_find_pool(dev, dma_to_phys(dev, dma_addr)))
> >> >> +		/* Swiotlb doesn't need a page attribute update on free */
> >> >>  		mark_mem_encrypted = false;
> >> >>  
> >> >>  	if (is_vmalloc_addr(cpu_addr)) {
> >> >> @@ -403,7 +409,7 @@ struct page *dma_direct_alloc_pages(struct device *dev, size_t size,
> >> >>  						  gfp, attrs);
> >> >>  
> >> >>  	if (is_swiotlb_for_alloc(dev)) {
> >> >> -		page = dma_direct_alloc_swiotlb(dev, size);
> >> >> +		page = dma_direct_alloc_swiotlb(dev, size, attrs);
> >> >>  		if (!page)
> >> >>  			return NULL;
> >> >>  
> >> >> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> >> >> index ab4eccbaa076..065663be282c 100644
> >> >> --- a/kernel/dma/swiotlb.c
> >> >> +++ b/kernel/dma/swiotlb.c
> >> >> @@ -259,10 +259,21 @@ void __init swiotlb_update_mem_attributes(void)
> >> >>  	struct io_tlb_pool *mem = &io_tlb_default_mem.defpool;
> >> >>  	unsigned long bytes;
> >> >>  
> >> >> +	/*
> >> >> +	 * if platform support memory encryption, swiotlb buffers are
> >> >> +	 * decrypted by default.
> >> >> +	 */
> >> >> +	if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
> >> >> +		io_tlb_default_mem.unencrypted = true;
> >> >> +	else
> >> >> +		io_tlb_default_mem.unencrypted = false;
> >> >> +
> >> >>  	if (!mem->nslabs || mem->late_alloc)
> >> >>  		return;
> >> >>  	bytes = PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT);
> >> >> -	set_memory_decrypted((unsigned long)mem->vaddr, bytes >> PAGE_SHIFT);
> >> >> +
> >> >> +	if (io_tlb_default_mem.unencrypted)
> >> >> +		set_memory_decrypted((unsigned long)mem->vaddr, bytes >> PAGE_SHIFT);
> >> >>  }
> >> >>  
> >> >>  static void swiotlb_init_io_tlb_pool(struct io_tlb_pool *mem, phys_addr_t start,
> >> >> @@ -505,8 +516,10 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
> >> >>  	if (!mem->slots)
> >> >>  		goto error_slots;
> >> >>  
> >> >> -	set_memory_decrypted((unsigned long)vstart,
> >> >> -			     (nslabs << IO_TLB_SHIFT) >> PAGE_SHIFT);
> >> >> +	if (io_tlb_default_mem.unencrypted)
> >> >> +		set_memory_decrypted((unsigned long)vstart,
> >> >> +				     (nslabs << IO_TLB_SHIFT) >> PAGE_SHIFT);
> >> >> +
> >> >>  	swiotlb_init_io_tlb_pool(mem, virt_to_phys(vstart), nslabs, true,
> >> >>  				 nareas);
> >> >>  	add_mem_pool(&io_tlb_default_mem, mem);
> >> >> @@ -539,7 +552,9 @@ void __init swiotlb_exit(void)
> >> >>  	tbl_size = PAGE_ALIGN(mem->end - mem->start);
> >> >>  	slots_size = PAGE_ALIGN(array_size(sizeof(*mem->slots), mem->nslabs));
> >> >>  
> >> >> -	set_memory_encrypted(tbl_vaddr, tbl_size >> PAGE_SHIFT);
> >> >> +	if (io_tlb_default_mem.unencrypted)
> >> >> +		set_memory_encrypted(tbl_vaddr, tbl_size >> PAGE_SHIFT);
> >> >> +
> >> >>  	if (mem->late_alloc) {
> >> >>  		area_order = get_order(array_size(sizeof(*mem->areas),
> >> >>  			mem->nareas));
> >> >> @@ -563,6 +578,7 @@ void __init swiotlb_exit(void)
> >> >>   * @gfp:	GFP flags for the allocation.
> >> >>   * @bytes:	Size of the buffer.
> >> >>   * @phys_limit:	Maximum allowed physical address of the buffer.
> >> >> + * @unencrypted: true to allocate unencrypted memory, false for encrypted memory
> >> >>   *
> >> >>   * Allocate pages from the buddy allocator. If successful, make the allocated
> >> >>   * pages decrypted that they can be used for DMA.
> >> >> @@ -570,7 +586,8 @@ void __init swiotlb_exit(void)
> >> >>   * Return: Decrypted pages, %NULL on allocation failure, or ERR_PTR(-EAGAIN)
> >> >>   * if the allocated physical address was above @phys_limit.
> >> >>   */
> >> >> -static struct page *alloc_dma_pages(gfp_t gfp, size_t bytes, u64 phys_limit)
> >> >> +static struct page *alloc_dma_pages(gfp_t gfp, size_t bytes,
> >> >> +		u64 phys_limit, bool unencrypted)
> >> >>  {
> >> >>  	unsigned int order = get_order(bytes);
> >> >>  	struct page *page;
> >> >> @@ -588,13 +605,13 @@ static struct page *alloc_dma_pages(gfp_t gfp, size_t bytes, u64 phys_limit)
> >> >>  	}
> >> >>  
> >> >>  	vaddr = phys_to_virt(paddr);
> >> >> -	if (set_memory_decrypted((unsigned long)vaddr, PFN_UP(bytes)))
> >> >> +	if (unencrypted && set_memory_decrypted((unsigned long)vaddr, PFN_UP(bytes)))
> >> >>  		goto error;
> >> >>  	return page;
> >> >>  
> >> >>  error:
> >> >>  	/* Intentional leak if pages cannot be encrypted again. */
> >> >> -	if (!set_memory_encrypted((unsigned long)vaddr, PFN_UP(bytes)))
> >> >> +	if (unencrypted && !set_memory_encrypted((unsigned long)vaddr, PFN_UP(bytes)))
> >> >>  		__free_pages(page, order);
> >> >>  	return NULL;
> >> >>  }
> >> >> @@ -604,30 +621,26 @@ static struct page *alloc_dma_pages(gfp_t gfp, size_t bytes, u64 phys_limit)
> >> >>   * @dev:	Device for which a memory pool is allocated.
> >> >>   * @bytes:	Size of the buffer.
> >> >>   * @phys_limit:	Maximum allowed physical address of the buffer.
> >> >> + * @attrs:	DMA attributes for the allocation.
> >> >>   * @gfp:	GFP flags for the allocation.
> >> >>   *
> >> >>   * Return: Allocated pages, or %NULL on allocation failure.
> >> >>   */
> >> >>  static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
> >> >> -		u64 phys_limit, gfp_t gfp)
> >> >> +		u64 phys_limit, unsigned long attrs, gfp_t gfp)
> >> >>  {
> >> >>  	struct page *page;
> >> >> -	unsigned long attrs = 0;
> >> >>  
> >> >>  	/*
> >> >>  	 * Allocate from the atomic pools if memory is encrypted and
> >> >>  	 * the allocation is atomic, because decrypting may block.
> >> >>  	 */
> >> >> -	if (!gfpflags_allow_blocking(gfp) && dev && force_dma_unencrypted(dev)) {
> >> >> +	if (!gfpflags_allow_blocking(gfp) && (attrs & DMA_ATTR_CC_SHARED)) {
> >> >>  		void *vaddr;
> >> >>  
> >> >>  		if (!IS_ENABLED(CONFIG_DMA_COHERENT_POOL))
> >> >>  			return NULL;
> >> >>  
> >> >> -		/* swiotlb considered decrypted by default */
> >> >> -		if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
> >> >> -			attrs = DMA_ATTR_CC_SHARED;
> >> >> -
> >> >>  		return dma_alloc_from_pool(dev, bytes, &vaddr, gfp,
> >> >>  					   attrs, dma_coherent_ok);
> >> >>  	}
> >> >> @@ -638,7 +651,8 @@ static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
> >> >>  	else if (phys_limit <= DMA_BIT_MASK(32))
> >> >>  		gfp |= __GFP_DMA32;
> >> >>  
> >> >> -	while (IS_ERR(page = alloc_dma_pages(gfp, bytes, phys_limit))) {
> >> >> +	while (IS_ERR(page = alloc_dma_pages(gfp, bytes, phys_limit,
> >> >> +					     !!(attrs & DMA_ATTR_CC_SHARED)))) {
> >> >>  		if (IS_ENABLED(CONFIG_ZONE_DMA32) &&
> >> >>  		    phys_limit < DMA_BIT_MASK(64) &&
> >> >>  		    !(gfp & (__GFP_DMA32 | __GFP_DMA)))
> >> >> @@ -657,15 +671,18 @@ static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
> >> >>   * swiotlb_free_tlb() - free a dynamically allocated IO TLB buffer
> >> >>   * @vaddr:	Virtual address of the buffer.
> >> >>   * @bytes:	Size of the buffer.
> >> >> + * @unencrypted: true if @vaddr was allocated decrypted and must be
> >> >> + *	re-encrypted before being freed
> >> >>   */
> >> >> -static void swiotlb_free_tlb(void *vaddr, size_t bytes)
> >> >> +static void swiotlb_free_tlb(void *vaddr, size_t bytes, bool unencrypted)
> >> >>  {
> >> >>  	if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
> >> >>  	    dma_free_from_pool(NULL, vaddr, bytes))
> >> >>  		return;
> >> >>  
> >> >>  	/* Intentional leak if pages cannot be encrypted again. */
> >> >> -	if (!set_memory_encrypted((unsigned long)vaddr, PFN_UP(bytes)))
> >> >> +	if (!unencrypted ||
> >> >> +	    !set_memory_encrypted((unsigned long)vaddr, PFN_UP(bytes)))
> >> >>  		__free_pages(virt_to_page(vaddr), get_order(bytes));
> >> >>  }
> >> >>  
> >> >> @@ -676,6 +693,7 @@ static void swiotlb_free_tlb(void *vaddr, size_t bytes)
> >> >>   * @nslabs:	Desired (maximum) number of slabs.
> >> >>   * @nareas:	Number of areas.
> >> >>   * @phys_limit:	Maximum DMA buffer physical address.
> >> >> + * @attrs:	DMA attributes for the allocation.
> >> >>   * @gfp:	GFP flags for the allocations.
> >> >>   *
> >> >>   * Allocate and initialize a new IO TLB memory pool. The actual number of
> >> >> @@ -686,7 +704,8 @@ static void swiotlb_free_tlb(void *vaddr, size_t bytes)
> >> >>   */
> >> >>  static struct io_tlb_pool *swiotlb_alloc_pool(struct device *dev,
> >> >>  		unsigned long minslabs, unsigned long nslabs,
> >> >> -		unsigned int nareas, u64 phys_limit, gfp_t gfp)
> >> >> +		unsigned int nareas, u64 phys_limit, unsigned long attrs,
> >> >> +		gfp_t gfp)
> >> >>  {
> >> >>  	struct io_tlb_pool *pool;
> >> >>  	unsigned int slot_order;
> >> >> @@ -704,9 +723,10 @@ static struct io_tlb_pool *swiotlb_alloc_pool(struct device *dev,
> >> >>  	if (!pool)
> >> >>  		goto error;
> >> >>  	pool->areas = (void *)pool + sizeof(*pool);
> >> >> +	pool->unencrypted = !!(attrs & DMA_ATTR_CC_SHARED);
> >> >>  
> >> >>  	tlb_size = nslabs << IO_TLB_SHIFT;
> >> >> -	while (!(tlb = swiotlb_alloc_tlb(dev, tlb_size, phys_limit, gfp))) {
> >> >> +	while (!(tlb = swiotlb_alloc_tlb(dev, tlb_size, phys_limit, attrs, gfp))) {
> >> >>  		if (nslabs <= minslabs)
> >> >>  			goto error_tlb;
> >> >>  		nslabs = ALIGN(nslabs >> 1, IO_TLB_SEGSIZE);
> >> >> @@ -724,7 +744,8 @@ static struct io_tlb_pool *swiotlb_alloc_pool(struct device *dev,
> >> >>  	return pool;
> >> >>  
> >> >>  error_slots:
> >> >> -	swiotlb_free_tlb(page_address(tlb), tlb_size);
> >> >> +	swiotlb_free_tlb(page_address(tlb), tlb_size,
> >> >> +			 !!(attrs & DMA_ATTR_CC_SHARED));
> >> >>  error_tlb:
> >> >>  	kfree(pool);
> >> >>  error:
> >> >> @@ -742,7 +763,9 @@ static void swiotlb_dyn_alloc(struct work_struct *work)
> >> >>  	struct io_tlb_pool *pool;
> >> >>  
> >> >>  	pool = swiotlb_alloc_pool(NULL, IO_TLB_MIN_SLABS, default_nslabs,
> >> >> -				  default_nareas, mem->phys_limit, GFP_KERNEL);
> >> >> +				  default_nareas, mem->phys_limit,
> >> >> +				  mem->unencrypted ? DMA_ATTR_CC_SHARED : 0,
> >> >> +				  GFP_KERNEL);
> >> >>  	if (!pool) {
> >> >>  		pr_warn_ratelimited("Failed to allocate new pool");
> >> >>  		return;
> >> >> @@ -762,7 +785,7 @@ static void swiotlb_dyn_free(struct rcu_head *rcu)
> >> >>  	size_t tlb_size = pool->end - pool->start;
> >> >>  
> >> >>  	free_pages((unsigned long)pool->slots, get_order(slots_size));
> >> >> -	swiotlb_free_tlb(pool->vaddr, tlb_size);
> >> >> +	swiotlb_free_tlb(pool->vaddr, tlb_size, pool->unencrypted);
> >> >>  	kfree(pool);
> >> >>  }
> >> >>  
> >> >> @@ -1232,6 +1255,7 @@ static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
> >> >>  	nslabs = nr_slots(alloc_size);
> >> >>  	phys_limit = min_not_zero(*dev->dma_mask, dev->bus_dma_limit);
> >> >>  	pool = swiotlb_alloc_pool(dev, nslabs, nslabs, 1, phys_limit,
> >> >> +				  mem->unencrypted ? DMA_ATTR_CC_SHARED : 0,
> >> >>  				  GFP_NOWAIT);
> >> >>  	if (!pool)
> >> >>  		return -1;
> >> >> @@ -1394,6 +1418,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
> >> >>  		enum dma_data_direction dir, unsigned long attrs)
> >> >>  {
> >> >>  	struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
> >> >> +	bool require_decrypted = false;
> >> >>  	unsigned int offset;
> >> >>  	struct io_tlb_pool *pool;
> >> >>  	unsigned int i;
> >> >> @@ -1411,6 +1436,16 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
> >> >>  	if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
> >> >>  		pr_warn_once("Memory encryption is active and system is using DMA bounce buffers\n");
> >> >>  
> >> >> +	/*
> >> >> +	 * if we are trying to swiotlb map a decrypted paddr or the paddr is encrypted
> >> >> +	 * but the device is forcing decryption, use decrypted io_tlb_mem
> >> >> +	 */
> >> >> +	if ((attrs & DMA_ATTR_CC_SHARED) || force_dma_unencrypted(dev))
> >> >> +		require_decrypted = true;
> >> >> +
> >> >> +	if (require_decrypted != mem->unencrypted)
> >> >> +		return (phys_addr_t)DMA_MAPPING_ERROR;
> >> >> +
> >> >>  	/*
> >> >>  	 * The default swiotlb memory pool is allocated with PAGE_SIZE
> >> >>  	 * alignment. If a mapping is requested with larger alignment,
> >> >> @@ -1608,8 +1643,14 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t paddr, size_t size,
> >> >>  	if (swiotlb_addr == (phys_addr_t)DMA_MAPPING_ERROR)
> >> >>  		return DMA_MAPPING_ERROR;
> >> >>  
> >> >> -	/* Ensure that the address returned is DMA'ble */
> >> >> -	dma_addr = phys_to_dma_unencrypted(dev, swiotlb_addr);
> >> >> +	/*
> >> >> +	 * Use the allocated io_tlb_mem encryption type to determine dma addr.
> >> >> +	 */
> >> >> +	if (dev->dma_io_tlb_mem->unencrypted)
> >> >> +		dma_addr = phys_to_dma_unencrypted(dev, swiotlb_addr);
> >> >> +	else
> >> >> +		dma_addr = phys_to_dma_encrypted(dev, swiotlb_addr);
> >> >> +
> >> >>  	if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
> >> >>  		__swiotlb_tbl_unmap_single(dev, swiotlb_addr, size, dir,
> >> >>  			attrs | DMA_ATTR_SKIP_CPU_SYNC,
> >> >> @@ -1773,7 +1814,8 @@ static inline void swiotlb_create_debugfs_files(struct io_tlb_mem *mem,
> >> >>  
> >> >>  #ifdef CONFIG_DMA_RESTRICTED_POOL
> >> >>  
> >> >> -struct page *swiotlb_alloc(struct device *dev, size_t size)
> >> >> +struct page *swiotlb_alloc(struct device *dev, size_t size,
> >> >> +		unsigned long attrs)
> >> >>  {
> >> >>  	struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
> >> >>  	struct io_tlb_pool *pool;
> >> >> @@ -1784,6 +1826,9 @@ struct page *swiotlb_alloc(struct device *dev, size_t size)
> >> >>  	if (!mem)
> >> >>  		return NULL;
> >> >>  
> >> >> +	if (mem->unencrypted != !!(attrs & DMA_ATTR_CC_SHARED))
> >> >> +		return NULL;
> >> >> +
> >> >>  	align = (1 << (get_order(size) + PAGE_SHIFT)) - 1;
> >> >>  	index = swiotlb_find_slots(dev, 0, size, align, &pool);
> >> >>  	if (index == -1)
> >> >> @@ -1853,9 +1898,18 @@ static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
> >> >>  			kfree(mem);
> >> >>  			return -ENOMEM;
> >> >>  		}
> >> >> +		/*
> >> >> +		 * if platform supports memory encryption,
> >> >> +		 * restricted mem pool is decrypted by default
> >> >> +		 */
> >> >> +		if (cc_platform_has(CC_ATTR_MEM_ENCRYPT)) {
> >> >> +			mem->unencrypted = true;
> >> >> +			set_memory_decrypted((unsigned long)phys_to_virt(rmem->base),
> >> >> +					     rmem->size >> PAGE_SHIFT);
> >> >> +		} else {
> >> >> +			mem->unencrypted = false;
> >> >> +		}
> >> >
> >> > This breaks pKVM as it doesn’t set CC_ATTR_MEM_ENCRYPT, so all virtio
> >> > traffic now fails.
> >> >
> >> > Also, by design, some drivers are clueless about bouncing, so
> >> > I believe that the pool should have a way to control it’s property
> >> > (encrypted or decrypted) and that takes priority over whatever
> >> > attributes comes from allocation.
> >> > And that brings us to the same point whether it’s better to return
> >> > the memory along with it’s state or we pass the requested state.
> >> > I think for other cases it’s fine for the device/DMA-API to dictate
> >> > the attrs, but not in restricted-dma case, the firmware just knows better.
> >> >
> >> 
> >> Is it that the pKVM guest kernel does not have awareness of
> >> encrypted/decrypted DMA allocations? Instead, the firmware attaches
> >> hypervisor-shared pages to the device via restricted-dma-pool? The
> >> kernel then has swiotlb->for_alloc = true, and hence all DMA allocations
> >> go through the restricted-dma-pool?
> >
> > Yes.
> >
> >> 
> >> Given that pKVM supports pkvm_set_memory_encrypted() and
> >> pkvm_set_memory_decrypted(), can we consider adding CC_ATTR_MEM_ENCRYPT
> >> support to pKVM? It would also be good to investigate whether we can set
> >> force_dma_unencrypted(dev) to true where needed.
> >
> > I was looking in to that, but it didn't work because
> > force_dma_unencrypted() is broken with restricted-dma due to the
> > double decryption issue, that's when I sent my first series [1]
> >
> > May be we should land some basic fixes for that path so we can
> > convert pKVM, then we do the full rework.
> >
> > I will revive my old work and see if I can send a RFC.
> >
> > [1] https://lore.kernel.org/all/20260305170335.963568-1-smostafa@google.com/
> >
> 
> With this series, can you check whether the only change needed is
> something like the following?
> 
> modified   kernel/dma/swiotlb.c
> @@ -1905,7 +1905,8 @@ static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
>  		 * if platform supports memory encryption,
>  		 * restricted mem pool is decrypted by default
>  		 */
> -		if (cc_platform_has(CC_ATTR_MEM_ENCRYPT)) {
> +		//if (cc_platform_has(CC_ATTR_MEM_ENCRYPT)) {
> +		if (true) {
>  			mem->unencrypted = true;
>  			set_memory_decrypted((unsigned long)phys_to_virt(rmem->base),
>  					     rmem->size >> PAGE_SHIFT);

Yes, that boots, but I will need to do more tests.

> 
> >
> >> 
> >> I agree that this patch, as it stands, can break pKVM because we are now
> >> missing the set_memory_decrypted() call required for pKVM to work.
> >> 
> >> We now mark the swiotlb io_tlb_mem as unencrypted/encrypted in the guest
> >> using struct io_tlb_mem->unencrypted. I am not clear what we can use for
> >> pKVM to conditionalize this so that it works for both protected and
> >> unprotected guests.
> >
> > There is no problem with non-protected guests as they don't use memory
> > encryption, my initial thought was that th encrpyted/decrypted is
> > per-pool property which is decided by FW (device-tree).
> >
> 
> What I meant was that we need a generic way to identify a pKVM guest, so
> that we can use it in the conditional above.

I have this patch, with that I can boot with your series unmodified,
but I will need to do more testing.

From d795b4c4ee2437587616b2b342e9996afe6d6680 Mon Sep 17 00:00:00 2001
From: Mostafa Saleh <smostafa@google.com>
Date: Thu, 14 May 2026 13:46:15 +0000
Subject: [PATCH] arm64/coco: Add pKVM as a CC platform

pKVM does support memory encryption, expose that to the rest of
the kernel through cc_platform_has()

At the moment, all devices inside the guest are emulated which
requires its memory to be shared back to the host (decrypted), so
set force_dma_unencrypted() to always return true.

Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
 arch/arm64/include/asm/hypervisor.h           |  6 ++++++
 arch/arm64/include/asm/mem_encrypt.h          |  3 ++-
 arch/arm64/kernel/rsi.c                       | 12 ------------
 arch/arm64/mm/init.c                          | 13 +++++++++++++
 drivers/virt/coco/pkvm-guest/arm-pkvm-guest.c |  5 +++++
 5 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/include/asm/hypervisor.h b/arch/arm64/include/asm/hypervisor.h
index a12fd897c877..1b0e15f290be 100644
--- a/arch/arm64/include/asm/hypervisor.h
+++ b/arch/arm64/include/asm/hypervisor.h
@@ -10,8 +10,14 @@ void kvm_arm_target_impl_cpu_init(void);

 #ifdef CONFIG_ARM_PKVM_GUEST
 void pkvm_init_hyp_services(void);
+bool is_protected_kvm_guest(void);
 #else
 static inline void pkvm_init_hyp_services(void) { };
+
+static inline bool is_protected_kvm_guest(void)
+{
+	return false;
+}
 #endif

 static inline void kvm_arch_init_hyp_services(void)
diff --git a/arch/arm64/include/asm/mem_encrypt.h b/arch/arm64/include/asm/mem_encrypt.h
index 314b2b52025f..636f45b4d8af 100644
--- a/arch/arm64/include/asm/mem_encrypt.h
+++ b/arch/arm64/include/asm/mem_encrypt.h
@@ -2,6 +2,7 @@
 #ifndef __ASM_MEM_ENCRYPT_H
 #define __ASM_MEM_ENCRYPT_H

+#include <asm/hypervisor.h>
 #include <asm/rsi.h>

 struct device;
@@ -20,7 +21,7 @@ int realm_register_memory_enc_ops(void);

 static inline bool force_dma_unencrypted(struct device *dev)
 {
-	return is_realm_world();
+	return is_realm_world() || is_protected_kvm_guest();
 }

 /*
diff --git a/arch/arm64/kernel/rsi.c b/arch/arm64/kernel/rsi.c
index 92160f2e57ff..25ca75ce1a4d 100644
--- a/arch/arm64/kernel/rsi.c
+++ b/arch/arm64/kernel/rsi.c
@@ -7,7 +7,6 @@
 #include <linux/memblock.h>
 #include <linux/psci.h>
 #include <linux/swiotlb.h>
-#include <linux/cc_platform.h>
 #include <linux/platform_device.h>

 #include <asm/io.h>
@@ -23,17 +22,6 @@ EXPORT_SYMBOL(prot_ns_shared);
 DEFINE_STATIC_KEY_FALSE_RO(rsi_present);
 EXPORT_SYMBOL(rsi_present);

-bool cc_platform_has(enum cc_attr attr)
-{
-	switch (attr) {
-	case CC_ATTR_MEM_ENCRYPT:
-		return is_realm_world();
-	default:
-		return false;
-	}
-}
-EXPORT_SYMBOL_GPL(cc_platform_has);
-
 static bool rsi_version_matches(void)
 {
 	unsigned long ver_lower, ver_higher;
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index acf67c7064db..a087ac5b15f7 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -12,6 +12,7 @@
 #include <linux/swap.h>
 #include <linux/init.h>
 #include <linux/cache.h>
+#include <linux/cc_platform.h>
 #include <linux/mman.h>
 #include <linux/nodemask.h>
 #include <linux/initrd.h>
@@ -36,6 +37,7 @@

 #include <asm/boot.h>
 #include <asm/fixmap.h>
+#include <asm/hypervisor.h>
 #include <asm/kasan.h>
 #include <asm/kernel-pgtable.h>
 #include <asm/kvm_host.h>
@@ -414,6 +416,17 @@ void dump_mem_limit(void)
 	}
 }

+bool cc_platform_has(enum cc_attr attr)
+{
+	switch (attr) {
+	case CC_ATTR_MEM_ENCRYPT:
+		return is_realm_world() || is_protected_kvm_guest();
+	default:
+		return false;
+	}
+}
+EXPORT_SYMBOL_GPL(cc_platform_has);
+
 #ifdef CONFIG_EXECMEM
 static u64 module_direct_base __ro_after_init = 0;
 static u64 module_plt_base __ro_after_init = 0;
diff --git a/drivers/virt/coco/pkvm-guest/arm-pkvm-guest.c b/drivers/virt/coco/pkvm-guest/arm-pkvm-guest.c
index 4230b817a80b..297e6d6019b8 100644
--- a/drivers/virt/coco/pkvm-guest/arm-pkvm-guest.c
+++ b/drivers/virt/coco/pkvm-guest/arm-pkvm-guest.c
@@ -95,6 +95,11 @@ static int mmio_guard_ioremap_hook(phys_addr_t phys, size_t size,
 	return 0;
 }

+bool is_protected_kvm_guest(void)
+{
+	return !!pkvm_granule;
+}
+
 void pkvm_init_hyp_services(void)
 {
 	int i;
--
2.54.0.563.g4f69b47b94-goog


Thanks,
Mostafa
> 
> -aneesh


^ permalink raw reply related

* Re: [PATCH 01/19] btrfs: require at least 4 devices for RAID 6
From: David Sterba @ 2026-05-15 14:51 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: kreijack, Goffredo Baroncelli, Christoph Hellwig, Andrew Morton,
	Catalin Marinas, Will Deacon, Ard Biesheuvel, Huacai Chen,
	WANG Xuerui, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy (CS GROUP), Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Alexandre Ghiti, Heiko Carstens,
	Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, Herbert Xu, Dan Williams, Chris Mason,
	David Sterba, Arnd Bergmann, Song Liu, Yu Kuai, Li Nan,
	linux-kernel, linux-arm-kernel, loongarch, linuxppc-dev,
	linux-riscv, linux-s390, linux-crypto, linux-btrfs, linux-arch,
	linux-raid
In-Reply-To: <0507CCEF-0548-442F-8703-1D006B5E068B@zytor.com>

On Thu, May 14, 2026 at 12:57:53PM -0700, H. Peter Anvin wrote:
> On May 14, 2026 12:51:59 PM PDT, Goffredo Baroncelli <kreijack@libero.it> wrote:
> >On 13/05/2026 07.47, Christoph Hellwig wrote:
> >> On Tue, May 12, 2026 at 01:42:31PM +0200, David Sterba wrote:
> >
> >> 
> >>> The degenerate modes of
> >>> raid0, 5, or 6 are explicit as a possible middle step when converting
> >>> profiles.  We can use a fallback implementation for this case if the
> >>> accelerated implementations cannot do it.
> >> 
> >> This is not about a degenerated mode.  For a degenerated RAID 6, parity
> >> generation uses the RAID 5 XOR routines as the second parity will be
> >> missing.  This is about generating two parities for a single data disk,
> >> which must be explicitly selected.
> >> 
> >
> >I think that the David concern is : "what happens for an already
> >existing btrfs raid6 3 disks filesystem when the user upgrade the kernel ?"
> >(I am thinking when a new BG needs to be allocated)...
> 
> That's what I'm saying – it should invoke the RAID-1 code under the cover (as with 3 disks, D = P = Q.)

Thanks, it was not clear to me what you meant. For the two edge cases
the code should do simple memcpy for both calculations of parity and
recovery.


^ permalink raw reply

* Re: [PATCH 4/5] x86/pci: Use official API to iterate over PCI buses
From: Dave Hansen @ 2026-05-15 15:13 UTC (permalink / raw)
  To: Gerd Bayer, Richard Henderson, Matt Turner, Magnus Lindholm,
	Russell King, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy (CS GROUP), Bjorn Helgaas,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin
  Cc: Yinghai Lu, linux-alpha, linux-kernel, linux-arm-kernel,
	linuxppc-dev, linux-pci
In-Reply-To: <20260515-priv_root_buses-v1-4-f8e393c57390@linux.ibm.com>

On 5/15/26 07:22, Gerd Bayer wrote:
>  static int __init pcibios_assign_resources(void)
>  {
> -	struct pci_bus *bus;
> +	struct pci_bus *bus = NULL;
>  
>  	if (!(pci_probe & PCI_ASSIGN_ROMS))
> -		list_for_each_entry(bus, &pci_root_buses, node)
> +		while ((bus = pci_find_next_bus(bus)) != NULL)
>  			pcibios_allocate_rom_resources(bus);

What's with the 'bus = NULL'? I thought there was some crazy macro magic
going on or something, but pci_find_next_bus() looks like a normal
function that's just taking a pointer and not _modifying_ the pointer value.

Also, wouldn't this be a more readable way of writing what you have?

	while (bus = pci_find_next_bus(bus))

For that matter isn't the kernel idiom for these things:

	for_each_pci_bus(bus) {
		// do bus stuff
	}

I'm kinda surprised there isn't one of those already.


^ permalink raw reply

* [PATCH v3] powerpc/pseries/iommu: Add TCEs for 16GB pages when RAM is pre-mapped
From: Gaurav Batra @ 2026-05-15 15:51 UTC (permalink / raw)
  To: maddy
  Cc: linuxppc-dev, ritesh.list, sbhat, vaibhav, donettom, harshpb,
	Gaurav Batra

In powerPC, if Dynamic DMA Window is big enough, RAM is pre-mapped. To
determine the size of RAM, a PAPR+ property "ibm,lrdr-capacity" is used.
This OF property dictates what is the max size of RAM an LPAR can have,
including DR added memory.

In PowerPC, 16GB pages can be allocated at machine level and then
assigned to LPARs. These 16GB pages are added to LPAR memory at the time
of boot. The address range for these 16GB pages is above MAX RAM an LPAR
can have (ibm,lrdr-capacity). In the current implementation, these 16GB
pages are being excluded from pre-mapped TCEs. A driver can have DMA
buffers allocated from 16GB pages. This results in platform to raise an
EEH when DMA is attempted on buffers in 16GB memory range.

commit 6aa989ab2bd0 ("powerpc/pseries/iommu: memory notifier incorrectly
adds TCEs for pmemory")

Prior to the above patch, memblock_end_of_DRAM() was being used to
determine the MAX memory of an LPAR. This included 16GB pages as well.
The issue with using memblock_end_of_DRAM() is that when pmemory is
converted to RAM via daxctl command, the DDW engine will incorrectly try
to add TCEs for pmemory as well.

Below is the address distribution of RAM, 16GB pages and pmemory for an
LPAR with max memory of 256GB, memory allocated 64GB, 2 16GB pages and
assigned pmemory of 8GB.

RANGE                                 SIZE  STATE REMOVABLE     BLOCK
0x0000000000000000-0x0000000fffffffff  64G online       yes     0-255
0x0000004000000000-0x00000047ffffffff  32G online       yes 1024-1151

cat /sys/bus/nd/devices/region0/resource
0x40100000000
cat /sys/bus/nd/devices/region0/size
8589934592

The approach to fix this problem is to revert back the code changes
introduced by the above patch and to stash away the MAX memory of an
LPAR, including 16GB pages, at the LPAR boot time. This value is then
used whenever TCEs are needed to be pre-mapped - enable_DDW() or,
iommu_mem_notifier()

Fixes: 6aa989ab2bd0 ("powerpc/pseries/iommu: memory notifier incorrectly adds TCEs for pmemory")
Signed-off-by: Gaurav Batra <gbatra@linux.ibm.com>
---

Change log:

V2 -> V3

1. Harsh: Remove R-b tags from the change log

   Response: Incorporated changes

2. Harsh: Change WARN_ON() to WARN_ONCE()

   Response: Incorporated changes

3. Harsh: Fix indendation

   Response: Incorporated changes

4. Harsh: Replace comment with a log if limit < arg->nr_pages ? 

   Response: Doesn't seems to be needed since the WARN_ONCE() will log this
   scenario. I removed the comment instead.

V1 -> V2

1. Harsh: Not only start_pfn, but end_pfn also needs to be within allowed
   range, which may require clamping arg->nr_pages if crossing the limits.

   Response: Incorporated changes.

 arch/powerpc/platforms/pseries/iommu.c | 58 ++++++++++++++++++--------
 1 file changed, 41 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 3e1f915fe4f6..7bbe070006fa 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -69,6 +69,8 @@ static struct iommu_table *iommu_pseries_alloc_table(int node)
 	return tbl;
 }
 
+static phys_addr_t pseries_ddw_max_ram;
+
 #ifdef CONFIG_IOMMU_API
 static struct iommu_table_group_ops spapr_tce_table_group_ops;
 #endif
@@ -1285,13 +1287,17 @@ static LIST_HEAD(failed_ddw_pdn_list);
 
 static phys_addr_t ddw_memory_hotplug_max(void)
 {
-	resource_size_t max_addr;
+	resource_size_t max_addr = memory_hotplug_max();
+	struct device_node *memory;
 
-#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG)
-	max_addr = hot_add_drconf_memory_max();
-#else
-	max_addr = memblock_end_of_DRAM();
-#endif
+	for_each_node_by_type(memory, "memory") {
+		struct resource res;
+
+		if (of_address_to_resource(memory, 0, &res))
+			continue;
+
+		max_addr = max_t(resource_size_t, max_addr, res.end + 1);
+	}
 
 	return max_addr;
 }
@@ -1446,7 +1452,7 @@ static struct property *ddw_property_create(const char *propname, u32 liobn, u64
 static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn, u64 dma_mask)
 {
 	int len = 0, ret;
-	int max_ram_len = order_base_2(ddw_memory_hotplug_max());
+	int max_ram_len = order_base_2(pseries_ddw_max_ram);
 	struct ddw_query_response query;
 	struct ddw_create_response create;
 	int page_shift;
@@ -1668,7 +1674,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn, u64 dma_mas
 
 	if (direct_mapping) {
 		/* DDW maps the whole partition, so enable direct DMA mapping */
-		ret = walk_system_ram_range(0, ddw_memory_hotplug_max() >> PAGE_SHIFT,
+		ret = walk_system_ram_range(0, pseries_ddw_max_ram >> PAGE_SHIFT,
 					    win64->value, tce_setrange_multi_pSeriesLP_walk);
 		if (ret) {
 			dev_info(&dev->dev, "failed to map DMA window for %pOF: %d\n",
@@ -2419,23 +2425,35 @@ static int iommu_mem_notifier(struct notifier_block *nb, unsigned long action,
 {
 	struct dma_win *window;
 	struct memory_notify *arg = data;
+	unsigned long limit = arg->nr_pages;
+	unsigned long max_ram_pages = pseries_ddw_max_ram >> PAGE_SHIFT;
 	int ret = 0;
 
 	/* This notifier can get called when onlining persistent memory as well.
 	 * TCEs are not pre-mapped for persistent memory. Persistent memory will
-	 * always be above ddw_memory_hotplug_max()
+	 * always be above pseries_ddw_max_ram
 	 */
+	if (arg->start_pfn >= max_ram_pages)
+		return NOTIFY_OK;
+
+	/* RAM is being DLPAR'ed. The range should never exceed max ram.
+	 * Just in case, clamp the range and throw a warning.
+	 */
+	if (arg->start_pfn + limit > max_ram_pages) {
+		limit = max_ram_pages - arg->start_pfn;
+		WARN_ONCE(1, "Limiting Page Range %lx - %lx to Max Mem Pages: %lx\n",
+					arg->start_pfn, arg->start_pfn + arg->nr_pages,
+					max_ram_pages);
+	}
 
 	switch (action) {
 	case MEM_GOING_ONLINE:
 		spin_lock(&dma_win_list_lock);
 		list_for_each_entry(window, &dma_win_list, list) {
-			if (window->direct && (arg->start_pfn << PAGE_SHIFT) <
-				ddw_memory_hotplug_max()) {
+			if (window->direct) {
 				ret |= tce_setrange_multi_pSeriesLP(arg->start_pfn,
-						arg->nr_pages, window->prop);
+						limit, window->prop);
 			}
-			/* XXX log error */
 		}
 		spin_unlock(&dma_win_list_lock);
 		break;
@@ -2443,12 +2461,10 @@ static int iommu_mem_notifier(struct notifier_block *nb, unsigned long action,
 	case MEM_OFFLINE:
 		spin_lock(&dma_win_list_lock);
 		list_for_each_entry(window, &dma_win_list, list) {
-			if (window->direct && (arg->start_pfn << PAGE_SHIFT) <
-				ddw_memory_hotplug_max()) {
+			if (window->direct) {
 				ret |= tce_clearrange_multi_pSeriesLP(arg->start_pfn,
-						arg->nr_pages, window->prop);
+						limit, window->prop);
 			}
-			/* XXX log error */
 		}
 		spin_unlock(&dma_win_list_lock);
 		break;
@@ -2532,6 +2548,14 @@ void __init iommu_init_early_pSeries(void)
 	register_memory_notifier(&iommu_mem_nb);
 
 	set_pci_dma_ops(&dma_iommu_ops);
+
+	/* During init determine the max memory an LPAR can have and set it. This
+	 * will be used for pre-mapping RAM in DDW. memblock_end_of_DRAM() can
+	 * change during the running of LPAR - daxctl can add pmemory as
+	 * "system-ram". This memory range should not be pre-mapped in DDW since
+	 * the address of pmemory can be much higher than the DDW size.
+	 */
+	pseries_ddw_max_ram = ddw_memory_hotplug_max();
 }
 
 static int __init disable_multitce(char *str)

base-commit: 6d35786de28116ecf78797a62b84e6bf3c45aa5a
-- 
2.39.3



^ permalink raw reply related

* [RFC 0/4] KVM: selftests: add powerpc support
From: Ritesh Harjani (IBM) @ 2026-05-15 16:04 UTC (permalink / raw)
  To: kvm
  Cc: linuxppc-dev, Madhavan Srinivasan, Harsh Prateek Bora,
	Christophe Leroy, Venkat Rao Bagalkote, Nicholas Piggin,
	linux-kernel, Ritesh Harjani (IBM)

Hi All,

This series primarly adds KVM selftests support for powerpc (64-bit, BookS,
radix MMU).

This patch series is originally Nick's work. I have mainly only rebased it on
the latest upstream tree. Since the rebase required few changes to all four
patches, I have dropped the earlier Acked-by from Michael Ellerman.

Since the last series was posted three years ago [1], I am resetting the
version to RFC and posting an early version (few tests still pending) for
getting any early review comments. BTW, I ran this on P9 (PowerNV) with radix
and haven't found any regressions so far. 

Note that I am planning to run this selftests with different configurations as
well on PowerPC and will share the test results soon. This rebase was done as
part of a larger effort to improve the selftests infrastructure for Linux on
PowerPC tree. Thanks to Harsh and Maddy for their help on this.
 
[1]: https://lore.kernel.org/all/20231120122920.293076-1-npiggin@gmail.com/

Nicholas Piggin (4):
  KVM: selftests: Move pgd_created check into virt_pgd_alloc
  KVM: selftests: Add aligned guest physical page allocator
  KVM: PPC: selftests: add support for powerpc
  KVM: PPC: selftests: powerpc enable kvm_create_max_vcpus test

 MAINTAINERS                                   |   2 +
 tools/testing/selftests/kvm/Makefile          |   2 +-
 tools/testing/selftests/kvm/Makefile.kvm      |  10 +
 .../testing/selftests/kvm/include/kvm_util.h  |  34 +-
 .../selftests/kvm/include/powerpc/hcall.h     |  17 +
 .../kvm/include/powerpc/kvm_util_arch.h       |  22 +
 .../selftests/kvm/include/powerpc/ppc_asm.h   |  32 ++
 .../selftests/kvm/include/powerpc/processor.h |  38 ++
 .../selftests/kvm/include/powerpc/ucall.h     |  21 +
 .../selftests/kvm/kvm_create_max_vcpus.c      |   9 +
 .../selftests/kvm/lib/arm64/processor.c       |   4 -
 tools/testing/selftests/kvm/lib/guest_modes.c |  20 +-
 tools/testing/selftests/kvm/lib/kvm_util.c    |  41 +-
 .../selftests/kvm/lib/loongarch/processor.c   |   4 -
 .../selftests/kvm/lib/powerpc/handlers.S      |  93 ++++
 .../testing/selftests/kvm/lib/powerpc/hcall.c |  45 ++
 .../selftests/kvm/lib/powerpc/processor.c     | 481 ++++++++++++++++++
 .../testing/selftests/kvm/lib/powerpc/ucall.c |  22 +
 .../selftests/kvm/lib/riscv/processor.c       |   4 -
 .../selftests/kvm/lib/s390/processor.c        |   4 -
 .../testing/selftests/kvm/lib/x86/processor.c |   9 +-
 21 files changed, 869 insertions(+), 45 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/include/powerpc/hcall.h
 create mode 100644 tools/testing/selftests/kvm/include/powerpc/kvm_util_arch.h
 create mode 100644 tools/testing/selftests/kvm/include/powerpc/ppc_asm.h
 create mode 100644 tools/testing/selftests/kvm/include/powerpc/processor.h
 create mode 100644 tools/testing/selftests/kvm/include/powerpc/ucall.h
 create mode 100644 tools/testing/selftests/kvm/lib/powerpc/handlers.S
 create mode 100644 tools/testing/selftests/kvm/lib/powerpc/hcall.c
 create mode 100644 tools/testing/selftests/kvm/lib/powerpc/processor.c
 create mode 100644 tools/testing/selftests/kvm/lib/powerpc/ucall.c

--
2.39.5



^ permalink raw reply

* [RFC 1/4] KVM: selftests: Move pgd_created check into virt_pgd_alloc
From: Ritesh Harjani (IBM) @ 2026-05-15 16:04 UTC (permalink / raw)
  To: kvm
  Cc: linuxppc-dev, Madhavan Srinivasan, Harsh Prateek Bora,
	Christophe Leroy, Venkat Rao Bagalkote, Nicholas Piggin,
	linux-kernel, Ritesh Harjani (IBM)
In-Reply-To: <cover.1778857539.git.ritesh.list@gmail.com>

From: Nicholas Piggin <npiggin@gmail.com>

virt_arch_pgd_alloc all do the same test and set of pgd_created. Move
this into common code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[Rebased to latest mainline tree]
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 tools/testing/selftests/kvm/include/kvm_util.h        | 5 +++++
 tools/testing/selftests/kvm/lib/arm64/processor.c     | 4 ----
 tools/testing/selftests/kvm/lib/loongarch/processor.c | 4 ----
 tools/testing/selftests/kvm/lib/riscv/processor.c     | 4 ----
 tools/testing/selftests/kvm/lib/s390/processor.c      | 4 ----
 tools/testing/selftests/kvm/lib/x86/processor.c       | 9 +++------
 6 files changed, 8 insertions(+), 22 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
index 2ecaaa0e9965..3666a8530f31 100644
--- a/tools/testing/selftests/kvm/include/kvm_util.h
+++ b/tools/testing/selftests/kvm/include/kvm_util.h
@@ -1197,7 +1197,12 @@ void virt_arch_pgd_alloc(struct kvm_vm *vm);
 
 static inline void virt_pgd_alloc(struct kvm_vm *vm)
 {
+	if (vm->mmu.pgd_created)
+		return;
+
 	virt_arch_pgd_alloc(vm);
+
+	vm->mmu.pgd_created = true;
 }
 
 /*
diff --git a/tools/testing/selftests/kvm/lib/arm64/processor.c b/tools/testing/selftests/kvm/lib/arm64/processor.c
index 01325bf4d36f..498fbcb0ea16 100644
--- a/tools/testing/selftests/kvm/lib/arm64/processor.c
+++ b/tools/testing/selftests/kvm/lib/arm64/processor.c
@@ -112,13 +112,9 @@ void virt_arch_pgd_alloc(struct kvm_vm *vm)
 {
 	size_t nr_pages = vm_page_align(vm, ptrs_per_pgd(vm) * 8) / vm->page_size;
 
-	if (vm->mmu.pgd_created)
-		return;
-
 	vm->mmu.pgd = vm_phy_pages_alloc(vm, nr_pages,
 					 KVM_GUEST_PAGE_TABLE_MIN_PADDR,
 					 vm->memslots[MEM_REGION_PT]);
-	vm->mmu.pgd_created = true;
 }
 
 static void _virt_pg_map(struct kvm_vm *vm, gva_t gva, gpa_t gpa,
diff --git a/tools/testing/selftests/kvm/lib/loongarch/processor.c b/tools/testing/selftests/kvm/lib/loongarch/processor.c
index 64d91fb76522..207055db5f5d 100644
--- a/tools/testing/selftests/kvm/lib/loongarch/processor.c
+++ b/tools/testing/selftests/kvm/lib/loongarch/processor.c
@@ -51,9 +51,6 @@ void virt_arch_pgd_alloc(struct kvm_vm *vm)
 	int i;
 	gpa_t child, table;
 
-	if (vm->mmu.pgd_created)
-		return;
-
 	child = table = 0;
 	for (i = 0; i < vm->mmu.pgtable_levels; i++) {
 		invalid_pgtable[i] = child;
@@ -64,7 +61,6 @@ void virt_arch_pgd_alloc(struct kvm_vm *vm)
 		child = table;
 	}
 	vm->mmu.pgd = table;
-	vm->mmu.pgd_created = true;
 }
 
 static int virt_pte_none(u64 *ptep, int level)
diff --git a/tools/testing/selftests/kvm/lib/riscv/processor.c b/tools/testing/selftests/kvm/lib/riscv/processor.c
index ded5429f3448..75a5d4c46001 100644
--- a/tools/testing/selftests/kvm/lib/riscv/processor.c
+++ b/tools/testing/selftests/kvm/lib/riscv/processor.c
@@ -66,13 +66,9 @@ void virt_arch_pgd_alloc(struct kvm_vm *vm)
 {
 	size_t nr_pages = vm_page_align(vm, ptrs_per_pte(vm) * 8) / vm->page_size;
 
-	if (vm->mmu.pgd_created)
-		return;
-
 	vm->mmu.pgd = vm_phy_pages_alloc(vm, nr_pages,
 					 KVM_GUEST_PAGE_TABLE_MIN_PADDR,
 					 vm->memslots[MEM_REGION_PT]);
-	vm->mmu.pgd_created = true;
 }
 
 void virt_arch_pg_map(struct kvm_vm *vm, gva_t gva, gpa_t gpa)
diff --git a/tools/testing/selftests/kvm/lib/s390/processor.c b/tools/testing/selftests/kvm/lib/s390/processor.c
index a9adb3782b35..342b7c92463e 100644
--- a/tools/testing/selftests/kvm/lib/s390/processor.c
+++ b/tools/testing/selftests/kvm/lib/s390/processor.c
@@ -17,16 +17,12 @@ void virt_arch_pgd_alloc(struct kvm_vm *vm)
 	TEST_ASSERT(vm->page_size == PAGE_SIZE, "Unsupported page size: 0x%x",
 		    vm->page_size);
 
-	if (vm->mmu.pgd_created)
-		return;
-
 	gpa = vm_phy_pages_alloc(vm, PAGES_PER_REGION,
 				   KVM_GUEST_PAGE_TABLE_MIN_PADDR,
 				   vm->memslots[MEM_REGION_PT]);
 	memset(addr_gpa2hva(vm, gpa), 0xff, PAGES_PER_REGION * vm->page_size);
 
 	vm->mmu.pgd = gpa;
-	vm->mmu.pgd_created = true;
 }
 
 /*
diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
index b51467d70f6e..e420afdfbcfb 100644
--- a/tools/testing/selftests/kvm/lib/x86/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86/processor.c
@@ -164,12 +164,9 @@ bool kvm_is_tdp_enabled(void)
 static void virt_mmu_init(struct kvm_vm *vm, struct kvm_mmu *mmu,
 			  struct pte_masks *pte_masks)
 {
-	/* If needed, create the top-level page table. */
-	if (!mmu->pgd_created) {
-		mmu->pgd = vm_alloc_page_table(vm);
-		mmu->pgd_created = true;
-		mmu->arch.pte_masks = *pte_masks;
-	}
+	/* Create the top-level page table. */
+	mmu->pgd = vm_alloc_page_table(vm);
+	mmu->arch.pte_masks = *pte_masks;
 
 	TEST_ASSERT(mmu->pgtable_levels == 4 || mmu->pgtable_levels == 5,
 		    "Selftests MMU only supports 4-level and 5-level paging, not %u-level paging",
-- 
2.39.5



^ permalink raw reply related

* [RFC 2/4] KVM: selftests: Add aligned guest physical page allocator
From: Ritesh Harjani (IBM) @ 2026-05-15 16:04 UTC (permalink / raw)
  To: kvm
  Cc: linuxppc-dev, Madhavan Srinivasan, Harsh Prateek Bora,
	Christophe Leroy, Venkat Rao Bagalkote, Nicholas Piggin,
	linux-kernel, Ritesh Harjani (IBM)
In-Reply-To: <cover.1778857539.git.ritesh.list@gmail.com>

From: Nicholas Piggin <npiggin@gmail.com>

powerpc will require this to allocate MMU tables in guest memory that
are larger than guest base page size.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[Rebased to latest mainline tree]
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 .../testing/selftests/kvm/include/kvm_util.h  | 20 +++++++++--
 tools/testing/selftests/kvm/lib/kvm_util.c    | 33 +++++++++----------
 2 files changed, 33 insertions(+), 20 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
index 3666a8530f31..c515c918c2c9 100644
--- a/tools/testing/selftests/kvm/include/kvm_util.h
+++ b/tools/testing/selftests/kvm/include/kvm_util.h
@@ -991,8 +991,8 @@ void kvm_gsi_routing_write(struct kvm_vm *vm, struct kvm_irq_routing *routing);
 const char *exit_reason_str(unsigned int exit_reason);
 
 gpa_t vm_phy_page_alloc(struct kvm_vm *vm, gpa_t min_gpa, u32 memslot);
-gpa_t __vm_phy_pages_alloc(struct kvm_vm *vm, size_t num, gpa_t min_gpa,
-			   u32 memslot, bool protected);
+gpa_t __vm_phy_pages_alloc(struct kvm_vm *vm, size_t num, size_t align,
+			   gpa_t min_gpa, u32 memslot, bool protected);
 gpa_t vm_alloc_page_table(struct kvm_vm *vm);
 
 static inline gpa_t vm_phy_pages_alloc(struct kvm_vm *vm, size_t num,
@@ -1003,10 +1003,24 @@ static inline gpa_t vm_phy_pages_alloc(struct kvm_vm *vm, size_t num,
 	 * protected memory, as the majority of memory for such VMs is
 	 * protected, i.e. using shared memory is effectively opt-in.
 	 */
-	return __vm_phy_pages_alloc(vm, num, min_gpa, memslot,
+	return __vm_phy_pages_alloc(vm, num, 1, min_gpa, memslot,
 				    vm_arch_has_protected_memory(vm));
 }
 
+static inline gpa_t vm_phy_pages_alloc_align(struct kvm_vm *vm, size_t num,
+					     size_t align, gpa_t min_gpa,
+					     u32 memslot)
+{
+	/*
+	 * By default, allocate memory as protected for VMs that support
+	 * protected memory, as the majority of memory for such VMs is
+	 * protected, i.e. using shared memory is effectively opt-in.
+	 */
+	return __vm_phy_pages_alloc(vm, num, align, min_gpa, memslot,
+				    vm_arch_has_protected_memory(vm));
+}
+
+
 /*
  * ____vm_create() does KVM_CREATE_VM and little else.  __vm_create() also
  * loads the test binary into guest memory and creates an IRQ chip (x86 only).
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 2a76eca7029d..cdb004c9ba56 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -1442,7 +1442,7 @@ static gva_t ____vm_alloc(struct kvm_vm *vm, size_t sz, gva_t min_gva,
 	u64 pages = (sz >> vm->page_shift) + ((sz % vm->page_size) != 0);
 
 	virt_pgd_alloc(vm);
-	gpa_t gpa = __vm_phy_pages_alloc(vm, pages,
+	gpa_t gpa = __vm_phy_pages_alloc(vm, pages, 1,
 					   KVM_UTIL_MIN_PFN * vm->page_size,
 					   vm->memslots[type], protected);
 
@@ -2021,7 +2021,7 @@ const char *exit_reason_str(unsigned int exit_reason)
  * and their base address is returned. A TEST_ASSERT failure occurs if
  * not enough pages are available at or above min_gpa.
  */
-gpa_t __vm_phy_pages_alloc(struct kvm_vm *vm, size_t num,
+gpa_t __vm_phy_pages_alloc(struct kvm_vm *vm, size_t num, size_t align,
 			   gpa_t min_gpa, u32 memslot,
 			   bool protected)
 {
@@ -2039,23 +2039,22 @@ gpa_t __vm_phy_pages_alloc(struct kvm_vm *vm, size_t num,
 	TEST_ASSERT(!protected || region->protected_phy_pages,
 		    "Region doesn't support protected memory");
 
-	base = pg = min_gpa >> vm->page_shift;
-	do {
-		for (; pg < base + num; ++pg) {
-			if (!sparsebit_is_set(region->unused_phy_pages, pg)) {
-				base = pg = sparsebit_next_set(region->unused_phy_pages, pg);
-				break;
+	base = min_gpa >> vm->page_shift;
+again:
+	base = (base + align - 1) & ~(align - 1);
+	for (pg = base; pg < base + num; ++pg) {
+		if (!sparsebit_is_set(region->unused_phy_pages, pg)) {
+			base = sparsebit_next_set(region->unused_phy_pages, pg);
+			if (!base) {
+				fprintf(stderr, "No guest physical page available, "
+					"min_gpa: 0x%lx page_size: 0x%x memslot: %u\n",
+					min_gpa, vm->page_size, memslot);
+				fputs("---- vm dump ----\n", stderr);
+				vm_dump(stderr, vm, 2);
+				abort();
 			}
+			goto again;
 		}
-	} while (pg && pg != base + num);
-
-	if (pg == 0) {
-		fprintf(stderr, "No guest physical page available, "
-			"min_gpa: 0x%lx page_size: 0x%x memslot: %u\n",
-			min_gpa, vm->page_size, memslot);
-		fputs("---- vm dump ----\n", stderr);
-		vm_dump(stderr, vm, 2);
-		abort();
 	}
 
 	for (pg = base; pg < base + num; ++pg) {
-- 
2.39.5



^ permalink raw reply related

* [RFC 3/4] KVM: PPC: selftests: add support for powerpc
From: Ritesh Harjani (IBM) @ 2026-05-15 16:04 UTC (permalink / raw)
  To: kvm
  Cc: linuxppc-dev, Madhavan Srinivasan, Harsh Prateek Bora,
	Christophe Leroy, Venkat Rao Bagalkote, Nicholas Piggin,
	linux-kernel, Ritesh Harjani (IBM)
In-Reply-To: <cover.1778857539.git.ritesh.list@gmail.com>

From: Nicholas Piggin <npiggin@gmail.com>

Implement KVM selftests support for powerpc (Book3S-64).

ucalls are implemented with an unsupported PAPR hcall number which will
always cause KVM to exit to userspace.

Virtual memory is implemented for the radix MMU, and only a base page
size is supported (both 4K and 64K).

Guest interrupts are taken in real-mode, so require a page allocated at
gRA 0x0. Interrupt entry is complicated because gVA:gRA is not 1:1
mapped (like the kernel is), so the MMU can not just be switched on and
off.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[Rebased to latest mainline tree]
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 MAINTAINERS                                   |   2 +
 tools/testing/selftests/kvm/Makefile          |   2 +-
 tools/testing/selftests/kvm/Makefile.kvm      |  10 +
 .../testing/selftests/kvm/include/kvm_util.h  |   9 +
 .../selftests/kvm/include/powerpc/hcall.h     |  17 +
 .../kvm/include/powerpc/kvm_util_arch.h       |  22 +
 .../selftests/kvm/include/powerpc/ppc_asm.h   |  32 ++
 .../selftests/kvm/include/powerpc/processor.h |  38 ++
 .../selftests/kvm/include/powerpc/ucall.h     |  21 +
 tools/testing/selftests/kvm/lib/guest_modes.c |  20 +-
 tools/testing/selftests/kvm/lib/kvm_util.c    |   8 +
 .../selftests/kvm/lib/powerpc/handlers.S      |  93 ++++
 .../testing/selftests/kvm/lib/powerpc/hcall.c |  45 ++
 .../selftests/kvm/lib/powerpc/processor.c     | 481 ++++++++++++++++++
 .../testing/selftests/kvm/lib/powerpc/ucall.c |  22 +
 15 files changed, 819 insertions(+), 3 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/include/powerpc/hcall.h
 create mode 100644 tools/testing/selftests/kvm/include/powerpc/kvm_util_arch.h
 create mode 100644 tools/testing/selftests/kvm/include/powerpc/ppc_asm.h
 create mode 100644 tools/testing/selftests/kvm/include/powerpc/processor.h
 create mode 100644 tools/testing/selftests/kvm/include/powerpc/ucall.h
 create mode 100644 tools/testing/selftests/kvm/lib/powerpc/handlers.S
 create mode 100644 tools/testing/selftests/kvm/lib/powerpc/hcall.c
 create mode 100644 tools/testing/selftests/kvm/lib/powerpc/processor.c
 create mode 100644 tools/testing/selftests/kvm/lib/powerpc/ucall.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 6aa3fe2ee1bb..9d0a0cb32811 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14115,6 +14115,8 @@ F:	arch/powerpc/include/asm/kvm*
 F:	arch/powerpc/include/uapi/asm/kvm*
 F:	arch/powerpc/kernel/kvm*
 F:	arch/powerpc/kvm/
+F:	tools/testing/selftests/kvm/*/powerpc/
+F:	tools/testing/selftests/kvm/powerpc/
 
 KERNEL VIRTUAL MACHINE FOR RISC-V (KVM/riscv)
 M:	Anup Patel <anup@brainfault.org>
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index f2b223072b62..03d91f00092f 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -3,7 +3,7 @@ top_srcdir = ../../../..
 include $(top_srcdir)/scripts/subarch.include
 ARCH            ?= $(SUBARCH)
 
-ifeq ($(ARCH),$(filter $(ARCH),arm64 s390 riscv x86 x86_64 loongarch))
+ifeq ($(ARCH),$(filter $(ARCH),arm64 s390 riscv x86 x86_64 loongarch powerpc))
 # Top-level selftests allows ARCH=x86_64 :-(
 ifeq ($(ARCH),x86_64)
 	override ARCH := x86
diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index 9118a5a51b89..825bea7f851d 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -52,6 +52,11 @@ LIBKVM_loongarch += lib/loongarch/processor.c
 LIBKVM_loongarch += lib/loongarch/ucall.c
 LIBKVM_loongarch += lib/loongarch/exception.S
 
+LIBKVM_powerpc += lib/powerpc/handlers.S
+LIBKVM_powerpc += lib/powerpc/processor.c
+LIBKVM_powerpc += lib/powerpc/ucall.c
+LIBKVM_powerpc += lib/powerpc/hcall.c
+
 # Non-compiled test targets
 TEST_PROGS_x86 += x86/nx_huge_pages_test.sh
 
@@ -239,6 +244,11 @@ TEST_GEN_PROGS_loongarch += memslot_perf_test
 TEST_GEN_PROGS_loongarch += set_memory_region_test
 TEST_GEN_PROGS_loongarch += steal_time
 
+TEST_GEN_PROGS_powerpc = $(TEST_GEN_PROGS_COMMON)
+TEST_GEN_PROGS_powerpc += access_tracking_perf_test
+TEST_GEN_PROGS_powerpc += dirty_log_perf_test
+TEST_GEN_PROGS_powerpc += hardware_disable_test
+
 SPLIT_TESTS += arch_timer
 SPLIT_TESTS += get-reg-list
 
diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
index c515c918c2c9..10f03a182c8b 100644
--- a/tools/testing/selftests/kvm/include/kvm_util.h
+++ b/tools/testing/selftests/kvm/include/kvm_util.h
@@ -209,6 +209,9 @@ enum vm_guest_mode {
 	VM_MODE_P41V48_4K,
 	VM_MODE_P41V39_4K,
 
+	VM_MODE_P52V52_4K,	/* For powerpc64 */
+	VM_MODE_P52V52_64K,
+
 	NUM_VM_MODES,
 };
 
@@ -268,6 +271,12 @@ extern enum vm_guest_mode vm_mode_default;
 #define MIN_PAGE_SHIFT			12U
 #define ptes_per_page(page_size)	((page_size) / 8)
 
+#elif defined(__powerpc64__)
+
+#define VM_MODE_DEFAULT			vm_mode_default
+#define MIN_PAGE_SHIFT			12U
+#define ptes_per_page(page_size)	((page_size) / 8)
+
 #endif
 
 #define VM_SHAPE_DEFAULT	VM_SHAPE(VM_MODE_DEFAULT)
diff --git a/tools/testing/selftests/kvm/include/powerpc/hcall.h b/tools/testing/selftests/kvm/include/powerpc/hcall.h
new file mode 100644
index 000000000000..4028baa6c5d8
--- /dev/null
+++ b/tools/testing/selftests/kvm/include/powerpc/hcall.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * powerpc hcall defines
+ */
+#ifndef SELFTEST_KVM_HCALL_H
+#define SELFTEST_KVM_HCALL_H
+
+#include <linux/compiler.h>
+
+/* Ucalls use unimplemented PAPR hcall 0 which exits KVM */
+#define H_UCALL	0
+
+int64_t hcall0(uint64_t token);
+int64_t hcall1(uint64_t token, uint64_t arg1);
+int64_t hcall2(uint64_t token, uint64_t arg1, uint64_t arg2);
+
+#endif
diff --git a/tools/testing/selftests/kvm/include/powerpc/kvm_util_arch.h b/tools/testing/selftests/kvm/include/powerpc/kvm_util_arch.h
new file mode 100644
index 000000000000..5d45c25cd299
--- /dev/null
+++ b/tools/testing/selftests/kvm/include/powerpc/kvm_util_arch.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef SELFTEST_KVM_UTIL_ARCH_H
+#define SELFTEST_KVM_UTIL_ARCH_H
+
+#include <stdint.h>
+
+#include "kvm_util_types.h"
+
+struct kvm_mmu_arch {};
+
+/* Page table fragment cache for guest page tables < page size */
+struct vm_pt_frag_cache {
+	gpa_t page;
+	size_t page_nr_used;
+};
+
+struct kvm_vm_arch {
+	gpa_t prtb; /* process table */
+	struct vm_pt_frag_cache pt_frag_cache[2]; /* 256B and 4KB PT caches */
+};
+
+#endif  /* SELFTEST_KVM_UTIL_ARCH_H */
diff --git a/tools/testing/selftests/kvm/include/powerpc/ppc_asm.h b/tools/testing/selftests/kvm/include/powerpc/ppc_asm.h
new file mode 100644
index 000000000000..b9df64659792
--- /dev/null
+++ b/tools/testing/selftests/kvm/include/powerpc/ppc_asm.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * powerpc asm specific defines
+ */
+#ifndef SELFTEST_KVM_PPC_ASM_H
+#define SELFTEST_KVM_PPC_ASM_H
+
+#define STACK_FRAME_MIN_SIZE	112 /* Could be 32 on ELFv2 */
+#define STACK_REDZONE_SIZE	512
+
+#define INT_FRAME_SIZE		(STACK_FRAME_MIN_SIZE + STACK_REDZONE_SIZE)
+
+#define SPR_SRR0	0x01a
+#define SPR_SRR1	0x01b
+#define SPR_CFAR	0x01c
+
+#define MSR_SF		0x8000000000000000ULL
+#define MSR_HV		0x1000000000000000ULL
+#define MSR_VEC		0x0000000002000000ULL
+#define MSR_VSX		0x0000000000800000ULL
+#define MSR_EE		0x0000000000008000ULL
+#define MSR_PR		0x0000000000004000ULL
+#define MSR_FP		0x0000000000002000ULL
+#define MSR_ME		0x0000000000001000ULL
+#define MSR_IR		0x0000000000000020ULL
+#define MSR_DR		0x0000000000000010ULL
+#define MSR_RI		0x0000000000000002ULL
+#define MSR_LE		0x0000000000000001ULL
+
+#define LPCR_ILE	0x0000000002000000ULL
+
+#endif
diff --git a/tools/testing/selftests/kvm/include/powerpc/processor.h b/tools/testing/selftests/kvm/include/powerpc/processor.h
new file mode 100644
index 000000000000..cb75b77c33bb
--- /dev/null
+++ b/tools/testing/selftests/kvm/include/powerpc/processor.h
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * powerpc processor specific defines
+ */
+#ifndef SELFTEST_KVM_PROCESSOR_H
+#define SELFTEST_KVM_PROCESSOR_H
+
+#include <linux/compiler.h>
+#include "ppc_asm.h"
+
+extern unsigned char __interrupts_start[];
+extern unsigned char __interrupts_end[];
+
+struct kvm_vm;
+struct kvm_vcpu;
+
+struct ex_regs {
+	uint64_t	gprs[32];
+	uint64_t	nia;
+	uint64_t	msr;
+	uint64_t	cfar;
+	uint64_t	lr;
+	uint64_t	ctr;
+	uint64_t	xer;
+	uint32_t	cr;
+	uint32_t	trap;
+	uint64_t	vaddr; /* vaddr of this struct */
+};
+
+void vm_install_exception_handler(struct kvm_vm *vm, int vector,
+			void (*handler)(struct ex_regs *));
+
+static inline void cpu_relax(void)
+{
+	asm volatile("" ::: "memory");
+}
+
+#endif
diff --git a/tools/testing/selftests/kvm/include/powerpc/ucall.h b/tools/testing/selftests/kvm/include/powerpc/ucall.h
new file mode 100644
index 000000000000..e0dbe91e8848
--- /dev/null
+++ b/tools/testing/selftests/kvm/include/powerpc/ucall.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef SELFTEST_KVM_UCALL_H
+#define SELFTEST_KVM_UCALL_H
+
+#include "hcall.h"
+
+#define UCALL_EXIT_REASON	KVM_EXIT_PAPR_HCALL
+
+#define UCALL_R4_UCALL	0x5715 /* regular ucall, r5 contains ucall pointer */
+#define UCALL_R4_SIMPLE	0x0000 /* simple exit usable by asm with no ucall data */
+
+static inline void ucall_arch_init(struct kvm_vm *vm, gpa_t mmio_gpa)
+{
+}
+
+static inline void ucall_arch_do_ucall(gva_t uc)
+{
+	hcall2(H_UCALL, UCALL_R4_UCALL, (uintptr_t)(uc));
+}
+
+#endif
diff --git a/tools/testing/selftests/kvm/lib/guest_modes.c b/tools/testing/selftests/kvm/lib/guest_modes.c
index 7a96c43b5704..439766fad693 100644
--- a/tools/testing/selftests/kvm/lib/guest_modes.c
+++ b/tools/testing/selftests/kvm/lib/guest_modes.c
@@ -4,16 +4,20 @@
  */
 #include "guest_modes.h"
 
-#if defined(__aarch64__) || defined(__riscv)
+#if defined(__aarch64__) || defined(__riscv) || defined(__powerpc64__)
 #include "processor.h"
 enum vm_guest_mode vm_mode_default;
 #endif
 
+#if defined(__powerpc64__)
+#include <unistd.h>
+#endif
+
 struct guest_mode guest_modes[NUM_VM_MODES];
 
 void guest_modes_append_default(void)
 {
-#if !defined(__aarch64__) && !defined(__riscv)
+#if !defined(__aarch64__) && !defined(__riscv) && !defined(__powerpc64__)
 	guest_mode_append(VM_MODE_DEFAULT, true);
 #endif
 
@@ -108,6 +112,18 @@ void guest_modes_append_default(void)
 		TEST_ASSERT(vm_mode_default != NUM_VM_MODES, "No supported mode!");
 	}
 #endif
+#ifdef __powerpc64__
+	{
+		TEST_REQUIRE(kvm_has_cap(KVM_CAP_PPC_MMU_RADIX));
+		/* Radix guest EA and RA are 52-bit on POWER9 and POWER10 */
+		if (sysconf(_SC_PAGESIZE) == 4096)
+			vm_mode_default = VM_MODE_P52V52_4K;
+		else
+			vm_mode_default = VM_MODE_P52V52_64K;
+		guest_mode_append(VM_MODE_P52V52_4K, true);
+		guest_mode_append(VM_MODE_P52V52_64K, true);
+	}
+#endif
 }
 
 void for_each_guest_mode(void (*func)(enum vm_guest_mode, void *), void *arg)
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index cdb004c9ba56..0dc67c1502cf 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -218,6 +218,8 @@ const char *vm_guest_mode_string(u32 i)
 		[VM_MODE_P41V57_4K]	= "PA-bits:41,  VA-bits:57,  4K pages",
 		[VM_MODE_P41V48_4K]	= "PA-bits:41,  VA-bits:48,  4K pages",
 		[VM_MODE_P41V39_4K]	= "PA-bits:41,  VA-bits:39,  4K pages",
+		[VM_MODE_P52V52_4K]	= "PA-bits:52,  VA-bits:52,  4K pages",
+		[VM_MODE_P52V52_64K]	= "PA-bits:52,  VA-bits:52, 64K pages",
 	};
 	_Static_assert(sizeof(strings)/sizeof(char *) == NUM_VM_MODES,
 		       "Missing new mode strings?");
@@ -254,6 +256,8 @@ const struct vm_guest_mode_params vm_guest_mode_params[] = {
 	[VM_MODE_P41V57_4K]	= { 41, 57,  0x1000, 12 },
 	[VM_MODE_P41V48_4K]	= { 41, 48,  0x1000, 12 },
 	[VM_MODE_P41V39_4K]	= { 41, 39,  0x1000, 12 },
+	[VM_MODE_P52V52_4K]	= { 52, 52,  0x1000, 12 },
+	[VM_MODE_P52V52_64K]	= { 52, 52, 0x10000, 16 },
 };
 _Static_assert(sizeof(vm_guest_mode_params)/sizeof(struct vm_guest_mode_params) == NUM_VM_MODES,
 	       "Missing new mode params?");
@@ -371,6 +375,10 @@ struct kvm_vm *____vm_create(struct vm_shape shape)
 	case VM_MODE_P41V39_4K:
 		vm->mmu.pgtable_levels = 3;
 		break;
+	case VM_MODE_P52V52_4K:
+	case VM_MODE_P52V52_64K:
+		vm->mmu.pgtable_levels = 4;
+		break;
 	default:
 		TEST_FAIL("Unknown guest mode: 0x%x", vm->mode);
 	}
diff --git a/tools/testing/selftests/kvm/lib/powerpc/handlers.S b/tools/testing/selftests/kvm/lib/powerpc/handlers.S
new file mode 100644
index 000000000000..b860f6a520a1
--- /dev/null
+++ b/tools/testing/selftests/kvm/lib/powerpc/handlers.S
@@ -0,0 +1,93 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#include <ppc_asm.h>
+
+.macro INTERRUPT vec
+. = __interrupts_start + \vec
+	std	%r0,(0*8)(%r13)
+	std	%r3,(3*8)(%r13)
+	mfspr	%r0,SPR_CFAR
+	li	%r3,\vec
+	b	handle_interrupt
+.endm
+
+.balign 0x1000
+.global __interrupts_start
+__interrupts_start:
+INTERRUPT 0x100
+INTERRUPT 0x200
+INTERRUPT 0x300
+INTERRUPT 0x380
+INTERRUPT 0x400
+INTERRUPT 0x480
+INTERRUPT 0x500
+INTERRUPT 0x600
+INTERRUPT 0x700
+INTERRUPT 0x800
+INTERRUPT 0x900
+INTERRUPT 0xa00
+INTERRUPT 0xc00
+INTERRUPT 0xd00
+INTERRUPT 0xf00
+INTERRUPT 0xf20
+INTERRUPT 0xf40
+INTERRUPT 0xf60
+
+virt_handle_interrupt:
+	stdu	%r1,-INT_FRAME_SIZE(%r1)
+	mr	%r3,%r31
+	bl	route_interrupt
+	ld	%r4,(32*8)(%r31) /* NIA */
+	ld	%r5,(33*8)(%r31) /* MSR */
+	ld	%r6,(35*8)(%r31) /* LR */
+	ld	%r7,(36*8)(%r31) /* CTR */
+	ld	%r8,(37*8)(%r31) /* XER */
+	lwz	%r9,(38*8)(%r31) /* CR */
+	mtspr	SPR_SRR0,%r4
+	mtspr	SPR_SRR1,%r5
+	mtlr	%r6
+	mtctr	%r7
+	mtxer	%r8
+	mtcr	%r9
+reg=4
+	ld	%r0,(0*8)(%r31)
+	ld	%r3,(3*8)(%r31)
+.rept 28
+	ld	reg,(reg*8)(%r31)
+	reg=reg+1
+.endr
+	addi	%r1,%r1,INT_FRAME_SIZE
+	rfid
+
+virt_handle_interrupt_p:
+	.llong virt_handle_interrupt
+
+handle_interrupt:
+reg=4
+.rept 28
+	std	reg,(reg*8)(%r13)
+	reg=reg+1
+.endr
+	mfspr	%r4,SPR_SRR0
+	mfspr	%r5,SPR_SRR1
+	mflr	%r6
+	mfctr	%r7
+	mfxer	%r8
+	mfcr	%r9
+	std	%r4,(32*8)(%r13) /* NIA */
+	std	%r5,(33*8)(%r13) /* MSR */
+	std	%r0,(34*8)(%r13) /* CFAR */
+	std	%r6,(35*8)(%r13) /* LR */
+	std	%r7,(36*8)(%r13) /* CTR */
+	std	%r8,(37*8)(%r13) /* XER */
+	stw	%r9,(38*8 + 0)(%r13) /* CR */
+	stw	%r3,(38*8 + 4)(%r13) /* TRAP */
+
+	ld	%r31,(39*8)(%r13) /* vaddr */
+	ld	%r4,virt_handle_interrupt_p - __interrupts_start(0)
+	mtspr	SPR_SRR0,%r4
+	/* Reuse SRR1 */
+
+	rfid
+.global __interrupts_end
+__interrupts_end:
diff --git a/tools/testing/selftests/kvm/lib/powerpc/hcall.c b/tools/testing/selftests/kvm/lib/powerpc/hcall.c
new file mode 100644
index 000000000000..23a56aabad42
--- /dev/null
+++ b/tools/testing/selftests/kvm/lib/powerpc/hcall.c
@@ -0,0 +1,45 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * PAPR (pseries) hcall support.
+ */
+#include "kvm_util.h"
+#include "hcall.h"
+
+int64_t hcall0(uint64_t token)
+{
+	register uintptr_t r3 asm ("r3") = token;
+
+	asm volatile("sc 1" : "+r"(r3) :
+			    : "r0", "r4", "r5", "r6", "r7", "r8", "r9",
+			      "r10","r11", "r12", "ctr", "xer",
+			      "memory");
+
+	return r3;
+}
+
+int64_t hcall1(uint64_t token, uint64_t arg1)
+{
+	register uintptr_t r3 asm ("r3") = token;
+	register uintptr_t r4 asm ("r4") = arg1;
+
+	asm volatile("sc 1" : "+r"(r3), "+r"(r4) :
+			    : "r0", "r5", "r6", "r7", "r8", "r9",
+			      "r10","r11", "r12", "ctr", "xer",
+			      "memory");
+
+	return r3;
+}
+
+int64_t hcall2(uint64_t token, uint64_t arg1, uint64_t arg2)
+{
+	register uintptr_t r3 asm ("r3") = token;
+	register uintptr_t r4 asm ("r4") = arg1;
+	register uintptr_t r5 asm ("r5") = arg2;
+
+	asm volatile("sc 1" : "+r"(r3), "+r"(r4), "+r"(r5) :
+			    : "r0", "r6", "r7", "r8", "r9",
+			      "r10","r11", "r12", "ctr", "xer",
+			      "memory");
+
+	return r3;
+}
diff --git a/tools/testing/selftests/kvm/lib/powerpc/processor.c b/tools/testing/selftests/kvm/lib/powerpc/processor.c
new file mode 100644
index 000000000000..a345844cf941
--- /dev/null
+++ b/tools/testing/selftests/kvm/lib/powerpc/processor.c
@@ -0,0 +1,481 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * KVM selftest powerpc library code - CPU-related functions (page tables...)
+ */
+
+#include <linux/sizes.h>
+
+#include "processor.h"
+#include "kvm_util.h"
+#include "ucall_common.h"
+#include "guest_modes.h"
+#include "hcall.h"
+
+#define RADIX_TREE_SIZE ((0x2UL << 61) | (0x5UL << 5)) /* 52-bits */
+#define RADIX_PGD_INDEX_SIZE 13
+
+static void set_proc_table(struct kvm_vm *vm, int pid, uint64_t dw0, uint64_t dw1)
+{
+	uint64_t *proc_table;
+
+	proc_table = addr_gpa2hva(vm, vm->arch.prtb);
+	proc_table[pid * 2 + 0] = cpu_to_be64(dw0);
+	proc_table[pid * 2 + 1] = cpu_to_be64(dw1);
+}
+
+static void set_radix_proc_table(struct kvm_vm *vm, int pid, gpa_t pgd)
+{
+	set_proc_table(vm, pid, pgd | RADIX_TREE_SIZE | RADIX_PGD_INDEX_SIZE, 0);
+}
+
+void virt_arch_pgd_alloc(struct kvm_vm *vm)
+{
+	struct kvm_ppc_mmuv3_cfg mmu_cfg;
+	gpa_t prtb, pgtb;
+	size_t pgd_pages;
+
+	TEST_ASSERT((vm->mode == VM_MODE_P52V52_4K) ||
+		    (vm->mode == VM_MODE_P52V52_64K),
+		    "Unsupported guest mode, mode: 0x%x", vm->mode);
+
+	prtb = vm_phy_page_alloc(vm, KVM_GUEST_PAGE_TABLE_MIN_PADDR,
+				 vm->memslots[MEM_REGION_PT]);
+	vm->arch.prtb = prtb;
+
+	pgd_pages = (1UL << (RADIX_PGD_INDEX_SIZE + 3)) >> vm->page_shift;
+	if (!pgd_pages)
+		pgd_pages = 1;
+	pgtb = vm_phy_pages_alloc_align(vm, pgd_pages, pgd_pages,
+					KVM_GUEST_PAGE_TABLE_MIN_PADDR,
+					vm->memslots[MEM_REGION_PT]);
+	vm->mmu.pgd = pgtb;
+
+	/* Set the base page directory in the proc table */
+	set_radix_proc_table(vm, 0, pgtb);
+
+	if (vm->mode == VM_MODE_P52V52_4K)
+		mmu_cfg.process_table = prtb | 0x8000000000000000UL | 0x0; /* 4K size */
+	else /* vm->mode == VM_MODE_P52V52_64K */
+		mmu_cfg.process_table = prtb | 0x8000000000000000UL | 0x4; /* 64K size */
+	mmu_cfg.flags = KVM_PPC_MMUV3_RADIX | KVM_PPC_MMUV3_GTSE;
+
+	vm_ioctl(vm, KVM_PPC_CONFIGURE_V3_MMU, &mmu_cfg);
+}
+
+static int pt_shift(struct kvm_vm *vm, int level)
+{
+	switch (level) {
+	case 1:
+		return 13;
+	case 2:
+	case 3:
+		return 9;
+	case 4:
+		if (vm->mode == VM_MODE_P52V52_4K)
+			return 9;
+		else /* vm->mode == VM_MODE_P52V52_64K */
+			return 5;
+	default:
+		TEST_ASSERT(false, "Invalid page table level %d\n", level);
+		return 0;
+	}
+}
+
+static uint64_t pt_entry_coverage(struct kvm_vm *vm, int level)
+{
+	uint64_t size = vm->page_size;
+
+	if (level == 4)
+		return size;
+	size <<= pt_shift(vm, 4);
+	if (level == 3)
+		return size;
+	size <<= pt_shift(vm, 3);
+	if (level == 2)
+		return size;
+	size <<= pt_shift(vm, 2);
+	return size;
+}
+
+static int pt_idx(struct kvm_vm *vm, uint64_t vaddr, int level, uint64_t *nls)
+{
+	switch (level) {
+	case 1:
+		if (nls)
+			*nls = 0x9;
+		return (vaddr >> 39) & 0x1fff;
+	case 2:
+		if (nls)
+			*nls = 0x9;
+		return (vaddr >> 30) & 0x1ff;
+	case 3:
+		if (vm->mode == VM_MODE_P52V52_4K) {
+			if (nls)
+				*nls = 0x9;
+		} else { /* vm->mode == VM_MODE_P52V52_64K */
+			if (nls)
+				*nls = 0x5;
+		}
+		return (vaddr >> 21) & 0x1ff;
+	case 4:
+		if (vm->mode == VM_MODE_P52V52_4K)
+			return (vaddr >> 12) & 0x1ff;
+		else /* vm->mode == VM_MODE_P52V52_64K */
+			return (vaddr >> 16) & 0x1f;
+	default:
+		TEST_ASSERT(false, "Invalid page table level %d\n", level);
+		return 0;
+	}
+}
+
+static uint64_t *virt_get_pte(struct kvm_vm *vm, gpa_t pt,
+			  uint64_t vaddr, int level, uint64_t *nls)
+{
+	int idx = pt_idx(vm, vaddr, level, nls);
+	uint64_t *ptep = addr_gpa2hva(vm, pt + idx * 8);
+
+	return ptep;
+}
+
+#define PTE_VALID	0x8000000000000000ull
+#define PTE_LEAF	0x4000000000000000ull
+#define PTE_REFERENCED	0x0000000000000100ull
+#define PTE_CHANGED	0x0000000000000080ull
+#define PTE_PRIV	0x0000000000000008ull
+#define PTE_READ	0x0000000000000004ull
+#define PTE_RW		0x0000000000000002ull
+#define PTE_EXEC	0x0000000000000001ull
+#define PTE_PAGE_MASK	0x01fffffffffff000ull
+
+#define PDE_VALID	PTE_VALID
+#define PDE_NLS		0x0000000000000011ull
+#define PDE_PT_MASK	0x0fffffffffffff00ull
+
+static gpa_t __vm_alloc_pt(struct kvm_vm *vm, uint64_t pt_shift)
+{
+	gpa_t pt;
+
+	if (pt_shift >= vm->page_shift) {
+		size_t pt_pages = 1ULL << (pt_shift - vm->page_shift);
+
+		pt = vm_phy_pages_alloc_align(vm, pt_pages, pt_pages,
+					KVM_GUEST_PAGE_TABLE_MIN_PADDR,
+					vm->memslots[MEM_REGION_PT]);
+	} else {
+		struct vm_pt_frag_cache *pt_frag_cache;
+
+		if (pt_shift == 8) {
+			pt_frag_cache = &vm->arch.pt_frag_cache[0];
+		} else if (pt_shift == 12) {
+			pt_frag_cache = &vm->arch.pt_frag_cache[1];
+		} else {
+			TEST_ASSERT(0, "Invalid pt_shift:%lu\n", pt_shift);
+			return 0;
+		}
+
+		if (!pt_frag_cache->page) {
+			pt_frag_cache->page = vm_phy_pages_alloc_align(vm, 1, 1,
+						KVM_GUEST_PAGE_TABLE_MIN_PADDR,
+						vm->memslots[MEM_REGION_PT]);
+		}
+		pt = pt_frag_cache->page + pt_frag_cache->page_nr_used;
+		pt_frag_cache->page_nr_used += (1 << pt_shift);
+		if (pt_frag_cache->page_nr_used == vm->page_size) {
+			pt_frag_cache->page = 0;
+			pt_frag_cache->page_nr_used = 0;
+		}
+	}
+
+	return pt;
+}
+
+void virt_arch_pg_map(struct kvm_vm *vm, uint64_t gva, uint64_t gpa)
+{
+	gpa_t pt = vm->mmu.pgd;
+	uint64_t *ptep, pte;
+	int level;
+
+	for (level = 1; level <= 3; level++) {
+		uint64_t nls;
+		uint64_t *pdep = virt_get_pte(vm, pt, gva, level, &nls);
+		uint64_t pde = be64_to_cpu(*pdep);
+
+		if (pde) {
+			TEST_ASSERT((pde & PDE_VALID) && !(pde & PTE_LEAF),
+				    "Invalid PDE at level: %u gva: 0x%lx pde:0x%lx\n",
+				    level, gva, pde);
+			pt = pde & PDE_PT_MASK;
+			continue;
+		}
+
+		pt = __vm_alloc_pt(vm, nls + 3);
+		pde = PDE_VALID | nls | pt;
+		*pdep = cpu_to_be64(pde);
+	}
+
+	ptep = virt_get_pte(vm, pt, gva, level, NULL);
+	pte = be64_to_cpu(*ptep);
+
+	TEST_ASSERT(!pte, "PTE already present at level: %u gva: 0x%lx pte:0x%lx\n",
+		    level, gva, pte);
+
+	pte = PTE_VALID | PTE_LEAF | PTE_REFERENCED | PTE_CHANGED | PTE_PRIV |
+	      PTE_READ | PTE_RW | PTE_EXEC | (gpa & PTE_PAGE_MASK);
+	*ptep = cpu_to_be64(pte);
+}
+
+gpa_t addr_arch_gva2gpa(struct kvm_vm *vm, gva_t gva)
+{
+	gpa_t pt = vm->mmu.pgd;
+	uint64_t *ptep, pte;
+	int level;
+
+	for (level = 1; level <= 3; level++) {
+		uint64_t nls;
+		uint64_t *pdep = virt_get_pte(vm, pt, gva, level, &nls);
+		uint64_t pde = be64_to_cpu(*pdep);
+
+		TEST_ASSERT((pde & PDE_VALID) && !(pde & PTE_LEAF),
+			"PDE not present at level: %u gva: 0x%lx pde:0x%lx\n",
+			level, gva, pde);
+		pt = pde & PDE_PT_MASK;
+	}
+
+	ptep = virt_get_pte(vm, pt, gva, level, NULL);
+	pte = be64_to_cpu(*ptep);
+
+	TEST_ASSERT(pte,
+		"PTE not present at level: %u gva: 0x%lx pte:0x%lx\n",
+		level, gva, pte);
+
+	TEST_ASSERT((pte & PTE_VALID) && (pte & PTE_LEAF) &&
+		    (pte & PTE_READ) && (pte & PTE_RW) && (pte & PTE_EXEC),
+		    "PTE not valid at level: %u gva: 0x%lx pte:0x%lx\n",
+		    level, gva, pte);
+
+	return (pte & PTE_PAGE_MASK) + (gva & (vm->page_size - 1));
+}
+
+static void virt_dump_pt(FILE *stream, struct kvm_vm *vm, gpa_t pt,
+			 gva_t va, int level, uint8_t indent)
+{
+	int size, idx;
+
+	size = 1U << (pt_shift(vm, level) + 3);
+
+	for (idx = 0; idx < size; idx += 8, va += pt_entry_coverage(vm, level)) {
+		uint64_t *page_table = addr_gpa2hva(vm, pt + idx);
+		uint64_t pte = be64_to_cpu(*page_table);
+
+		if (!(pte & PTE_VALID))
+			continue;
+
+		if (pte & PTE_LEAF) {
+			fprintf(stream,
+				"%*s PTE[%d] gVA:0x%016lx -> gRA:0x%016llx\n",
+				indent, "", idx / 8, va, pte & PTE_PAGE_MASK);
+		} else {
+			fprintf(stream, "%*sPDE%d[%d] gVA:0x%016lx\n",
+				indent, "", level, idx / 8, va);
+			virt_dump_pt(stream, vm, pte & PDE_PT_MASK, va,
+				     level + 1, indent + 2);
+		}
+	}
+
+}
+
+void virt_arch_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
+{
+	gpa_t pt = vm->mmu.pgd;
+
+	if (!vm->mmu.pgd_created)
+		return;
+
+	virt_dump_pt(stream, vm, pt, 0, 1, indent);
+}
+
+static unsigned long get_r2(void)
+{
+	unsigned long r2;
+
+	asm("mr %0,%%r2" : "=r"(r2));
+
+	return r2;
+}
+
+void vcpu_arch_set_entry_point(struct kvm_vcpu *vcpu, void *guest_code)
+{
+	struct kvm_regs regs;
+
+	vcpu_regs_get(vcpu, &regs);
+	regs.pc = (uintptr_t)guest_code;
+	regs.gpr[12] = (uintptr_t)guest_code;
+	vcpu_regs_set(vcpu, &regs);
+}
+
+struct kvm_vcpu *vm_arch_vcpu_add(struct kvm_vm *vm, uint32_t vcpu_id)
+{
+	const size_t stack_size = SZ_64K;
+	gva_t stack_vaddr, ex_regs_vaddr;
+	gpa_t ex_regs_paddr;
+	struct ex_regs *ex_regs;
+	struct kvm_regs regs;
+	struct kvm_vcpu *vcpu;
+	uint64_t lpcr;
+
+	stack_vaddr = __vm_alloc(vm, stack_size,
+				       DEFAULT_GUEST_STACK_VADDR_MIN,
+				       MEM_REGION_DATA);
+
+	ex_regs_vaddr = __vm_alloc(vm, stack_size,
+				       DEFAULT_GUEST_STACK_VADDR_MIN,
+				       MEM_REGION_DATA);
+	ex_regs_paddr = addr_gva2gpa(vm, ex_regs_vaddr);
+	ex_regs = addr_gpa2hva(vm, ex_regs_paddr);
+	ex_regs->vaddr = ex_regs_vaddr;
+
+	vcpu = __vm_vcpu_add(vm, vcpu_id);
+
+	vcpu_enable_cap(vcpu, KVM_CAP_PPC_PAPR, 1);
+
+	/* Setup guest registers */
+	vcpu_regs_get(vcpu, &regs);
+	lpcr = vcpu_get_reg(vcpu, KVM_REG_PPC_LPCR_64);
+
+	regs.gpr[1] = stack_vaddr + stack_size - 256;
+	regs.gpr[2] = (uintptr_t)get_r2();
+	regs.gpr[13] = (uintptr_t)ex_regs_paddr;
+
+	regs.msr = MSR_SF | MSR_VEC | MSR_VSX | MSR_FP |
+		   MSR_ME | MSR_IR | MSR_DR | MSR_RI;
+
+	if (BYTE_ORDER == LITTLE_ENDIAN) {
+		regs.msr |= MSR_LE;
+		lpcr |= LPCR_ILE;
+	} else {
+		lpcr &= ~LPCR_ILE;
+	}
+
+	vcpu_regs_set(vcpu, &regs);
+	vcpu_set_reg(vcpu, KVM_REG_PPC_LPCR_64, lpcr);
+
+	return vcpu;
+}
+
+void vcpu_args_set(struct kvm_vcpu *vcpu, unsigned int num, ...)
+{
+	va_list ap;
+	struct kvm_regs regs;
+	int i;
+
+	TEST_ASSERT(num >= 1 && num <= 5, "Unsupported number of args: %u\n",
+		    num);
+
+	va_start(ap, num);
+	vcpu_regs_get(vcpu, &regs);
+
+	for (i = 0; i < num; i++)
+		regs.gpr[i + 3] = va_arg(ap, uint64_t);
+
+	vcpu_regs_set(vcpu, &regs);
+	va_end(ap);
+}
+
+void vcpu_arch_dump(FILE *stream, struct kvm_vcpu *vcpu, uint8_t indent)
+{
+	struct kvm_regs regs;
+
+	vcpu_regs_get(vcpu, &regs);
+
+	fprintf(stream, "%*sNIA: 0x%016llx  MSR: 0x%016llx\n",
+			indent, "", regs.pc, regs.msr);
+	fprintf(stream, "%*sLR:  0x%016llx  CTR :0x%016llx\n",
+			indent, "", regs.lr, regs.ctr);
+	fprintf(stream, "%*sCR:  0x%08llx          XER :0x%016llx\n",
+			indent, "", regs.cr, regs.xer);
+}
+
+void kvm_arch_vm_post_create(struct kvm_vm *vm, unsigned int nr_vcpus)
+{
+	gpa_t excp_paddr;
+	void *mem;
+
+	excp_paddr = vm_phy_page_alloc(vm, 0, vm->memslots[MEM_REGION_DATA]);
+
+	TEST_ASSERT(excp_paddr == 0,
+		    "Interrupt vectors not allocated at gPA address 0: (0x%lx)",
+		    excp_paddr);
+
+	mem = addr_gpa2hva(vm, excp_paddr);
+	memcpy(mem, __interrupts_start, __interrupts_end - __interrupts_start);
+}
+
+void assert_on_unhandled_exception(struct kvm_vcpu *vcpu)
+{
+	struct ucall uc;
+
+	if (get_ucall(vcpu, &uc) == UCALL_UNHANDLED) {
+		gpa_t ex_regs_paddr;
+		struct ex_regs *ex_regs;
+		struct kvm_regs regs;
+
+		vcpu_regs_get(vcpu, &regs);
+		ex_regs_paddr = (gpa_t)regs.gpr[13];
+		ex_regs = addr_gpa2hva(vcpu->vm, ex_regs_paddr);
+
+		TEST_FAIL("Unexpected interrupt in guest NIA:0x%016lx MSR:0x%016lx TRAP:0x%04x",
+			  ex_regs->nia, ex_regs->msr, ex_regs->trap);
+	}
+}
+
+struct handler {
+	void (*fn)(struct ex_regs *regs);
+	int trap;
+};
+
+#define NR_HANDLERS	10
+static struct handler handlers[NR_HANDLERS];
+
+void route_interrupt(struct ex_regs *regs)
+{
+	int i;
+
+	for (i = 0; i < NR_HANDLERS; i++) {
+		if (handlers[i].trap == regs->trap) {
+			handlers[i].fn(regs);
+			return;
+		}
+	}
+
+	ucall(UCALL_UNHANDLED, 0);
+}
+
+void vm_install_exception_handler(struct kvm_vm *vm, int trap,
+			       void (*fn)(struct ex_regs *))
+{
+	int i;
+
+	for (i = 0; i < NR_HANDLERS; i++) {
+		if (!handlers[i].trap || handlers[i].trap == trap) {
+			if (fn == NULL)
+				trap = 0; /* Clear handler */
+			handlers[i].trap = trap;
+			handlers[i].fn = fn;
+			sync_global_to_guest(vm, handlers[i]);
+			return;
+		}
+	}
+
+	TEST_FAIL("Out of exception handlers");
+}
+
+void kvm_selftest_arch_init(void)
+{
+	TEST_REQUIRE(kvm_has_cap(KVM_CAP_PPC_MMU_RADIX));
+
+	/*
+	 * powerpc default mode is set by host page size and not static,
+	 * so start by computing that early.
+	 */
+	guest_modes_append_default();
+}
diff --git a/tools/testing/selftests/kvm/lib/powerpc/ucall.c b/tools/testing/selftests/kvm/lib/powerpc/ucall.c
new file mode 100644
index 000000000000..3481a7a0b850
--- /dev/null
+++ b/tools/testing/selftests/kvm/lib/powerpc/ucall.c
@@ -0,0 +1,22 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * ucall support. A ucall is a "hypercall to host userspace".
+ */
+#include "kvm_util.h"
+#include "ucall_common.h"
+#include "hcall.h"
+
+void *ucall_arch_get_ucall(struct kvm_vcpu *vcpu)
+{
+	struct kvm_run *run = vcpu->run;
+
+	if (run->exit_reason == UCALL_EXIT_REASON &&
+	    run->papr_hcall.nr == H_UCALL) {
+		struct kvm_regs regs;
+
+		vcpu_regs_get(vcpu, &regs);
+		if (regs.gpr[4] == UCALL_R4_UCALL)
+			return (void *)regs.gpr[5];
+	}
+	return NULL;
+}
-- 
2.39.5



^ permalink raw reply related

* [RFC 4/4] KVM: PPC: selftests: powerpc enable kvm_create_max_vcpus test
From: Ritesh Harjani (IBM) @ 2026-05-15 16:04 UTC (permalink / raw)
  To: kvm
  Cc: linuxppc-dev, Madhavan Srinivasan, Harsh Prateek Bora,
	Christophe Leroy, Venkat Rao Bagalkote, Nicholas Piggin,
	linux-kernel, Ritesh Harjani (IBM)
In-Reply-To: <cover.1778857539.git.ritesh.list@gmail.com>

From: Nicholas Piggin <npiggin@gmail.com>

powerpc's maximum permitted vCPU ID depends on the VM's SMT mode, and
the maximum reported by KVM_CAP_MAX_VCPU_ID exceeds a simple non-SMT
VM's limit.

The powerpc KVM selftest port uses non-SMT VMs, so add a workaround
to the kvm_create_max_vcpus test case to limit vCPU IDs to
KVM_CAP_MAX_VCPUS on powerpc.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[Rebased to laest mainline tree]
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 tools/testing/selftests/kvm/kvm_create_max_vcpus.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/tools/testing/selftests/kvm/kvm_create_max_vcpus.c b/tools/testing/selftests/kvm/kvm_create_max_vcpus.c
index c5310736ed06..a82c13d6cdf5 100644
--- a/tools/testing/selftests/kvm/kvm_create_max_vcpus.c
+++ b/tools/testing/selftests/kvm/kvm_create_max_vcpus.c
@@ -56,6 +56,15 @@ int main(int argc, char *argv[])
 		    "KVM_MAX_VCPU_IDS (%d) must be at least as large as KVM_MAX_VCPUS (%d).",
 		    kvm_max_vcpu_id, kvm_max_vcpus);
 
+#ifdef __powerpc64__
+	/*
+	 * powerpc has a particular format for the vcpu ID that depends on
+	 * the guest SMT mode, and the max ID cap is too large for non-SMT
+	 * modes, where the maximum ID is the same as the maximum vCPUs.
+	 */
+	kvm_max_vcpu_id = kvm_max_vcpus;
+#endif
+
 	test_vcpu_creation(0, kvm_max_vcpus);
 
 	if (kvm_max_vcpu_id > kvm_max_vcpus)
-- 
2.39.5



^ permalink raw reply related

* [powerpc:merge] BUILD SUCCESS 1546b4ea65764226eacfb337b141e7563b336a75
From: kernel test robot @ 2026-05-15 19:56 UTC (permalink / raw)
  To: Madhavan Srinivasan; +Cc: linuxppc-dev

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git merge
branch HEAD: 1546b4ea65764226eacfb337b141e7563b336a75  Automatic merge of 'fixes' into merge (2026-05-15 12:09)

elapsed time: 793m

configs tested: 313
configs skipped: 6

The following configs have been built successfully.
More configs may be tested in the coming days.

tested configs:
alpha                             allnoconfig    gcc-15.2.0
alpha                            allyesconfig    gcc-15.2.0
alpha                               defconfig    gcc-15.2.0
arc                              allmodconfig    clang-16
arc                              allmodconfig    gcc-15.2.0
arc                               allnoconfig    gcc-15.2.0
arc                              allyesconfig    clang-23
arc                              allyesconfig    gcc-15.2.0
arc                                 defconfig    gcc-15.2.0
arc                            randconfig-001    gcc-8.5.0
arc                   randconfig-001-20260515    gcc-13.4.0
arc                   randconfig-001-20260515    gcc-8.5.0
arc                            randconfig-002    gcc-8.5.0
arc                   randconfig-002-20260515    gcc-8.5.0
arm                               allnoconfig    clang-23
arm                               allnoconfig    gcc-15.2.0
arm                              allyesconfig    clang-16
arm                              allyesconfig    gcc-15.2.0
arm                                 defconfig    clang-23
arm                                 defconfig    gcc-15.2.0
arm                            randconfig-001    gcc-8.5.0
arm                   randconfig-001-20260515    clang-23
arm                   randconfig-001-20260515    gcc-8.5.0
arm                            randconfig-002    gcc-8.5.0
arm                   randconfig-002-20260515    clang-23
arm                   randconfig-002-20260515    gcc-8.5.0
arm                            randconfig-003    gcc-8.5.0
arm                   randconfig-003-20260515    clang-23
arm                   randconfig-003-20260515    gcc-8.5.0
arm                            randconfig-004    gcc-8.5.0
arm                   randconfig-004-20260515    gcc-14.3.0
arm                   randconfig-004-20260515    gcc-8.5.0
arm64                            allmodconfig    clang-19
arm64                            allmodconfig    clang-23
arm64                             allnoconfig    gcc-15.2.0
arm64                               defconfig    gcc-15.2.0
arm64                 randconfig-001-20260515    clang-16
arm64                 randconfig-001-20260515    gcc-11.5.0
arm64                 randconfig-001-20260516    gcc-9.5.0
arm64                 randconfig-002-20260515    gcc-10.5.0
arm64                 randconfig-002-20260515    gcc-11.5.0
arm64                 randconfig-002-20260516    gcc-9.5.0
arm64                 randconfig-003-20260515    gcc-11.5.0
arm64                 randconfig-003-20260516    gcc-9.5.0
arm64                 randconfig-004-20260515    gcc-11.5.0
arm64                 randconfig-004-20260516    gcc-9.5.0
csky                             allmodconfig    gcc-15.2.0
csky                              allnoconfig    gcc-15.2.0
csky                                defconfig    gcc-15.2.0
csky                  randconfig-001-20260515    gcc-10.5.0
csky                  randconfig-001-20260515    gcc-11.5.0
csky                  randconfig-001-20260516    gcc-9.5.0
csky                  randconfig-002-20260515    gcc-11.5.0
csky                  randconfig-002-20260515    gcc-15.2.0
csky                  randconfig-002-20260516    gcc-9.5.0
hexagon                          allmodconfig    clang-17
hexagon                          allmodconfig    gcc-15.2.0
hexagon                           allnoconfig    clang-23
hexagon                           allnoconfig    gcc-15.2.0
hexagon                             defconfig    clang-23
hexagon                             defconfig    gcc-15.2.0
hexagon               randconfig-001-20260515    clang-23
hexagon               randconfig-001-20260516    gcc-11.5.0
hexagon               randconfig-002-20260515    clang-23
hexagon               randconfig-002-20260516    gcc-11.5.0
i386                             allmodconfig    clang-20
i386                             allmodconfig    gcc-14
i386                              allnoconfig    gcc-14
i386                              allnoconfig    gcc-15.2.0
i386                             allyesconfig    clang-20
i386                             allyesconfig    gcc-14
i386        buildonly-randconfig-001-20260515    gcc-14
i386        buildonly-randconfig-001-20260516    clang-20
i386        buildonly-randconfig-002-20260515    gcc-14
i386        buildonly-randconfig-002-20260516    clang-20
i386        buildonly-randconfig-003-20260515    gcc-12
i386        buildonly-randconfig-003-20260515    gcc-14
i386        buildonly-randconfig-003-20260516    clang-20
i386        buildonly-randconfig-004-20260515    gcc-12
i386        buildonly-randconfig-004-20260515    gcc-14
i386        buildonly-randconfig-004-20260516    clang-20
i386        buildonly-randconfig-005-20260515    gcc-14
i386        buildonly-randconfig-005-20260516    clang-20
i386        buildonly-randconfig-006-20260515    clang-20
i386        buildonly-randconfig-006-20260515    gcc-14
i386        buildonly-randconfig-006-20260516    clang-20
i386                                defconfig    clang-20
i386                                defconfig    gcc-15.2.0
i386                  randconfig-001-20260515    clang-20
i386                  randconfig-002-20260515    clang-20
i386                  randconfig-003-20260515    clang-20
i386                  randconfig-004-20260515    clang-20
i386                  randconfig-005-20260515    clang-20
i386                  randconfig-006-20260515    gcc-14
i386                  randconfig-007-20260515    clang-20
i386                  randconfig-011-20260515    clang-20
i386                  randconfig-011-20260515    gcc-14
i386                  randconfig-012-20260515    gcc-14
i386                  randconfig-013-20260515    gcc-14
i386                  randconfig-014-20260515    clang-20
i386                  randconfig-014-20260515    gcc-14
i386                  randconfig-015-20260515    clang-20
i386                  randconfig-015-20260515    gcc-14
i386                  randconfig-016-20260515    clang-20
i386                  randconfig-016-20260515    gcc-14
i386                  randconfig-017-20260515    clang-20
i386                  randconfig-017-20260515    gcc-14
loongarch                        allmodconfig    clang-19
loongarch                        allmodconfig    clang-23
loongarch                         allnoconfig    clang-23
loongarch                         allnoconfig    gcc-15.2.0
loongarch                           defconfig    clang-19
loongarch                loongson32_defconfig    clang-23
loongarch             randconfig-001-20260515    gcc-15.2.0
loongarch             randconfig-001-20260516    gcc-11.5.0
loongarch             randconfig-002-20260515    clang-23
loongarch             randconfig-002-20260516    gcc-11.5.0
m68k                             allmodconfig    gcc-15.2.0
m68k                              allnoconfig    gcc-15.2.0
m68k                             allyesconfig    clang-16
m68k                             allyesconfig    gcc-15.2.0
m68k                                defconfig    clang-19
m68k                                defconfig    gcc-15.2.0
microblaze                        allnoconfig    gcc-15.2.0
microblaze                       allyesconfig    gcc-15.2.0
microblaze                          defconfig    clang-19
microblaze                          defconfig    gcc-15.2.0
mips                             allmodconfig    gcc-15.2.0
mips                              allnoconfig    gcc-15.2.0
mips                             allyesconfig    gcc-15.2.0
mips                 decstation_r4k_defconfig    gcc-15.2.0
nios2                            allmodconfig    clang-23
nios2                             allnoconfig    clang-23
nios2                             allnoconfig    gcc-11.5.0
nios2                               defconfig    clang-19
nios2                               defconfig    gcc-11.5.0
nios2                 randconfig-001-20260515    gcc-11.5.0
nios2                 randconfig-001-20260516    gcc-11.5.0
nios2                 randconfig-002-20260515    gcc-8.5.0
nios2                 randconfig-002-20260516    gcc-11.5.0
openrisc                         allmodconfig    clang-23
openrisc                          allnoconfig    clang-23
openrisc                          allnoconfig    gcc-15.2.0
openrisc                            defconfig    gcc-15.2.0
parisc                           allmodconfig    gcc-15.2.0
parisc                            allnoconfig    clang-23
parisc                            allnoconfig    gcc-15.2.0
parisc                           allyesconfig    clang-19
parisc                           allyesconfig    gcc-15.2.0
parisc                              defconfig    gcc-15.2.0
parisc                         randconfig-001    gcc-8.5.0
parisc                randconfig-001-20260515    gcc-10.5.0
parisc                randconfig-001-20260515    gcc-8.5.0
parisc                randconfig-001-20260516    gcc-12.5.0
parisc                         randconfig-002    gcc-8.5.0
parisc                randconfig-002-20260515    gcc-15.2.0
parisc                randconfig-002-20260515    gcc-8.5.0
parisc                randconfig-002-20260516    gcc-12.5.0
parisc64                            defconfig    clang-19
parisc64                            defconfig    gcc-15.2.0
powerpc                          allmodconfig    gcc-15.2.0
powerpc                           allnoconfig    clang-23
powerpc                           allnoconfig    gcc-15.2.0
powerpc                        randconfig-001    gcc-8.5.0
powerpc               randconfig-001-20260515    clang-16
powerpc               randconfig-001-20260515    gcc-8.5.0
powerpc               randconfig-001-20260516    gcc-12.5.0
powerpc                        randconfig-002    gcc-8.5.0
powerpc               randconfig-002-20260515    gcc-8.5.0
powerpc               randconfig-002-20260516    gcc-12.5.0
powerpc64                      randconfig-001    gcc-8.5.0
powerpc64             randconfig-001-20260515    gcc-14.3.0
powerpc64             randconfig-001-20260515    gcc-8.5.0
powerpc64             randconfig-001-20260516    gcc-12.5.0
powerpc64                      randconfig-002    gcc-8.5.0
powerpc64             randconfig-002-20260515    clang-23
powerpc64             randconfig-002-20260515    gcc-8.5.0
powerpc64             randconfig-002-20260516    gcc-12.5.0
riscv                            allmodconfig    clang-23
riscv                             allnoconfig    clang-23
riscv                             allnoconfig    gcc-15.2.0
riscv                            allyesconfig    clang-16
riscv                               defconfig    clang-23
riscv                               defconfig    gcc-15.2.0
riscv                          randconfig-001    gcc-8.5.0
riscv                 randconfig-001-20260515    clang-23
riscv                 randconfig-001-20260515    gcc-15.2.0
riscv                 randconfig-001-20260516    gcc-15.2.0
riscv                          randconfig-002    clang-23
riscv                 randconfig-002-20260515    clang-23
riscv                 randconfig-002-20260515    gcc-15.2.0
riscv                 randconfig-002-20260516    gcc-15.2.0
s390                             allmodconfig    clang-18
s390                             allmodconfig    clang-19
s390                              allnoconfig    clang-23
s390                             allyesconfig    gcc-15.2.0
s390                                defconfig    clang-23
s390                                defconfig    gcc-15.2.0
s390                           randconfig-001    gcc-11.5.0
s390                  randconfig-001-20260515    clang-18
s390                  randconfig-001-20260515    gcc-15.2.0
s390                  randconfig-001-20260516    gcc-15.2.0
s390                           randconfig-002    clang-23
s390                  randconfig-002-20260515    clang-23
s390                  randconfig-002-20260515    gcc-15.2.0
s390                  randconfig-002-20260516    gcc-15.2.0
sh                               allmodconfig    gcc-15.2.0
sh                                allnoconfig    clang-23
sh                                allnoconfig    gcc-15.2.0
sh                               allyesconfig    clang-19
sh                               allyesconfig    gcc-15.2.0
sh                                  defconfig    gcc-14
sh                                  defconfig    gcc-15.2.0
sh                             randconfig-001    gcc-15.2.0
sh                    randconfig-001-20260515    gcc-13.4.0
sh                    randconfig-001-20260515    gcc-15.2.0
sh                    randconfig-001-20260516    gcc-15.2.0
sh                             randconfig-002    gcc-14.3.0
sh                    randconfig-002-20260515    gcc-15.2.0
sh                    randconfig-002-20260516    gcc-15.2.0
sparc                             allnoconfig    clang-23
sparc                             allnoconfig    gcc-15.2.0
sparc                               defconfig    gcc-15.2.0
sparc                 randconfig-001-20260515    gcc-8.5.0
sparc                 randconfig-001-20260516    gcc-8.5.0
sparc                 randconfig-002-20260515    gcc-15.2.0
sparc                 randconfig-002-20260516    gcc-8.5.0
sparc64                          allmodconfig    clang-23
sparc64                             defconfig    clang-20
sparc64                             defconfig    gcc-14
sparc64               randconfig-001-20260515    clang-20
sparc64               randconfig-001-20260516    gcc-8.5.0
sparc64               randconfig-002-20260515    clang-20
sparc64               randconfig-002-20260516    gcc-8.5.0
um                               allmodconfig    clang-19
um                                allnoconfig    clang-23
um                               allyesconfig    gcc-14
um                               allyesconfig    gcc-15.2.0
um                                  defconfig    clang-23
um                                  defconfig    gcc-14
um                             i386_defconfig    gcc-14
um                    randconfig-001-20260515    gcc-14
um                    randconfig-001-20260516    gcc-8.5.0
um                    randconfig-002-20260515    gcc-14
um                    randconfig-002-20260516    gcc-8.5.0
um                           x86_64_defconfig    clang-23
um                           x86_64_defconfig    gcc-14
x86_64                           allmodconfig    clang-20
x86_64                            allnoconfig    clang-20
x86_64                            allnoconfig    clang-23
x86_64                           allyesconfig    clang-20
x86_64      buildonly-randconfig-001-20260515    clang-20
x86_64      buildonly-randconfig-001-20260515    gcc-14
x86_64      buildonly-randconfig-002-20260515    gcc-14
x86_64      buildonly-randconfig-003-20260515    gcc-14
x86_64      buildonly-randconfig-004-20260515    gcc-12
x86_64      buildonly-randconfig-004-20260515    gcc-14
x86_64      buildonly-randconfig-005-20260515    gcc-14
x86_64      buildonly-randconfig-006-20260515    clang-20
x86_64      buildonly-randconfig-006-20260515    gcc-14
x86_64                              defconfig    gcc-14
x86_64                                  kexec    clang-20
x86_64                         randconfig-001    gcc-14
x86_64                randconfig-001-20260515    clang-20
x86_64                         randconfig-002    gcc-14
x86_64                randconfig-002-20260515    clang-20
x86_64                         randconfig-003    clang-20
x86_64                randconfig-003-20260515    gcc-13
x86_64                         randconfig-004    clang-20
x86_64                randconfig-004-20260515    clang-20
x86_64                         randconfig-005    gcc-14
x86_64                randconfig-005-20260515    clang-20
x86_64                         randconfig-006    clang-20
x86_64                randconfig-006-20260515    clang-20
x86_64                         randconfig-011    clang-20
x86_64                randconfig-011-20260515    clang-20
x86_64                         randconfig-012    clang-20
x86_64                randconfig-012-20260515    clang-20
x86_64                         randconfig-013    clang-20
x86_64                randconfig-013-20260515    clang-20
x86_64                         randconfig-014    clang-20
x86_64                randconfig-014-20260515    clang-20
x86_64                         randconfig-015    clang-20
x86_64                randconfig-015-20260515    clang-20
x86_64                randconfig-015-20260515    gcc-14
x86_64                         randconfig-016    clang-20
x86_64                randconfig-016-20260515    clang-20
x86_64                randconfig-016-20260515    gcc-14
x86_64                randconfig-071-20260515    clang-20
x86_64                randconfig-071-20260515    gcc-12
x86_64                randconfig-072-20260515    gcc-12
x86_64                randconfig-072-20260515    gcc-14
x86_64                randconfig-073-20260515    gcc-12
x86_64                randconfig-073-20260515    gcc-14
x86_64                randconfig-074-20260515    clang-20
x86_64                randconfig-074-20260515    gcc-12
x86_64                randconfig-075-20260515    gcc-12
x86_64                randconfig-076-20260515    clang-20
x86_64                randconfig-076-20260515    gcc-12
x86_64                               rhel-9.4    clang-20
x86_64                           rhel-9.4-bpf    gcc-14
x86_64                          rhel-9.4-func    clang-20
x86_64                    rhel-9.4-kselftests    clang-20
x86_64                         rhel-9.4-kunit    gcc-14
x86_64                           rhel-9.4-ltp    gcc-14
x86_64                          rhel-9.4-rust    clang-20
xtensa                            allnoconfig    clang-23
xtensa                            allnoconfig    gcc-15.2.0
xtensa                           allyesconfig    clang-23
xtensa                randconfig-001-20260515    gcc-9.5.0
xtensa                randconfig-001-20260516    gcc-8.5.0
xtensa                randconfig-002-20260515    gcc-11.5.0
xtensa                randconfig-002-20260516    gcc-8.5.0

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply

* Re: [PATCH 01/19] btrfs: require at least 4 devices for RAID 6
From: H. Peter Anvin @ 2026-05-15 19:59 UTC (permalink / raw)
  To: Christoph Hellwig, kreijack
  Cc: David Sterba, Andrew Morton, Catalin Marinas, Will Deacon,
	Ard Biesheuvel, Huacai Chen, WANG Xuerui, Madhavan Srinivasan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy (CS GROUP),
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, Herbert Xu,
	Dan Williams, Chris Mason, David Sterba, Arnd Bergmann, Song Liu,
	Yu Kuai, Li Nan, linux-kernel, linux-arm-kernel, loongarch,
	linuxppc-dev, linux-riscv, linux-s390, linux-crypto, linux-btrfs,
	linux-arch, linux-raid
In-Reply-To: <20260515043705.GA3855@lst.de>

On May 14, 2026 9:37:05 PM PDT, Christoph Hellwig <hch@lst.de> wrote:
>On Thu, May 14, 2026 at 09:51:59PM +0200, Goffredo Baroncelli wrote:
>> I think that the David concern is : "what happens for an already
>> existing btrfs raid6 3 disks filesystem when the user upgrade the kernel ?"
>> (I am thinking when a new BG needs to be allocated)...
>
>Then it will cleanly fail to mount instead of constantly corrupting data
>and memory with every write, yes.  Which clearly suggest that such
>file systems don't exist in the wild.
>
>But if btrfs wants to keep supporting this I'll just add a _unsafe
>version without the check in the core library.

I don't think this is a good idea. Error out; it is the btrfs maintainers' job to ensure user data isn't lost. 

The RAID-6 code has *never* supported only 3 units, and if it ever worked for *any* of the implementations it was purely by accident. Speaking as the original author I should know; this was deliberate as in some cases the degenerate case (3) would have required extra trays in the code to no user benefit. 

I would not be surprised if the kernel crashed or corrupted the page cache in that case.


^ permalink raw reply

* Re: [PATCH v4 04/13] dma: swiotlb: track pool encryption state and honor DMA_ATTR_CC_SHARED
From: Jason Gunthorpe @ 2026-05-15 22:51 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: Aneesh Kumar K.V (Arm), iommu, linux-arm-kernel, linux-kernel,
	linux-coco, Robin Murphy, Marek Szyprowski, Will Deacon,
	Marc Zyngier, Steven Price, Suzuki K Poulose, Catalin Marinas,
	Jiri Pirko, Petr Tesarik, Alexey Kardashevskiy, Dan Williams,
	Xu Yilun, linuxppc-dev, linux-s390, Madhavan Srinivasan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy (CS GROUP),
	Alexander Gordeev, Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, x86
In-Reply-To: <agXfm3mS_M3fvRrN@google.com>

On Thu, May 14, 2026 at 02:43:39PM +0000, Mostafa Saleh wrote:
> > That's a somewhat different problem, we have the dev->trusted stuff
> > that is supposed to deal with this kind of security. We need it for
> > IOMMU based systems too, eg hot plug thunderbolt should have it.
> 
> I see that it is used only for dma-iommu and for PCI devices.
> However, I think that should be a problem with other CCA solutions
> with emulated devices as they are untrusted. As I'd expect they
> would have virtio devices.

Yes, any security solution with an out of TCB device should be using either
memory encryption so the kernel already bounces or this trusted stuff
and a force strict dma-iommu so the dma layer is careful.

This is more policy from userspace what devices they want in or out of
their TCB. Like you make accept the device into T=1 but then still
want to keep it out of your TCB with the vIOMMU, I can see good
arguments for something like that.

> > > While we can debate the aesthetics of the setup , this is
> > > the exisitng behaviour for Linux, which existed for years
> > > and pKVM relies on and is used extensively.
> > > And, this patch alters that long-standing logic and introduces
> > > a functional regression.
> > 
> > Yeah, Aneesh needs to do something here, I'm pointing out it is
> > entirely seperate thing from the CC path we are working on which is
> > decoupling CC from reylying on force swiotlb.
> 
> I am looking into converting pKVM to use the CC stuff, I replied with
> a patch to Aneesh in this thread. However, I need to do more testing
> and make sure there are not any unwanted consequences.

Yeah, it is a nice patch and I think it will help reduce the
complexity if it aligns to CCA type stuff.

> > In a pkvm world it should be the same, the S2 table for the SMMU will
> > control what the device can access, and if the SMMU points to a
> > "private" or "shared" page is not something the device needs to know
> > or care about.
> 
> I see that's because dma-iommu chooses the attrs for iommu_map().

Long term the DMA API path through the dma-iommu will pass the
ATTR_CC_SHARED through to iommu_map so when the arch requires a
different IOPTE it can construct it.

> In pKVM, dma_addr_t and IOPTE are the same for private and shared,
> so nothing differs in that case.

Yes, so you don't have to worry.

> We don’t expect pass-through devices to interact with shared
> memory (T=0) at the moment.
> However, I can see use cases for that, where the host and the guest
> collaborate with device passthrough and require zero copy.

Once you add the CC patch it becomes immediately possible though
because the user can allocate a CC shared DMA HEAP and feed that all
over the place.

> One other interesting case for device-passthrough is non-coherent
> devices which then require private pools for bouncing.

Why does shared/private matter for bouncing? Why do you need to bounce
at all? Do cmo's not work in pkvm guests?

Jason


^ permalink raw reply

* [PATCH] powerpc/44x: Set GPIO chip firmware node
From: Rosen Penev @ 2026-05-15 23:19 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy (CS GROUP), Rosen Penev, Bartosz Golaszewski,
	Linus Walleij, open list

The PPC4xx GPIO driver stopped assigning an explicit firmware node
to the gpio_chip when it moved away from of_mm_gpiochip_add_data().

Restore that association from the platform device so OF GPIO lookup
can match phandles to the registered gpiochip.

Tested on: Cisco MX60W. No more probe deferral.

Assisted-by: Codex:GPT-5.5
Fixes: 1044dbaf2a77 ("powerpc/44x: Change GPIO driver to a proper platform driver")
Signed-off-by: Rosen Penev <rosenp@gmail.com>
---
 arch/powerpc/platforms/44x/gpio.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/platforms/44x/gpio.c b/arch/powerpc/platforms/44x/gpio.c
index d5824b7747b3..4d5176aa6895 100644
--- a/arch/powerpc/platforms/44x/gpio.c
+++ b/arch/powerpc/platforms/44x/gpio.c
@@ -169,6 +169,8 @@ static int ppc4xx_gpio_probe(struct platform_device *ofdev)
 
 	gc = &chip->gc;
 
+	gc->parent = dev;
+	gc->fwnode = dev_fwnode(dev);
 	gc->base = -1;
 	gc->ngpio = 32;
 	gc->direction_input = ppc4xx_gpio_dir_in;
-- 
2.54.0



^ permalink raw reply related

* [PATCH] powerpc/44x: Use platform resource helper for GPIO MMIO
From: Rosen Penev @ 2026-05-15 23:25 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy (CS GROUP), open list

Map the PPC4xx GPIO register block through the platform device
resource instead of reparsing the firmware node directly.

The GPIO node now probes as a platform device, so use the
platform helper to keep resource handling aligned with the converted
driver model and to report mapping failures with the platform device
context.

Assisted-by: Codex:GPT-5.5
Signed-off-by: Rosen Penev <rosenp@gmail.com>
---
 arch/powerpc/platforms/44x/gpio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/44x/gpio.c b/arch/powerpc/platforms/44x/gpio.c
index 2bc5cc260894..4d5176aa6895 100644
--- a/arch/powerpc/platforms/44x/gpio.c
+++ b/arch/powerpc/platforms/44x/gpio.c
@@ -182,7 +182,7 @@ static int ppc4xx_gpio_probe(struct platform_device *ofdev)
 	if (!gc->label)
 		return -ENOMEM;
 
-	chip->regs = devm_of_iomap(dev, np, 0, NULL);
+	chip->regs = devm_platform_ioremap_resource(ofdev, 0);
 	if (IS_ERR(chip->regs))
 		return PTR_ERR(chip->regs);
 
-- 
2.54.0



^ permalink raw reply related

* Re: [PATCH 01/19] btrfs: require at least 4 devices for RAID 6
From: Goffredo Baroncelli @ 2026-05-15 16:50 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: David Sterba, Andrew Morton, Catalin Marinas, Will Deacon,
	Ard Biesheuvel, Huacai Chen, WANG Xuerui, Madhavan Srinivasan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy (CS GROUP),
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Herbert Xu, Dan Williams, Chris Mason, David Sterba,
	Arnd Bergmann, Song Liu, Yu Kuai, Li Nan, linux-kernel,
	linux-arm-kernel, loongarch, linuxppc-dev, linux-riscv,
	linux-s390, linux-crypto, linux-btrfs, linux-arch, linux-raid
In-Reply-To: <20260515043705.GA3855@lst.de>

On 15/05/2026 06.37, Christoph Hellwig wrote:
> On Thu, May 14, 2026 at 09:51:59PM +0200, Goffredo Baroncelli wrote:
>> I think that the David concern is : "what happens for an already
>> existing btrfs raid6 3 disks filesystem when the user upgrade the kernel ?"
>> (I am thinking when a new BG needs to be allocated)...
> 
> Then it will cleanly fail to mount instead of constantly corrupting data
> and memory with every write, yes.  Which clearly suggest that such
> file systems don't exist in the wild.
> 
> But if btrfs wants to keep supporting this I'll just add a _unsafe
> version without the check in the core library.
> 

I am not arguing about this part. My point is that the change shouldn't have impacted the
BTRFS interface versus the user (as patch 01/19 does), but instead the change should
have modify the interface raid code <-> btrfs (e.g. doing a memcpy....), or at least the
cover letter should warn that the raid6 code requires a number of disk >= 4, pointing
to BTRFS as "client doing wrong things".

At least, the message was received: don't relay to the raid6 code when the number of disk is
less than 4.

BR
GB

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5


^ permalink raw reply

* [GIT PULL] Please pull powerpc/linux.git powerpc-7.1-3 tag
From: Madhavan Srinivasan @ 2026-05-16  5:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: aboorvad, allyheev, bvanassche, christophe.leroy, julianbraha,
	linusw, linux-kernel, linuxppc-dev, make24, mpe, naveen, npiggin,
	sayalip, sshegde

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Hi Linus,

Please pull some more powerpc fixes for 7.1:

The following changes since commit f583bd5f64d40e083dde5bb22846c4d93e59d471:

   powerpc/pasemi: Drop redundant res assignment (2026-05-06 07:49:19 +0530)

are available in the git repository at:

https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
tags/powerpc-7.1-3

for you to fetch changes up to 31467b23823ffec1f6fff407f8e3ca9af8b7491a:

   powerpc/time: Remove redundant preempt_disable|enable() calls from 
arch_irq_work_raise() (2026-05-14 11:15:26 +0530)

- ------------------------------------------------------------------
powerpc fixes for 7.1 #3

  - fix preempt count leak in sysfs show paths

  - Fix error handling in pika_dtm_thread

  - Remove pmac_low_i2c_{lock,unlock}()

  - Enable all windfarms by default

  - fix dead default for GUEST_STATE_BUFFER_TEST

  - Remove redundant preempt_disable|enable() calls from 
arch_irq_work_raise()

Thanks to:
Aboorva Devarajan, Ally Heev, Amit Machhiwal, Bart Van Assche, Christophe
Leroy, Christophe Leroy (CS GROUP), Dan Carpenter, Gautam Menghani, Harsh
Prateek Bora, Julian Braha, Krzysztof Kozlowski, Linus Walleij, Ma Ke, 
Ritesh
Harjani (IBM), Sayali Patil

- ------------------------------------------------------------------
Aboorva Devarajan (1):
       powerpc/hv-gpci: fix preempt count leak in sysfs show paths

Ally Heev (1):
       powerpc: 82xx: fix uninitialized pointers with free attribute

Bart Van Assche (1):
       powerpc/powermac: Remove pmac_low_i2c_{lock,unlock}()

Julian Braha (1):
       powerpc: fix dead default for GUEST_STATE_BUFFER_TEST

Linus Walleij (1):
       powerpc/g5: Enable all windfarms by default

Ma Ke (1):
       powerpc/warp: Fix error handling in pika_dtm_thread

Sayali Patil (1):
       powerpc/time: Remove redundant preempt_disable|enable() calls 
from arch_irq_work_raise()


  arch/powerpc/Kconfig.debug                |  3 +-
  arch/powerpc/configs/g5_defconfig         |  2 ++
  arch/powerpc/include/asm/pmac_low_i2c.h   |  4 ---
  arch/powerpc/kernel/time.c                |  6 ++--
  arch/powerpc/perf/hv-gpci.c               | 24 +++++++++-----
  arch/powerpc/platforms/44x/warp.c         |  2 ++
  arch/powerpc/platforms/82xx/km82xx.c      |  4 +--
  arch/powerpc/platforms/powermac/low_i2c.c | 34 --------------------
  8 files changed, 27 insertions(+), 52 deletions(-)
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEqX2DNAOgU8sBX3pRpnEsdPSHZJQFAmoIBlwACgkQpnEsdPSH
ZJTIcQ//RUVp6NY4tj5f2NsBZqo62rUAVWCaCFYSgBe0o0oSLU16uRUdP8TJCtYh
As4gP+sgNR2gP5bvURX9HwryB/7hQaJ/nLxehCY6WTg3C1I163BQZqzvcahBrw6Z
D5A5YhAi8ECFgwkWLf8AMShzER65krnFLnWmldmxuvbvGy7KJAUTgQZ6T+HilIoa
I01abDV7Lsv6Tu7bKjiuJLilGZjxWHbLII2MEmvK6OKTrF4od+VuC8cXsXDbG+Ks
9OA9jxz804Lo1fTvBHiZFLZYShaexnzoSmf6XDKP+/aPjg9HW6cYCm0AgukVdoW/
B/khmVdt5HcjJbnaoioifh3GXEZHOZO5GjsImXdQpdRJe/rQMkG99uyoh6t5kA0Q
c9k8UAT8FOWwtk1KzHn3098jZ7szGgPcFH36LCYPWUOnI+w9wFVA3lVKnwW0CKY0
RgAhPqcfwxwtCtNs+76eAZvYM89b9GFtv+v9Z4e5EaQdAUrG1zrBOo90byUuTdDy
M12CZnnqB46KdZIekPHhrRuDQ5teUCY4FqV5302WYZvf+qtb2UfY/rwrHUfl1yQJ
78tDAn/mryI17a+0DBkUo5mZlA0/u6WCn6wiJazunbVlvgRVM7+1nEmFNAuZlkSi
K3eBhzWT/WigrsvCYh2o15GyhYoNgHR5V3U0wBLW7ZEO8Gu4meA=
=ipJ5
-----END PGP SIGNATURE-----


^ permalink raw reply

* Re: [PATCH v4 03/13] dma-pool: track decrypted atomic pools and select them via attrs
From: Alexey Kardashevskiy @ 2026-05-16 12:53 UTC (permalink / raw)
  To: Aneesh Kumar K.V (Arm), iommu, linux-arm-kernel, linux-kernel,
	linux-coco
  Cc: Robin Murphy, Marek Szyprowski, Will Deacon, Marc Zyngier,
	Steven Price, Suzuki K Poulose, Catalin Marinas, Jiri Pirko,
	Jason Gunthorpe, Mostafa Saleh, Petr Tesarik, Dan Williams,
	Xu Yilun, linuxppc-dev, linux-s390, Madhavan Srinivasan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy (CS GROUP),
	Alexander Gordeev, Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, x86
In-Reply-To: <20260512090408.794195-4-aneesh.kumar@kernel.org>

On 12/5/26 19:03, Aneesh Kumar K.V (Arm) wrote:
> Teach the atomic DMA pool code to distinguish between encrypted and
> decrypted pools, and make pool allocation select the matching pool based
> on DMA attributes.
> 
> Introduce a dma_gen_pool wrapper that records whether a pool is
> decrypted, initialize that state when the atomic pools are created, and
> use it when expanding and resizing the pools.  Update dma_alloc_from_pool()
> to take attrs and skip pools whose encrypted/decrypted state does not
> match DMA_ATTR_CC_SHARED.  Update dma_free_from_pool() accordingly.
> 
> Also pass DMA_ATTR_CC_SHARED from the swiotlb atomic allocation path
> so decrypted swiotlb allocations are taken from the correct atomic pool.
> 
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> ---
>   drivers/iommu/dma-iommu.c   |   2 +-
>   include/linux/dma-map-ops.h |   2 +-
>   kernel/dma/direct.c         |  11 ++-
>   kernel/dma/pool.c           | 163 +++++++++++++++++++++++-------------
>   kernel/dma/swiotlb.c        |   7 +-
>   5 files changed, 122 insertions(+), 63 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 54d96e847f16..c2595bee3d41 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -1673,7 +1673,7 @@ void *iommu_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
>   	if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
>   	    !gfpflags_allow_blocking(gfp) && !coherent)
>   		page = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &cpu_addr,
> -					       gfp, NULL);
> +					   gfp, attrs, NULL);
>   	else
>   		cpu_addr = iommu_dma_alloc_pages(dev, size, &page, gfp, attrs);
>   	if (!cpu_addr)
> diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
> index 6a1832a73cad..696b2c3a2305 100644
> --- a/include/linux/dma-map-ops.h
> +++ b/include/linux/dma-map-ops.h
> @@ -212,7 +212,7 @@ void *dma_common_pages_remap(struct page **pages, size_t size, pgprot_t prot,
>   void dma_common_free_remap(void *cpu_addr, size_t size);
>   
>   struct page *dma_alloc_from_pool(struct device *dev, size_t size,
> -		void **cpu_addr, gfp_t flags,
> +		void **cpu_addr, gfp_t flags, unsigned long attrs,
>   		bool (*phys_addr_ok)(struct device *, phys_addr_t, size_t));
>   bool dma_free_from_pool(struct device *dev, void *start, size_t size);
>   
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 0c2e1f8436ce..dc2907439b3d 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -162,7 +162,7 @@ static bool dma_direct_use_pool(struct device *dev, gfp_t gfp)
>   }
>   
>   static void *dma_direct_alloc_from_pool(struct device *dev, size_t size,
> -		dma_addr_t *dma_handle, gfp_t gfp)
> +		dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
>   {
>   	struct page *page;
>   	u64 phys_limit;
> @@ -172,7 +172,8 @@ static void *dma_direct_alloc_from_pool(struct device *dev, size_t size,
>   		return NULL;
>   
>   	gfp |= dma_direct_optimal_gfp_mask(dev, &phys_limit);
> -	page = dma_alloc_from_pool(dev, size, &ret, gfp, dma_coherent_ok);
> +	page = dma_alloc_from_pool(dev, size, &ret, gfp, attrs,
> +				   dma_coherent_ok);
>   	if (!page)
>   		return NULL;
>   	*dma_handle = phys_to_dma_direct(dev, page_to_phys(page));
> @@ -261,7 +262,8 @@ void *dma_direct_alloc(struct device *dev, size_t size,
>   	 */
>   	if ((remap || (attrs & DMA_ATTR_CC_SHARED)) &&
>   	    dma_direct_use_pool(dev, gfp))
> -		return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp);
> +		return dma_direct_alloc_from_pool(dev, size, dma_handle,
> +						  gfp, attrs);
>   
>   	if (is_swiotlb_for_alloc(dev)) {
>   		page = dma_direct_alloc_swiotlb(dev, size);
> @@ -397,7 +399,8 @@ struct page *dma_direct_alloc_pages(struct device *dev, size_t size,
>   		attrs |= DMA_ATTR_CC_SHARED;
>   
>   	if ((attrs & DMA_ATTR_CC_SHARED) && dma_direct_use_pool(dev, gfp))
> -		return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp);
> +		return dma_direct_alloc_from_pool(dev, size, dma_handle,
> +						  gfp, attrs);
>   
>   	if (is_swiotlb_for_alloc(dev)) {
>   		page = dma_direct_alloc_swiotlb(dev, size);
> diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
> index 2b2fbb709242..75f0eba48a23 100644
> --- a/kernel/dma/pool.c
> +++ b/kernel/dma/pool.c
> @@ -12,12 +12,18 @@
>   #include <linux/set_memory.h>
>   #include <linux/slab.h>
>   #include <linux/workqueue.h>
> +#include <linux/cc_platform.h>
>   
> -static struct gen_pool *atomic_pool_dma __ro_after_init;
> +struct dma_gen_pool {
> +	bool unencrypted;
> +	struct gen_pool *pool;
> +};
> +
> +static struct dma_gen_pool atomic_pool_dma __ro_after_init;
>   static unsigned long pool_size_dma;
> -static struct gen_pool *atomic_pool_dma32 __ro_after_init;
> +static struct dma_gen_pool atomic_pool_dma32 __ro_after_init;
>   static unsigned long pool_size_dma32;
> -static struct gen_pool *atomic_pool_kernel __ro_after_init;
> +static struct dma_gen_pool atomic_pool_kernel __ro_after_init;
>   static unsigned long pool_size_kernel;
>   
>   /* Size can be defined by the coherent_pool command line */
> @@ -76,7 +82,7 @@ static bool cma_in_zone(gfp_t gfp)
>   	return true;
>   }
>   
> -static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size,
> +static int atomic_pool_expand(struct dma_gen_pool *dma_pool, size_t pool_size,
>   			      gfp_t gfp)
>   {
>   	unsigned int order;
> @@ -113,12 +119,15 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size,
>   	 * Memory in the atomic DMA pools must be unencrypted, the pools do not
>   	 * shrink so no re-encryption occurs in dma_direct_free().
>   	 */
> -	ret = set_memory_decrypted((unsigned long)page_to_virt(page),
> +	if (dma_pool->unencrypted) {
> +		ret = set_memory_decrypted((unsigned long)page_to_virt(page),
>   				   1 << order);
> -	if (ret)
> -		goto remove_mapping;
> -	ret = gen_pool_add_virt(pool, (unsigned long)addr, page_to_phys(page),
> -				pool_size, NUMA_NO_NODE);
> +		if (ret)
> +			goto remove_mapping;
> +	}
> +
> +	ret = gen_pool_add_virt(dma_pool->pool, (unsigned long)addr,
> +				page_to_phys(page), pool_size, NUMA_NO_NODE);
>   	if (ret)
>   		goto encrypt_mapping;
>   
> @@ -126,11 +135,15 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size,
>   	return 0;
>   
>   encrypt_mapping:
> -	ret = set_memory_encrypted((unsigned long)page_to_virt(page),
> -				   1 << order);
> -	if (WARN_ON_ONCE(ret)) {
> -		/* Decrypt succeeded but encrypt failed, purposely leak */
> -		goto out;
> +	if (dma_pool->unencrypted) {
> +		int rc;
> +
> +		rc = set_memory_encrypted((unsigned long)page_to_virt(page),
> +					  1 << order);
> +		if (WARN_ON_ONCE(rc)) {
> +			/* Decrypt succeeded but encrypt failed, purposely leak */
> +			goto out;
> +		}
>   	}
>   remove_mapping:
>   #ifdef CONFIG_DMA_DIRECT_REMAP
> @@ -142,46 +155,52 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size,
>   	return ret;
>   }
>   
> -static void atomic_pool_resize(struct gen_pool *pool, gfp_t gfp)
> +static void atomic_pool_resize(struct dma_gen_pool *dma_pool, gfp_t gfp)
>   {
> -	if (pool && gen_pool_avail(pool) < atomic_pool_size)
> -		atomic_pool_expand(pool, gen_pool_size(pool), gfp);
> +	if (dma_pool->pool && gen_pool_avail(dma_pool->pool) < atomic_pool_size)
> +		atomic_pool_expand(dma_pool, gen_pool_size(dma_pool->pool), gfp);
>   }
>   
>   static void atomic_pool_work_fn(struct work_struct *work)
>   {
>   	if (IS_ENABLED(CONFIG_ZONE_DMA))
> -		atomic_pool_resize(atomic_pool_dma,
> +		atomic_pool_resize(&atomic_pool_dma,
>   				   GFP_KERNEL | GFP_DMA);
>   	if (IS_ENABLED(CONFIG_ZONE_DMA32))
> -		atomic_pool_resize(atomic_pool_dma32,
> +		atomic_pool_resize(&atomic_pool_dma32,
>   				   GFP_KERNEL | GFP_DMA32);
> -	atomic_pool_resize(atomic_pool_kernel, GFP_KERNEL);
> +	atomic_pool_resize(&atomic_pool_kernel, GFP_KERNEL);
>   }
>   
> -static __init struct gen_pool *__dma_atomic_pool_init(size_t pool_size,
> -						      gfp_t gfp)
> +static __init struct dma_gen_pool *__dma_atomic_pool_init(struct dma_gen_pool *dma_pool,
> +		size_t pool_size, gfp_t gfp)
>   {
> -	struct gen_pool *pool;
>   	int ret;
>   
> -	pool = gen_pool_create(PAGE_SHIFT, NUMA_NO_NODE);
> -	if (!pool)
> +	dma_pool->pool = gen_pool_create(PAGE_SHIFT, NUMA_NO_NODE);
> +	if (!dma_pool->pool)
>   		return NULL;
>   
> -	gen_pool_set_algo(pool, gen_pool_first_fit_order_align, NULL);
> +	gen_pool_set_algo(dma_pool->pool, gen_pool_first_fit_order_align, NULL);
> +
> +	/* if platform is using memory encryption atomic pools are by default decrypted. */
> +	if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
> +		dma_pool->unencrypted = true;
> +	else
> +		dma_pool->unencrypted = false;
>   
> -	ret = atomic_pool_expand(pool, pool_size, gfp);
> +	ret = atomic_pool_expand(dma_pool, pool_size, gfp);
>   	if (ret) {
> -		gen_pool_destroy(pool);
> +		gen_pool_destroy(dma_pool->pool);
> +		dma_pool->pool = NULL;
>   		pr_err("DMA: failed to allocate %zu KiB %pGg pool for atomic allocation\n",
>   		       pool_size >> 10, &gfp);
>   		return NULL;
>   	}
>   
>   	pr_info("DMA: preallocated %zu KiB %pGg pool for atomic allocations\n",
> -		gen_pool_size(pool) >> 10, &gfp);
> -	return pool;
> +		gen_pool_size(dma_pool->pool) >> 10, &gfp);
> +	return dma_pool;
>   }
>   
>   #ifdef CONFIG_ZONE_DMA32
> @@ -207,21 +226,22 @@ static int __init dma_atomic_pool_init(void)
>   
>   	/* All memory might be in the DMA zone(s) to begin with */
>   	if (has_managed_zone(ZONE_NORMAL)) {
> -		atomic_pool_kernel = __dma_atomic_pool_init(atomic_pool_size,
> -						    GFP_KERNEL);
> -		if (!atomic_pool_kernel)
> +		__dma_atomic_pool_init(&atomic_pool_kernel, atomic_pool_size, GFP_KERNEL);
> +		if (!atomic_pool_kernel.pool)
>   			ret = -ENOMEM;
>   	}
> +
>   	if (has_managed_dma()) {
> -		atomic_pool_dma = __dma_atomic_pool_init(atomic_pool_size,
> -						GFP_KERNEL | GFP_DMA);
> -		if (!atomic_pool_dma)
> +		__dma_atomic_pool_init(&atomic_pool_dma, atomic_pool_size,
> +				       GFP_KERNEL | GFP_DMA);
> +		if (!atomic_pool_dma.pool)
>   			ret = -ENOMEM;
>   	}
> +
>   	if (has_managed_dma32) {
> -		atomic_pool_dma32 = __dma_atomic_pool_init(atomic_pool_size,
> -						GFP_KERNEL | GFP_DMA32);
> -		if (!atomic_pool_dma32)
> +		__dma_atomic_pool_init(&atomic_pool_dma32, atomic_pool_size,
> +				       GFP_KERNEL | GFP_DMA32);
> +		if (!atomic_pool_dma32.pool)
>   			ret = -ENOMEM;
>   	}
>   
> @@ -230,19 +250,44 @@ static int __init dma_atomic_pool_init(void)
>   }
>   postcore_initcall(dma_atomic_pool_init);
>   
> -static inline struct gen_pool *dma_guess_pool(struct gen_pool *prev, gfp_t gfp)
> +static inline struct dma_gen_pool *__dma_guess_pool(struct dma_gen_pool *first,
> +		struct dma_gen_pool *second, struct dma_gen_pool *third)
> +{
> +	if (first->pool)
> +		return first;
> +	if (second && second->pool)
> +		return second;
> +	if (third && third->pool)
> +		return third;
> +	return NULL;
> +}
> +
> +static inline struct dma_gen_pool *dma_guess_pool(struct dma_gen_pool *prev,
> +		gfp_t gfp)
>   {
> -	if (prev == NULL) {
> +	if (!prev) {
>   		if (gfp & GFP_DMA)
> -			return atomic_pool_dma ?: atomic_pool_dma32 ?: atomic_pool_kernel;
> +			return __dma_guess_pool(&atomic_pool_dma,
> +						&atomic_pool_dma32,
> +						&atomic_pool_kernel);
> +
>   		if (gfp & GFP_DMA32)
> -			return atomic_pool_dma32 ?: atomic_pool_dma ?: atomic_pool_kernel;
> -		return atomic_pool_kernel ?: atomic_pool_dma32 ?: atomic_pool_dma;
> +			return __dma_guess_pool(&atomic_pool_dma32,
> +						&atomic_pool_dma,
> +						&atomic_pool_kernel);
> +
> +		return __dma_guess_pool(&atomic_pool_kernel,
> +					&atomic_pool_dma32,
> +					&atomic_pool_dma);
>   	}
> -	if (prev == atomic_pool_kernel)
> -		return atomic_pool_dma32 ? atomic_pool_dma32 : atomic_pool_dma;
> -	if (prev == atomic_pool_dma32)
> -		return atomic_pool_dma;
> +
> +	if (prev == &atomic_pool_kernel)
> +		return __dma_guess_pool(&atomic_pool_dma32,
> +					&atomic_pool_dma, NULL);
> +
> +	if (prev == &atomic_pool_dma32)
> +		return __dma_guess_pool(&atomic_pool_dma, NULL, NULL);
> +
>   	return NULL;
>   }
>   
> @@ -272,16 +317,20 @@ static struct page *__dma_alloc_from_pool(struct device *dev, size_t size,
>   }
>   
>   struct page *dma_alloc_from_pool(struct device *dev, size_t size,
> -		void **cpu_addr, gfp_t gfp,
> +		void **cpu_addr, gfp_t gfp, unsigned long attrs,
>   		bool (*phys_addr_ok)(struct device *, phys_addr_t, size_t))
>   {
> -	struct gen_pool *pool = NULL;
> +	struct dma_gen_pool *dma_pool = NULL;
>   	struct page *page;
>   	bool pool_found = false;
>   
> -	while ((pool = dma_guess_pool(pool, gfp))) {
> +	while ((dma_pool = dma_guess_pool(dma_pool, gfp))) {
> +
> +		if (dma_pool->unencrypted != !!(attrs & DMA_ATTR_CC_SHARED))
> +			continue;
> +
>   		pool_found = true;
> -		page = __dma_alloc_from_pool(dev, size, pool, cpu_addr,
> +		page = __dma_alloc_from_pool(dev, size, dma_pool->pool, cpu_addr,
>   					     phys_addr_ok);
>   		if (page)
>   			return page;
> @@ -296,12 +345,14 @@ struct page *dma_alloc_from_pool(struct device *dev, size_t size,
>   
>   bool dma_free_from_pool(struct device *dev, void *start, size_t size)
>   {
> -	struct gen_pool *pool = NULL;
> +	struct dma_gen_pool *dma_pool = NULL;
> +
> +	while ((dma_pool = dma_guess_pool(dma_pool, 0))) {
>   
> -	while ((pool = dma_guess_pool(pool, 0))) {
> -		if (!gen_pool_has_addr(pool, (unsigned long)start, size))
> +		if (!gen_pool_has_addr(dma_pool->pool, (unsigned long)start, size))


v3 of this just crashed here with dma_pool!=NULL but dma_pool->pool==NULL. continuing debugging... Thanks,


>   			continue;
> -		gen_pool_free(pool, (unsigned long)start, size);
> +
> +		gen_pool_free(dma_pool->pool, (unsigned long)start, size);
>   		return true;
>   	}
>   
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 1abd3e6146f4..ab4eccbaa076 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -612,6 +612,7 @@ static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
>   		u64 phys_limit, gfp_t gfp)
>   {
>   	struct page *page;
> +	unsigned long attrs = 0;
>   
>   	/*
>   	 * Allocate from the atomic pools if memory is encrypted and
> @@ -623,8 +624,12 @@ static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
>   		if (!IS_ENABLED(CONFIG_DMA_COHERENT_POOL))
>   			return NULL;
>   
> +		/* swiotlb considered decrypted by default */
> +		if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
> +			attrs = DMA_ATTR_CC_SHARED;
> +
>   		return dma_alloc_from_pool(dev, bytes, &vaddr, gfp,
> -					   dma_coherent_ok);
> +					   attrs, dma_coherent_ok);
>   	}
>   
>   	gfp &= ~GFP_ZONEMASK;

-- 
Alexey



^ permalink raw reply

* Re: [PATCH] powerpc/44x: Set GPIO chip firmware node
From: Linus Walleij @ 2026-05-16 14:55 UTC (permalink / raw)
  To: Rosen Penev
  Cc: linuxppc-dev, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy (CS GROUP), Bartosz Golaszewski,
	open list
In-Reply-To: <20260515231913.1154801-1-rosenp@gmail.com>

On Sat, May 16, 2026 at 1:19 AM Rosen Penev <rosenp@gmail.com> wrote:

> The PPC4xx GPIO driver stopped assigning an explicit firmware node
> to the gpio_chip when it moved away from of_mm_gpiochip_add_data().
>
> Restore that association from the platform device so OF GPIO lookup
> can match phandles to the registered gpiochip.
>
> Tested on: Cisco MX60W. No more probe deferral.
>
> Assisted-by: Codex:GPT-5.5
> Fixes: 1044dbaf2a77 ("powerpc/44x: Change GPIO driver to a proper platform driver")
> Signed-off-by: Rosen Penev <rosenp@gmail.com>

Reviewed-by: Linus Walleij <linusw@kernel.org>

Yours,
Linus Walleij


^ permalink raw reply

* [PATCH] powerpc: define __LITTLE_ENDIAN and __BIG_ENDIAN for math-emu
From: Mingcong Bai @ 2026-05-17  4:14 UTC (permalink / raw)
  To: linux-kernel
  Cc: Xi Ruoyao, Kexy Biscuit, Mingcong Bai, stable, kernel test robot,
	Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy (CS GROUP), linuxppc-dev

Similar to commit b929926f01f2 ("sh: define __BIG_ENDIAN for math-emu"),
define __LITTLE_ENDIAN and __BIG_ENDIAN as 0 to mitigate build-time
warnings:

  ./include/math-emu/double.h:59:21: error: ‘__BIG_ENDIAN’ is not defined, evaluates to ‘0’ [-Werror=undef]
     59 | #if __BYTE_ORDER == __BIG_ENDIAN
        |

Cc: stable@vger.kernel.org
Fixes: 13da9e200fe4 ("Revert "endian: #define __BYTE_ORDER"")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202507301656.7FEX6J5W-lkp@intel.com/
Signed-off-by: Mingcong Bai <jeffbai@aosc.io>
---
 arch/powerpc/include/asm/sfp-machine.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/sfp-machine.h b/arch/powerpc/include/asm/sfp-machine.h
index 8b957aabb826d..db8525605c026 100644
--- a/arch/powerpc/include/asm/sfp-machine.h
+++ b/arch/powerpc/include/asm/sfp-machine.h
@@ -319,10 +319,12 @@
 #define abort()								\
 	return 0
 
-#ifdef __BIG_ENDIAN
+#ifdef __BIG_ENDIAN__
 #define __BYTE_ORDER __BIG_ENDIAN
+#define __LITTLE_ENDIAN 0
 #else
 #define __BYTE_ORDER __LITTLE_ENDIAN
+#define __BIG_ENDIAN 0
 #endif
 
 /* Exception flags. */
-- 
2.52.0



^ permalink raw reply related

* Re: [PATCH] powerpc/44x: Set GPIO chip firmware node
From: Rosen Penev @ 2026-05-17  5:44 UTC (permalink / raw)
  To: Linus Walleij
  Cc: linuxppc-dev, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy (CS GROUP), Bartosz Golaszewski,
	open list
In-Reply-To: <CAD++jLnuJDs=Zkba5H0L0Wrny+Wej2YLTWF9mCrb1PBownw=kA@mail.gmail.com>

On Sat, May 16, 2026 at 7:55 AM Linus Walleij <linusw@kernel.org> wrote:
>
> On Sat, May 16, 2026 at 1:19 AM Rosen Penev <rosenp@gmail.com> wrote:
>
> > The PPC4xx GPIO driver stopped assigning an explicit firmware node
> > to the gpio_chip when it moved away from of_mm_gpiochip_add_data().
> >
> > Restore that association from the platform device so OF GPIO lookup
> > can match phandles to the registered gpiochip.
> >
> > Tested on: Cisco MX60W. No more probe deferral.
> >
> > Assisted-by: Codex:GPT-5.5
> > Fixes: 1044dbaf2a77 ("powerpc/44x: Change GPIO driver to a proper platform driver")
> > Signed-off-by: Rosen Penev <rosenp@gmail.com>
>
> Reviewed-by: Linus Walleij <linusw@kernel.org>
dev was needed, not fwnode. Which makes sense as this is specified in dts.

https://patch.msgid.link/20260427-gpio-mmio-more-v3-1-fe1882351424@kernel.org
fixes this but that's not in a released kernel.
>
> Yours,
> Linus Walleij


^ permalink raw reply

* Re: [PATCH v4 00/13] dma-mapping: Use DMA_ATTR_CC_SHARED through direct, pool and swiotlb paths
From: Jiri Pirko @ 2026-05-17  6:19 UTC (permalink / raw)
  To: Aneesh Kumar K.V (Arm)
  Cc: iommu, linux-arm-kernel, linux-kernel, linux-coco, Robin Murphy,
	Marek Szyprowski, Will Deacon, Marc Zyngier, Steven Price,
	Suzuki K Poulose, Catalin Marinas, Jason Gunthorpe, Mostafa Saleh,
	Petr Tesarik, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
	linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, x86
In-Reply-To: <20260512090408.794195-1-aneesh.kumar@kernel.org>

Tue, May 12, 2026 at 11:03:55AM +0200, aneesh.kumar@kernel.org wrote:
>This series propagates DMA_ATTR_CC_SHARED through the dma-direct,
>dma-pool, and swiotlb paths so that encrypted and decrypted DMA buffers
>are handled consistently.
>
>Today, the direct DMA path mostly relies on force_dma_unencrypted() for
>shared/decrypted buffer handling. This series consolidates the
>force_dma_unencrypted() checks in the top-level functions and ensures
>that the remaining DMA interfaces use DMA attributes to make the correct
>decisions.

FWIW, the patchset in general looks good to me. I tested this with my
system_cc_shared dmabuf flow, works flawlessly.

Thanks!


^ permalink raw reply

* [PATCHv2] powerpc/44x: Set GPIO chip parent
From: Rosen Penev @ 2026-05-17  6:37 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy (CS GROUP), Linus Walleij, open list

The PPC4xx GPIO driver stopped assigning an explicit parent
to the gpio_chip when it moved away from of_mm_gpiochip_add_data().

Restore that association from the platform device so OF GPIO lookup
can match phandles to the registered gpiochip.

Tested on: Cisco MX60W. No more probe deferral.

Assisted-by: Codex:GPT-5.5
Fixes: 1044dbaf2a77 ("powerpc/44x: Change GPIO driver to a proper platform driver")
Signed-off-by: Rosen Penev <rosenp@gmail.com>
---
 arch/powerpc/platforms/44x/gpio.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/44x/gpio.c b/arch/powerpc/platforms/44x/gpio.c
index aea0d913b59d..4413a94cf7a6 100644
--- a/arch/powerpc/platforms/44x/gpio.c
+++ b/arch/powerpc/platforms/44x/gpio.c
@@ -169,6 +169,7 @@ static int ppc4xx_gpio_probe(struct platform_device *ofdev)
 
 	gc = &chip->gc;
 
+	gc->parent = dev;
 	gc->base = -1;
 	gc->ngpio = 32;
 	gc->direction_input = ppc4xx_gpio_dir_in;
-- 
2.54.0



^ permalink raw reply related

* Re: [PATCH v2 0/5] mm: reduce mmap_lock contention and improve page fault performance
From: Barry Song @ 2026-05-17  8:45 UTC (permalink / raw)
  To: Matthew Wilcox, surenb
  Cc: akpm, linux-mm, david, ljs, liam, vbabka, rppt, mhocko, jack,
	pfalcato, wanglian, chentao, lianux.mm, kunwu.chan, liyangouwen1,
	chrisl, kasong, shikemeng, nphamcs, bhe, youngjun.park,
	linux-arm-kernel, linux-kernel, loongarch, linuxppc-dev,
	linux-riscv, linux-s390, Nanzhe Zhao
In-Reply-To: <afTpoL3FklpQZNMM@casper.infradead.org>

On Sat, May 2, 2026 at 1:58 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Sat, May 02, 2026 at 01:44:34AM +0800, Barry Song wrote:
> > On Fri, May 1, 2026 at 10:57 PM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Fri, May 01, 2026 at 06:49:58AM +0800, Barry Song wrote:
> > > > 1. There is no deterministic latency for I/O completion. It depends on
> > > > both the hardware and the software stack (bio/request queues and the
> > > > block scheduler). Sometimes the latency is short; at other times it can
> > > > be quite long. In such cases, a high-priority thread performing operations
> > > > such as mprotect, unmap, prctl_set_vma, or madvise may be forced to wait
> > > > for an unpredictable amount of time.
> > >
> > > But does that actually happen?  I find it hard to believe that thread A
> > > unmaps a VMA while thread B is in the middle of taking a page fault in
> > > that same VMA.  mprotect() and madvise() are more likely to happen, but
> > > it still seems really unlikely to me.
> >
> > It doesn’t have to involve unmapping or applying mprotect to
> > the entire VMA—just a portion of it is sufficient.
>
> Yes, but that still fails to answer "does this actually happen".  How much
> performance is all this complexity in the page fault handler buying us?
> If you don't answer this question, I'm just going to go in and rip it
> all out.
>

Hi Matthew (and Lorenzo, Jan, and anyone else who may be
waiting for answers),

As promised during LSF/MM/BPF, we conducted thorough
testing on Android phones to determine whether performing
I/O in `filemap_fault()` can block `vma_start_write()`.
I wanted to give a quick update on this question.

Nanzhe at Xiaomi created tracing scripts and ran various
applications on Android devices with I/O performed under
the VMA lock in `filemap_fault()`. We found that:

1. There are very few cases where unmap() is blocked by
   page faults. I assume this is due to buggy user code
   or poor synchronization between reads and unmap().
So I assume it is not a problem.

2. We observed many cases where `vma_start_write()`
   is blocked by page-fault I/O in some applications.
   The blocking occurs in the `dup_mmap()` path during
   fork().

With Suren's commit fb49c455323ff ("fork: lock VMAs of
the parent process when forking"), we now always hold
`vma_write_lock()` for each VMA. Note that the
`mmap_lock` write lock is also held, which could lead to
chained waiting if page-fault I/O is performed without
releasing the VMA lock.

My gut feeling is that Suren's commit may be overshooting,
so my rough idea is that we might want to do something like
the following (we haven't tested it yet and it might be
wrong):

diff --git a/mm/mmap.c b/mm/mmap.c
index 2311ae7c2ff4..5ddaf297f31a 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1762,7 +1762,13 @@ __latent_entropy int dup_mmap(struct mm_struct
*mm, struct mm_struct *oldmm)
        for_each_vma(vmi, mpnt) {
                struct file *file;

-               retval = vma_start_write_killable(mpnt);
+               /*
+                * For anonymous or writable private VMAs, prevent
+                * concurrent CoW faults.
+                */
+               if (!mpnt->vm_file || (!(mpnt->vm_flags & VM_SHARED) &&
+                                       (mpnt->vm_flags & VM_WRITE)))
+                       retval = vma_start_write_killable(mpnt);
                if (retval < 0)
                        goto loop_out;
                if (mpnt->vm_flags & VM_DONTCOPY) {

Based on the above, we may want to re-check whether fork()
can be blocked by page faults. At the same time, if Suren,
you, or anyone else has any comments, please feel free to
share them.

Best Regards
Barry


^ permalink raw reply related

* Re: [PATCHv2] powerpc/44x: Set GPIO chip parent
From: Linus Walleij @ 2026-05-17 10:47 UTC (permalink / raw)
  To: Rosen Penev
  Cc: linuxppc-dev, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy (CS GROUP), open list
In-Reply-To: <20260517063754.21819-1-rosenp@gmail.com>

On Sun, May 17, 2026 at 8:38 AM Rosen Penev <rosenp@gmail.com> wrote:

> The PPC4xx GPIO driver stopped assigning an explicit parent
> to the gpio_chip when it moved away from of_mm_gpiochip_add_data().
>
> Restore that association from the platform device so OF GPIO lookup
> can match phandles to the registered gpiochip.
>
> Tested on: Cisco MX60W. No more probe deferral.
>
> Assisted-by: Codex:GPT-5.5
> Fixes: 1044dbaf2a77 ("powerpc/44x: Change GPIO driver to a proper platform driver")
> Signed-off-by: Rosen Penev <rosenp@gmail.com>

Reviewed-by: Linus Walleij <linusw@kernel.org>

Yours,
Linus Walleij


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox