public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: "Aneesh Kumar K.V (Arm)" <aneesh.kumar@kernel.org>
Cc: linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
	linux-coco@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.linux.dev, Catalin Marinas <catalin.marinas@arm.com>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	Marek Szyprowski <m.szyprowski@samsung.com>,
	Robin Murphy <robin.murphy@arm.com>,
	Steven Price <steven.price@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Thomas Gleixner <tglx@kernel.org>, Will Deacon <will@kernel.org>
Subject: Re: [PATCH v4 2/3] swiotlb: dma: its: Enforce host page-size alignment for shared buffers
Date: Mon, 27 Apr 2026 10:27:23 +0100	[thread overview]
Message-ID: <86zf2ozrb8.wl-maz@kernel.org> (raw)
In-Reply-To: <20260427063108.909019-3-aneesh.kumar@kernel.org>

On Mon, 27 Apr 2026 07:31:07 +0100,
"Aneesh Kumar K.V (Arm)" <aneesh.kumar@kernel.org> wrote:
> 
> When running private-memory guests, the guest kernel must apply additional
> constraints when allocating buffers that are shared with the hypervisor.
> 
> These shared buffers are also accessed by the host kernel and therefore
> must be aligned to the host’s page size, and have a size that is a multiple
> of the host page size.
> 
> On non-secure hosts, set_guest_memory_attributes() tracks memory at the
> host PAGE_SIZE granularity. This creates a mismatch when the guest applies
> attributes at 4K boundaries while the host uses 64K pages. In such cases,
> set_guest_memory_attributes() call returns -EINVAL, preventing the
> conversion of memory regions from private to shared.
> 
> Architectures such as Arm can tolerate realm physical address space
> (protected memory) PFNs being mapped as shared memory, as incorrect
> accesses are detected and reported as GPC faults. However, relying on this
> mechanism is unsafe and can still lead to kernel crashes.
> 
> This is particularly likely when guest_memfd allocations are mmapped and
> accessed from userspace. Once exposed to userspace, we cannot guarantee
> that applications will only access the intended 4K shared region rather
> than the full 64K page mapped into their address space. Such userspace
> addresses may also be passed back into the kernel and accessed via the
> linear map, resulting in a GPC fault and a kernel crash.
> 
> With CCA, although Stage-2 mappings managed by the RMM still operate at a
> 4K granularity, shared pages must nonetheless be aligned to the
> host-managed page size and sized as whole host pages to avoid the issues
> described above.

I thought that was being fixed, and that there was now a strong
guarantee that RMM and host are aligned on the page size. Even more,
S2 is totally irrelevant here. The only thing that matters is the host
page size vs the guest page size. Nothing else.

> 
> Introduce a new helper, mem_decrypt_align(), to allow callers to enforce
> the required alignment and size constraints for shared buffers.
> 
> The architecture-specific implementation of mem_decrypt_align() will be
> provided in a follow-up patch.
> 
> Note on restricted-dma-pool:
> rmem_swiotlb_device_init() uses reserved-memory regions described by
> firmware. Those regions are not changed in-kernel to satisfy host granule
> alignment. This is intentional: we do not expect restricted-dma-pool
> allocations to be used with CCA. If restricted-dma-pool is intended for CCA
> shared use, firmware must provide base/size aligned to the host IPA-change
> granule.
> 
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> ---
>  arch/arm64/mm/mem_encrypt.c      | 19 +++++++++++++++----
>  drivers/irqchip/irq-gic-v3-its.c | 20 +++++++++++++-------
>  include/linux/mem_encrypt.h      | 14 ++++++++++++++
>  kernel/dma/contiguous.c          | 10 ++++++++++
>  kernel/dma/direct.c              | 16 ++++++++++++++--
>  kernel/dma/pool.c                |  4 +++-
>  kernel/dma/swiotlb.c             | 21 +++++++++++++--------
>  7 files changed, 82 insertions(+), 22 deletions(-)
> 

[...]

> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index 291d7668cc8d..239d7e3bc16f 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -213,16 +213,17 @@ static gfp_t gfp_flags_quirk;
>  static struct page *its_alloc_pages_node(int node, gfp_t gfp,
>  					 unsigned int order)
>  {
> +	unsigned int new_order;
>  	struct page *page;
>  	int ret = 0;
>  
> -	page = alloc_pages_node(node, gfp | gfp_flags_quirk, order);
> -
> +	new_order = get_order(mem_decrypt_align((PAGE_SIZE << order)));
> +	page = alloc_pages_node(node, gfp | gfp_flags_quirk, new_order);
>  	if (!page)
>  		return NULL;
>  
>  	ret = set_memory_decrypted((unsigned long)page_address(page),
> -				   1 << order);
> +				   1 << new_order);
>  	/*
>  	 * If set_memory_decrypted() fails then we don't know what state the
>  	 * page is in, so we can't free it. Instead we leak it.
> @@ -241,13 +242,16 @@ static struct page *its_alloc_pages(gfp_t gfp, unsigned int order)
>  
>  static void its_free_pages(void *addr, unsigned int order)
>  {
> +	int new_order;
> +
> +	new_order = get_order(mem_decrypt_align((PAGE_SIZE << order)));
>  	/*
>  	 * If the memory cannot be encrypted again then we must leak the pages.
>  	 * set_memory_encrypted() will already have WARNed.
>  	 */
> -	if (set_memory_encrypted((unsigned long)addr, 1 << order))
> +	if (set_memory_encrypted((unsigned long)addr, 1 << new_order))
>  		return;
> -	free_pages((unsigned long)addr, order);
> +	free_pages((unsigned long)addr, new_order);
>  }
>

Here's the non-obfuscated version of the two hunks above (and let it
be on the record that New Order is a terrible, overrated band):

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 291d7668cc8da..a4d555aaee241 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -216,6 +216,7 @@ static struct page *its_alloc_pages_node(int node, gfp_t gfp,
 	struct page *page;
 	int ret = 0;
 
+	order = get_order(mem_decrypt_align(PAGE_SIZE << order));
 	page = alloc_pages_node(node, gfp | gfp_flags_quirk, order);
 
 	if (!page)
@@ -245,6 +246,7 @@ static void its_free_pages(void *addr, unsigned int order)
 	 * If the memory cannot be encrypted again then we must leak the pages.
 	 * set_memory_encrypted() will already have WARNed.
 	 */
+	order = get_order(mem_decrypt_align(PAGE_SIZE << order));
 	if (set_memory_encrypted((unsigned long)addr, 1 << order))
 		return;
 	free_pages((unsigned long)addr, order);

>  static struct gen_pool *itt_pool;
> @@ -268,11 +272,13 @@ static void *itt_alloc_pool(int node, int size)
>  		if (addr)
>  			break;
>  
> -		page = its_alloc_pages_node(node, GFP_KERNEL | __GFP_ZERO, 0);
> +		page = its_alloc_pages_node(node, GFP_KERNEL | __GFP_ZERO,
> +					    get_order(mem_decrypt_granule_size()));

You already taught its_alloc_pages_node() about the decrypt granule
size stuff. I don't think we need to see more of it (and you don't
mess with the call that is just above it).

>  		if (!page)
>  			break;
>  
> -		gen_pool_add(itt_pool, (unsigned long)page_address(page), PAGE_SIZE, node);
> +		gen_pool_add(itt_pool, (unsigned long)page_address(page),
> +			     mem_decrypt_granule_size(), node);

I'd rather see something like mem_decrypt_align(PAGE_SIZE), which
keeps the intent clear.

	M.

-- 
Without deviation from the norm, progress is not possible.

  reply	other threads:[~2026-04-27  9:27 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-27  6:31 [PATCH v4 0/3] Enforce host page-size alignment for shared buffers Aneesh Kumar K.V (Arm)
2026-04-27  6:31 ` [PATCH v4 1/3] dma-direct: swiotlb: handle swiotlb alloc/free outside __dma_direct_alloc_pages Aneesh Kumar K.V (Arm)
2026-04-27  6:31 ` [PATCH v4 2/3] swiotlb: dma: its: Enforce host page-size alignment for shared buffers Aneesh Kumar K.V (Arm)
2026-04-27  9:27   ` Marc Zyngier [this message]
2026-04-27 13:38     ` Jason Gunthorpe
2026-04-27 13:49   ` Jason Gunthorpe
2026-04-27  6:31 ` [PATCH v4 3/3] coco: guest: arm64: Query host IPA-change alignment via RHI Aneesh Kumar K.V (Arm)
2026-04-27 10:33   ` Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86zf2ozrb8.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=aneesh.kumar@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@ziepe.ca \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=m.szyprowski@samsung.com \
    --cc=robin.murphy@arm.com \
    --cc=steven.price@arm.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tglx@kernel.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox