public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
To: Marc Zyngier <maz@kernel.org>
Cc: linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
	linux-coco@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.linux.dev, Catalin Marinas <catalin.marinas@arm.com>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	Marek Szyprowski <m.szyprowski@samsung.com>,
	Robin Murphy <robin.murphy@arm.com>,
	Steven Price <steven.price@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Thomas Gleixner <tglx@kernel.org>, Will Deacon <will@kernel.org>
Subject: Re: [PATCH v4 2/3] swiotlb: dma: its: Enforce host page-size alignment for shared buffers
Date: Tue, 28 Apr 2026 17:50:53 +0530	[thread overview]
Message-ID: <yq5aa4un1dju.fsf@kernel.org> (raw)
In-Reply-To: <86zf2ozrb8.wl-maz@kernel.org>

Marc Zyngier <maz@kernel.org> writes:

> On Mon, 27 Apr 2026 07:31:07 +0100,
> "Aneesh Kumar K.V (Arm)" <aneesh.kumar@kernel.org> wrote:
>> 
>> When running private-memory guests, the guest kernel must apply additional
>> constraints when allocating buffers that are shared with the hypervisor.
>> 
>> These shared buffers are also accessed by the host kernel and therefore
>> must be aligned to the host’s page size, and have a size that is a multiple
>> of the host page size.
>> 
>> On non-secure hosts, set_guest_memory_attributes() tracks memory at the
>> host PAGE_SIZE granularity. This creates a mismatch when the guest applies
>> attributes at 4K boundaries while the host uses 64K pages. In such cases,
>> set_guest_memory_attributes() call returns -EINVAL, preventing the
>> conversion of memory regions from private to shared.
>> 
>> Architectures such as Arm can tolerate realm physical address space
>> (protected memory) PFNs being mapped as shared memory, as incorrect
>> accesses are detected and reported as GPC faults. However, relying on this
>> mechanism is unsafe and can still lead to kernel crashes.
>> 
>> This is particularly likely when guest_memfd allocations are mmapped and
>> accessed from userspace. Once exposed to userspace, we cannot guarantee
>> that applications will only access the intended 4K shared region rather
>> than the full 64K page mapped into their address space. Such userspace
>> addresses may also be passed back into the kernel and accessed via the
>> linear map, resulting in a GPC fault and a kernel crash.
>> 
>> With CCA, although Stage-2 mappings managed by the RMM still operate at a
>> 4K granularity, shared pages must nonetheless be aligned to the
>> host-managed page size and sized as whole host pages to avoid the issues
>> described above.
>
> I thought that was being fixed, and that there was now a strong
> guarantee that RMM and host are aligned on the page size. Even more,
> S2 is totally irrelevant here. The only thing that matters is the host
> page size vs the guest page size. Nothing else.
>

Yes, the latest RMM update includes the ability to change the granule
size.

The section above in the commit message was intended to explain that the
S2 mapping size is irrelevant. I agree it is not clear as written, so I
will reword it to improve clarity.

>
>> 
>> Introduce a new helper, mem_decrypt_align(), to allow callers to enforce
>> the required alignment and size constraints for shared buffers.
>> 
>> The architecture-specific implementation of mem_decrypt_align() will be
>> provided in a follow-up patch.
>> 
>> Note on restricted-dma-pool:
>> rmem_swiotlb_device_init() uses reserved-memory regions described by
>> firmware. Those regions are not changed in-kernel to satisfy host granule
>> alignment. This is intentional: we do not expect restricted-dma-pool
>> allocations to be used with CCA. If restricted-dma-pool is intended for CCA
>> shared use, firmware must provide base/size aligned to the host IPA-change
>> granule.
>> 
>> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
>> ---
>>  arch/arm64/mm/mem_encrypt.c      | 19 +++++++++++++++----
>>  drivers/irqchip/irq-gic-v3-its.c | 20 +++++++++++++-------
>>  include/linux/mem_encrypt.h      | 14 ++++++++++++++
>>  kernel/dma/contiguous.c          | 10 ++++++++++
>>  kernel/dma/direct.c              | 16 ++++++++++++++--
>>  kernel/dma/pool.c                |  4 +++-
>>  kernel/dma/swiotlb.c             | 21 +++++++++++++--------
>>  7 files changed, 82 insertions(+), 22 deletions(-)
>> 
>
> [...]
>
>> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
>> index 291d7668cc8d..239d7e3bc16f 100644
>> --- a/drivers/irqchip/irq-gic-v3-its.c
>> +++ b/drivers/irqchip/irq-gic-v3-its.c
>> @@ -213,16 +213,17 @@ static gfp_t gfp_flags_quirk;
>>  static struct page *its_alloc_pages_node(int node, gfp_t gfp,
>>  					 unsigned int order)
>>  {
>> +	unsigned int new_order;
>>  	struct page *page;
>>  	int ret = 0;
>>  
>> -	page = alloc_pages_node(node, gfp | gfp_flags_quirk, order);
>> -
>> +	new_order = get_order(mem_decrypt_align((PAGE_SIZE << order)));
>> +	page = alloc_pages_node(node, gfp | gfp_flags_quirk, new_order);
>>  	if (!page)
>>  		return NULL;
>>  
>>  	ret = set_memory_decrypted((unsigned long)page_address(page),
>> -				   1 << order);
>> +				   1 << new_order);
>>  	/*
>>  	 * If set_memory_decrypted() fails then we don't know what state the
>>  	 * page is in, so we can't free it. Instead we leak it.
>> @@ -241,13 +242,16 @@ static struct page *its_alloc_pages(gfp_t gfp, unsigned int order)
>>  
>>  static void its_free_pages(void *addr, unsigned int order)
>>  {
>> +	int new_order;
>> +
>> +	new_order = get_order(mem_decrypt_align((PAGE_SIZE << order)));
>>  	/*
>>  	 * If the memory cannot be encrypted again then we must leak the pages.
>>  	 * set_memory_encrypted() will already have WARNed.
>>  	 */
>> -	if (set_memory_encrypted((unsigned long)addr, 1 << order))
>> +	if (set_memory_encrypted((unsigned long)addr, 1 << new_order))
>>  		return;
>> -	free_pages((unsigned long)addr, order);
>> +	free_pages((unsigned long)addr, new_order);
>>  }
>>
>
> Here's the non-obfuscated version of the two hunks above (and let it
> be on the record that New Order is a terrible, overrated band):
>
> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index 291d7668cc8da..a4d555aaee241 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -216,6 +216,7 @@ static struct page *its_alloc_pages_node(int node, gfp_t gfp,
>  	struct page *page;
>  	int ret = 0;
>  
> +	order = get_order(mem_decrypt_align(PAGE_SIZE << order));
>  	page = alloc_pages_node(node, gfp | gfp_flags_quirk, order);
>  
>  	if (!page)
> @@ -245,6 +246,7 @@ static void its_free_pages(void *addr, unsigned int order)
>  	 * If the memory cannot be encrypted again then we must leak the pages.
>  	 * set_memory_encrypted() will already have WARNed.
>  	 */
> +	order = get_order(mem_decrypt_align(PAGE_SIZE << order));
>  	if (set_memory_encrypted((unsigned long)addr, 1 << order))
>  		return;
>  	free_pages((unsigned long)addr, order);
>

I will include this in the next revision.


>>  static struct gen_pool *itt_pool;
>> @@ -268,11 +272,13 @@ static void *itt_alloc_pool(int node, int size)
>>  		if (addr)
>>  			break;
>>  
>> -		page = its_alloc_pages_node(node, GFP_KERNEL | __GFP_ZERO, 0);
>> +		page = its_alloc_pages_node(node, GFP_KERNEL | __GFP_ZERO,
>> +					    get_order(mem_decrypt_granule_size()));
>
> You already taught its_alloc_pages_node() about the decrypt granule
> size stuff. I don't think we need to see more of it (and you don't
> mess with the call that is just above it).
>
>>  		if (!page)
>>  			break;
>>  
>> -		gen_pool_add(itt_pool, (unsigned long)page_address(page), PAGE_SIZE, node);
>> +		gen_pool_add(itt_pool, (unsigned long)page_address(page),
>> +			     mem_decrypt_granule_size(), node);
>
> I'd rather see something like mem_decrypt_align(PAGE_SIZE), which
> keeps the intent clear.
>

The helper was added based on feedback from a previous version. I assume
you are suggesting that only this caller should switch?


-aneesh

  parent reply	other threads:[~2026-04-28 12:21 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-27  6:31 [PATCH v4 0/3] Enforce host page-size alignment for shared buffers Aneesh Kumar K.V (Arm)
2026-04-27  6:31 ` [PATCH v4 1/3] dma-direct: swiotlb: handle swiotlb alloc/free outside __dma_direct_alloc_pages Aneesh Kumar K.V (Arm)
2026-04-27  6:31 ` [PATCH v4 2/3] swiotlb: dma: its: Enforce host page-size alignment for shared buffers Aneesh Kumar K.V (Arm)
2026-04-27  9:27   ` Marc Zyngier
2026-04-27 13:38     ` Jason Gunthorpe
2026-04-28 12:20     ` Aneesh Kumar K.V [this message]
2026-04-28 13:31       ` Marc Zyngier
2026-04-27 13:49   ` Jason Gunthorpe
2026-04-28 12:22     ` Aneesh Kumar K.V
2026-04-27  6:31 ` [PATCH v4 3/3] coco: guest: arm64: Query host IPA-change alignment via RHI Aneesh Kumar K.V (Arm)
2026-04-27 10:33   ` Marc Zyngier
2026-04-28 12:49     ` Aneesh Kumar K.V
2026-04-28 13:49       ` Marc Zyngier
2026-04-28 15:22         ` Suzuki K Poulose
2026-04-28 13:56   ` Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=yq5aa4un1dju.fsf@kernel.org \
    --to=aneesh.kumar@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@ziepe.ca \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=m.szyprowski@samsung.com \
    --cc=maz@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=steven.price@arm.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tglx@kernel.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox