From: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
To: Marc Zyngier <maz@kernel.org>
Cc: linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
linux-coco@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
kvmarm@lists.linux.dev, Catalin Marinas <catalin.marinas@arm.com>,
Jason Gunthorpe <jgg@ziepe.ca>,
Marek Szyprowski <m.szyprowski@samsung.com>,
Robin Murphy <robin.murphy@arm.com>,
Steven Price <steven.price@arm.com>,
Suzuki K Poulose <suzuki.poulose@arm.com>,
Thomas Gleixner <tglx@kernel.org>, Will Deacon <will@kernel.org>
Subject: Re: [PATCH v4 2/3] swiotlb: dma: its: Enforce host page-size alignment for shared buffers
Date: Tue, 28 Apr 2026 17:50:53 +0530 [thread overview]
Message-ID: <yq5aa4un1dju.fsf@kernel.org> (raw)
In-Reply-To: <86zf2ozrb8.wl-maz@kernel.org>
Marc Zyngier <maz@kernel.org> writes:
> On Mon, 27 Apr 2026 07:31:07 +0100,
> "Aneesh Kumar K.V (Arm)" <aneesh.kumar@kernel.org> wrote:
>>
>> When running private-memory guests, the guest kernel must apply additional
>> constraints when allocating buffers that are shared with the hypervisor.
>>
>> These shared buffers are also accessed by the host kernel and therefore
>> must be aligned to the host’s page size, and have a size that is a multiple
>> of the host page size.
>>
>> On non-secure hosts, set_guest_memory_attributes() tracks memory at the
>> host PAGE_SIZE granularity. This creates a mismatch when the guest applies
>> attributes at 4K boundaries while the host uses 64K pages. In such cases,
>> set_guest_memory_attributes() call returns -EINVAL, preventing the
>> conversion of memory regions from private to shared.
>>
>> Architectures such as Arm can tolerate realm physical address space
>> (protected memory) PFNs being mapped as shared memory, as incorrect
>> accesses are detected and reported as GPC faults. However, relying on this
>> mechanism is unsafe and can still lead to kernel crashes.
>>
>> This is particularly likely when guest_memfd allocations are mmapped and
>> accessed from userspace. Once exposed to userspace, we cannot guarantee
>> that applications will only access the intended 4K shared region rather
>> than the full 64K page mapped into their address space. Such userspace
>> addresses may also be passed back into the kernel and accessed via the
>> linear map, resulting in a GPC fault and a kernel crash.
>>
>> With CCA, although Stage-2 mappings managed by the RMM still operate at a
>> 4K granularity, shared pages must nonetheless be aligned to the
>> host-managed page size and sized as whole host pages to avoid the issues
>> described above.
>
> I thought that was being fixed, and that there was now a strong
> guarantee that RMM and host are aligned on the page size. Even more,
> S2 is totally irrelevant here. The only thing that matters is the host
> page size vs the guest page size. Nothing else.
>
Yes, the latest RMM update includes the ability to change the granule
size.
The section above in the commit message was intended to explain that the
S2 mapping size is irrelevant. I agree it is not clear as written, so I
will reword it to improve clarity.
>
>>
>> Introduce a new helper, mem_decrypt_align(), to allow callers to enforce
>> the required alignment and size constraints for shared buffers.
>>
>> The architecture-specific implementation of mem_decrypt_align() will be
>> provided in a follow-up patch.
>>
>> Note on restricted-dma-pool:
>> rmem_swiotlb_device_init() uses reserved-memory regions described by
>> firmware. Those regions are not changed in-kernel to satisfy host granule
>> alignment. This is intentional: we do not expect restricted-dma-pool
>> allocations to be used with CCA. If restricted-dma-pool is intended for CCA
>> shared use, firmware must provide base/size aligned to the host IPA-change
>> granule.
>>
>> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
>> ---
>> arch/arm64/mm/mem_encrypt.c | 19 +++++++++++++++----
>> drivers/irqchip/irq-gic-v3-its.c | 20 +++++++++++++-------
>> include/linux/mem_encrypt.h | 14 ++++++++++++++
>> kernel/dma/contiguous.c | 10 ++++++++++
>> kernel/dma/direct.c | 16 ++++++++++++++--
>> kernel/dma/pool.c | 4 +++-
>> kernel/dma/swiotlb.c | 21 +++++++++++++--------
>> 7 files changed, 82 insertions(+), 22 deletions(-)
>>
>
> [...]
>
>> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
>> index 291d7668cc8d..239d7e3bc16f 100644
>> --- a/drivers/irqchip/irq-gic-v3-its.c
>> +++ b/drivers/irqchip/irq-gic-v3-its.c
>> @@ -213,16 +213,17 @@ static gfp_t gfp_flags_quirk;
>> static struct page *its_alloc_pages_node(int node, gfp_t gfp,
>> unsigned int order)
>> {
>> + unsigned int new_order;
>> struct page *page;
>> int ret = 0;
>>
>> - page = alloc_pages_node(node, gfp | gfp_flags_quirk, order);
>> -
>> + new_order = get_order(mem_decrypt_align((PAGE_SIZE << order)));
>> + page = alloc_pages_node(node, gfp | gfp_flags_quirk, new_order);
>> if (!page)
>> return NULL;
>>
>> ret = set_memory_decrypted((unsigned long)page_address(page),
>> - 1 << order);
>> + 1 << new_order);
>> /*
>> * If set_memory_decrypted() fails then we don't know what state the
>> * page is in, so we can't free it. Instead we leak it.
>> @@ -241,13 +242,16 @@ static struct page *its_alloc_pages(gfp_t gfp, unsigned int order)
>>
>> static void its_free_pages(void *addr, unsigned int order)
>> {
>> + int new_order;
>> +
>> + new_order = get_order(mem_decrypt_align((PAGE_SIZE << order)));
>> /*
>> * If the memory cannot be encrypted again then we must leak the pages.
>> * set_memory_encrypted() will already have WARNed.
>> */
>> - if (set_memory_encrypted((unsigned long)addr, 1 << order))
>> + if (set_memory_encrypted((unsigned long)addr, 1 << new_order))
>> return;
>> - free_pages((unsigned long)addr, order);
>> + free_pages((unsigned long)addr, new_order);
>> }
>>
>
> Here's the non-obfuscated version of the two hunks above (and let it
> be on the record that New Order is a terrible, overrated band):
>
> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index 291d7668cc8da..a4d555aaee241 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -216,6 +216,7 @@ static struct page *its_alloc_pages_node(int node, gfp_t gfp,
> struct page *page;
> int ret = 0;
>
> + order = get_order(mem_decrypt_align(PAGE_SIZE << order));
> page = alloc_pages_node(node, gfp | gfp_flags_quirk, order);
>
> if (!page)
> @@ -245,6 +246,7 @@ static void its_free_pages(void *addr, unsigned int order)
> * If the memory cannot be encrypted again then we must leak the pages.
> * set_memory_encrypted() will already have WARNed.
> */
> + order = get_order(mem_decrypt_align(PAGE_SIZE << order));
> if (set_memory_encrypted((unsigned long)addr, 1 << order))
> return;
> free_pages((unsigned long)addr, order);
>
I will include this in the next revision.
>> static struct gen_pool *itt_pool;
>> @@ -268,11 +272,13 @@ static void *itt_alloc_pool(int node, int size)
>> if (addr)
>> break;
>>
>> - page = its_alloc_pages_node(node, GFP_KERNEL | __GFP_ZERO, 0);
>> + page = its_alloc_pages_node(node, GFP_KERNEL | __GFP_ZERO,
>> + get_order(mem_decrypt_granule_size()));
>
> You already taught its_alloc_pages_node() about the decrypt granule
> size stuff. I don't think we need to see more of it (and you don't
> mess with the call that is just above it).
>
>> if (!page)
>> break;
>>
>> - gen_pool_add(itt_pool, (unsigned long)page_address(page), PAGE_SIZE, node);
>> + gen_pool_add(itt_pool, (unsigned long)page_address(page),
>> + mem_decrypt_granule_size(), node);
>
> I'd rather see something like mem_decrypt_align(PAGE_SIZE), which
> keeps the intent clear.
>
The helper was added based on feedback from a previous version. I assume
you are suggesting that only this caller should switch?
-aneesh
next prev parent reply other threads:[~2026-04-28 12:21 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-27 6:31 [PATCH v4 0/3] Enforce host page-size alignment for shared buffers Aneesh Kumar K.V (Arm)
2026-04-27 6:31 ` [PATCH v4 1/3] dma-direct: swiotlb: handle swiotlb alloc/free outside __dma_direct_alloc_pages Aneesh Kumar K.V (Arm)
2026-04-27 6:31 ` [PATCH v4 2/3] swiotlb: dma: its: Enforce host page-size alignment for shared buffers Aneesh Kumar K.V (Arm)
2026-04-27 9:27 ` Marc Zyngier
2026-04-27 13:38 ` Jason Gunthorpe
2026-04-28 12:20 ` Aneesh Kumar K.V [this message]
2026-04-28 13:31 ` Marc Zyngier
2026-04-27 13:49 ` Jason Gunthorpe
2026-04-28 12:22 ` Aneesh Kumar K.V
2026-04-27 6:31 ` [PATCH v4 3/3] coco: guest: arm64: Query host IPA-change alignment via RHI Aneesh Kumar K.V (Arm)
2026-04-27 10:33 ` Marc Zyngier
2026-04-28 12:49 ` Aneesh Kumar K.V
2026-04-28 13:49 ` Marc Zyngier
2026-04-28 15:22 ` Suzuki K Poulose
2026-04-28 13:56 ` Will Deacon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=yq5aa4un1dju.fsf@kernel.org \
--to=aneesh.kumar@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=iommu@lists.linux.dev \
--cc=jgg@ziepe.ca \
--cc=kvmarm@lists.linux.dev \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-coco@lists.linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=m.szyprowski@samsung.com \
--cc=maz@kernel.org \
--cc=robin.murphy@arm.com \
--cc=steven.price@arm.com \
--cc=suzuki.poulose@arm.com \
--cc=tglx@kernel.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox