* RE: [PATCH v2 4/6] Drivers: hv: Mark shared memory as decrypted for CCA Realms
From: Kameron Carr @ 2026-06-26 11:08 UTC (permalink / raw)
To: 'Michael Kelley', kys, haiyangz, wei.liu, decui, longli
Cc: catalin.marinas, will, mark.rutland, lpieralisi, sudeep.holla,
arnd, thuth, linux-hyperv, linux-arm-kernel, linux-kernel,
linux-arch
In-Reply-To: <SN6PR02MB4157D5F94B5C5F35020FF047D4EC2@SN6PR02MB4157.namprd02.prod.outlook.com>
On Thursday, June 25, 2026 11:59 AM, Michael Kelley wrote:
> From: Kameron Carr <kameroncarr@linux.microsoft.com> Sent: Thursday,
> June 25, 2026 10:35 AM
> > We need to round up the memory allocated for the input/output pages to
> > the nearest PAGE_SIZE, since set_memory_decrypted() requires the size to
> > be a multiple of PAGE_SIZE. This only has an effect on ARM VMs that are
> > using PAGE_SIZE larger than 4K.
>
> I think this change resulted from a Sashiko comment. My understanding is
> that the ARM CCA architecture only supports CCA guests with 4 KiB page
> size. Is that still the case, or has that restriction been lifted in a
later version
> of the architecture? I'm in favor of handling the larger page sizes, if
only for
> future proofing. But I wondered whether your intent is to always support
> > 4 KiB page sizes even if CCA doesn't support them now. Another way to
> put it: In reviewing code, should I flag issues related to page sizes > 4
KiB?
I think you might be right. I'm looking at RMM spec 2.0 beta 2, and the RMI
can have granule size 4KB, 16KB, 64KB, but the RSI is restricted to granule
size
4KB.
I'm open to suggestion on best way to move forward.
> > @@ -499,14 +500,16 @@ int hv_common_cpu_init(unsigned int cpu)
> > }
> >
> > if (!ms_hyperv.paravisor_present &&
> > - (hv_isolation_type_snp() || hv_isolation_type_tdx())) {
> > - ret = set_memory_decrypted((unsigned long)mem,
> pgcount);
> > + (hv_isolation_type_snp() || hv_isolation_type_tdx() ||
> > + hv_isolation_type_cca())) {
> > + ret = set_memory_decrypted((unsigned
> long)kasan_reset_tag(mem),
> > + alloc_size >> PAGE_SHIFT);
>
> I don't know enough about KASAN or the tag situation on ARM64
> to comment on this change. But this same sequence of allocating
> memory and then decrypting it occurs in other places in Hyper-V
> code. It seems like those places would also need the same
> kasan_reset_tag() call.
I'm not sure of the exact behavior of PAGE_END when there are
KASAN tags, but it looks like tags could mess with the address
comparison.
I do see that __virt_to_phys_nodebug() and virt_addr_valid() in
arch/arm64/include/asm/memory.h both reset tags before calling
__is_lm_address().
If there are many calls that follow this pattern, it may be better to
add the tag reset in __set_memory_enc_dec().
I can undo this change.
Regards,
Kameron
^ permalink raw reply
* Re: [PATCH net v2 1/2] net: mana: Sync page pool RX frags for CPU
From: Simon Horman @ 2026-06-26 14:50 UTC (permalink / raw)
To: Dexuan Cui
Cc: kys, haiyangz, wei.liu, longli, andrew+netdev, davem, edumazet,
kuba, pabeni, kotaranov, ernis, dipayanroy, kees, jacob.e.keller,
ssengar, linux-hyperv, netdev, linux-kernel, linux-rdma, stable
In-Reply-To: <20260624222605.1794719-2-decui@microsoft.com>
On Wed, Jun 24, 2026 at 03:26:04PM -0700, Dexuan Cui wrote:
> MANA allocates RX buffers from page pool fragments when frag_count is
> greater than 1. In that case the buffers remain DMA mapped by page pool
> and the RX completion path does not call dma_unmap_single(). As a result,
> the implicit sync-for-CPU normally performed by dma_unmap_single() is
> missing before the packet data is passed to the networking stack.
>
> This breaks RX on configurations which require explicit DMA syncing, for
> example when booted with swiotlb=force.
>
> Fix this by recording the page pool page and DMA sync offset when the RX
> buffer is allocated, and syncing the received packet range for CPU access
> before handing the RX buffer to the stack.
>
> Fixes: 730ff06d3f5c ("net: mana: Use page pool fragments for RX buffers instead of full pages to improve memory efficiency.")
> Cc: stable@vger.kernel.org
> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
>
> Changes since v1:
> v1 is split into two patches in the v2.
> Add Haiyang's Reviewed-by.
>
> drivers/net/ethernet/microsoft/mana/mana_en.c | 39 +++++++++++++++----
> include/net/mana/mana.h | 8 ++++
> 2 files changed, 40 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index c9b1df1ed109..1875bffd82b7 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> @@ -2044,12 +2044,16 @@ static void mana_rx_skb(void *buf_va, bool from_pool,
> }
>
> static void *mana_get_rxfrag(struct mana_rxq *rxq, struct device *dev,
> - dma_addr_t *da, bool *from_pool)
> + dma_addr_t *da, bool *from_pool,
> + struct page **pp_page, u32 *dma_sync_offset)
> {
> struct page *page;
> u32 offset;
> void *va;
> +
> *from_pool = false;
> + *pp_page = NULL;
> + *dma_sync_offset = 0;
>
> /* Don't use fragments for jumbo frames or XDP where it's 1 fragment
> * per page.
> @@ -2087,31 +2091,47 @@ static void *mana_get_rxfrag(struct mana_rxq *rxq, struct device *dev,
> va = page_to_virt(page) + offset;
> *da = page_pool_get_dma_addr(page) + offset + rxq->headroom;
> *from_pool = true;
> + *pp_page = page;
> + *dma_sync_offset = offset + rxq->headroom;
>
> return va;
> }
>
> /* Allocate frag for rx buffer, and save the old buf */
> static void mana_refill_rx_oob(struct device *dev, struct mana_rxq *rxq,
> - struct mana_recv_buf_oob *rxoob, void **old_buf,
> - bool *old_fp)
> + struct mana_recv_buf_oob *rxoob, u32 pktlen,
> + void **old_buf, bool *old_fp)
> {
> + struct page *pp_page;
> + u32 dma_sync_offset;
> bool from_pool;
> dma_addr_t da;
> void *va;
>
> - va = mana_get_rxfrag(rxq, dev, &da, &from_pool);
> + va = mana_get_rxfrag(rxq, dev, &da, &from_pool, &pp_page,
> + &dma_sync_offset);
> if (!va)
> return;
> - if (!rxoob->from_pool || rxq->frag_count == 1)
> + if (!rxoob->from_pool || rxq->frag_count == 1) {
> dma_unmap_single(dev, rxoob->sgl[0].address, rxq->datasize,
> DMA_FROM_DEVICE);
> + } else {
> + /* The page pool maps the whole page and only syncs for device
> + * automatically (PP_FLAG_DMA_SYNC_DEV). Sync the received bytes
> + * for the CPU before they are read: this is required if DMA
> + * is incoherent or bounce buffers are used.
> + */
> + page_pool_dma_sync_for_cpu(rxq->page_pool, rxoob->pp_page,
> + rxoob->dma_sync_offset, pktlen);
> + }
Hi,
I'm sorry to be bothersome but I think that the order of the two patches
that comprise this series should be reversed. Or if that is not possible,
go back to a single patch.
Because, as flagged by https://netdev-ai.bots.linux.dev/sashiko/
Is pktlen here bounded before it reaches page_pool_dma_sync_for_cpu()?
The value originates from oob->ppi[i].pkt_len in mana_process_rx_cqe()
and is forwarded straight into this call with no comparison against
rxq->datasize or (rxq->alloc_size - rxoob->dma_sync_offset).
When SWIOTLB is in use (the swiotlb=force case explicitly called out in
the commit message), page_pool_dma_sync_for_cpu() reaches
dma_sync_single_range_for_cpu() and copies dma_sync_size bytes from the
bounce buffer back into the original page.
Since alloc_size can be smaller than PAGE_SIZE and multiple fragments
share a single page_pool page, can a pktlen larger than the fragment
extent here cause the copy-back to spill past this fragment into
neighbouring fragments that belong to other rxoobs still in flight?
If so, those neighbours may already have been or may shortly be passed
up via napi_gro_receive() in mana_rx_skb(), so the over-sync would
silently overwrite their payloads before the eventual skb_put() in
mana_build_skb() trips skb_over_panic() on this oversized packet.
Would it make sense to validate pktlen against rxq->datasize before
calling mana_refill_rx_oob()? The follow-up patch in this series,
"net: mana: Validate the packet length reported by the NIC" (commit
6c707fe658d6), adds exactly that check:
if (unlikely(pktlen > rxq->datasize))
...
Could that validation be folded into this patch so that the sync-for-CPU
introduced here cannot be steered with an attacker-controlled length,
particularly given that the motivating scenario (swiotlb=force) is the
Confidential VM case where the hypervisor-supplied CQE is untrusted?
^ permalink raw reply
* RE: [PATCH v2 4/6] Drivers: hv: Mark shared memory as decrypted for CCA Realms
From: Michael Kelley @ 2026-06-26 15:04 UTC (permalink / raw)
To: Kameron Carr, Michael Kelley, kys@microsoft.com,
haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com,
longli@microsoft.com
Cc: catalin.marinas@arm.com, will@kernel.org, mark.rutland@arm.com,
lpieralisi@kernel.org, sudeep.holla@kernel.org, arnd@arndb.de,
thuth@redhat.com, linux-hyperv@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org
In-Reply-To: <000801dd055c$2e375050$8aa5f0f0$@linux.microsoft.com>
From: Kameron Carr <kameroncarr@linux.microsoft.com> Sent: Friday, June 26, 2026 4:09 AM
>
> On Thursday, June 25, 2026 11:59 AM, Michael Kelley wrote:
> > From: Kameron Carr <kameroncarr@linux.microsoft.com> Sent: Thursday,
> > June 25, 2026 10:35 AM
> > > We need to round up the memory allocated for the input/output pages to
> > > the nearest PAGE_SIZE, since set_memory_decrypted() requires the size to
> > > be a multiple of PAGE_SIZE. This only has an effect on ARM VMs that are
> > > using PAGE_SIZE larger than 4K.
> >
> > I think this change resulted from a Sashiko comment. My understanding is
> > that the ARM CCA architecture only supports CCA guests with 4 KiB page
> > size. Is that still the case, or has that restriction been lifted in a later version
> > of the architecture? I'm in favor of handling the larger page sizes, if only for
> > future proofing. But I wondered whether your intent is to always support
> > > 4 KiB page sizes even if CCA doesn't support them now. Another way to
> > put it: In reviewing code, should I flag issues related to page sizes > 4 KiB?
>
> I think you might be right. I'm looking at RMM spec 2.0 beta 2, and the RMI
> can have granule size 4KB, 16KB, 64KB, but the RSI is restricted to granule size
> 4KB.
>
> I'm open to suggestion on best way to move forward.
The best approach probably depends on whether the 4 KiB restriction is
likely to be lifted in a future version of the CCA architecture, and I don't have
any insight into that.
If it is likely to be lifted, then doing the initial implementation to support
larger page sizes probably makes sense (which is what you've done here).
It's less work than going back and adding later. But the commit message
and/or code comments should indicate that the larger page size support
is future-proofing work, so that someone doesn't get the wrong idea that
it should work with larger page sizes now.
The alternate approach is to not do any larger page size support now,
and to explicitly state that the code is assuming the current restriction
of 4 KiB page size only.
Whichever approach is chosen should be used consistently so there's
not a mishmash.
>
> > > @@ -499,14 +500,16 @@ int hv_common_cpu_init(unsigned int cpu)
> > > }
> > >
> > > if (!ms_hyperv.paravisor_present &&
> > > - (hv_isolation_type_snp() || hv_isolation_type_tdx())) {
> > > - ret = set_memory_decrypted((unsigned long)mem, pgcount);
> > > + (hv_isolation_type_snp() || hv_isolation_type_tdx() ||
> > > + hv_isolation_type_cca())) {
> > > + ret = set_memory_decrypted((unsigned long)kasan_reset_tag(mem),
> > > + alloc_size >> PAGE_SHIFT);
> >
> > I don't know enough about KASAN or the tag situation on ARM64
> > to comment on this change. But this same sequence of allocating
> > memory and then decrypting it occurs in other places in Hyper-V
> > code. It seems like those places would also need the same
> > kasan_reset_tag() call.
>
> I'm not sure of the exact behavior of PAGE_END when there are
> KASAN tags, but it looks like tags could mess with the address
> comparison.
>
> I do see that __virt_to_phys_nodebug() and virt_addr_valid() in
> arch/arm64/include/asm/memory.h both reset tags before calling
> __is_lm_address().
>
> If there are many calls that follow this pattern, it may be better to
> add the tag reset in __set_memory_enc_dec().
I had the same thought.
>
> I can undo this change.
>
Unfortunately, I don't know whether undoing it is right or not. I
haven't taken the time to go learn about the whole scheme and its
implications.
Michael
^ permalink raw reply
page: | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox