From: Catalin Marinas <catalin.marinas@arm.com>
To: john.cox@raspberrypi.com
Cc: Will Deacon <will@kernel.org>,
linux-arm-kernel@lists.infradead.org,
Robin Murphy <robin.murphy@arm.com>
Subject: Re: [PATCH] arm64/dma-mapping: Fix arch_sync_dma_for_device to respect dir parameter
Date: Wed, 20 Aug 2025 14:25:27 +0100 [thread overview]
Message-ID: <aKXMx5RZCy6cuc33@arm.com> (raw)
In-Reply-To: <20250820-arm64-dma-direction-fix-v1-1-818a4ca8f879@raspberrypi.com>
On Wed, Aug 20, 2025 at 11:28:06AM +0100, John Cox via B4 Relay wrote:
> All other architectures do different cache operations depending on the
> dir parameter. Fix arm64 to do the same.
I suspect that's a bug in the users of the DMA API. We shouldn't modify
the arm64 implementation to cope with them.
> This fixes udmabuf operations when syncing for read e.g. when the CPU
> reads back a V4L2 decoded frame buffer.
>
> Signed-off-by: John Cox <john.cox@raspberrypi.com>
> ---
> This patch makes the arch_sync_dma_for_device function on arm64
> do different things depending on the value of the dir parameter. In
> particular it does a cache invalidate operation if the dir flag is
> set to DMA_FROM_DEVICE. The current code does a writeback without
> invalidate under all circumstances. Nearly all other architectures do
> an invalidate if the direction is FROM_DEVICE which seems like the
> correct thing to do to me.
So does arm64 but in the arch_sync_dma_for_cpu(). That's the correct
place to do it, otherwise after arch_sync_dma_for_device() you may have
speculative loads by the CPU populating the caches with stale data
before the device finished writing.
> This patch fixes a problem I was having with udmabuf allocated
> dmabufs. It also fixes a very similar problem I had with dma_heap
> allocated dmabuf but that occured very much less frequently and I
> haven't traced exactly what was going on there.
>
> My problem (on a Raspberry Pi5):
>
> [Userland]
> Alloc memory with memfd_create + ftruncate
> Derive dmabuf from memfd with udmabuf
> Close memfd
> Queue dmabuf into V4L2 with QBUF
> <decode a video frame>
> Extract dmabuf from V4L2 with DQBUF
> Map dmabuf for read with mmap
> Sync for read with DMA_BUF_IOCTL_SYNC with (DMA_BUF_SYNC_START |
> DMA_BUF_SYNC_READ)
> Read buffer
> Sync end
> Unmap
Between the device writing to the buffer and the "read buffer" step
above, is there a call to arch_sync_dma_for_cpu()? A quick look at
begin_cpu_udmabuf() shows a dma_sync_sgtable_for_cpu(), though there is
a branch where this is skipped. get_sg_table() seems to do a DMA map
which I think ends up in arch_sync_dma_for_device() but the sync
for-CPU is skipped.
An attempt to a udmabuf fix (untested):
diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
index 40399c26e6be..9ab4a6c01143 100644
--- a/drivers/dma-buf/udmabuf.c
+++ b/drivers/dma-buf/udmabuf.c
@@ -256,10 +256,11 @@ static int begin_cpu_udmabuf(struct dma_buf *buf,
ret = PTR_ERR(ubuf->sg);
ubuf->sg = NULL;
}
- } else {
- dma_sync_sgtable_for_cpu(dev, ubuf->sg, direction);
}
+ if (ubuf->sg)
+ dma_sync_sgtable_for_cpu(dev, ubuf->sg, direction);
+
return ret;
}
> I get old (zero) data out of the "Read buffer" stage in some cache
> lines sometimes.
> It doesn't matter which way round the mmap & sync are.
>
> I am aware that there is a patchset going through for udmabuf that may
> well fix the udmabuf case above, but given that this patch fixes
> something similar in dma_heap/system too I think it is still worth
> having.
> ---
> arch/arm64/mm/dma-mapping.c | 16 +++++++++++++++-
> 1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> index b2b5792b2caaf81ccfc3204c94395bb0faeabddd..51c43c1f563015139e365ed86f0f5f0d9483fa7f 100644
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
> @@ -16,8 +16,22 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size,
> enum dma_data_direction dir)
> {
> unsigned long start = (unsigned long)phys_to_virt(paddr);
> + unsigned long end = start + size;
>
> - dcache_clean_poc(start, start + size);
> + switch (dir) {
> + case DMA_BIDIRECTIONAL:
> + dcache_clean_inval_poc(start, end);
> + break;
> + case DMA_TO_DEVICE:
> + dcache_clean_poc(start, end);
> + break;
> + case DMA_FROM_DEVICE:
> + dcache_inval_poc(start, end);
> + break;
> + case DMA_NONE:
> + default:
> + break;
> + }
> }
As explained above, that's not the right fix. We need to identify what's
missing on the ioctl() paths.
--
Catalin
next prev parent reply other threads:[~2025-08-20 14:10 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-20 10:28 [PATCH] arm64/dma-mapping: Fix arch_sync_dma_for_device to respect dir parameter John Cox via B4 Relay
2025-08-20 13:25 ` Catalin Marinas [this message]
2025-08-20 14:08 ` Robin Murphy
2025-08-20 14:43 ` John Cox
2025-08-20 15:16 ` Catalin Marinas
2025-08-20 15:35 ` John Cox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aKXMx5RZCy6cuc33@arm.com \
--to=catalin.marinas@arm.com \
--cc=john.cox@raspberrypi.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=robin.murphy@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox