public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
From: Catalin Marinas <catalin.marinas@arm.com>
To: john.cox@raspberrypi.com
Cc: Will Deacon <will@kernel.org>,
	linux-arm-kernel@lists.infradead.org,
	Robin Murphy <robin.murphy@arm.com>
Subject: Re: [PATCH] arm64/dma-mapping: Fix arch_sync_dma_for_device to respect dir parameter
Date: Wed, 20 Aug 2025 14:25:27 +0100	[thread overview]
Message-ID: <aKXMx5RZCy6cuc33@arm.com> (raw)
In-Reply-To: <20250820-arm64-dma-direction-fix-v1-1-818a4ca8f879@raspberrypi.com>

On Wed, Aug 20, 2025 at 11:28:06AM +0100, John Cox via B4 Relay wrote:
> All other architectures do different cache operations depending on the
> dir parameter. Fix arm64 to do the same.

I suspect that's a bug in the users of the DMA API. We shouldn't modify
the arm64 implementation to cope with them.

> This fixes udmabuf operations when syncing for read e.g. when the CPU
> reads back a V4L2 decoded frame buffer.
> 
> Signed-off-by: John Cox <john.cox@raspberrypi.com>
> ---
> This patch makes the arch_sync_dma_for_device function on arm64
> do different things depending on the value of the dir parameter. In
> particular it does a cache invalidate operation if the dir flag is
> set to DMA_FROM_DEVICE. The current code does a writeback without
> invalidate under all circumstances. Nearly all other architectures do
> an invalidate if the direction is FROM_DEVICE which seems like the
> correct thing to do to me.

So does arm64 but in the arch_sync_dma_for_cpu(). That's the correct
place to do it, otherwise after arch_sync_dma_for_device() you may have
speculative loads by the CPU populating the caches with stale data
before the device finished writing.

> This patch fixes a problem I was having with udmabuf allocated
> dmabufs. It also fixes a very similar problem I had with dma_heap
> allocated dmabuf but that occured very much less frequently and I
> haven't traced exactly what was going on there.
> 
> My problem (on a Raspberry Pi5):
> 
> [Userland]
> Alloc memory with memfd_create + ftruncate
> Derive dmabuf from memfd with udmabuf
> Close memfd
> Queue dmabuf into V4L2 with QBUF
> <decode a video frame>
> Extract dmabuf from V4L2 with DQBUF
> Map dmabuf for read with mmap
> Sync for read with DMA_BUF_IOCTL_SYNC with (DMA_BUF_SYNC_START |
> DMA_BUF_SYNC_READ)
> Read buffer
> Sync end
> Unmap

Between the device writing to the buffer and the "read buffer" step
above, is there a call to arch_sync_dma_for_cpu()? A quick look at
begin_cpu_udmabuf() shows a dma_sync_sgtable_for_cpu(), though there is
a branch where this is skipped. get_sg_table() seems to do a DMA map
which I think ends up in arch_sync_dma_for_device() but the sync
for-CPU is skipped.

An attempt to a udmabuf fix (untested):

diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
index 40399c26e6be..9ab4a6c01143 100644
--- a/drivers/dma-buf/udmabuf.c
+++ b/drivers/dma-buf/udmabuf.c
@@ -256,10 +256,11 @@ static int begin_cpu_udmabuf(struct dma_buf *buf,
 			ret = PTR_ERR(ubuf->sg);
 			ubuf->sg = NULL;
 		}
-	} else {
-		dma_sync_sgtable_for_cpu(dev, ubuf->sg, direction);
 	}
 
+	if (ubuf->sg)
+		dma_sync_sgtable_for_cpu(dev, ubuf->sg, direction);
+
 	return ret;
 }

> I get old (zero) data out of the "Read buffer" stage in some cache
> lines sometimes.
> It doesn't matter which way round the mmap & sync are.
> 
> I am aware that there is a patchset going through for udmabuf that may
> well fix the udmabuf case above, but given that this patch fixes
> something similar in dma_heap/system too I think it is still worth
> having.
> ---
>  arch/arm64/mm/dma-mapping.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> index b2b5792b2caaf81ccfc3204c94395bb0faeabddd..51c43c1f563015139e365ed86f0f5f0d9483fa7f 100644
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
> @@ -16,8 +16,22 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size,
>  			      enum dma_data_direction dir)
>  {
>  	unsigned long start = (unsigned long)phys_to_virt(paddr);
> +	unsigned long end = start + size;
>  
> -	dcache_clean_poc(start, start + size);
> +	switch (dir) {
> +	case DMA_BIDIRECTIONAL:
> +		dcache_clean_inval_poc(start, end);
> +		break;
> +	case DMA_TO_DEVICE:
> +		dcache_clean_poc(start, end);
> +		break;
> +	case DMA_FROM_DEVICE:
> +		dcache_inval_poc(start, end);
> +		break;
> +	case DMA_NONE:
> +	default:
> +		break;
> +	}
>  }

As explained above, that's not the right fix. We need to identify what's
missing on the ioctl() paths.

-- 
Catalin


  reply	other threads:[~2025-08-20 14:10 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-20 10:28 [PATCH] arm64/dma-mapping: Fix arch_sync_dma_for_device to respect dir parameter John Cox via B4 Relay
2025-08-20 13:25 ` Catalin Marinas [this message]
2025-08-20 14:08   ` Robin Murphy
2025-08-20 14:43     ` John Cox
2025-08-20 15:16       ` Catalin Marinas
2025-08-20 15:35         ` John Cox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aKXMx5RZCy6cuc33@arm.com \
    --to=catalin.marinas@arm.com \
    --cc=john.cox@raspberrypi.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=robin.murphy@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox