linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] Allow non-coherent video capture buffers on Rockchip ISP V1
@ 2025-01-28 20:35 Mikhail Rudenko
  2025-01-28 20:35 ` [PATCH v3 1/2] media: videobuf2: Fix dmabuf cache sync/flush in dma-contig Mikhail Rudenko
  2025-01-28 20:35 ` [PATCH v3 2/2] media: rkisp1: Allow non-coherent video capture buffers Mikhail Rudenko
  0 siblings, 2 replies; 7+ messages in thread
From: Mikhail Rudenko @ 2025-01-28 20:35 UTC (permalink / raw)
  To: Dafna Hirschfeld, Laurent Pinchart, Mauro Carvalho Chehab,
	Heiko Stuebner, Tomasz Figa, Marek Szyprowski, Hans Verkuil,
	Sergey Senozhatsky
  Cc: linux-media, linux-rockchip, linux-arm-kernel, linux-kernel,
	Mauro Carvalho Chehab, Mikhail Rudenko, stable

This small series adds support for non-coherent video capture buffers
on Rockchip ISP V1. Patch 1 fixes cache management for dmabuf's
allocated by dma-contig allocator. Patch 2 allows non-coherent
allocations on the rkisp1 capture queue. Some timing measurements are
provided in the commit message of patch 2.

Signed-off-by: Mikhail Rudenko <mike.rudenko@gmail.com>
---
Changes in v3:
- ignore skip_cache_sync_* flags in vb2_dc_dmabuf_ops_{begin,end}_cpu_access
- invalidate/flush kernel mappings as appropriate if they exist
- use dma_sync_sgtable_* instead of dma_sync_sg_*
- Link to v2: https://lore.kernel.org/r/20250115-b4-rkisp-noncoherent-v2-0-0853e1a24012@gmail.com

Changes in v2:
- Fix vb2_dc_dmabuf_ops_{begin,end}_cpu_access() for non-coherent buffers.
- Add cache management timing information to patch 2 commit message.
- Link to v1: https://lore.kernel.org/r/20250102-b4-rkisp-noncoherent-v1-1-bba164f7132c@gmail.com

---
Mikhail Rudenko (2):
      media: videobuf2: Fix dmabuf cache sync/flush in dma-contig
      media: rkisp1: Allow non-coherent video capture buffers

 .../media/common/videobuf2/videobuf2-dma-contig.c  | 22 ++++++++++++++++++++++
 .../platform/rockchip/rkisp1/rkisp1-capture.c      |  1 +
 2 files changed, 23 insertions(+)
---
base-commit: c4b7779abc6633677e6edb79e2809f4f61fde157
change-id: 20241231-b4-rkisp-noncoherent-ad6e7c7a68ba

Best regards,
-- 
Mikhail Rudenko <mike.rudenko@gmail.com>



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v3 1/2] media: videobuf2: Fix dmabuf cache sync/flush in dma-contig
  2025-01-28 20:35 [PATCH v3 0/2] Allow non-coherent video capture buffers on Rockchip ISP V1 Mikhail Rudenko
@ 2025-01-28 20:35 ` Mikhail Rudenko
  2025-02-07 16:22   ` Tomasz Figa
  2025-02-28  3:30   ` Nicolas Dufresne
  2025-01-28 20:35 ` [PATCH v3 2/2] media: rkisp1: Allow non-coherent video capture buffers Mikhail Rudenko
  1 sibling, 2 replies; 7+ messages in thread
From: Mikhail Rudenko @ 2025-01-28 20:35 UTC (permalink / raw)
  To: Dafna Hirschfeld, Laurent Pinchart, Mauro Carvalho Chehab,
	Heiko Stuebner, Tomasz Figa, Marek Szyprowski, Hans Verkuil,
	Sergey Senozhatsky
  Cc: linux-media, linux-rockchip, linux-arm-kernel, linux-kernel,
	Mauro Carvalho Chehab, Mikhail Rudenko, stable

When support for V4L2_FLAG_MEMORY_NON_CONSISTENT was removed in
commit 129134e5415d ("media: media/v4l2: remove
V4L2_FLAG_MEMORY_NON_CONSISTENT flag"),
vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions were made
no-ops. Later, when support for V4L2_MEMORY_FLAG_NON_COHERENT was
introduced in commit c0acf9cfeee0 ("media: videobuf2: handle
V4L2_MEMORY_FLAG_NON_COHERENT flag"), the above functions remained
no-ops, making cache maintenance for non-coherent dmabufs allocated by
dma-contig impossible.

Fix this by reintroducing dma_sync_sgtable_for_{cpu,device} and
{flush,invalidate}_kernel_vmap_range calls to
vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions for non-coherent
buffers.

Fixes: c0acf9cfeee0 ("media: videobuf2: handle V4L2_MEMORY_FLAG_NON_COHERENT flag")
Cc: stable@vger.kernel.org
Signed-off-by: Mikhail Rudenko <mike.rudenko@gmail.com>
---
 .../media/common/videobuf2/videobuf2-dma-contig.c  | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
index bb0b7fa67b539aa73ad5ccf3c3bc318e26f8a4cb..146d7997a0da5989fb081a6f28ce0641fe726e63 100644
--- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
+++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
@@ -427,6 +427,17 @@ static int
 vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf,
 				   enum dma_data_direction direction)
 {
+	struct vb2_dc_buf *buf = dbuf->priv;
+	struct sg_table *sgt = buf->dma_sgt;
+
+	if (!buf->non_coherent_mem)
+		return 0;
+
+	if (buf->vaddr)
+		invalidate_kernel_vmap_range(buf->vaddr, buf->size);
+
+	dma_sync_sgtable_for_cpu(buf->dev, sgt, buf->dma_dir);
+
 	return 0;
 }
 
@@ -434,6 +445,17 @@ static int
 vb2_dc_dmabuf_ops_end_cpu_access(struct dma_buf *dbuf,
 				 enum dma_data_direction direction)
 {
+	struct vb2_dc_buf *buf = dbuf->priv;
+	struct sg_table *sgt = buf->dma_sgt;
+
+	if (!buf->non_coherent_mem)
+		return 0;
+
+	if (buf->vaddr)
+		flush_kernel_vmap_range(buf->vaddr, buf->size);
+
+	dma_sync_sgtable_for_device(buf->dev, sgt, buf->dma_dir);
+
 	return 0;
 }
 

-- 
2.47.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v3 2/2] media: rkisp1: Allow non-coherent video capture buffers
  2025-01-28 20:35 [PATCH v3 0/2] Allow non-coherent video capture buffers on Rockchip ISP V1 Mikhail Rudenko
  2025-01-28 20:35 ` [PATCH v3 1/2] media: videobuf2: Fix dmabuf cache sync/flush in dma-contig Mikhail Rudenko
@ 2025-01-28 20:35 ` Mikhail Rudenko
  1 sibling, 0 replies; 7+ messages in thread
From: Mikhail Rudenko @ 2025-01-28 20:35 UTC (permalink / raw)
  To: Dafna Hirschfeld, Laurent Pinchart, Mauro Carvalho Chehab,
	Heiko Stuebner, Tomasz Figa, Marek Szyprowski, Hans Verkuil,
	Sergey Senozhatsky
  Cc: linux-media, linux-rockchip, linux-arm-kernel, linux-kernel,
	Mauro Carvalho Chehab, Mikhail Rudenko

Currently, the rkisp1 driver always uses coherent DMA allocations for
video capture buffers. However, on some platforms, using non-coherent
buffers can improve performance, especially when CPU processing of
MMAP'ed video buffers is required.

For example, on the Rockchip RK3399 running at maximum CPU frequency,
the time to memcpy a frame from a 1280x720 XRGB32 MMAP'ed buffer to a
malloc'ed userspace buffer decreases from 7.7 ms to 1.1 ms when using
non-coherent DMA allocation. When doing cache management with
DMA_BUF_IOCTL_SYNC, DMA_BUF_SYNC_START and DMA_BUF_SYNC_END operations
take around 270 us each.

This change allows userspace to request the allocation of non-coherent
buffers. Note that the behavior for existing users will remain unchanged
unless they explicitly set the V4L2_MEMORY_FLAG_NON_COHERENT flag when
allocating buffers.

Signed-off-by: Mikhail Rudenko <mike.rudenko@gmail.com>
---
 drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
index 6dcefd144d5abe358323e37ac6133c6134ac636e..c94f7d1d73a92646457a27da20726ec6f92e7717 100644
--- a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
+++ b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
@@ -1563,6 +1563,7 @@ static int rkisp1_register_capture(struct rkisp1_capture *cap)
 	q->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
 	q->lock = &node->vlock;
 	q->dev = cap->rkisp1->dev;
+	q->allow_cache_hints = 1;
 	ret = vb2_queue_init(q);
 	if (ret) {
 		dev_err(cap->rkisp1->dev,

-- 
2.47.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 1/2] media: videobuf2: Fix dmabuf cache sync/flush in dma-contig
  2025-01-28 20:35 ` [PATCH v3 1/2] media: videobuf2: Fix dmabuf cache sync/flush in dma-contig Mikhail Rudenko
@ 2025-02-07 16:22   ` Tomasz Figa
  2025-02-28  3:30   ` Nicolas Dufresne
  1 sibling, 0 replies; 7+ messages in thread
From: Tomasz Figa @ 2025-02-07 16:22 UTC (permalink / raw)
  To: Mikhail Rudenko, Christoph Hellwig, Robin Murphy, Hans Verkuil
  Cc: Dafna Hirschfeld, Laurent Pinchart, Mauro Carvalho Chehab,
	Heiko Stuebner, Marek Szyprowski, Sergey Senozhatsky, linux-media,
	linux-rockchip, linux-arm-kernel, linux-kernel,
	Mauro Carvalho Chehab, stable

On Tue, Jan 28, 2025 at 9:36 PM Mikhail Rudenko <mike.rudenko@gmail.com> wrote:
>
> When support for V4L2_FLAG_MEMORY_NON_CONSISTENT was removed in
> commit 129134e5415d ("media: media/v4l2: remove
> V4L2_FLAG_MEMORY_NON_CONSISTENT flag"),
> vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions were made
> no-ops. Later, when support for V4L2_MEMORY_FLAG_NON_COHERENT was
> introduced in commit c0acf9cfeee0 ("media: videobuf2: handle
> V4L2_MEMORY_FLAG_NON_COHERENT flag"), the above functions remained
> no-ops, making cache maintenance for non-coherent dmabufs allocated by
> dma-contig impossible.
>
> Fix this by reintroducing dma_sync_sgtable_for_{cpu,device} and
> {flush,invalidate}_kernel_vmap_range calls to
> vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions for non-coherent
> buffers.
>
> Fixes: c0acf9cfeee0 ("media: videobuf2: handle V4L2_MEMORY_FLAG_NON_COHERENT flag")
> Cc: stable@vger.kernel.org
> Signed-off-by: Mikhail Rudenko <mike.rudenko@gmail.com>
> ---
>  .../media/common/videobuf2/videobuf2-dma-contig.c  | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
>
> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> index bb0b7fa67b539aa73ad5ccf3c3bc318e26f8a4cb..146d7997a0da5989fb081a6f28ce0641fe726e63 100644
> --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> @@ -427,6 +427,17 @@ static int
>  vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf,
>                                    enum dma_data_direction direction)
>  {
> +       struct vb2_dc_buf *buf = dbuf->priv;
> +       struct sg_table *sgt = buf->dma_sgt;
> +
> +       if (!buf->non_coherent_mem)
> +               return 0;
> +
> +       if (buf->vaddr)
> +               invalidate_kernel_vmap_range(buf->vaddr, buf->size);
> +
> +       dma_sync_sgtable_for_cpu(buf->dev, sgt, buf->dma_dir);
> +
>         return 0;
>  }
>
> @@ -434,6 +445,17 @@ static int
>  vb2_dc_dmabuf_ops_end_cpu_access(struct dma_buf *dbuf,
>                                  enum dma_data_direction direction)
>  {
> +       struct vb2_dc_buf *buf = dbuf->priv;
> +       struct sg_table *sgt = buf->dma_sgt;
> +
> +       if (!buf->non_coherent_mem)
> +               return 0;
> +
> +       if (buf->vaddr)
> +               flush_kernel_vmap_range(buf->vaddr, buf->size);
> +
> +       dma_sync_sgtable_for_device(buf->dev, sgt, buf->dma_dir);
> +
>         return 0;
>  }

I took some time (over)thinking the kernel vmap range synchronization,
because these functions can be called both from the kernel space using
respective dma_buf_*() kAPI and also from the user space using the
DMA_BUF_SYNC IOCTLs, so we could in theory have the multiple
invocations racing with each other, but then I realized that we don't
really provide any guarantees for concurrent writes and reads from the
CPU, so I believe this should work fine. Sorry for the delay.

Acked-by: Tomasz Figa <tfiga@chromium.org>

Let me add @Christoph Hellwig and @Robin Murphy just in case I'm wrong
on that, though... Hans, let's give them some time to take a look
before applying this.

Best regards,
Tomasz


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 1/2] media: videobuf2: Fix dmabuf cache sync/flush in dma-contig
  2025-01-28 20:35 ` [PATCH v3 1/2] media: videobuf2: Fix dmabuf cache sync/flush in dma-contig Mikhail Rudenko
  2025-02-07 16:22   ` Tomasz Figa
@ 2025-02-28  3:30   ` Nicolas Dufresne
  2025-02-28 10:25     ` Tomasz Figa
  1 sibling, 1 reply; 7+ messages in thread
From: Nicolas Dufresne @ 2025-02-28  3:30 UTC (permalink / raw)
  To: Mikhail Rudenko, Dafna Hirschfeld, Laurent Pinchart,
	Mauro Carvalho Chehab, Heiko Stuebner, Tomasz Figa,
	Marek Szyprowski, Hans Verkuil, Sergey Senozhatsky
  Cc: linux-media, linux-rockchip, linux-arm-kernel, linux-kernel,
	Mauro Carvalho Chehab, stable

Le mardi 28 janvier 2025 à 23:35 +0300, Mikhail Rudenko a écrit :
> When support for V4L2_FLAG_MEMORY_NON_CONSISTENT was removed in
> commit 129134e5415d ("media: media/v4l2: remove
> V4L2_FLAG_MEMORY_NON_CONSISTENT flag"),
> vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions were made
> no-ops. Later, when support for V4L2_MEMORY_FLAG_NON_COHERENT was
> introduced in commit c0acf9cfeee0 ("media: videobuf2: handle
> V4L2_MEMORY_FLAG_NON_COHERENT flag"), the above functions remained
> no-ops, making cache maintenance for non-coherent dmabufs allocated
> by
> dma-contig impossible.
> 
> Fix this by reintroducing dma_sync_sgtable_for_{cpu,device} and
> {flush,invalidate}_kernel_vmap_range calls to
> vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions for non-coherent
> buffers.
> 
> Fixes: c0acf9cfeee0 ("media: videobuf2: handle
> V4L2_MEMORY_FLAG_NON_COHERENT flag")
> Cc: stable@vger.kernel.org
> Signed-off-by: Mikhail Rudenko <mike.rudenko@gmail.com>
> ---
>  .../media/common/videobuf2/videobuf2-dma-contig.c  | 22
> ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> index
> bb0b7fa67b539aa73ad5ccf3c3bc318e26f8a4cb..146d7997a0da5989fb081a6f28c
> e0641fe726e63 100644
> --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> @@ -427,6 +427,17 @@ static int
>  vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf,
>  				   enum dma_data_direction
> direction)
>  {
> +	struct vb2_dc_buf *buf = dbuf->priv;
> +	struct sg_table *sgt = buf->dma_sgt;
> +
> +	if (!buf->non_coherent_mem)
> +		return 0;
> +
> +	if (buf->vaddr)
> +		invalidate_kernel_vmap_range(buf->vaddr, buf->size);

Am I correct that this is mostly to prevent the kernel from reading
back old data from the cache after an application or other driver did
CPU writes ? If so, can't we restrict that to DMA_TO_DEVICE and
DMA_BIDIRECTIONAL ?

As for pending kernel writes, they should have been flushed before the
buffer is made available for dequeue. And any access while a buffer is
queued is concurrent access, which is expected to have undefined
behaviour.

> +
> +	dma_sync_sgtable_for_cpu(buf->dev, sgt, buf->dma_dir);

Isn't there a link to make between buf->dma_dir and direcction before
calling this ? Also, shouldn't we use direction insead of buf->dma_dir
to possibly limit the scope ?

> +
>  	return 0;
>  }
>  
> @@ -434,6 +445,17 @@ static int
>  vb2_dc_dmabuf_ops_end_cpu_access(struct dma_buf *dbuf,
>  				 enum dma_data_direction direction)
>  {
> +	struct vb2_dc_buf *buf = dbuf->priv;
> +	struct sg_table *sgt = buf->dma_sgt;
> +
> +	if (!buf->non_coherent_mem)
> +		return 0;
> +
> +	if (buf->vaddr)
> +		flush_kernel_vmap_range(buf->vaddr, buf->size);
> +
> +	dma_sync_sgtable_for_device(buf->dev, sgt, buf->dma_dir);

Similar questions for the end_cpu_access implementation.

Nicolas

> +
>  	return 0;
>  }
>  
> 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 1/2] media: videobuf2: Fix dmabuf cache sync/flush in dma-contig
  2025-02-28  3:30   ` Nicolas Dufresne
@ 2025-02-28 10:25     ` Tomasz Figa
  2025-02-28 11:41       ` Mikhail Rudenko
  0 siblings, 1 reply; 7+ messages in thread
From: Tomasz Figa @ 2025-02-28 10:25 UTC (permalink / raw)
  To: Nicolas Dufresne
  Cc: Mikhail Rudenko, Dafna Hirschfeld, Laurent Pinchart,
	Mauro Carvalho Chehab, Heiko Stuebner, Marek Szyprowski,
	Hans Verkuil, Sergey Senozhatsky, linux-media, linux-rockchip,
	linux-arm-kernel, linux-kernel, Mauro Carvalho Chehab, stable

On Fri, Feb 28, 2025 at 12:30 PM Nicolas Dufresne <nicolas@ndufresne.ca> wrote:
>
> Le mardi 28 janvier 2025 à 23:35 +0300, Mikhail Rudenko a écrit :
> > When support for V4L2_FLAG_MEMORY_NON_CONSISTENT was removed in
> > commit 129134e5415d ("media: media/v4l2: remove
> > V4L2_FLAG_MEMORY_NON_CONSISTENT flag"),
> > vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions were made
> > no-ops. Later, when support for V4L2_MEMORY_FLAG_NON_COHERENT was
> > introduced in commit c0acf9cfeee0 ("media: videobuf2: handle
> > V4L2_MEMORY_FLAG_NON_COHERENT flag"), the above functions remained
> > no-ops, making cache maintenance for non-coherent dmabufs allocated
> > by
> > dma-contig impossible.
> >
> > Fix this by reintroducing dma_sync_sgtable_for_{cpu,device} and
> > {flush,invalidate}_kernel_vmap_range calls to
> > vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions for non-coherent
> > buffers.
> >
> > Fixes: c0acf9cfeee0 ("media: videobuf2: handle
> > V4L2_MEMORY_FLAG_NON_COHERENT flag")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Mikhail Rudenko <mike.rudenko@gmail.com>
> > ---
> >  .../media/common/videobuf2/videobuf2-dma-contig.c  | 22
> > ++++++++++++++++++++++
> >  1 file changed, 22 insertions(+)
> >
> > diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> > b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> > index
> > bb0b7fa67b539aa73ad5ccf3c3bc318e26f8a4cb..146d7997a0da5989fb081a6f28c
> > e0641fe726e63 100644
> > --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> > +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> > @@ -427,6 +427,17 @@ static int
> >  vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf,
> >                                  enum dma_data_direction
> > direction)
> >  {
> > +     struct vb2_dc_buf *buf = dbuf->priv;
> > +     struct sg_table *sgt = buf->dma_sgt;
> > +
> > +     if (!buf->non_coherent_mem)
> > +             return 0;
> > +
> > +     if (buf->vaddr)
> > +             invalidate_kernel_vmap_range(buf->vaddr, buf->size);
>
> Am I correct that this is mostly to prevent the kernel from reading
> back old data from the cache after an application or other driver did
> CPU writes ? If so, can't we restrict that to DMA_TO_DEVICE and
> DMA_BIDIRECTIONAL ?

Note that this function must also synchronize between the user-space
and kernel mappings, where the DMA direction doesn't really matter.
Also it's unlikely for it to be called when not needed - why would one
begin a CPU access before the DMA, when the DMA is FROM_DEVICE?

>
> As for pending kernel writes, they should have been flushed before the
> buffer is made available for dequeue.

There is no implicit flushing for imported DMA-bufs. All the flushing
needs to be executed directly by the CPU accessors by surrounding the
access with begin and end CPU access, be it in the kernel or
userspace.

> And any access while a buffer is
> queued is concurrent access, which is expected to have undefined
> behaviour.
>

Correct.

> > +
> > +     dma_sync_sgtable_for_cpu(buf->dev, sgt, buf->dma_dir);
>
> Isn't there a link to make between buf->dma_dir and direcction before
> calling this ? Also, shouldn't we use direction insead of buf->dma_dir
> to possibly limit the scope ?

Oh, yes, that's a good catch. It should be |direction| passed here and
not |buf->dma_dir|, since the former determines what CPU access will
be done.

>
> > +
> >       return 0;
> >  }
> >
> > @@ -434,6 +445,17 @@ static int
> >  vb2_dc_dmabuf_ops_end_cpu_access(struct dma_buf *dbuf,
> >                                enum dma_data_direction direction)
> >  {
> > +     struct vb2_dc_buf *buf = dbuf->priv;
> > +     struct sg_table *sgt = buf->dma_sgt;
> > +
> > +     if (!buf->non_coherent_mem)
> > +             return 0;
> > +
> > +     if (buf->vaddr)
> > +             flush_kernel_vmap_range(buf->vaddr, buf->size);
> > +
> > +     dma_sync_sgtable_for_device(buf->dev, sgt, buf->dma_dir);
>
> Similar questions for the end_cpu_access implementation.

Yeah, same here.

>
> Nicolas
>
> > +
> >       return 0;
> >  }
> >
> >
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 1/2] media: videobuf2: Fix dmabuf cache sync/flush in dma-contig
  2025-02-28 10:25     ` Tomasz Figa
@ 2025-02-28 11:41       ` Mikhail Rudenko
  0 siblings, 0 replies; 7+ messages in thread
From: Mikhail Rudenko @ 2025-02-28 11:41 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Nicolas Dufresne, Dafna Hirschfeld, Laurent Pinchart,
	Mauro Carvalho Chehab, Heiko Stuebner, Marek Szyprowski,
	Hans Verkuil, Sergey Senozhatsky, linux-media, linux-rockchip,
	linux-arm-kernel, linux-kernel, Mauro Carvalho Chehab, stable


On 2025-02-28 at 19:25 +09, Tomasz Figa <tfiga@chromium.org> wrote:

> On Fri, Feb 28, 2025 at 12:30 PM Nicolas Dufresne <nicolas@ndufresne.ca> wrote:
>>
>> Le mardi 28 janvier 2025 à 23:35 +0300, Mikhail Rudenko a écrit :
>> > When support for V4L2_FLAG_MEMORY_NON_CONSISTENT was removed in
>> > commit 129134e5415d ("media: media/v4l2: remove
>> > V4L2_FLAG_MEMORY_NON_CONSISTENT flag"),
>> > vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions were made
>> > no-ops. Later, when support for V4L2_MEMORY_FLAG_NON_COHERENT was
>> > introduced in commit c0acf9cfeee0 ("media: videobuf2: handle
>> > V4L2_MEMORY_FLAG_NON_COHERENT flag"), the above functions remained
>> > no-ops, making cache maintenance for non-coherent dmabufs allocated
>> > by
>> > dma-contig impossible.
>> >
>> > Fix this by reintroducing dma_sync_sgtable_for_{cpu,device} and
>> > {flush,invalidate}_kernel_vmap_range calls to
>> > vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions for non-coherent
>> > buffers.
>> >
>> > Fixes: c0acf9cfeee0 ("media: videobuf2: handle
>> > V4L2_MEMORY_FLAG_NON_COHERENT flag")
>> > Cc: stable@vger.kernel.org
>> > Signed-off-by: Mikhail Rudenko <mike.rudenko@gmail.com>
>> > ---
>> >  .../media/common/videobuf2/videobuf2-dma-contig.c  | 22
>> > ++++++++++++++++++++++
>> >  1 file changed, 22 insertions(+)
>> >
>> > diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
>> > b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
>> > index
>> > bb0b7fa67b539aa73ad5ccf3c3bc318e26f8a4cb..146d7997a0da5989fb081a6f28c
>> > e0641fe726e63 100644
>> > --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
>> > +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
>> > @@ -427,6 +427,17 @@ static int
>> >  vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf,
>> >                                  enum dma_data_direction
>> > direction)
>> >  {
>> > +     struct vb2_dc_buf *buf = dbuf->priv;
>> > +     struct sg_table *sgt = buf->dma_sgt;
>> > +
>> > +     if (!buf->non_coherent_mem)
>> > +             return 0;
>> > +
>> > +     if (buf->vaddr)
>> > +             invalidate_kernel_vmap_range(buf->vaddr, buf->size);
>>
>> Am I correct that this is mostly to prevent the kernel from reading
>> back old data from the cache after an application or other driver did
>> CPU writes ? If so, can't we restrict that to DMA_TO_DEVICE and
>> DMA_BIDIRECTIONAL ?
>
> Note that this function must also synchronize between the user-space
> and kernel mappings, where the DMA direction doesn't really matter.
> Also it's unlikely for it to be called when not needed - why would one
> begin a CPU access before the DMA, when the DMA is FROM_DEVICE?
>
>>
>> As for pending kernel writes, they should have been flushed before the
>> buffer is made available for dequeue.
>
> There is no implicit flushing for imported DMA-bufs. All the flushing
> needs to be executed directly by the CPU accessors by surrounding the
> access with begin and end CPU access, be it in the kernel or
> userspace.
>
>> And any access while a buffer is
>> queued is concurrent access, which is expected to have undefined
>> behaviour.
>>
>
> Correct.
>
>> > +
>> > +     dma_sync_sgtable_for_cpu(buf->dev, sgt, buf->dma_dir);
>>
>> Isn't there a link to make between buf->dma_dir and direcction before
>> calling this ? Also, shouldn't we use direction insead of buf->dma_dir
>> to possibly limit the scope ?
>
> Oh, yes, that's a good catch.

Indeed, will fix this in v4.

> It should be |direction| passed here and
> not |buf->dma_dir|, since the former determines what CPU access will
> be done.
>
>>
>> > +
>> >       return 0;
>> >  }
>> >
>> > @@ -434,6 +445,17 @@ static int
>> >  vb2_dc_dmabuf_ops_end_cpu_access(struct dma_buf *dbuf,
>> >                                enum dma_data_direction direction)
>> >  {
>> > +     struct vb2_dc_buf *buf = dbuf->priv;
>> > +     struct sg_table *sgt = buf->dma_sgt;
>> > +
>> > +     if (!buf->non_coherent_mem)
>> > +             return 0;
>> > +
>> > +     if (buf->vaddr)
>> > +             flush_kernel_vmap_range(buf->vaddr, buf->size);
>> > +
>> > +     dma_sync_sgtable_for_device(buf->dev, sgt, buf->dma_dir);
>>
>> Similar questions for the end_cpu_access implementation.
>
> Yeah, same here.

Noted, will fix.

>>
>> Nicolas
>>
>> > +
>> >       return 0;
>> >  }
>> >
>> >
>>


--
Best regards,
Mikhail Rudenko


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-02-28 12:14 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-28 20:35 [PATCH v3 0/2] Allow non-coherent video capture buffers on Rockchip ISP V1 Mikhail Rudenko
2025-01-28 20:35 ` [PATCH v3 1/2] media: videobuf2: Fix dmabuf cache sync/flush in dma-contig Mikhail Rudenko
2025-02-07 16:22   ` Tomasz Figa
2025-02-28  3:30   ` Nicolas Dufresne
2025-02-28 10:25     ` Tomasz Figa
2025-02-28 11:41       ` Mikhail Rudenko
2025-01-28 20:35 ` [PATCH v3 2/2] media: rkisp1: Allow non-coherent video capture buffers Mikhail Rudenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).