* [PATCH v4 0/2] Allow non-coherent video capture buffers on Rockchip ISP V1
@ 2025-03-03 11:40 Mikhail Rudenko
2025-03-03 11:40 ` [PATCH v4 1/2] media: videobuf2: Fix dmabuf cache sync/flush in dma-contig Mikhail Rudenko
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Mikhail Rudenko @ 2025-03-03 11:40 UTC (permalink / raw)
To: Dafna Hirschfeld, Laurent Pinchart, Mauro Carvalho Chehab,
Heiko Stuebner, Tomasz Figa, Marek Szyprowski, Hans Verkuil,
Sergey Senozhatsky
Cc: linux-media, linux-rockchip, linux-arm-kernel, linux-kernel,
Mauro Carvalho Chehab, Mikhail Rudenko, stable
This small series adds support for non-coherent video capture buffers
on Rockchip ISP V1. Patch 1 fixes cache management for dmabuf's
allocated by dma-contig allocator. Patch 2 allows non-coherent
allocations on the rkisp1 capture queue. Some timing measurements are
provided in the commit message of patch 2.
Signed-off-by: Mikhail Rudenko <mike.rudenko@gmail.com>
---
Changes in v4:
- rebase to media/next
- use `direction` instead of `buf->dma_dir` in dma_sync_sgtable_*
- Link to v3: https://lore.kernel.org/r/20250128-b4-rkisp-noncoherent-v3-0-baf39c997d2a@gmail.com
Changes in v3:
- ignore skip_cache_sync_* flags in vb2_dc_dmabuf_ops_{begin,end}_cpu_access
- invalidate/flush kernel mappings as appropriate if they exist
- use dma_sync_sgtable_* instead of dma_sync_sg_*
- Link to v2: https://lore.kernel.org/r/20250115-b4-rkisp-noncoherent-v2-0-0853e1a24012@gmail.com
Changes in v2:
- Fix vb2_dc_dmabuf_ops_{begin,end}_cpu_access() for non-coherent buffers.
- Add cache management timing information to patch 2 commit message.
- Link to v1: https://lore.kernel.org/r/20250102-b4-rkisp-noncoherent-v1-1-bba164f7132c@gmail.com
---
Mikhail Rudenko (2):
media: videobuf2: Fix dmabuf cache sync/flush in dma-contig
media: rkisp1: Allow non-coherent video capture buffers
.../media/common/videobuf2/videobuf2-dma-contig.c | 22 ++++++++++++++++++++++
.../platform/rockchip/rkisp1/rkisp1-capture.c | 1 +
2 files changed, 23 insertions(+)
---
base-commit: b2c4bf0c102084e77ed1b12090d77a76469a6814
change-id: 20241231-b4-rkisp-noncoherent-ad6e7c7a68ba
Best regards,
--
Mikhail Rudenko <mike.rudenko@gmail.com>
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH v4 1/2] media: videobuf2: Fix dmabuf cache sync/flush in dma-contig 2025-03-03 11:40 [PATCH v4 0/2] Allow non-coherent video capture buffers on Rockchip ISP V1 Mikhail Rudenko @ 2025-03-03 11:40 ` Mikhail Rudenko 2025-03-03 15:24 ` Nicolas Dufresne 2025-03-03 11:40 ` [PATCH v4 2/2] media: rkisp1: Allow non-coherent video capture buffers Mikhail Rudenko 2026-03-23 16:03 ` [PATCH v4 0/2] Allow non-coherent video capture buffers on Rockchip ISP V1 Jacopo Mondi 2 siblings, 1 reply; 10+ messages in thread From: Mikhail Rudenko @ 2025-03-03 11:40 UTC (permalink / raw) To: Dafna Hirschfeld, Laurent Pinchart, Mauro Carvalho Chehab, Heiko Stuebner, Tomasz Figa, Marek Szyprowski, Hans Verkuil, Sergey Senozhatsky Cc: linux-media, linux-rockchip, linux-arm-kernel, linux-kernel, Mauro Carvalho Chehab, Mikhail Rudenko, stable When support for V4L2_FLAG_MEMORY_NON_CONSISTENT was removed in commit 129134e5415d ("media: media/v4l2: remove V4L2_FLAG_MEMORY_NON_CONSISTENT flag"), vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions were made no-ops. Later, when support for V4L2_MEMORY_FLAG_NON_COHERENT was introduced in commit c0acf9cfeee0 ("media: videobuf2: handle V4L2_MEMORY_FLAG_NON_COHERENT flag"), the above functions remained no-ops, making cache maintenance for non-coherent dmabufs allocated by dma-contig impossible. Fix this by reintroducing dma_sync_sgtable_for_{cpu,device} and {flush,invalidate}_kernel_vmap_range calls to vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions for non-coherent buffers. Fixes: c0acf9cfeee0 ("media: videobuf2: handle V4L2_MEMORY_FLAG_NON_COHERENT flag") Cc: stable@vger.kernel.org Signed-off-by: Mikhail Rudenko <mike.rudenko@gmail.com> --- .../media/common/videobuf2/videobuf2-dma-contig.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c index a13ec569c82f6da2d977222b94af32e74c6c6c82..d41095fe5bd21faf815d6b035d7bc888a84a95d5 100644 --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c @@ -427,6 +427,17 @@ static int vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf, enum dma_data_direction direction) { + struct vb2_dc_buf *buf = dbuf->priv; + struct sg_table *sgt = buf->dma_sgt; + + if (!buf->non_coherent_mem) + return 0; + + if (buf->vaddr) + invalidate_kernel_vmap_range(buf->vaddr, buf->size); + + dma_sync_sgtable_for_cpu(buf->dev, sgt, direction); + return 0; } @@ -434,6 +445,17 @@ static int vb2_dc_dmabuf_ops_end_cpu_access(struct dma_buf *dbuf, enum dma_data_direction direction) { + struct vb2_dc_buf *buf = dbuf->priv; + struct sg_table *sgt = buf->dma_sgt; + + if (!buf->non_coherent_mem) + return 0; + + if (buf->vaddr) + flush_kernel_vmap_range(buf->vaddr, buf->size); + + dma_sync_sgtable_for_device(buf->dev, sgt, direction); + return 0; } -- 2.48.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v4 1/2] media: videobuf2: Fix dmabuf cache sync/flush in dma-contig 2025-03-03 11:40 ` [PATCH v4 1/2] media: videobuf2: Fix dmabuf cache sync/flush in dma-contig Mikhail Rudenko @ 2025-03-03 15:24 ` Nicolas Dufresne 2025-03-05 7:40 ` Mikhail Rudenko ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Nicolas Dufresne @ 2025-03-03 15:24 UTC (permalink / raw) To: Mikhail Rudenko, Dafna Hirschfeld, Laurent Pinchart, Mauro Carvalho Chehab, Heiko Stuebner, Tomasz Figa, Marek Szyprowski, Hans Verkuil, Sergey Senozhatsky Cc: linux-media, linux-rockchip, linux-arm-kernel, linux-kernel, Mauro Carvalho Chehab, stable Hi Mikhail, Le lundi 03 mars 2025 à 14:40 +0300, Mikhail Rudenko a écrit : > When support for V4L2_FLAG_MEMORY_NON_CONSISTENT was removed in > commit 129134e5415d ("media: media/v4l2: remove > V4L2_FLAG_MEMORY_NON_CONSISTENT flag"), > vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions were made > no-ops. Later, when support for V4L2_MEMORY_FLAG_NON_COHERENT was > introduced in commit c0acf9cfeee0 ("media: videobuf2: handle > V4L2_MEMORY_FLAG_NON_COHERENT flag"), the above functions remained > no-ops, making cache maintenance for non-coherent dmabufs allocated > by > dma-contig impossible. > > Fix this by reintroducing dma_sync_sgtable_for_{cpu,device} and > {flush,invalidate}_kernel_vmap_range calls to > vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions for non-coherent > buffers. > > Fixes: c0acf9cfeee0 ("media: videobuf2: handle > V4L2_MEMORY_FLAG_NON_COHERENT flag") > Cc: stable@vger.kernel.org > Signed-off-by: Mikhail Rudenko <mike.rudenko@gmail.com> > --- > .../media/common/videobuf2/videobuf2-dma-contig.c | 22 > ++++++++++++++++++++++ > 1 file changed, 22 insertions(+) > > diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c > b/drivers/media/common/videobuf2/videobuf2-dma-contig.c > index > a13ec569c82f6da2d977222b94af32e74c6c6c82..d41095fe5bd21faf815d6b035d7 > bc888a84a95d5 100644 > --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c > +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c > @@ -427,6 +427,17 @@ static int > vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf, > enum dma_data_direction > direction) > { > + struct vb2_dc_buf *buf = dbuf->priv; > + struct sg_table *sgt = buf->dma_sgt; > + > + if (!buf->non_coherent_mem) > + return 0; > + > + if (buf->vaddr) > + invalidate_kernel_vmap_range(buf->vaddr, buf->size); What would make me a lot more confortable with this change is if you enable kernel mappings for one test. This will ensure you cover the call to "invalidate" in your testing. I'd like to know about the performance impact. With this implementation it should be identical to the VB2 one. What I was trying to say in previous comments, is that my impression is that we can skip this for CPU read access, since we don't guaranty concurrent access anyway. Both address space can keep their cache in that case. Though, I see RKISP does not use kernel mapping plus I'm not reporting a bug, but checking if we should leave a comment for possible users of kernel mapping in the future ? > + > + dma_sync_sgtable_for_cpu(buf->dev, sgt, direction); > + > return 0; > } > > @@ -434,6 +445,17 @@ static int > vb2_dc_dmabuf_ops_end_cpu_access(struct dma_buf *dbuf, > enum dma_data_direction direction) > { > + struct vb2_dc_buf *buf = dbuf->priv; > + struct sg_table *sgt = buf->dma_sgt; > + > + if (!buf->non_coherent_mem) > + return 0; > + > + if (buf->vaddr) > + flush_kernel_vmap_range(buf->vaddr, buf->size); > + > + dma_sync_sgtable_for_device(buf->dev, sgt, direction); > + > return 0; > } > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4 1/2] media: videobuf2: Fix dmabuf cache sync/flush in dma-contig 2025-03-03 15:24 ` Nicolas Dufresne @ 2025-03-05 7:40 ` Mikhail Rudenko 2025-03-05 8:12 ` Tomasz Figa 2025-03-09 20:18 ` Mikhail Rudenko 2 siblings, 0 replies; 10+ messages in thread From: Mikhail Rudenko @ 2025-03-05 7:40 UTC (permalink / raw) To: Nicolas Dufresne Cc: Dafna Hirschfeld, Laurent Pinchart, Mauro Carvalho Chehab, Heiko Stuebner, Tomasz Figa, Marek Szyprowski, Hans Verkuil, Sergey Senozhatsky, linux-media, linux-rockchip, linux-arm-kernel, linux-kernel, Mauro Carvalho Chehab, stable Hi Nicolas, On 2025-03-03 at 10:24 -05, Nicolas Dufresne <nicolas@ndufresne.ca> wrote: > Hi Mikhail, > > Le lundi 03 mars 2025 à 14:40 +0300, Mikhail Rudenko a écrit : >> When support for V4L2_FLAG_MEMORY_NON_CONSISTENT was removed in >> commit 129134e5415d ("media: media/v4l2: remove >> V4L2_FLAG_MEMORY_NON_CONSISTENT flag"), >> vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions were made >> no-ops. Later, when support for V4L2_MEMORY_FLAG_NON_COHERENT was >> introduced in commit c0acf9cfeee0 ("media: videobuf2: handle >> V4L2_MEMORY_FLAG_NON_COHERENT flag"), the above functions remained >> no-ops, making cache maintenance for non-coherent dmabufs allocated >> by >> dma-contig impossible. >> >> Fix this by reintroducing dma_sync_sgtable_for_{cpu,device} and >> {flush,invalidate}_kernel_vmap_range calls to >> vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions for non-coherent >> buffers. >> >> Fixes: c0acf9cfeee0 ("media: videobuf2: handle >> V4L2_MEMORY_FLAG_NON_COHERENT flag") >> Cc: stable@vger.kernel.org >> Signed-off-by: Mikhail Rudenko <mike.rudenko@gmail.com> >> --- >> .../media/common/videobuf2/videobuf2-dma-contig.c | 22 >> ++++++++++++++++++++++ >> 1 file changed, 22 insertions(+) >> >> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c >> b/drivers/media/common/videobuf2/videobuf2-dma-contig.c >> index >> a13ec569c82f6da2d977222b94af32e74c6c6c82..d41095fe5bd21faf815d6b035d7 >> bc888a84a95d5 100644 >> --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c >> +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c >> @@ -427,6 +427,17 @@ static int >> vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf, >> enum dma_data_direction >> direction) >> { >> + struct vb2_dc_buf *buf = dbuf->priv; >> + struct sg_table *sgt = buf->dma_sgt; >> + >> + if (!buf->non_coherent_mem) >> + return 0; >> + >> + if (buf->vaddr) >> + invalidate_kernel_vmap_range(buf->vaddr, buf->size); > > What would make me a lot more confortable with this change is if you > enable kernel mappings for one test. This will ensure you cover the > call to "invalidate" in your testing. I'd like to know about the > performance impact. With this implementation it should be identical to > the VB2 one. I'll enable kernel mappings and rerun my tests later this week. > What I was trying to say in previous comments, is that my impression is > that we can skip this for CPU read access, since we don't guaranty > concurrent access anyway. Both address space can keep their cache in > that case. Though, I see RKISP does not use kernel mapping plus I'm not > reporting a bug, but checking if we should leave a comment for possible > users of kernel mapping in the future ? I trust Tomasz here, I'd wait for his comment on v4. >> + >> + dma_sync_sgtable_for_cpu(buf->dev, sgt, direction); >> + >> return 0; >> } >> >> @@ -434,6 +445,17 @@ static int >> vb2_dc_dmabuf_ops_end_cpu_access(struct dma_buf *dbuf, >> enum dma_data_direction direction) >> { >> + struct vb2_dc_buf *buf = dbuf->priv; >> + struct sg_table *sgt = buf->dma_sgt; >> + >> + if (!buf->non_coherent_mem) >> + return 0; >> + >> + if (buf->vaddr) >> + flush_kernel_vmap_range(buf->vaddr, buf->size); >> + >> + dma_sync_sgtable_for_device(buf->dev, sgt, direction); >> + >> return 0; >> } >> >> -- Best regards, Mikhail Rudenko ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4 1/2] media: videobuf2: Fix dmabuf cache sync/flush in dma-contig 2025-03-03 15:24 ` Nicolas Dufresne 2025-03-05 7:40 ` Mikhail Rudenko @ 2025-03-05 8:12 ` Tomasz Figa 2025-03-09 20:18 ` Mikhail Rudenko 2 siblings, 0 replies; 10+ messages in thread From: Tomasz Figa @ 2025-03-05 8:12 UTC (permalink / raw) To: Nicolas Dufresne Cc: Mikhail Rudenko, Dafna Hirschfeld, Laurent Pinchart, Mauro Carvalho Chehab, Heiko Stuebner, Marek Szyprowski, Hans Verkuil, Sergey Senozhatsky, linux-media, linux-rockchip, linux-arm-kernel, linux-kernel, Mauro Carvalho Chehab, stable On Tue, Mar 4, 2025 at 12:24 AM Nicolas Dufresne <nicolas@ndufresne.ca> wrote: > > Hi Mikhail, > > Le lundi 03 mars 2025 à 14:40 +0300, Mikhail Rudenko a écrit : > > When support for V4L2_FLAG_MEMORY_NON_CONSISTENT was removed in > > commit 129134e5415d ("media: media/v4l2: remove > > V4L2_FLAG_MEMORY_NON_CONSISTENT flag"), > > vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions were made > > no-ops. Later, when support for V4L2_MEMORY_FLAG_NON_COHERENT was > > introduced in commit c0acf9cfeee0 ("media: videobuf2: handle > > V4L2_MEMORY_FLAG_NON_COHERENT flag"), the above functions remained > > no-ops, making cache maintenance for non-coherent dmabufs allocated > > by > > dma-contig impossible. > > > > Fix this by reintroducing dma_sync_sgtable_for_{cpu,device} and > > {flush,invalidate}_kernel_vmap_range calls to > > vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions for non-coherent > > buffers. > > > > Fixes: c0acf9cfeee0 ("media: videobuf2: handle > > V4L2_MEMORY_FLAG_NON_COHERENT flag") > > Cc: stable@vger.kernel.org > > Signed-off-by: Mikhail Rudenko <mike.rudenko@gmail.com> > > --- > > .../media/common/videobuf2/videobuf2-dma-contig.c | 22 > > ++++++++++++++++++++++ > > 1 file changed, 22 insertions(+) > > > > diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c > > b/drivers/media/common/videobuf2/videobuf2-dma-contig.c > > index > > a13ec569c82f6da2d977222b94af32e74c6c6c82..d41095fe5bd21faf815d6b035d7 > > bc888a84a95d5 100644 > > --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c > > +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c > > @@ -427,6 +427,17 @@ static int > > vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf, > > enum dma_data_direction > > direction) > > { > > + struct vb2_dc_buf *buf = dbuf->priv; > > + struct sg_table *sgt = buf->dma_sgt; > > + > > + if (!buf->non_coherent_mem) > > + return 0; > > + > > + if (buf->vaddr) > > + invalidate_kernel_vmap_range(buf->vaddr, buf->size); > > What would make me a lot more confortable with this change is if you > enable kernel mappings for one test. This will ensure you cover the > call to "invalidate" in your testing. I'd like to know about the > performance impact. With this implementation it should be identical to > the VB2 one. I agree that it would be good to test that path as well. I wonder if we could somehow do it with one of the vi* drivers... > > What I was trying to say in previous comments, is that my impression is > that we can skip this for CPU read access, since we don't guaranty > concurrent access anyway. Both address space can keep their cache in > that case. Though, I see RKISP does not use kernel mapping plus I'm not > reporting a bug, but checking if we should leave a comment for possible > users of kernel mapping in the future ? We can't skip it for CPU read access, because it may be the first read after the DMA writing to the buffer, so we need to invalidate the caches. That said, on majority of systems this will be a no-op, because it only applies to VIVT and VIPT aliasing caches + only when the kernel mapping is actually used (the buf->vaddr mapping is created on demand). > > > + > > + dma_sync_sgtable_for_cpu(buf->dev, sgt, direction); > > + > > return 0; > > } > > > > @@ -434,6 +445,17 @@ static int > > vb2_dc_dmabuf_ops_end_cpu_access(struct dma_buf *dbuf, > > enum dma_data_direction direction) > > { > > + struct vb2_dc_buf *buf = dbuf->priv; > > + struct sg_table *sgt = buf->dma_sgt; > > + > > + if (!buf->non_coherent_mem) > > + return 0; > > + > > + if (buf->vaddr) > > + flush_kernel_vmap_range(buf->vaddr, buf->size); > > + > > + dma_sync_sgtable_for_device(buf->dev, sgt, direction); > > + > > return 0; > > } > > > > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4 1/2] media: videobuf2: Fix dmabuf cache sync/flush in dma-contig 2025-03-03 15:24 ` Nicolas Dufresne 2025-03-05 7:40 ` Mikhail Rudenko 2025-03-05 8:12 ` Tomasz Figa @ 2025-03-09 20:18 ` Mikhail Rudenko 2025-03-10 9:00 ` Tomasz Figa 2 siblings, 1 reply; 10+ messages in thread From: Mikhail Rudenko @ 2025-03-09 20:18 UTC (permalink / raw) To: Nicolas Dufresne Cc: Dafna Hirschfeld, Laurent Pinchart, Mauro Carvalho Chehab, Heiko Stuebner, Tomasz Figa, Marek Szyprowski, Hans Verkuil, Sergey Senozhatsky, linux-media, linux-rockchip, linux-arm-kernel, linux-kernel, Mauro Carvalho Chehab, stable Hi Nicolas, Tomasz, On 2025-03-03 at 10:24 -05, Nicolas Dufresne <nicolas@ndufresne.ca> wrote: > Hi Mikhail, > > Le lundi 03 mars 2025 à 14:40 +0300, Mikhail Rudenko a écrit : >> When support for V4L2_FLAG_MEMORY_NON_CONSISTENT was removed in >> commit 129134e5415d ("media: media/v4l2: remove >> V4L2_FLAG_MEMORY_NON_CONSISTENT flag"), >> vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions were made >> no-ops. Later, when support for V4L2_MEMORY_FLAG_NON_COHERENT was >> introduced in commit c0acf9cfeee0 ("media: videobuf2: handle >> V4L2_MEMORY_FLAG_NON_COHERENT flag"), the above functions remained >> no-ops, making cache maintenance for non-coherent dmabufs allocated >> by >> dma-contig impossible. >> >> Fix this by reintroducing dma_sync_sgtable_for_{cpu,device} and >> {flush,invalidate}_kernel_vmap_range calls to >> vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions for non-coherent >> buffers. >> >> Fixes: c0acf9cfeee0 ("media: videobuf2: handle >> V4L2_MEMORY_FLAG_NON_COHERENT flag") >> Cc: stable@vger.kernel.org >> Signed-off-by: Mikhail Rudenko <mike.rudenko@gmail.com> >> --- >> .../media/common/videobuf2/videobuf2-dma-contig.c | 22 >> ++++++++++++++++++++++ >> 1 file changed, 22 insertions(+) >> >> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c >> b/drivers/media/common/videobuf2/videobuf2-dma-contig.c >> index >> a13ec569c82f6da2d977222b94af32e74c6c6c82..d41095fe5bd21faf815d6b035d7 >> bc888a84a95d5 100644 >> --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c >> +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c >> @@ -427,6 +427,17 @@ static int >> vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf, >> enum dma_data_direction >> direction) >> { >> + struct vb2_dc_buf *buf = dbuf->priv; >> + struct sg_table *sgt = buf->dma_sgt; >> + >> + if (!buf->non_coherent_mem) >> + return 0; >> + >> + if (buf->vaddr) >> + invalidate_kernel_vmap_range(buf->vaddr, buf->size); > > What would make me a lot more confortable with this change is if you > enable kernel mappings for one test. This will ensure you cover the > call to "invalidate" in your testing. I'd like to know about the > performance impact. With this implementation it should be identical to > the VB2 one. > I have re-run my tests on RK3399, with 1280x720 XRGB capture buffers (1 plane, 3686400 bytes). Capture process was pinned to "big" cores running at fixed frequency of 1.8 GHz. Libcamera was modified to request buffers with V4L2_MEMORY_FLAG_NON_COHERENT flag. DMA_BUF_IOCTL_SYNC ioctls were used as appropriate. For kernel mapping effect test, vb2_plane_vaddr call was inserted into rkisp1_vb2_buf_init. The timings are as following: - memcpy coherent buffer: 7570 +/- 63 us - memcpy non-coherent buffer: 1120 +/- 34 us without kernel mapping: - ioctl(fd, DMA_BUF_IOCTL_SYNC, {DMA_BUF_SYNC_START|DMA_BUF_SYNC_READ}): 428 +/- 15 us - ioctl(fd, DMA_BUF_IOCTL_SYNC, {DMA_BUF_SYNC_END|DMA_BUF_SYNC_READ}): 422 +/- 28 us with kernel mapping: - ioctl(fd, DMA_BUF_IOCTL_SYNC, {DMA_BUF_SYNC_START|DMA_BUF_SYNC_READ}): 526 +/- 13 us - ioctl(fd, DMA_BUF_IOCTL_SYNC, {DMA_BUF_SYNC_END|DMA_BUF_SYNC_READ}): 519 +/- 38 us So, even with kernel mapping enabled, speedup is 7570 / (1120 + 526 + 519) = 3.5, and the use of noncoherent buffers is justified -- at least on this platform. -- Best regards, Mikhail Rudenko ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4 1/2] media: videobuf2: Fix dmabuf cache sync/flush in dma-contig 2025-03-09 20:18 ` Mikhail Rudenko @ 2025-03-10 9:00 ` Tomasz Figa 0 siblings, 0 replies; 10+ messages in thread From: Tomasz Figa @ 2025-03-10 9:00 UTC (permalink / raw) To: Mikhail Rudenko Cc: Nicolas Dufresne, Dafna Hirschfeld, Laurent Pinchart, Mauro Carvalho Chehab, Heiko Stuebner, Marek Szyprowski, Hans Verkuil, Sergey Senozhatsky, linux-media, linux-rockchip, linux-arm-kernel, linux-kernel, Mauro Carvalho Chehab, stable On Mon, Mar 10, 2025 at 5:52 PM Mikhail Rudenko <mike.rudenko@gmail.com> wrote: > > > Hi Nicolas, Tomasz, > > On 2025-03-03 at 10:24 -05, Nicolas Dufresne <nicolas@ndufresne.ca> wrote: > > > Hi Mikhail, > > > > Le lundi 03 mars 2025 à 14:40 +0300, Mikhail Rudenko a écrit : > >> When support for V4L2_FLAG_MEMORY_NON_CONSISTENT was removed in > >> commit 129134e5415d ("media: media/v4l2: remove > >> V4L2_FLAG_MEMORY_NON_CONSISTENT flag"), > >> vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions were made > >> no-ops. Later, when support for V4L2_MEMORY_FLAG_NON_COHERENT was > >> introduced in commit c0acf9cfeee0 ("media: videobuf2: handle > >> V4L2_MEMORY_FLAG_NON_COHERENT flag"), the above functions remained > >> no-ops, making cache maintenance for non-coherent dmabufs allocated > >> by > >> dma-contig impossible. > >> > >> Fix this by reintroducing dma_sync_sgtable_for_{cpu,device} and > >> {flush,invalidate}_kernel_vmap_range calls to > >> vb2_dc_dmabuf_ops_{begin,end}_cpu_access() functions for non-coherent > >> buffers. > >> > >> Fixes: c0acf9cfeee0 ("media: videobuf2: handle > >> V4L2_MEMORY_FLAG_NON_COHERENT flag") > >> Cc: stable@vger.kernel.org > >> Signed-off-by: Mikhail Rudenko <mike.rudenko@gmail.com> > >> --- > >> .../media/common/videobuf2/videobuf2-dma-contig.c | 22 > >> ++++++++++++++++++++++ > >> 1 file changed, 22 insertions(+) > >> > >> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c > >> b/drivers/media/common/videobuf2/videobuf2-dma-contig.c > >> index > >> a13ec569c82f6da2d977222b94af32e74c6c6c82..d41095fe5bd21faf815d6b035d7 > >> bc888a84a95d5 100644 > >> --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c > >> +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c > >> @@ -427,6 +427,17 @@ static int > >> vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf, > >> enum dma_data_direction > >> direction) > >> { > >> + struct vb2_dc_buf *buf = dbuf->priv; > >> + struct sg_table *sgt = buf->dma_sgt; > >> + > >> + if (!buf->non_coherent_mem) > >> + return 0; > >> + > >> + if (buf->vaddr) > >> + invalidate_kernel_vmap_range(buf->vaddr, buf->size); > > > > What would make me a lot more confortable with this change is if you > > enable kernel mappings for one test. This will ensure you cover the > > call to "invalidate" in your testing. I'd like to know about the > > performance impact. With this implementation it should be identical to > > the VB2 one. > > > > I have re-run my tests on RK3399, with 1280x720 XRGB capture buffers (1 > plane, 3686400 bytes). Capture process was pinned to "big" cores running > at fixed frequency of 1.8 GHz. Libcamera was modified to request buffers > with V4L2_MEMORY_FLAG_NON_COHERENT flag. DMA_BUF_IOCTL_SYNC ioctls were > used as appropriate. For kernel mapping effect test, vb2_plane_vaddr > call was inserted into rkisp1_vb2_buf_init. > > The timings are as following: > > - memcpy coherent buffer: 7570 +/- 63 us > - memcpy non-coherent buffer: 1120 +/- 34 us > > without kernel mapping: > > - ioctl(fd, DMA_BUF_IOCTL_SYNC, {DMA_BUF_SYNC_START|DMA_BUF_SYNC_READ}): 428 +/- 15 us > - ioctl(fd, DMA_BUF_IOCTL_SYNC, {DMA_BUF_SYNC_END|DMA_BUF_SYNC_READ}): 422 +/- 28 us > > with kernel mapping: > > - ioctl(fd, DMA_BUF_IOCTL_SYNC, {DMA_BUF_SYNC_START|DMA_BUF_SYNC_READ}): 526 +/- 13 us > - ioctl(fd, DMA_BUF_IOCTL_SYNC, {DMA_BUF_SYNC_END|DMA_BUF_SYNC_READ}): 519 +/- 38 us > > So, even with kernel mapping enabled, speedup is 7570 / (1120 + 526 + 519) = 3.5, > and the use of noncoherent buffers is justified -- at least on this platform. Thanks a lot for the additional testing. Acked-by: Tomasz Figa <tfiga@chromium.org> Best regards, Tomasz ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v4 2/2] media: rkisp1: Allow non-coherent video capture buffers 2025-03-03 11:40 [PATCH v4 0/2] Allow non-coherent video capture buffers on Rockchip ISP V1 Mikhail Rudenko 2025-03-03 11:40 ` [PATCH v4 1/2] media: videobuf2: Fix dmabuf cache sync/flush in dma-contig Mikhail Rudenko @ 2025-03-03 11:40 ` Mikhail Rudenko 2026-03-23 16:03 ` [PATCH v4 0/2] Allow non-coherent video capture buffers on Rockchip ISP V1 Jacopo Mondi 2 siblings, 0 replies; 10+ messages in thread From: Mikhail Rudenko @ 2025-03-03 11:40 UTC (permalink / raw) To: Dafna Hirschfeld, Laurent Pinchart, Mauro Carvalho Chehab, Heiko Stuebner, Tomasz Figa, Marek Szyprowski, Hans Verkuil, Sergey Senozhatsky Cc: linux-media, linux-rockchip, linux-arm-kernel, linux-kernel, Mauro Carvalho Chehab, Mikhail Rudenko Currently, the rkisp1 driver always uses coherent DMA allocations for video capture buffers. However, on some platforms, using non-coherent buffers can improve performance, especially when CPU processing of MMAP'ed video buffers is required. For example, on the Rockchip RK3399 running at maximum CPU frequency, the time to memcpy a frame from a 1280x720 XRGB32 MMAP'ed buffer to a malloc'ed userspace buffer decreases from 7.7 ms to 1.1 ms when using non-coherent DMA allocation. When doing cache management with DMA_BUF_IOCTL_SYNC, DMA_BUF_SYNC_START and DMA_BUF_SYNC_END operations take around 270 us each. This change allows userspace to request the allocation of non-coherent buffers. Note that the behavior for existing users will remain unchanged unless they explicitly set the V4L2_MEMORY_FLAG_NON_COHERENT flag when allocating buffers. Signed-off-by: Mikhail Rudenko <mike.rudenko@gmail.com> --- drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c index 6dcefd144d5abe358323e37ac6133c6134ac636e..c94f7d1d73a92646457a27da20726ec6f92e7717 100644 --- a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c +++ b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c @@ -1563,6 +1563,7 @@ static int rkisp1_register_capture(struct rkisp1_capture *cap) q->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC; q->lock = &node->vlock; q->dev = cap->rkisp1->dev; + q->allow_cache_hints = 1; ret = vb2_queue_init(q); if (ret) { dev_err(cap->rkisp1->dev, -- 2.48.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v4 0/2] Allow non-coherent video capture buffers on Rockchip ISP V1 2025-03-03 11:40 [PATCH v4 0/2] Allow non-coherent video capture buffers on Rockchip ISP V1 Mikhail Rudenko 2025-03-03 11:40 ` [PATCH v4 1/2] media: videobuf2: Fix dmabuf cache sync/flush in dma-contig Mikhail Rudenko 2025-03-03 11:40 ` [PATCH v4 2/2] media: rkisp1: Allow non-coherent video capture buffers Mikhail Rudenko @ 2026-03-23 16:03 ` Jacopo Mondi 2026-03-24 19:12 ` Mikhail Rudenko 2 siblings, 1 reply; 10+ messages in thread From: Jacopo Mondi @ 2026-03-23 16:03 UTC (permalink / raw) To: Mikhail Rudenko Cc: Dafna Hirschfeld, Laurent Pinchart, Mauro Carvalho Chehab, Heiko Stuebner, Tomasz Figa, Marek Szyprowski, Hans Verkuil, Sergey Senozhatsky, linux-media, linux-rockchip, linux-arm-kernel, linux-kernel, Mauro Carvalho Chehab, stable Hello On Mon, Mar 03, 2025 at 02:40:08PM +0300, Mikhail Rudenko wrote: > This small series adds support for non-coherent video capture buffers > on Rockchip ISP V1. Patch 1 fixes cache management for dmabuf's > allocated by dma-contig allocator. Patch 2 allows non-coherent > allocations on the rkisp1 capture queue. Some timing measurements are > provided in the commit message of patch 2. > > Signed-off-by: Mikhail Rudenko <mike.rudenko@gmail.com> I regularly get back to this series everytime I have to reason about the caching policies in vb2.. Is there any reason why it didn't get in ? > --- > Changes in v4: > - rebase to media/next > - use `direction` instead of `buf->dma_dir` in dma_sync_sgtable_* > - Link to v3: https://lore.kernel.org/r/20250128-b4-rkisp-noncoherent-v3-0-baf39c997d2a@gmail.com > > Changes in v3: > - ignore skip_cache_sync_* flags in vb2_dc_dmabuf_ops_{begin,end}_cpu_access > - invalidate/flush kernel mappings as appropriate if they exist > - use dma_sync_sgtable_* instead of dma_sync_sg_* > - Link to v2: https://lore.kernel.org/r/20250115-b4-rkisp-noncoherent-v2-0-0853e1a24012@gmail.com > > Changes in v2: > - Fix vb2_dc_dmabuf_ops_{begin,end}_cpu_access() for non-coherent buffers. > - Add cache management timing information to patch 2 commit message. > - Link to v1: https://lore.kernel.org/r/20250102-b4-rkisp-noncoherent-v1-1-bba164f7132c@gmail.com > > --- > Mikhail Rudenko (2): > media: videobuf2: Fix dmabuf cache sync/flush in dma-contig > media: rkisp1: Allow non-coherent video capture buffers > > .../media/common/videobuf2/videobuf2-dma-contig.c | 22 ++++++++++++++++++++++ > .../platform/rockchip/rkisp1/rkisp1-capture.c | 1 + > 2 files changed, 23 insertions(+) > --- > base-commit: b2c4bf0c102084e77ed1b12090d77a76469a6814 > change-id: 20241231-b4-rkisp-noncoherent-ad6e7c7a68ba > > Best regards, > -- > Mikhail Rudenko <mike.rudenko@gmail.com> > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4 0/2] Allow non-coherent video capture buffers on Rockchip ISP V1 2026-03-23 16:03 ` [PATCH v4 0/2] Allow non-coherent video capture buffers on Rockchip ISP V1 Jacopo Mondi @ 2026-03-24 19:12 ` Mikhail Rudenko 0 siblings, 0 replies; 10+ messages in thread From: Mikhail Rudenko @ 2026-03-24 19:12 UTC (permalink / raw) To: Jacopo Mondi Cc: Dafna Hirschfeld, Laurent Pinchart, Mauro Carvalho Chehab, Heiko Stuebner, Tomasz Figa, Marek Szyprowski, Hans Verkuil, Sergey Senozhatsky, linux-media, linux-rockchip, linux-arm-kernel, linux-kernel, Mauro Carvalho Chehab, stable Hi, Jacopo! On 2026-03-23 at 17:03 +01, Jacopo Mondi <jacopo.mondi@ideasonboard.com> wrote: > Hello > > On Mon, Mar 03, 2025 at 02:40:08PM +0300, Mikhail Rudenko wrote: >> This small series adds support for non-coherent video capture buffers >> on Rockchip ISP V1. Patch 1 fixes cache management for dmabuf's >> allocated by dma-contig allocator. Patch 2 allows non-coherent >> allocations on the rkisp1 capture queue. Some timing measurements are >> provided in the commit message of patch 2. >> >> Signed-off-by: Mikhail Rudenko <mike.rudenko@gmail.com> > > I regularly get back to this series everytime I have to reason about > the caching policies in vb2.. > > Is there any reason why it didn't get in ? My impression is that all the review comments were addressed, but these patches somehow fell through the cracks. I can rebase and post v5 if any maintainer is interested in picking it up. >> --- >> Changes in v4: >> - rebase to media/next >> - use `direction` instead of `buf->dma_dir` in dma_sync_sgtable_* >> - Link to v3: https://lore.kernel.org/r/20250128-b4-rkisp-noncoherent-v3-0-baf39c997d2a@gmail.com >> >> Changes in v3: >> - ignore skip_cache_sync_* flags in vb2_dc_dmabuf_ops_{begin,end}_cpu_access >> - invalidate/flush kernel mappings as appropriate if they exist >> - use dma_sync_sgtable_* instead of dma_sync_sg_* >> - Link to v2: https://lore.kernel.org/r/20250115-b4-rkisp-noncoherent-v2-0-0853e1a24012@gmail.com >> >> Changes in v2: >> - Fix vb2_dc_dmabuf_ops_{begin,end}_cpu_access() for non-coherent buffers. >> - Add cache management timing information to patch 2 commit message. >> - Link to v1: https://lore.kernel.org/r/20250102-b4-rkisp-noncoherent-v1-1-bba164f7132c@gmail.com >> >> --- >> Mikhail Rudenko (2): >> media: videobuf2: Fix dmabuf cache sync/flush in dma-contig >> media: rkisp1: Allow non-coherent video capture buffers >> >> .../media/common/videobuf2/videobuf2-dma-contig.c | 22 ++++++++++++++++++++++ >> .../platform/rockchip/rkisp1/rkisp1-capture.c | 1 + >> 2 files changed, 23 insertions(+) >> --- >> base-commit: b2c4bf0c102084e77ed1b12090d77a76469a6814 >> change-id: 20241231-b4-rkisp-noncoherent-ad6e7c7a68ba >> >> Best regards, >> -- >> Mikhail Rudenko <mike.rudenko@gmail.com> >> >> -- Best regards, Mikhail Rudenko ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-03-24 19:24 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-03-03 11:40 [PATCH v4 0/2] Allow non-coherent video capture buffers on Rockchip ISP V1 Mikhail Rudenko 2025-03-03 11:40 ` [PATCH v4 1/2] media: videobuf2: Fix dmabuf cache sync/flush in dma-contig Mikhail Rudenko 2025-03-03 15:24 ` Nicolas Dufresne 2025-03-05 7:40 ` Mikhail Rudenko 2025-03-05 8:12 ` Tomasz Figa 2025-03-09 20:18 ` Mikhail Rudenko 2025-03-10 9:00 ` Tomasz Figa 2025-03-03 11:40 ` [PATCH v4 2/2] media: rkisp1: Allow non-coherent video capture buffers Mikhail Rudenko 2026-03-23 16:03 ` [PATCH v4 0/2] Allow non-coherent video capture buffers on Rockchip ISP V1 Jacopo Mondi 2026-03-24 19:12 ` Mikhail Rudenko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox