* [PATCH] media: rkisp1: allow non-coherent video capture buffers @ 2025-01-02 15:35 Mikhail Rudenko 2025-01-03 15:23 ` Laurent Pinchart 0 siblings, 1 reply; 17+ messages in thread From: Mikhail Rudenko @ 2025-01-02 15:35 UTC (permalink / raw) To: Dafna Hirschfeld, Laurent Pinchart, Mauro Carvalho Chehab, Heiko Stuebner Cc: linux-media, linux-rockchip, linux-arm-kernel, linux-kernel, Mikhail Rudenko Currently, the rkisp1 driver always uses coherent DMA allocations for video capture buffers. However, on some platforms, using non-coherent buffers can improve performance, especially when CPU processing of MMAP'ed video buffers is required. For example, on the Rockchip RK3399 running at maximum CPU frequency, the time to memcpy a frame from a 1280x720 XRGB32 MMAP'ed buffer to a malloc'ed userspace buffer decreases from 7.7 ms to 1.1 ms when using non-coherent DMA allocation. CPU usage also decreases accordingly. This change allows userspace to request the allocation of non-coherent buffers. Note that the behavior for existing users will remain unchanged unless they explicitly set the V4L2_MEMORY_FLAG_NON_COHERENT flag when allocating buffers. Signed-off-by: Mikhail Rudenko <mike.rudenko@gmail.com> --- drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c index 6dcefd144d5abe358323e37ac6133c6134ac636e..c94f7d1d73a92646457a27da20726ec6f92e7717 100644 --- a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c +++ b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c @@ -1563,6 +1563,7 @@ static int rkisp1_register_capture(struct rkisp1_capture *cap) q->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC; q->lock = &node->vlock; q->dev = cap->rkisp1->dev; + q->allow_cache_hints = 1; ret = vb2_queue_init(q); if (ret) { dev_err(cap->rkisp1->dev, --- base-commit: 40ed9e9b2808beeb835bd0ed971fb364c285d39c change-id: 20241231-b4-rkisp-noncoherent-ad6e7c7a68ba Best regards, -- Mikhail Rudenko <mike.rudenko@gmail.com> ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] media: rkisp1: allow non-coherent video capture buffers 2025-01-02 15:35 [PATCH] media: rkisp1: allow non-coherent video capture buffers Mikhail Rudenko @ 2025-01-03 15:23 ` Laurent Pinchart 2025-01-14 16:00 ` Mikhail Rudenko 0 siblings, 1 reply; 17+ messages in thread From: Laurent Pinchart @ 2025-01-03 15:23 UTC (permalink / raw) To: Mikhail Rudenko Cc: Dafna Hirschfeld, Mauro Carvalho Chehab, Heiko Stuebner, linux-media, linux-rockchip, linux-arm-kernel, linux-kernel On Thu, Jan 02, 2025 at 06:35:00PM +0300, Mikhail Rudenko wrote: > Currently, the rkisp1 driver always uses coherent DMA allocations for > video capture buffers. However, on some platforms, using non-coherent > buffers can improve performance, especially when CPU processing of > MMAP'ed video buffers is required. > > For example, on the Rockchip RK3399 running at maximum CPU frequency, > the time to memcpy a frame from a 1280x720 XRGB32 MMAP'ed buffer to a > malloc'ed userspace buffer decreases from 7.7 ms to 1.1 ms when using > non-coherent DMA allocation. CPU usage also decreases accordingly. What's the time taken by the cache management operations ? > This change allows userspace to request the allocation of non-coherent > buffers. Note that the behavior for existing users will remain unchanged > unless they explicitly set the V4L2_MEMORY_FLAG_NON_COHERENT flag when > allocating buffers. > > Signed-off-by: Mikhail Rudenko <mike.rudenko@gmail.com> > --- > drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c > index 6dcefd144d5abe358323e37ac6133c6134ac636e..c94f7d1d73a92646457a27da20726ec6f92e7717 100644 > --- a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c > +++ b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c > @@ -1563,6 +1563,7 @@ static int rkisp1_register_capture(struct rkisp1_capture *cap) > q->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC; > q->lock = &node->vlock; > q->dev = cap->rkisp1->dev; > + q->allow_cache_hints = 1; > ret = vb2_queue_init(q); > if (ret) { > dev_err(cap->rkisp1->dev, > > --- > base-commit: 40ed9e9b2808beeb835bd0ed971fb364c285d39c > change-id: 20241231-b4-rkisp-noncoherent-ad6e7c7a68ba -- Regards, Laurent Pinchart ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] media: rkisp1: allow non-coherent video capture buffers 2025-01-03 15:23 ` Laurent Pinchart @ 2025-01-14 16:00 ` Mikhail Rudenko 2025-01-15 8:31 ` Tomasz Figa ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: Mikhail Rudenko @ 2025-01-14 16:00 UTC (permalink / raw) To: Laurent Pinchart Cc: Dafna Hirschfeld, Mauro Carvalho Chehab, Heiko Stuebner, linux-media, linux-rockchip, linux-arm-kernel, linux-kernel Hi Laurent, On 2025-01-03 at 17:23 +02, Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote: > On Thu, Jan 02, 2025 at 06:35:00PM +0300, Mikhail Rudenko wrote: >> Currently, the rkisp1 driver always uses coherent DMA allocations for >> video capture buffers. However, on some platforms, using non-coherent >> buffers can improve performance, especially when CPU processing of >> MMAP'ed video buffers is required. >> >> For example, on the Rockchip RK3399 running at maximum CPU frequency, >> the time to memcpy a frame from a 1280x720 XRGB32 MMAP'ed buffer to a >> malloc'ed userspace buffer decreases from 7.7 ms to 1.1 ms when using >> non-coherent DMA allocation. CPU usage also decreases accordingly. > > What's the time taken by the cache management operations ? Sorry for the late reply, your question turned out a little more interesting than I expected initially. :) When capturing using Yavta with MMAP buffers under the conditions mentioned in the commit message, ftrace gives 437.6 +- 1.1 us for dma_sync_sgtable_for_cpu and 409 +- 14 us for dma_sync_sgtable_for_device. Thus, it looks like using non-coherent buffers in this case is more CPU-efficient even when considering cache management overhead. When trying to do the same measurements with libcamera, I failed. In a typical libcamera use case when MMAP buffers are allocated from a device, exported as dmabufs and then used for capture on the same device with DMABUF memory type, cache management in kernel is skipped [1] [2]. Also, vb2_dc_dmabuf_ops_{begin,end}_cpu_access are no-ops [3], so DMA_BUF_IOCTL_SYNC from userspace does not work either. So it looks like to make this change really useful, the above issue of cache management for libcamera/DMABUF/videobuf2-dma-contig has to be solved. I'm not an expert in this area, so any advice is kindly welcome. :) [1] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-core.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n411 [2] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-core.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n829 [3] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-dma-contig.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n426 -- Best regards, Mikhail Rudenko ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] media: rkisp1: allow non-coherent video capture buffers 2025-01-14 16:00 ` Mikhail Rudenko @ 2025-01-15 8:31 ` Tomasz Figa 2025-01-15 13:24 ` Mikhail Rudenko 2025-01-15 19:13 ` Nicolas Dufresne 2025-02-27 17:05 ` Jacopo Mondi 2 siblings, 1 reply; 17+ messages in thread From: Tomasz Figa @ 2025-01-15 8:31 UTC (permalink / raw) To: Mikhail Rudenko Cc: Laurent Pinchart, Dafna Hirschfeld, Mauro Carvalho Chehab, Heiko Stuebner, linux-media, linux-rockchip, linux-arm-kernel, linux-kernel Hi Mikhail and Laurent, On Wed, Jan 15, 2025 at 2:07 AM Mikhail Rudenko <mike.rudenko@gmail.com> wrote: > > > Hi Laurent, > > On 2025-01-03 at 17:23 +02, Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote: > > > On Thu, Jan 02, 2025 at 06:35:00PM +0300, Mikhail Rudenko wrote: > >> Currently, the rkisp1 driver always uses coherent DMA allocations for > >> video capture buffers. However, on some platforms, using non-coherent > >> buffers can improve performance, especially when CPU processing of > >> MMAP'ed video buffers is required. > >> > >> For example, on the Rockchip RK3399 running at maximum CPU frequency, > >> the time to memcpy a frame from a 1280x720 XRGB32 MMAP'ed buffer to a > >> malloc'ed userspace buffer decreases from 7.7 ms to 1.1 ms when using > >> non-coherent DMA allocation. CPU usage also decreases accordingly. > > > > What's the time taken by the cache management operations ? > > Sorry for the late reply, your question turned out a little more > interesting than I expected initially. :) > > When capturing using Yavta with MMAP buffers under the conditions mentioned > in the commit message, ftrace gives 437.6 +- 1.1 us for > dma_sync_sgtable_for_cpu and 409 +- 14 us for > dma_sync_sgtable_for_device. Thus, it looks like using non-coherent > buffers in this case is more CPU-efficient even when considering cache > management overhead. > > When trying to do the same measurements with libcamera, I failed. In a > typical libcamera use case when MMAP buffers are allocated from a > device, exported as dmabufs and then used for capture on the same device > with DMABUF memory type, cache management in kernel is skipped [1] > [2]. Also, vb2_dc_dmabuf_ops_{begin,end}_cpu_access are no-ops [3], so > DMA_BUF_IOCTL_SYNC from userspace does not work either. Oops, so I believe this is a bug. When an MMAP buffer is allocated in the non-coherent mode, those ops should perform proper cache maintenance. Let me send a patch to fix this in a couple of days unless someone does it earlier. Best regards, Tomasz > > So it looks like to make this change really useful, the above issue of > cache management for libcamera/DMABUF/videobuf2-dma-contig has to be > solved. I'm not an expert in this area, so any advice is kindly welcome. :) > > [1] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-core.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n411 > [2] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-core.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n829 > [3] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-dma-contig.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n426 > > -- > Best regards, > Mikhail Rudenko > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] media: rkisp1: allow non-coherent video capture buffers 2025-01-15 8:31 ` Tomasz Figa @ 2025-01-15 13:24 ` Mikhail Rudenko 2025-01-15 14:46 ` Tomasz Figa 0 siblings, 1 reply; 17+ messages in thread From: Mikhail Rudenko @ 2025-01-15 13:24 UTC (permalink / raw) To: Tomasz Figa Cc: Laurent Pinchart, Dafna Hirschfeld, Mauro Carvalho Chehab, Heiko Stuebner, linux-media, linux-rockchip, linux-arm-kernel, linux-kernel Hi Tomasz, On 2025-01-15 at 17:31 +09, Tomasz Figa <tfiga@chromium.org> wrote: > Hi Mikhail and Laurent, > > On Wed, Jan 15, 2025 at 2:07 AM Mikhail Rudenko <mike.rudenko@gmail.com> wrote: >> >> >> Hi Laurent, >> >> On 2025-01-03 at 17:23 +02, Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote: >> >> > On Thu, Jan 02, 2025 at 06:35:00PM +0300, Mikhail Rudenko wrote: >> >> Currently, the rkisp1 driver always uses coherent DMA allocations for >> >> video capture buffers. However, on some platforms, using non-coherent >> >> buffers can improve performance, especially when CPU processing of >> >> MMAP'ed video buffers is required. >> >> >> >> For example, on the Rockchip RK3399 running at maximum CPU frequency, >> >> the time to memcpy a frame from a 1280x720 XRGB32 MMAP'ed buffer to a >> >> malloc'ed userspace buffer decreases from 7.7 ms to 1.1 ms when using >> >> non-coherent DMA allocation. CPU usage also decreases accordingly. >> > >> > What's the time taken by the cache management operations ? >> >> Sorry for the late reply, your question turned out a little more >> interesting than I expected initially. :) >> >> When capturing using Yavta with MMAP buffers under the conditions mentioned >> in the commit message, ftrace gives 437.6 +- 1.1 us for >> dma_sync_sgtable_for_cpu and 409 +- 14 us for >> dma_sync_sgtable_for_device. Thus, it looks like using non-coherent >> buffers in this case is more CPU-efficient even when considering cache >> management overhead. >> >> When trying to do the same measurements with libcamera, I failed. In a >> typical libcamera use case when MMAP buffers are allocated from a >> device, exported as dmabufs and then used for capture on the same device >> with DMABUF memory type, cache management in kernel is skipped [1] >> [2]. Also, vb2_dc_dmabuf_ops_{begin,end}_cpu_access are no-ops [3], so >> DMA_BUF_IOCTL_SYNC from userspace does not work either. > > Oops, so I believe this is a bug. When an MMAP buffer is allocated in > the non-coherent mode, those ops should perform proper cache > maintenance. Thanks for pointing this out! > Let me send a patch to fix this in a couple of days unless someone > does it earlier. Now that we know that this is a bug, not an API misuse from my side, I can fix this myself and send a v2. Would this be okay for you? > Best regards, > Tomasz > >> >> So it looks like to make this change really useful, the above issue of >> cache management for libcamera/DMABUF/videobuf2-dma-contig has to be >> solved. I'm not an expert in this area, so any advice is kindly welcome. :) >> >> [1] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-core.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n411 >> [2] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-core.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n829 >> [3] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-dma-contig.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n426 >> >> -- >> Best regards, >> Mikhail Rudenko >> -- Best regards, Mikhail Rudenko ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] media: rkisp1: allow non-coherent video capture buffers 2025-01-15 13:24 ` Mikhail Rudenko @ 2025-01-15 14:46 ` Tomasz Figa 2025-01-15 17:29 ` Mikhail Rudenko 0 siblings, 1 reply; 17+ messages in thread From: Tomasz Figa @ 2025-01-15 14:46 UTC (permalink / raw) To: Mikhail Rudenko Cc: Laurent Pinchart, Dafna Hirschfeld, Mauro Carvalho Chehab, Heiko Stuebner, linux-media, linux-rockchip, linux-arm-kernel, linux-kernel On Wed, Jan 15, 2025 at 10:30 PM Mikhail Rudenko <mike.rudenko@gmail.com> wrote: > > Hi Tomasz, > > On 2025-01-15 at 17:31 +09, Tomasz Figa <tfiga@chromium.org> wrote: > > > Hi Mikhail and Laurent, > > > > On Wed, Jan 15, 2025 at 2:07 AM Mikhail Rudenko <mike.rudenko@gmail.com> wrote: > >> > >> > >> Hi Laurent, > >> > >> On 2025-01-03 at 17:23 +02, Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote: > >> > >> > On Thu, Jan 02, 2025 at 06:35:00PM +0300, Mikhail Rudenko wrote: > >> >> Currently, the rkisp1 driver always uses coherent DMA allocations for > >> >> video capture buffers. However, on some platforms, using non-coherent > >> >> buffers can improve performance, especially when CPU processing of > >> >> MMAP'ed video buffers is required. > >> >> > >> >> For example, on the Rockchip RK3399 running at maximum CPU frequency, > >> >> the time to memcpy a frame from a 1280x720 XRGB32 MMAP'ed buffer to a > >> >> malloc'ed userspace buffer decreases from 7.7 ms to 1.1 ms when using > >> >> non-coherent DMA allocation. CPU usage also decreases accordingly. > >> > > >> > What's the time taken by the cache management operations ? > >> > >> Sorry for the late reply, your question turned out a little more > >> interesting than I expected initially. :) > >> > >> When capturing using Yavta with MMAP buffers under the conditions mentioned > >> in the commit message, ftrace gives 437.6 +- 1.1 us for > >> dma_sync_sgtable_for_cpu and 409 +- 14 us for > >> dma_sync_sgtable_for_device. Thus, it looks like using non-coherent > >> buffers in this case is more CPU-efficient even when considering cache > >> management overhead. > >> > >> When trying to do the same measurements with libcamera, I failed. In a > >> typical libcamera use case when MMAP buffers are allocated from a > >> device, exported as dmabufs and then used for capture on the same device > >> with DMABUF memory type, cache management in kernel is skipped [1] > >> [2]. Also, vb2_dc_dmabuf_ops_{begin,end}_cpu_access are no-ops [3], so > >> DMA_BUF_IOCTL_SYNC from userspace does not work either. > > > > Oops, so I believe this is a bug. When an MMAP buffer is allocated in > > the non-coherent mode, those ops should perform proper cache > > maintenance. > > Thanks for pointing this out! > > > Let me send a patch to fix this in a couple of days unless someone > > does it earlier. > > Now that we know that this is a bug, not an API misuse from my side, I > can fix this myself and send a v2. Would this be okay for you? I'd be more than happy :) > > > Best regards, > > Tomasz > > > >> > >> So it looks like to make this change really useful, the above issue of > >> cache management for libcamera/DMABUF/videobuf2-dma-contig has to be > >> solved. I'm not an expert in this area, so any advice is kindly welcome. :) > >> > >> [1] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-core.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n411 > >> [2] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-core.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n829 > >> [3] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-dma-contig.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n426 > >> > >> -- > >> Best regards, > >> Mikhail Rudenko > >> > > > -- > Best regards, > Mikhail Rudenko ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] media: rkisp1: allow non-coherent video capture buffers 2025-01-15 14:46 ` Tomasz Figa @ 2025-01-15 17:29 ` Mikhail Rudenko 0 siblings, 0 replies; 17+ messages in thread From: Mikhail Rudenko @ 2025-01-15 17:29 UTC (permalink / raw) To: Tomasz Figa Cc: Laurent Pinchart, Dafna Hirschfeld, Mauro Carvalho Chehab, Heiko Stuebner, linux-media, linux-rockchip, linux-arm-kernel, linux-kernel On 2025-01-15 at 23:46 +09, Tomasz Figa <tfiga@chromium.org> wrote: > On Wed, Jan 15, 2025 at 10:30 PM Mikhail Rudenko <mike.rudenko@gmail.com> wrote: >> >> Hi Tomasz, >> >> On 2025-01-15 at 17:31 +09, Tomasz Figa <tfiga@chromium.org> wrote: >> >> > Hi Mikhail and Laurent, >> > >> > On Wed, Jan 15, 2025 at 2:07 AM Mikhail Rudenko <mike.rudenko@gmail.com> wrote: >> >> >> >> >> >> Hi Laurent, >> >> >> >> On 2025-01-03 at 17:23 +02, Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote: >> >> >> >> > On Thu, Jan 02, 2025 at 06:35:00PM +0300, Mikhail Rudenko wrote: >> >> >> Currently, the rkisp1 driver always uses coherent DMA allocations for >> >> >> video capture buffers. However, on some platforms, using non-coherent >> >> >> buffers can improve performance, especially when CPU processing of >> >> >> MMAP'ed video buffers is required. >> >> >> >> >> >> For example, on the Rockchip RK3399 running at maximum CPU frequency, >> >> >> the time to memcpy a frame from a 1280x720 XRGB32 MMAP'ed buffer to a >> >> >> malloc'ed userspace buffer decreases from 7.7 ms to 1.1 ms when using >> >> >> non-coherent DMA allocation. CPU usage also decreases accordingly. >> >> > >> >> > What's the time taken by the cache management operations ? >> >> >> >> Sorry for the late reply, your question turned out a little more >> >> interesting than I expected initially. :) >> >> >> >> When capturing using Yavta with MMAP buffers under the conditions mentioned >> >> in the commit message, ftrace gives 437.6 +- 1.1 us for >> >> dma_sync_sgtable_for_cpu and 409 +- 14 us for >> >> dma_sync_sgtable_for_device. Thus, it looks like using non-coherent >> >> buffers in this case is more CPU-efficient even when considering cache >> >> management overhead. >> >> >> >> When trying to do the same measurements with libcamera, I failed. In a >> >> typical libcamera use case when MMAP buffers are allocated from a >> >> device, exported as dmabufs and then used for capture on the same device >> >> with DMABUF memory type, cache management in kernel is skipped [1] >> >> [2]. Also, vb2_dc_dmabuf_ops_{begin,end}_cpu_access are no-ops [3], so >> >> DMA_BUF_IOCTL_SYNC from userspace does not work either. >> > >> > Oops, so I believe this is a bug. When an MMAP buffer is allocated in >> > the non-coherent mode, those ops should perform proper cache >> > maintenance. >> >> Thanks for pointing this out! >> >> > Let me send a patch to fix this in a couple of days unless someone >> > does it earlier. >> >> Now that we know that this is a bug, not an API misuse from my side, I >> can fix this myself and send a v2. Would this be okay for you? > > I'd be more than happy :) Done, see [1]. A review would be appreciated. :) [1] https://lore.kernel.org/all/20250115-b4-rkisp-noncoherent-v2-0-0853e1a24012@gmail.com/ -- Best regards, Mikhail Rudenko ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] media: rkisp1: allow non-coherent video capture buffers 2025-01-14 16:00 ` Mikhail Rudenko 2025-01-15 8:31 ` Tomasz Figa @ 2025-01-15 19:13 ` Nicolas Dufresne 2025-02-27 17:05 ` Jacopo Mondi 2 siblings, 0 replies; 17+ messages in thread From: Nicolas Dufresne @ 2025-01-15 19:13 UTC (permalink / raw) To: Mikhail Rudenko, Laurent Pinchart Cc: Dafna Hirschfeld, Mauro Carvalho Chehab, Heiko Stuebner, linux-media, linux-rockchip, linux-arm-kernel, linux-kernel Le mardi 14 janvier 2025 à 19:00 +0300, Mikhail Rudenko a écrit : > Hi Laurent, > > On 2025-01-03 at 17:23 +02, Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote: > > > On Thu, Jan 02, 2025 at 06:35:00PM +0300, Mikhail Rudenko wrote: > > > Currently, the rkisp1 driver always uses coherent DMA allocations for > > > video capture buffers. However, on some platforms, using non-coherent > > > buffers can improve performance, especially when CPU processing of > > > MMAP'ed video buffers is required. > > > > > > For example, on the Rockchip RK3399 running at maximum CPU frequency, > > > the time to memcpy a frame from a 1280x720 XRGB32 MMAP'ed buffer to a > > > malloc'ed userspace buffer decreases from 7.7 ms to 1.1 ms when using > > > non-coherent DMA allocation. CPU usage also decreases accordingly. > > > > What's the time taken by the cache management operations ? > > Sorry for the late reply, your question turned out a little more > interesting than I expected initially. :) > > When capturing using Yavta with MMAP buffers under the conditions mentioned > in the commit message, ftrace gives 437.6 +- 1.1 us for > dma_sync_sgtable_for_cpu and 409 +- 14 us for > dma_sync_sgtable_for_device. Thus, it looks like using non-coherent > buffers in this case is more CPU-efficient even when considering cache > management overhead. > > When trying to do the same measurements with libcamera, I failed. In a > typical libcamera use case when MMAP buffers are allocated from a > device, exported as dmabufs and then used for capture on the same device > with DMABUF memory type, cache management in kernel is skipped [1] > [2]. Also, vb2_dc_dmabuf_ops_{begin,end}_cpu_access are no-ops [3], so > DMA_BUF_IOCTL_SYNC from userspace does not work either. > > So it looks like to make this change really useful, the above issue of > cache management for libcamera/DMABUF/videobuf2-dma-contig has to be > solved. I'm not an expert in this area, so any advice is kindly welcome. :) The manual coherency hints are not implemented for DMABuf, and libcamera only do dmabuf. Someone will have to look into that. This is also why this API have very low adoption, its breaks easily. > > [1] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-core.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n411 > [2] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-core.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n829 > [3] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-dma-contig.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n426 > > -- > Best regards, > Mikhail Rudenko > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] media: rkisp1: allow non-coherent video capture buffers 2025-01-14 16:00 ` Mikhail Rudenko 2025-01-15 8:31 ` Tomasz Figa 2025-01-15 19:13 ` Nicolas Dufresne @ 2025-02-27 17:05 ` Jacopo Mondi 2025-02-27 20:46 ` Mikhail Rudenko 2025-02-28 10:00 ` Tomasz Figa 2 siblings, 2 replies; 17+ messages in thread From: Jacopo Mondi @ 2025-02-27 17:05 UTC (permalink / raw) To: Mikhail Rudenko Cc: Laurent Pinchart, Dafna Hirschfeld, Mauro Carvalho Chehab, Heiko Stuebner, linux-media, linux-rockchip, linux-arm-kernel, linux-kernel Hi Mikhail On Tue, Jan 14, 2025 at 07:00:39PM +0300, Mikhail Rudenko wrote: > > Hi Laurent, > > On 2025-01-03 at 17:23 +02, Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote: > > > On Thu, Jan 02, 2025 at 06:35:00PM +0300, Mikhail Rudenko wrote: > >> Currently, the rkisp1 driver always uses coherent DMA allocations for > >> video capture buffers. However, on some platforms, using non-coherent > >> buffers can improve performance, especially when CPU processing of > >> MMAP'ed video buffers is required. > >> > >> For example, on the Rockchip RK3399 running at maximum CPU frequency, > >> the time to memcpy a frame from a 1280x720 XRGB32 MMAP'ed buffer to a > >> malloc'ed userspace buffer decreases from 7.7 ms to 1.1 ms when using > >> non-coherent DMA allocation. CPU usage also decreases accordingly. > > > > What's the time taken by the cache management operations ? > > Sorry for the late reply, your question turned out a little more > interesting than I expected initially. :) > > When capturing using Yavta with MMAP buffers under the conditions mentioned > in the commit message, ftrace gives 437.6 +- 1.1 us for > dma_sync_sgtable_for_cpu and 409 +- 14 us for > dma_sync_sgtable_for_device. Thus, it looks like using non-coherent > buffers in this case is more CPU-efficient even when considering cache > management overhead. > > When trying to do the same measurements with libcamera, I failed. In a > typical libcamera use case when MMAP buffers are allocated from a > device, exported as dmabufs and then used for capture on the same device > with DMABUF memory type, cache management in kernel is skipped [1] > [2]. Also, vb2_dc_dmabuf_ops_{begin,end}_cpu_access are no-ops [3], so > DMA_BUF_IOCTL_SYNC from userspace does not work either. > > So it looks like to make this change really useful, the above issue of > cache management for libcamera/DMABUF/videobuf2-dma-contig has to be > solved. I'm not an expert in this area, so any advice is kindly welcome. :) It would be shame if we let this discussion drop dead.. cache management policies are relevant for performances, specifically for cpu access, and your above 7.7ms vs 1.1 ms test clearly shows that. > > [1] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-core.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n411 I would like to know from Hans if the decision to disallow cache-hints for dmabuf importers is a design choice or is deeply rooted in other reasons I might be missing. I'm asking because the idea is for libcamera to act solely as dma-buf importer, the current alloc-export-then-import trick is an helper for applications to work around the absence of a system allocator. If the requirement to disable cache-hints for importers cannot be lifted, for libcamera it means we would not be able to use it. > [2] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-core.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n829 > [3] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-dma-contig.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n426 > > -- > Best regards, > Mikhail Rudenko > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] media: rkisp1: allow non-coherent video capture buffers 2025-02-27 17:05 ` Jacopo Mondi @ 2025-02-27 20:46 ` Mikhail Rudenko 2025-02-28 2:58 ` Nicolas Dufresne 2025-02-28 10:00 ` Tomasz Figa 1 sibling, 1 reply; 17+ messages in thread From: Mikhail Rudenko @ 2025-02-27 20:46 UTC (permalink / raw) To: Jacopo Mondi Cc: Laurent Pinchart, Dafna Hirschfeld, Mauro Carvalho Chehab, Heiko Stuebner, linux-media, linux-rockchip, linux-arm-kernel, linux-kernel Hi Jacopo, On 2025-02-27 at 18:05 +01, Jacopo Mondi <jacopo.mondi@ideasonboard.com> wrote: > Hi Mikhail > > On Tue, Jan 14, 2025 at 07:00:39PM +0300, Mikhail Rudenko wrote: >> >> Hi Laurent, >> >> On 2025-01-03 at 17:23 +02, Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote: >> >> > On Thu, Jan 02, 2025 at 06:35:00PM +0300, Mikhail Rudenko wrote: >> >> Currently, the rkisp1 driver always uses coherent DMA allocations for >> >> video capture buffers. However, on some platforms, using non-coherent >> >> buffers can improve performance, especially when CPU processing of >> >> MMAP'ed video buffers is required. >> >> >> >> For example, on the Rockchip RK3399 running at maximum CPU frequency, >> >> the time to memcpy a frame from a 1280x720 XRGB32 MMAP'ed buffer to a >> >> malloc'ed userspace buffer decreases from 7.7 ms to 1.1 ms when using >> >> non-coherent DMA allocation. CPU usage also decreases accordingly. >> > >> > What's the time taken by the cache management operations ? >> >> Sorry for the late reply, your question turned out a little more >> interesting than I expected initially. :) >> >> When capturing using Yavta with MMAP buffers under the conditions mentioned >> in the commit message, ftrace gives 437.6 +- 1.1 us for >> dma_sync_sgtable_for_cpu and 409 +- 14 us for >> dma_sync_sgtable_for_device. Thus, it looks like using non-coherent >> buffers in this case is more CPU-efficient even when considering cache >> management overhead. >> >> When trying to do the same measurements with libcamera, I failed. In a >> typical libcamera use case when MMAP buffers are allocated from a >> device, exported as dmabufs and then used for capture on the same device >> with DMABUF memory type, cache management in kernel is skipped [1] >> [2]. Also, vb2_dc_dmabuf_ops_{begin,end}_cpu_access are no-ops [3], so >> DMA_BUF_IOCTL_SYNC from userspace does not work either. >> >> So it looks like to make this change really useful, the above issue of >> cache management for libcamera/DMABUF/videobuf2-dma-contig has to be >> solved. I'm not an expert in this area, so any advice is kindly welcome. :) > > It would be shame if we let this discussion drop dead.. cache > management policies are relevant for performances, specifically for > cpu access, and your above 7.7ms vs 1.1 ms test clearly shows that. > >> >> [1] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-core.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n411 > > I would like to know from Hans if the decision to disallow cache-hints > for dmabuf importers is a design choice or is deeply rooted in other > reasons I might be missing. > > I'm asking because the idea is for libcamera to act solely as dma-buf > importer, the current alloc-export-then-import trick is an helper for > applications to work around the absence of a system allocator. > > If the requirement to disable cache-hints for importers cannot be > lifted, for libcamera it means we would not be able to use it. Meanwhile, I have posted a patch, which re-enables cache management ops for non-coherent dmabufs exported from dma-contig allocator [1]. It is currently waiting for review. [1] https://lore.kernel.org/all/20250128-b4-rkisp-noncoherent-v3-1-baf39c997d2a@gmail.com/ > >> [2] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-core.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n829 >> [3] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-dma-contig.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n426 >> >> -- >> Best regards, >> Mikhail Rudenko >> -- Best regards, Mikhail Rudenko ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] media: rkisp1: allow non-coherent video capture buffers 2025-02-27 20:46 ` Mikhail Rudenko @ 2025-02-28 2:58 ` Nicolas Dufresne 2025-02-28 9:54 ` Hans Verkuil 0 siblings, 1 reply; 17+ messages in thread From: Nicolas Dufresne @ 2025-02-28 2:58 UTC (permalink / raw) To: Mikhail Rudenko, Jacopo Mondi Cc: Laurent Pinchart, Dafna Hirschfeld, Mauro Carvalho Chehab, Heiko Stuebner, linux-media, linux-rockchip, linux-arm-kernel, linux-kernel Le jeudi 27 février 2025 à 23:46 +0300, Mikhail Rudenko a écrit : > Hi Jacopo, > > On 2025-02-27 at 18:05 +01, Jacopo Mondi <jacopo.mondi@ideasonboard.com> wrote: > > > Hi Mikhail > > > > On Tue, Jan 14, 2025 at 07:00:39PM +0300, Mikhail Rudenko wrote: > > > > > > Hi Laurent, > > > > > > On 2025-01-03 at 17:23 +02, Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote: > > > > > > > On Thu, Jan 02, 2025 at 06:35:00PM +0300, Mikhail Rudenko wrote: > > > > > Currently, the rkisp1 driver always uses coherent DMA allocations for > > > > > video capture buffers. However, on some platforms, using non-coherent > > > > > buffers can improve performance, especially when CPU processing of > > > > > MMAP'ed video buffers is required. > > > > > > > > > > For example, on the Rockchip RK3399 running at maximum CPU frequency, > > > > > the time to memcpy a frame from a 1280x720 XRGB32 MMAP'ed buffer to a > > > > > malloc'ed userspace buffer decreases from 7.7 ms to 1.1 ms when using > > > > > non-coherent DMA allocation. CPU usage also decreases accordingly. > > > > > > > > What's the time taken by the cache management operations ? > > > > > > Sorry for the late reply, your question turned out a little more > > > interesting than I expected initially. :) > > > > > > When capturing using Yavta with MMAP buffers under the conditions mentioned > > > in the commit message, ftrace gives 437.6 +- 1.1 us for > > > dma_sync_sgtable_for_cpu and 409 +- 14 us for > > > dma_sync_sgtable_for_device. Thus, it looks like using non-coherent > > > buffers in this case is more CPU-efficient even when considering cache > > > management overhead. > > > > > > When trying to do the same measurements with libcamera, I failed. In a > > > typical libcamera use case when MMAP buffers are allocated from a > > > device, exported as dmabufs and then used for capture on the same device > > > with DMABUF memory type, cache management in kernel is skipped [1] > > > [2]. Also, vb2_dc_dmabuf_ops_{begin,end}_cpu_access are no-ops [3], so > > > DMA_BUF_IOCTL_SYNC from userspace does not work either. > > > > > > So it looks like to make this change really useful, the above issue of > > > cache management for libcamera/DMABUF/videobuf2-dma-contig has to be > > > solved. I'm not an expert in this area, so any advice is kindly welcome. :) > > > > It would be shame if we let this discussion drop dead.. cache > > management policies are relevant for performances, specifically for > > cpu access, and your above 7.7ms vs 1.1 ms test clearly shows that. > > > > > > > > [1] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-core.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n411 > > > > I would like to know from Hans if the decision to disallow cache-hints > > for dmabuf importers is a design choice or is deeply rooted in other > > reasons I might be missing. > > > > I'm asking because the idea is for libcamera to act solely as dma-buf > > importer, the current alloc-export-then-import trick is an helper for > > applications to work around the absence of a system allocator. > > > > If the requirement to disable cache-hints for importers cannot be > > lifted, for libcamera it means we would not be able to use it. > > Meanwhile, I have posted a patch, which re-enables cache management ops > for non-coherent dmabufs exported from dma-contig allocator [1]. It is > currently waiting for review. > > [1] https://lore.kernel.org/all/20250128-b4-rkisp-noncoherent-v3-1-baf39c997d2a@gmail.com/ Thanks for you work, this matches what I was referring to missing in my previous reply. I don't think there is any intention to block or deprecate it, but partially enabling leads to problems. Do we need something in the SG allocator to ? Nicolas > > > > > > [2] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-core.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n829 > > > [3] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-dma-contig.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n426 > > > > > > -- > > > Best regards, > > > Mikhail Rudenko > > > > > > -- > Best regards, > Mikhail Rudenko > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] media: rkisp1: allow non-coherent video capture buffers 2025-02-28 2:58 ` Nicolas Dufresne @ 2025-02-28 9:54 ` Hans Verkuil 0 siblings, 0 replies; 17+ messages in thread From: Hans Verkuil @ 2025-02-28 9:54 UTC (permalink / raw) To: Nicolas Dufresne, Mikhail Rudenko, Jacopo Mondi, Tomasz Figa Cc: Laurent Pinchart, Dafna Hirschfeld, Mauro Carvalho Chehab, Heiko Stuebner, linux-media, linux-rockchip, linux-arm-kernel, linux-kernel On 28/02/2025 03:58, Nicolas Dufresne wrote: > Le jeudi 27 février 2025 à 23:46 +0300, Mikhail Rudenko a écrit : >> Hi Jacopo, >> >> On 2025-02-27 at 18:05 +01, Jacopo Mondi <jacopo.mondi@ideasonboard.com> wrote: >> >>> Hi Mikhail >>> >>> On Tue, Jan 14, 2025 at 07:00:39PM +0300, Mikhail Rudenko wrote: >>>> >>>> Hi Laurent, >>>> >>>> On 2025-01-03 at 17:23 +02, Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote: >>>> >>>>> On Thu, Jan 02, 2025 at 06:35:00PM +0300, Mikhail Rudenko wrote: >>>>>> Currently, the rkisp1 driver always uses coherent DMA allocations for >>>>>> video capture buffers. However, on some platforms, using non-coherent >>>>>> buffers can improve performance, especially when CPU processing of >>>>>> MMAP'ed video buffers is required. >>>>>> >>>>>> For example, on the Rockchip RK3399 running at maximum CPU frequency, >>>>>> the time to memcpy a frame from a 1280x720 XRGB32 MMAP'ed buffer to a >>>>>> malloc'ed userspace buffer decreases from 7.7 ms to 1.1 ms when using >>>>>> non-coherent DMA allocation. CPU usage also decreases accordingly. >>>>> >>>>> What's the time taken by the cache management operations ? >>>> >>>> Sorry for the late reply, your question turned out a little more >>>> interesting than I expected initially. :) >>>> >>>> When capturing using Yavta with MMAP buffers under the conditions mentioned >>>> in the commit message, ftrace gives 437.6 +- 1.1 us for >>>> dma_sync_sgtable_for_cpu and 409 +- 14 us for >>>> dma_sync_sgtable_for_device. Thus, it looks like using non-coherent >>>> buffers in this case is more CPU-efficient even when considering cache >>>> management overhead. >>>> >>>> When trying to do the same measurements with libcamera, I failed. In a >>>> typical libcamera use case when MMAP buffers are allocated from a >>>> device, exported as dmabufs and then used for capture on the same device >>>> with DMABUF memory type, cache management in kernel is skipped [1] >>>> [2]. Also, vb2_dc_dmabuf_ops_{begin,end}_cpu_access are no-ops [3], so >>>> DMA_BUF_IOCTL_SYNC from userspace does not work either. >>>> >>>> So it looks like to make this change really useful, the above issue of >>>> cache management for libcamera/DMABUF/videobuf2-dma-contig has to be >>>> solved. I'm not an expert in this area, so any advice is kindly welcome. :) >>> >>> It would be shame if we let this discussion drop dead.. cache >>> management policies are relevant for performances, specifically for >>> cpu access, and your above 7.7ms vs 1.1 ms test clearly shows that. >>> >>>> >>>> [1] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-core.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n411 >>> >>> I would like to know from Hans if the decision to disallow cache-hints >>> for dmabuf importers is a design choice or is deeply rooted in other >>> reasons I might be missing. >>> >>> I'm asking because the idea is for libcamera to act solely as dma-buf >>> importer, the current alloc-export-then-import trick is an helper for >>> applications to work around the absence of a system allocator. >>> >>> If the requirement to disable cache-hints for importers cannot be >>> lifted, for libcamera it means we would not be able to use it. >> >> Meanwhile, I have posted a patch, which re-enables cache management ops >> for non-coherent dmabufs exported from dma-contig allocator [1]. It is >> currently waiting for review. >> >> [1] https://lore.kernel.org/all/20250128-b4-rkisp-noncoherent-v3-1-baf39c997d2a@gmail.com/ I rely heavily on Tomasz Figa for review of such patches. He was OK with the patch, but he suggested waiting a bit for more comments. My plan was to pick it up end of next week if there are no further comments. Regards, Hans > > Thanks for you work, this matches what I was referring to missing in my > previous reply. > > I don't think there is any intention to block or deprecate it, but > partially enabling leads to problems. Do we need something in the SG > allocator to ? > > Nicolas > >> >>> >>>> [2] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-core.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n829 >>>> [3] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-dma-contig.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n426 >>>> >>>> -- >>>> Best regards, >>>> Mikhail Rudenko >>>> >> >> >> -- >> Best regards, >> Mikhail Rudenko >> > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] media: rkisp1: allow non-coherent video capture buffers 2025-02-27 17:05 ` Jacopo Mondi 2025-02-27 20:46 ` Mikhail Rudenko @ 2025-02-28 10:00 ` Tomasz Figa 2025-02-28 10:18 ` Jacopo Mondi 1 sibling, 1 reply; 17+ messages in thread From: Tomasz Figa @ 2025-02-28 10:00 UTC (permalink / raw) To: Jacopo Mondi Cc: Mikhail Rudenko, Laurent Pinchart, Dafna Hirschfeld, Mauro Carvalho Chehab, Heiko Stuebner, linux-media, linux-rockchip, linux-arm-kernel, linux-kernel Hi Jacopo, On Fri, Feb 28, 2025 at 2:11 AM Jacopo Mondi <jacopo.mondi@ideasonboard.com> wrote: > > Hi Mikhail > > On Tue, Jan 14, 2025 at 07:00:39PM +0300, Mikhail Rudenko wrote: > > > > Hi Laurent, > > > > On 2025-01-03 at 17:23 +02, Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote: > > > > > On Thu, Jan 02, 2025 at 06:35:00PM +0300, Mikhail Rudenko wrote: > > >> Currently, the rkisp1 driver always uses coherent DMA allocations for > > >> video capture buffers. However, on some platforms, using non-coherent > > >> buffers can improve performance, especially when CPU processing of > > >> MMAP'ed video buffers is required. > > >> > > >> For example, on the Rockchip RK3399 running at maximum CPU frequency, > > >> the time to memcpy a frame from a 1280x720 XRGB32 MMAP'ed buffer to a > > >> malloc'ed userspace buffer decreases from 7.7 ms to 1.1 ms when using > > >> non-coherent DMA allocation. CPU usage also decreases accordingly. > > > > > > What's the time taken by the cache management operations ? > > > > Sorry for the late reply, your question turned out a little more > > interesting than I expected initially. :) > > > > When capturing using Yavta with MMAP buffers under the conditions mentioned > > in the commit message, ftrace gives 437.6 +- 1.1 us for > > dma_sync_sgtable_for_cpu and 409 +- 14 us for > > dma_sync_sgtable_for_device. Thus, it looks like using non-coherent > > buffers in this case is more CPU-efficient even when considering cache > > management overhead. > > > > When trying to do the same measurements with libcamera, I failed. In a > > typical libcamera use case when MMAP buffers are allocated from a > > device, exported as dmabufs and then used for capture on the same device > > with DMABUF memory type, cache management in kernel is skipped [1] > > [2]. Also, vb2_dc_dmabuf_ops_{begin,end}_cpu_access are no-ops [3], so > > DMA_BUF_IOCTL_SYNC from userspace does not work either. > > > > So it looks like to make this change really useful, the above issue of > > cache management for libcamera/DMABUF/videobuf2-dma-contig has to be > > solved. I'm not an expert in this area, so any advice is kindly welcome. :) > > It would be shame if we let this discussion drop dead.. cache > management policies are relevant for performances, specifically for > cpu access, and your above 7.7ms vs 1.1 ms test clearly shows that. > > > > > [1] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-core.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n411 > > I would like to know from Hans if the decision to disallow cache-hints > for dmabuf importers is a design choice or is deeply rooted in other > reasons I might be missing. When DMA-buf is used, the responsibility for cache management is solely on the CPU users' side, so cache-hints don't really apply. It's the exporter (=allocator) who determines the mapping policy of the buffer and provides necessary DMA_BUF_SYNC operations (can be no-op if the buffer is coherent). Best regards, Tomasz > > I'm asking because the idea is for libcamera to act solely as dma-buf > importer, the current alloc-export-then-import trick is an helper for > applications to work around the absence of a system allocator. > > If the requirement to disable cache-hints for importers cannot be > lifted, for libcamera it means we would not be able to use it. > > > > [2] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-core.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n829 > > [3] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-dma-contig.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n426 > > > > -- > > Best regards, > > Mikhail Rudenko > > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] media: rkisp1: allow non-coherent video capture buffers 2025-02-28 10:00 ` Tomasz Figa @ 2025-02-28 10:18 ` Jacopo Mondi 2025-02-28 10:28 ` Tomasz Figa 0 siblings, 1 reply; 17+ messages in thread From: Jacopo Mondi @ 2025-02-28 10:18 UTC (permalink / raw) To: Tomasz Figa Cc: Jacopo Mondi, Mikhail Rudenko, Laurent Pinchart, Dafna Hirschfeld, Mauro Carvalho Chehab, Heiko Stuebner, linux-media, linux-rockchip, linux-arm-kernel, linux-kernel Hi Tomasz On Fri, Feb 28, 2025 at 07:00:57PM +0900, Tomasz Figa wrote: > Hi Jacopo, > > On Fri, Feb 28, 2025 at 2:11 AM Jacopo Mondi > <jacopo.mondi@ideasonboard.com> wrote: > > > > Hi Mikhail > > > > On Tue, Jan 14, 2025 at 07:00:39PM +0300, Mikhail Rudenko wrote: > > > > > > Hi Laurent, > > > > > > On 2025-01-03 at 17:23 +02, Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote: > > > > > > > On Thu, Jan 02, 2025 at 06:35:00PM +0300, Mikhail Rudenko wrote: > > > >> Currently, the rkisp1 driver always uses coherent DMA allocations for > > > >> video capture buffers. However, on some platforms, using non-coherent > > > >> buffers can improve performance, especially when CPU processing of > > > >> MMAP'ed video buffers is required. > > > >> > > > >> For example, on the Rockchip RK3399 running at maximum CPU frequency, > > > >> the time to memcpy a frame from a 1280x720 XRGB32 MMAP'ed buffer to a > > > >> malloc'ed userspace buffer decreases from 7.7 ms to 1.1 ms when using > > > >> non-coherent DMA allocation. CPU usage also decreases accordingly. > > > > > > > > What's the time taken by the cache management operations ? > > > > > > Sorry for the late reply, your question turned out a little more > > > interesting than I expected initially. :) > > > > > > When capturing using Yavta with MMAP buffers under the conditions mentioned > > > in the commit message, ftrace gives 437.6 +- 1.1 us for > > > dma_sync_sgtable_for_cpu and 409 +- 14 us for > > > dma_sync_sgtable_for_device. Thus, it looks like using non-coherent > > > buffers in this case is more CPU-efficient even when considering cache > > > management overhead. > > > > > > When trying to do the same measurements with libcamera, I failed. In a > > > typical libcamera use case when MMAP buffers are allocated from a > > > device, exported as dmabufs and then used for capture on the same device > > > with DMABUF memory type, cache management in kernel is skipped [1] > > > [2]. Also, vb2_dc_dmabuf_ops_{begin,end}_cpu_access are no-ops [3], so > > > DMA_BUF_IOCTL_SYNC from userspace does not work either. > > > > > > So it looks like to make this change really useful, the above issue of > > > cache management for libcamera/DMABUF/videobuf2-dma-contig has to be > > > solved. I'm not an expert in this area, so any advice is kindly welcome. :) > > > > It would be shame if we let this discussion drop dead.. cache > > management policies are relevant for performances, specifically for > > cpu access, and your above 7.7ms vs 1.1 ms test clearly shows that. > > > > > > > > [1] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-core.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n411 > > > > I would like to know from Hans if the decision to disallow cache-hints > > for dmabuf importers is a design choice or is deeply rooted in other > > reasons I might be missing. > > When DMA-buf is used, the responsibility for cache management is > solely on the CPU users' side, so cache-hints don't really apply. It's > the exporter (=allocator) who determines the mapping policy of the > buffer and provides necessary DMA_BUF_SYNC operations (can be no-op if > the buffer is coherent). This all makes sense. I take it as, for libcamera, users of the FrameBufferAllocator helper (which first exports MMAP buffers from the video device and the imports them back as DMABUF) won't be able to control the cache policies. Now, in the long term, we want FrameBufferAllocator to go away and have either buffers exported by the consumer (likely DRM) or by a system wide buffer allocator (when we'll have one) and have the video devices operate as pure importers. But for the time being the "first export then import" use case is possibile and valid so I wonder if we should consider measures to allow controlling caching policies for this use case too. > > Best regards, > Tomasz > > > > > I'm asking because the idea is for libcamera to act solely as dma-buf > > importer, the current alloc-export-then-import trick is an helper for > > applications to work around the absence of a system allocator. > > > > If the requirement to disable cache-hints for importers cannot be > > lifted, for libcamera it means we would not be able to use it. > > > > > > > [2] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-core.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n829 > > > [3] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-dma-contig.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n426 > > > > > > -- > > > Best regards, > > > Mikhail Rudenko > > > > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] media: rkisp1: allow non-coherent video capture buffers 2025-02-28 10:18 ` Jacopo Mondi @ 2025-02-28 10:28 ` Tomasz Figa 2025-02-28 10:48 ` Jacopo Mondi 0 siblings, 1 reply; 17+ messages in thread From: Tomasz Figa @ 2025-02-28 10:28 UTC (permalink / raw) To: Jacopo Mondi Cc: Mikhail Rudenko, Laurent Pinchart, Dafna Hirschfeld, Mauro Carvalho Chehab, Heiko Stuebner, linux-media, linux-rockchip, linux-arm-kernel, linux-kernel On Fri, Feb 28, 2025 at 7:18 PM Jacopo Mondi <jacopo.mondi@ideasonboard.com> wrote: > > Hi Tomasz > > On Fri, Feb 28, 2025 at 07:00:57PM +0900, Tomasz Figa wrote: > > Hi Jacopo, > > > > On Fri, Feb 28, 2025 at 2:11 AM Jacopo Mondi > > <jacopo.mondi@ideasonboard.com> wrote: > > > > > > Hi Mikhail > > > > > > On Tue, Jan 14, 2025 at 07:00:39PM +0300, Mikhail Rudenko wrote: > > > > > > > > Hi Laurent, > > > > > > > > On 2025-01-03 at 17:23 +02, Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote: > > > > > > > > > On Thu, Jan 02, 2025 at 06:35:00PM +0300, Mikhail Rudenko wrote: > > > > >> Currently, the rkisp1 driver always uses coherent DMA allocations for > > > > >> video capture buffers. However, on some platforms, using non-coherent > > > > >> buffers can improve performance, especially when CPU processing of > > > > >> MMAP'ed video buffers is required. > > > > >> > > > > >> For example, on the Rockchip RK3399 running at maximum CPU frequency, > > > > >> the time to memcpy a frame from a 1280x720 XRGB32 MMAP'ed buffer to a > > > > >> malloc'ed userspace buffer decreases from 7.7 ms to 1.1 ms when using > > > > >> non-coherent DMA allocation. CPU usage also decreases accordingly. > > > > > > > > > > What's the time taken by the cache management operations ? > > > > > > > > Sorry for the late reply, your question turned out a little more > > > > interesting than I expected initially. :) > > > > > > > > When capturing using Yavta with MMAP buffers under the conditions mentioned > > > > in the commit message, ftrace gives 437.6 +- 1.1 us for > > > > dma_sync_sgtable_for_cpu and 409 +- 14 us for > > > > dma_sync_sgtable_for_device. Thus, it looks like using non-coherent > > > > buffers in this case is more CPU-efficient even when considering cache > > > > management overhead. > > > > > > > > When trying to do the same measurements with libcamera, I failed. In a > > > > typical libcamera use case when MMAP buffers are allocated from a > > > > device, exported as dmabufs and then used for capture on the same device > > > > with DMABUF memory type, cache management in kernel is skipped [1] > > > > [2]. Also, vb2_dc_dmabuf_ops_{begin,end}_cpu_access are no-ops [3], so > > > > DMA_BUF_IOCTL_SYNC from userspace does not work either. > > > > > > > > So it looks like to make this change really useful, the above issue of > > > > cache management for libcamera/DMABUF/videobuf2-dma-contig has to be > > > > solved. I'm not an expert in this area, so any advice is kindly welcome. :) > > > > > > It would be shame if we let this discussion drop dead.. cache > > > management policies are relevant for performances, specifically for > > > cpu access, and your above 7.7ms vs 1.1 ms test clearly shows that. > > > > > > > > > > > [1] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-core.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n411 > > > > > > I would like to know from Hans if the decision to disallow cache-hints > > > for dmabuf importers is a design choice or is deeply rooted in other > > > reasons I might be missing. > > > > When DMA-buf is used, the responsibility for cache management is > > solely on the CPU users' side, so cache-hints don't really apply. It's > > the exporter (=allocator) who determines the mapping policy of the > > buffer and provides necessary DMA_BUF_SYNC operations (can be no-op if > > the buffer is coherent). > > This all makes sense. > > I take it as, for libcamera, users of the FrameBufferAllocator helper > (which first exports MMAP buffers from the video device and the > imports them back as DMABUF) won't be able to control the cache > policies. > > Now, in the long term, we want FrameBufferAllocator to go away and > have either buffers exported by the consumer (likely DRM) or by a > system wide buffer allocator (when we'll have one) and have the video > devices operate as pure importers. But for the time being the > "first export then import" use case is possibile and valid so I wonder > if we should consider measures to allow controlling caching policies > for this use case too. Hmm, I may be missing something, but if FrameBufferAllocator does the allocation internally in libcamera, why couldn't it use the V4L2_MEMORY_FLAG_NON_COHERENT REQBUFS/CREATE_BUFS flag? Note that however, once the buffer is allocated and mapped once, on many architectures it must keep the same mapping attributes across the different mappings, due to how the memory hierarchy works. (Not sure if this is what you are asking for here, though.) > > > > > Best regards, > > Tomasz > > > > > > > > I'm asking because the idea is for libcamera to act solely as dma-buf > > > importer, the current alloc-export-then-import trick is an helper for > > > applications to work around the absence of a system allocator. > > > > > > If the requirement to disable cache-hints for importers cannot be > > > lifted, for libcamera it means we would not be able to use it. > > > > > > > > > > [2] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-core.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n829 > > > > [3] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-dma-contig.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n426 > > > > > > > > -- > > > > Best regards, > > > > Mikhail Rudenko > > > > > > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] media: rkisp1: allow non-coherent video capture buffers 2025-02-28 10:28 ` Tomasz Figa @ 2025-02-28 10:48 ` Jacopo Mondi 2025-02-28 11:19 ` Tomasz Figa 0 siblings, 1 reply; 17+ messages in thread From: Jacopo Mondi @ 2025-02-28 10:48 UTC (permalink / raw) To: Tomasz Figa Cc: Jacopo Mondi, Mikhail Rudenko, Laurent Pinchart, Dafna Hirschfeld, Mauro Carvalho Chehab, Heiko Stuebner, linux-media, linux-rockchip, linux-arm-kernel, linux-kernel Hi Tomasz On Fri, Feb 28, 2025 at 07:28:43PM +0900, Tomasz Figa wrote: > On Fri, Feb 28, 2025 at 7:18 PM Jacopo Mondi > <jacopo.mondi@ideasonboard.com> wrote: > > > > Hi Tomasz > > > > On Fri, Feb 28, 2025 at 07:00:57PM +0900, Tomasz Figa wrote: > > > Hi Jacopo, > > > > > > On Fri, Feb 28, 2025 at 2:11 AM Jacopo Mondi > > > <jacopo.mondi@ideasonboard.com> wrote: > > > > > > > > Hi Mikhail > > > > > > > > On Tue, Jan 14, 2025 at 07:00:39PM +0300, Mikhail Rudenko wrote: > > > > > > > > > > Hi Laurent, > > > > > > > > > > On 2025-01-03 at 17:23 +02, Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote: > > > > > > > > > > > On Thu, Jan 02, 2025 at 06:35:00PM +0300, Mikhail Rudenko wrote: > > > > > >> Currently, the rkisp1 driver always uses coherent DMA allocations for > > > > > >> video capture buffers. However, on some platforms, using non-coherent > > > > > >> buffers can improve performance, especially when CPU processing of > > > > > >> MMAP'ed video buffers is required. > > > > > >> > > > > > >> For example, on the Rockchip RK3399 running at maximum CPU frequency, > > > > > >> the time to memcpy a frame from a 1280x720 XRGB32 MMAP'ed buffer to a > > > > > >> malloc'ed userspace buffer decreases from 7.7 ms to 1.1 ms when using > > > > > >> non-coherent DMA allocation. CPU usage also decreases accordingly. > > > > > > > > > > > > What's the time taken by the cache management operations ? > > > > > > > > > > Sorry for the late reply, your question turned out a little more > > > > > interesting than I expected initially. :) > > > > > > > > > > When capturing using Yavta with MMAP buffers under the conditions mentioned > > > > > in the commit message, ftrace gives 437.6 +- 1.1 us for > > > > > dma_sync_sgtable_for_cpu and 409 +- 14 us for > > > > > dma_sync_sgtable_for_device. Thus, it looks like using non-coherent > > > > > buffers in this case is more CPU-efficient even when considering cache > > > > > management overhead. > > > > > > > > > > When trying to do the same measurements with libcamera, I failed. In a > > > > > typical libcamera use case when MMAP buffers are allocated from a > > > > > device, exported as dmabufs and then used for capture on the same device > > > > > with DMABUF memory type, cache management in kernel is skipped [1] > > > > > [2]. Also, vb2_dc_dmabuf_ops_{begin,end}_cpu_access are no-ops [3], so > > > > > DMA_BUF_IOCTL_SYNC from userspace does not work either. > > > > > > > > > > So it looks like to make this change really useful, the above issue of > > > > > cache management for libcamera/DMABUF/videobuf2-dma-contig has to be > > > > > solved. I'm not an expert in this area, so any advice is kindly welcome. :) > > > > > > > > It would be shame if we let this discussion drop dead.. cache > > > > management policies are relevant for performances, specifically for > > > > cpu access, and your above 7.7ms vs 1.1 ms test clearly shows that. > > > > > > > > > > > > > > [1] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-core.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n411 > > > > > > > > I would like to know from Hans if the decision to disallow cache-hints > > > > for dmabuf importers is a design choice or is deeply rooted in other > > > > reasons I might be missing. > > > > > > When DMA-buf is used, the responsibility for cache management is > > > solely on the CPU users' side, so cache-hints don't really apply. It's > > > the exporter (=allocator) who determines the mapping policy of the > > > buffer and provides necessary DMA_BUF_SYNC operations (can be no-op if > > > the buffer is coherent). > > > > This all makes sense. > > > > I take it as, for libcamera, users of the FrameBufferAllocator helper > > (which first exports MMAP buffers from the video device and the > > imports them back as DMABUF) won't be able to control the cache > > policies. > > > > Now, in the long term, we want FrameBufferAllocator to go away and > > have either buffers exported by the consumer (likely DRM) or by a > > system wide buffer allocator (when we'll have one) and have the video > > devices operate as pure importers. But for the time being the > > "first export then import" use case is possibile and valid so I wonder > > if we should consider measures to allow controlling caching policies > > for this use case too. > > Hmm, I may be missing something, but if FrameBufferAllocator does the > allocation internally in libcamera, why couldn't it use the > V4L2_MEMORY_FLAG_NON_COHERENT REQBUFS/CREATE_BUFS flag? -I- might be missing something, but reading the below links that Mikhail reported in a previous email [1] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-core.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n411 [2] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-core.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n829 it seems to me that yes, when first exporting you can hint to tell vb2 to allocate from non-coherent/non-contiguous memory, but then when switching the device to importer mode (and import the same buffers it previously exported) the cache sync point would then be skipped, leading to possible synchronization issues between the cpu consumer and the device. > > Note that however, once the buffer is allocated and mapped once, on > many architectures it must keep the same mapping attributes across the > different mappings, due to how the memory hierarchy works. (Not sure > if this is what you are asking for here, though.) > > > > > > > > > Best regards, > > > Tomasz > > > > > > > > > > > I'm asking because the idea is for libcamera to act solely as dma-buf > > > > importer, the current alloc-export-then-import trick is an helper for > > > > applications to work around the absence of a system allocator. > > > > > > > > If the requirement to disable cache-hints for importers cannot be > > > > lifted, for libcamera it means we would not be able to use it. > > > > > > > > > > > > > [2] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-core.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n829 > > > > > [3] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-dma-contig.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n426 > > > > > > > > > > -- > > > > > Best regards, > > > > > Mikhail Rudenko > > > > > > > > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] media: rkisp1: allow non-coherent video capture buffers 2025-02-28 10:48 ` Jacopo Mondi @ 2025-02-28 11:19 ` Tomasz Figa 0 siblings, 0 replies; 17+ messages in thread From: Tomasz Figa @ 2025-02-28 11:19 UTC (permalink / raw) To: Jacopo Mondi Cc: Mikhail Rudenko, Laurent Pinchart, Dafna Hirschfeld, Mauro Carvalho Chehab, Heiko Stuebner, linux-media, linux-rockchip, linux-arm-kernel, linux-kernel On Fri, Feb 28, 2025 at 7:48 PM Jacopo Mondi <jacopo.mondi@ideasonboard.com> wrote: > > Hi Tomasz > > On Fri, Feb 28, 2025 at 07:28:43PM +0900, Tomasz Figa wrote: > > On Fri, Feb 28, 2025 at 7:18 PM Jacopo Mondi > > <jacopo.mondi@ideasonboard.com> wrote: > > > > > > Hi Tomasz > > > > > > On Fri, Feb 28, 2025 at 07:00:57PM +0900, Tomasz Figa wrote: > > > > Hi Jacopo, > > > > > > > > On Fri, Feb 28, 2025 at 2:11 AM Jacopo Mondi > > > > <jacopo.mondi@ideasonboard.com> wrote: > > > > > > > > > > Hi Mikhail > > > > > > > > > > On Tue, Jan 14, 2025 at 07:00:39PM +0300, Mikhail Rudenko wrote: > > > > > > > > > > > > Hi Laurent, > > > > > > > > > > > > On 2025-01-03 at 17:23 +02, Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote: > > > > > > > > > > > > > On Thu, Jan 02, 2025 at 06:35:00PM +0300, Mikhail Rudenko wrote: > > > > > > >> Currently, the rkisp1 driver always uses coherent DMA allocations for > > > > > > >> video capture buffers. However, on some platforms, using non-coherent > > > > > > >> buffers can improve performance, especially when CPU processing of > > > > > > >> MMAP'ed video buffers is required. > > > > > > >> > > > > > > >> For example, on the Rockchip RK3399 running at maximum CPU frequency, > > > > > > >> the time to memcpy a frame from a 1280x720 XRGB32 MMAP'ed buffer to a > > > > > > >> malloc'ed userspace buffer decreases from 7.7 ms to 1.1 ms when using > > > > > > >> non-coherent DMA allocation. CPU usage also decreases accordingly. > > > > > > > > > > > > > > What's the time taken by the cache management operations ? > > > > > > > > > > > > Sorry for the late reply, your question turned out a little more > > > > > > interesting than I expected initially. :) > > > > > > > > > > > > When capturing using Yavta with MMAP buffers under the conditions mentioned > > > > > > in the commit message, ftrace gives 437.6 +- 1.1 us for > > > > > > dma_sync_sgtable_for_cpu and 409 +- 14 us for > > > > > > dma_sync_sgtable_for_device. Thus, it looks like using non-coherent > > > > > > buffers in this case is more CPU-efficient even when considering cache > > > > > > management overhead. > > > > > > > > > > > > When trying to do the same measurements with libcamera, I failed. In a > > > > > > typical libcamera use case when MMAP buffers are allocated from a > > > > > > device, exported as dmabufs and then used for capture on the same device > > > > > > with DMABUF memory type, cache management in kernel is skipped [1] > > > > > > [2]. Also, vb2_dc_dmabuf_ops_{begin,end}_cpu_access are no-ops [3], so > > > > > > DMA_BUF_IOCTL_SYNC from userspace does not work either. > > > > > > > > > > > > So it looks like to make this change really useful, the above issue of > > > > > > cache management for libcamera/DMABUF/videobuf2-dma-contig has to be > > > > > > solved. I'm not an expert in this area, so any advice is kindly welcome. :) > > > > > > > > > > It would be shame if we let this discussion drop dead.. cache > > > > > management policies are relevant for performances, specifically for > > > > > cpu access, and your above 7.7ms vs 1.1 ms test clearly shows that. > > > > > > > > > > > > > > > > > [1] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-core.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n411 > > > > > > > > > > I would like to know from Hans if the decision to disallow cache-hints > > > > > for dmabuf importers is a design choice or is deeply rooted in other > > > > > reasons I might be missing. > > > > > > > > When DMA-buf is used, the responsibility for cache management is > > > > solely on the CPU users' side, so cache-hints don't really apply. It's > > > > the exporter (=allocator) who determines the mapping policy of the > > > > buffer and provides necessary DMA_BUF_SYNC operations (can be no-op if > > > > the buffer is coherent). > > > > > > This all makes sense. > > > > > > I take it as, for libcamera, users of the FrameBufferAllocator helper > > > (which first exports MMAP buffers from the video device and the > > > imports them back as DMABUF) won't be able to control the cache > > > policies. > > > > > > Now, in the long term, we want FrameBufferAllocator to go away and > > > have either buffers exported by the consumer (likely DRM) or by a > > > system wide buffer allocator (when we'll have one) and have the video > > > devices operate as pure importers. But for the time being the > > > "first export then import" use case is possibile and valid so I wonder > > > if we should consider measures to allow controlling caching policies > > > for this use case too. > > > > Hmm, I may be missing something, but if FrameBufferAllocator does the > > allocation internally in libcamera, why couldn't it use the > > V4L2_MEMORY_FLAG_NON_COHERENT REQBUFS/CREATE_BUFS flag? > > -I- might be missing something, but reading the below links that > Mikhail reported in a previous email > [1] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-core.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n411 > [2] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-core.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n829 > > it seems to me that yes, when first exporting you can hint to tell > vb2 to allocate from non-coherent/non-contiguous memory, but then when > switching the device to importer mode (and import the same buffers it > previously exported) the cache sync point would then > be skipped, leading to possible synchronization issues between the cpu > consumer and the device. Yeah, and that is 100% expected. As I said in my previous reply, in the DMA-buf world, it's the responsibility of the users doing the CPU accesses to ensure the correct synchronization (aka explicit sync). > > > > > Note that however, once the buffer is allocated and mapped once, on > > many architectures it must keep the same mapping attributes across the > > different mappings, due to how the memory hierarchy works. (Not sure > > if this is what you are asking for here, though.) > > > > > > > > > > > > > Best regards, > > > > Tomasz > > > > > > > > > > > > > > I'm asking because the idea is for libcamera to act solely as dma-buf > > > > > importer, the current alloc-export-then-import trick is an helper for > > > > > applications to work around the absence of a system allocator. > > > > > > > > > > If the requirement to disable cache-hints for importers cannot be > > > > > lifted, for libcamera it means we would not be able to use it. > > > > > > > > > > > > > > > > [2] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-core.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n829 > > > > > > [3] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-dma-contig.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n426 > > > > > > > > > > > > -- > > > > > > Best regards, > > > > > > Mikhail Rudenko > > > > > > > > > > > ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2025-02-28 12:06 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-01-02 15:35 [PATCH] media: rkisp1: allow non-coherent video capture buffers Mikhail Rudenko 2025-01-03 15:23 ` Laurent Pinchart 2025-01-14 16:00 ` Mikhail Rudenko 2025-01-15 8:31 ` Tomasz Figa 2025-01-15 13:24 ` Mikhail Rudenko 2025-01-15 14:46 ` Tomasz Figa 2025-01-15 17:29 ` Mikhail Rudenko 2025-01-15 19:13 ` Nicolas Dufresne 2025-02-27 17:05 ` Jacopo Mondi 2025-02-27 20:46 ` Mikhail Rudenko 2025-02-28 2:58 ` Nicolas Dufresne 2025-02-28 9:54 ` Hans Verkuil 2025-02-28 10:00 ` Tomasz Figa 2025-02-28 10:18 ` Jacopo Mondi 2025-02-28 10:28 ` Tomasz Figa 2025-02-28 10:48 ` Jacopo Mondi 2025-02-28 11:19 ` Tomasz Figa
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).